Overview
Miraxis is building the rights-cleared data factory for robotics and physical AI. A key differentiator is turning messy, heterogeneous real-world robotics data into training-ready datasets with verifiable quality.
As Robotics Data Pipeline Engineer, you will own the multimodal pipeline layer: ingestion, transformation, validation, QA gates, and delivery packaging. You should be able to talk shop with vendors/partners/clients on best practices (formats, sync, calibration metadata, labeling, eval outputs) and also build the tooling to manipulate and audit datasets directly.
Miraxis was founded by Artem Sokolov, who is also the Founder and CEO of Humanoid AI. https://thehumanoid.ai/te%d0%b0m/
What you’ll do (Responsibilities)
* Build and operate multimodal pipelines for robotics/physical AI datasets: ingestion, transformation, validation, and delivery packaging.
* Define “training-ready” as enforceable checks: alignment validation, integrity checks, schema enforcement, and reproducibility standards.
* Build tooling to inspect, transform, and audit datasets (large files, long-running jobs, real-world edge cases).
* Collaborate with Ops/Delivery and Hardware & Integration to ensure capture metadata and formats support downstream usability.
* Work with partners/vendors/clients to align on formats and best practices; turn external constraints into concrete pipeline requirements.
* Maintain clear documentation (schemas, runbooks, data contracts) so a remote team can operate consistently.
What we’re looking for
* Hands-on experience with robotics/physical AI datasets (multimodal: video + sensors + proprioception) and their failure modes.
* Strong Python and data engineering instincts: validation, reproducibility, and careful handling of messy real-world data.
* Comfort working at the intersection of software and domain: can reason about timing/sync, calibration metadata, and the practicalities of capture pipelines.
* Able to communicate clearly with both engineers and external stakeholders; converts ambiguity into executable specs.
Nice to have
* Experience with ROS/ROS2 data formats (bags) or other robotics logging systems.
* Familiarity with simulation/teleoperation datasets, annotation/labeling workflows, and evaluation harnesses.
* Experience building QA frameworks that surface issues early (before downstream training).
Working style & expectations
* Remote-friendly, high-ownership role. Writing and maintaining clear docs is part of the job.
* Travel may be required occasionally for partner debugging and alignment.
Location
Remote-friendly.