Principal machine learning engineer - production systems

Bristol (City of Bristol)

SoftInWay UK Ltd.

Machine learning engineer

Posted: 19 January

Offer description

Job Description

Principal Machine Learning Engineer – Production Systems

Overview

SoftInWay UK Ltd. Is seeking a highly experienced ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack.

Responsibilities

* Architect the ML Solver Platform:
* Define modular architecture for data preprocessing, model execution, and post-processing.
* Establish clear API contracts between Python/TensorFlow and C# services.
* Productionize ML Workflows:
* Convert research code into robust, testable, and observable services.
* Implement CI/CD pipelines, automated testing, and reproducibility standards.
* Integration & Interoperability:
* Design REST/gRPC endpoints for cross-language communication.
* Ensure compatibility with C#/.NET services.
* Performance & Scalability:
* Optimize GPU/CPU utilization, batching strategies, and memory management.
* Plan for multi-model and multi-tenant scenarios.
* MLOps & Lifecycle Management:
* Implement model versioning, artifact registries, and deployment workflows.
* Set up monitoring, logging, and alerting for solver performance.
* Security & Compliance:
* Apply best practices for secrets management, dependency scanning, and secure artifact storage.

Required Skills & Experience

* ML Frameworks: Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.
* Programming: Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.
* Architecture: Proven experience designing scalable ML systems for production.
* APIs: Proficiency in gRPC/Protobuf and REST for cross-language integration.
* MLOps: CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.
* Performance Optimization: GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.
* Observability: Metrics, tracing, structured logging, dashboards.
* Security: SBOM, image signing, role-based access, vulnerability scanning.

Preferred Qualifications

* Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.
* Familiarity with distributed training strategies and multi-GPU setups.
* Knowledge of feature stores and data validation frameworks.
* Exposure to regulated environments and compliance frameworks.

Tools & Technologies

* ML: TensorFlow, ONNX Runtime, tf2onnx.
* APIs: FastAPI, gRPC.
* DevOps: GitLab CI/GitHub Actions, Docker, Kubernetes.
* Monitoring: Prometheus, Grafana, OpenTelemetry.
* Security: HashiCorp Vault, Sigstore.

Why Join Us?

* Work on cutting-edge ML solutions integrated into commercial engineering software.
* Define architecture that scales across global deployments.
* Collaborate with a team of experts in ML, software engineering, and UI development.
* Competitive salary and benefits.

To apply: Send your resume and a brief cover letter to HR@softinway.com

Apply

Create E-mail Alert

Save

Similar job

Machine learning engineer

Bristol (City of Bristol)

Fruition Group

Machine learning engineer

£85,000 a year

Similar job

Senior machine learning engineer (large systems)

Bristol (City of Bristol)

Permanent

Graphcore

Machine learning engineer

Similar job

Machine learning engineer - sports ai

Bristol (City of Bristol)

Permanent

Hawk-Eye Innovations

Machine learning engineer

€43,750 a year