Devops specialist

Bournemouth

OhChat

Posted: 13h ago

Offer description

Job Title: DevOps Specialist & Data Engineer

Location: Remote

Type: Full-time

Experience Level: Senior

Industry: Generative AI / Artificial Intelligence / Machine Learning

Reports To: Head of Engineering / CTO

About Us

Ready to join a cutting edge AI company? We’re on a mission to become the OpenAI of the spicy content industry, building a full-spectrum ecosystem of revolutionary AI infrastructure and products. Our platform, OhChat, features digital twins of real-world personalities and original AI characters, enabling users to interact with lifelike AI-generated characters through text, voice, and images, with a roadmap that includes agentic superModels, API integrations, and video capabilities.

Role Overview

We are looking for a Senior DevOps Specialist with a strong python and data engineering background to support our R&D and tech teams by designing, building, and maintaining robust infrastructure and data pipelines across AWS and GCP. You will be instrumental in ensuring our systems are scalable, observable, cost-effective, and secure. This role is hands-on, cross-functional, and central to our product and research success.

Key Responsibilities

DevOps & Infrastructure

* Design, implement, and maintain infrastructure on AWS and Google Cloud Platform (GCP) to support high-performance computing workloads and scalable services.

* Collaborate with R&D teams to provision and manage compute environments for model training and experimentation.

* Maintain / monitor systems, implement observability solutions (e.g., logging, metrics, tracing), and proactively resolve infrastructure issues.

* Manage CI/CD pipelines for rapid, reliable deployment of services and models.

* Ensure high availability, disaster recovery, and robust security practices across environments.

Data Engineering

* Build and maintain data processing pipelines for model training, experimentation, and analytics.

* Work closely with machine learning engineers and researchers to understand data requirements and workflows.

* Design and implement solutions for data ingestion, transformation, and storage using tools such as Scrappy, Playwright, agentic workflows (e.g. crawl4ai) or equivalent.

* Optimize and benchmark AI training / inference / data workflows to ensure high performance, scalability, cost and an exceptional customer experience.

* Maintain data quality, lineage, and compliance across multiple environments.

Key Requirements

* 5+ years of experience in DevOps, Site Reliability Engineering, or Data Engineering roles.

* Deep expertise with AWS and GCP, including services like EC2, S3, Lambda, IAM, GKE, BigQuery, and more.

* Strong proficiency in infrastructure-as-code tools (e.g., Terraform, Pulumi, CloudFormation).

* Extensive hands-on experience with Docker, Kubernetes, and CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Jenkins, with a strong ability to optimize CI/CD workflows as well as AI training and inference pipelines for performance and reliability."

* Exceptional programming skills in Python. You are expected to write clean, efficient, and production-ready code. You should be highly proficient with modern Python programming paradigms and tooling.

* Proficiency in data-centric programming and scripting languages (e.g., Python, SQL, Bash).

* Proven experience designing and maintaining scalable ETL/ELT pipelines.

* Focused, sharp, and results-oriented: You are decisive, work with a high degree of autonomy, and consistently deliver high-quality results. You are quick to understand and solve the core of a problem and know how to summarize it efficiently for stakeholders.

* Effective communicator and concise in reporting: You should be able to communicate technical insights in a clear and actionable manner, both verbally and in written form. Your reports should be precise, insightful, and aligned with business objectives.

Nice to Have

* Experience supporting AI/ML model training infrastructure (e.g., GPU orchestration, model serving) for both Diffusion- and LLM pipelines.

* Familiarity with data lake architectures and tools like Delta Lake, LakeFS, or Databricks.

* Knowledge of security and compliance best practices (e.g., SOC2, ISO 27001).

* Exposure to MLOps platforms or frameworks (e.g., MLflow, Kubeflow, Vertex AI).

What We Offer

* Competitive salary + equity

* Flexible work environment and remote-friendly culture

* Opportunities to work on cutting-edge AI/ML technology

* Fast-paced environment with high impact and visibility

* Professional growth support and resources

Apply

Create E-mail Alert

Save

See more jobs