Overview
The Software Assurance Group is seeking a Principal Cloud Engineer to support the Machine Learning Engineering team. In this role, you will collaborate with ML engineers to design and implement architectures for cloud-native tools that enhance security, observability, and scalability of large-scale, global AI systems. You will lead the design, build, and guidance of a team responsible for secure, compliant cloud services tailored to the needs of machine learning environments, including secure transport and processing of ML artifacts, scalable telemetry pipelines, and orchestration frameworks for dynamic ML workloads. The architecture will support multi-tenant, global deployments while maintaining compliance and privacy requirements.
Responsibilities
* Design and implement scalable, highly available cloud services, management consoles, and robust telemetry/observability.
* Integrate modern security tools and orchestrate large-scale workloads using Kubernetes or similar frameworks.
* Automate deployments and infrastructure (e.g., via Terraform, Ansible) and enable CI/CD integration.
* Optimize system performance and costs.
* Support SLAs/SLOs and drive response for critical production issues.
* Champion security-first, compliant architectures, and collaborate globally with engineering and ML teams.
Qualifications
* BS Degree in Computer Science, Software Engineering, or related field; MS or PhD preferred.
* 8+ years of experience with distributed/cloud systems on a major cloud platform (OCI, AWS, GCP, Azure).
* Proficiency with programming languages including Java, Python, Go, C/C++.
* Expertise in multi-tenant cloud applications, microservices, APIs, and Infrastructure-as-Code.
* Strong background in CI/CD, Linux, security design (including compliance), and working with global teams.
* Demonstrated experience with observability, scalability, and cost optimization in production.
* Eligibility to work in the United Kingdom without sponsorship is essential.
* Direct experience with Oracle Cloud Infrastructure (OCI).
* Experience with security and quality tools.
* Exposure to ML/data engineering platforms and collaboration with ML teams.
* Experience with large-scale orchestration, monitoring (e.g., Kubernetes, Prometheus, Airflow), and disaster recovery.
* Background in application or cloud security and leading cross-functional technical projects.
* Experience mentoring junior staff.
Benefits
Oracle offers competitive benefits, flexible medical, life insurance, and retirement options, and supports employees in volunteering and community involvement.
#J-18808-Ljbffr