Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Software engineer- ai/ml, aws neuron distributed training

Cambridge
Amazon
Software engineer
Posted: 22h ago
Offer description

Overview

Software Engineer- AI/ML, AWS Neuron Distributed Training: Do you love decomposing problems to develop products that impact millions of people around the world? Would you enjoy identifying, defining, and building software solutions that revolutionize how businesses operate?

The Annapurna Labs team at Amazon Web Services (AWS) is looking for a Software Development Engineer II to build, deliver, and maintain complex products that delight our customers and raise our performance bar. You’ll design fault-tolerant systems that run at massive scale as we continue to innovate best-in-class services and applications in the AWS Cloud.

Annapurna Labs was a startup acquired by AWS in 2015 and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the servers that use them. This role is for a senior software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including large language models and other ML workloads.

The ML Distributed Training team works with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trn2 and Trn1. Experience training large models using Python is a must. FSDP and Deepspeed and other distributed training libraries are central to this role and extending them for Neuron-based systems is key.


Key job responsibilities

* Lead efforts building distributed training support into PyTorch, TensorFlow, JAX and the Neuron compiler and runtime stacks.
* Tune models to ensure high performance and maximize efficiency on customer AWS Trainium and Inferentia silicon and on TRN2, TRN1, Inf1 servers.
* Apply strong software development and ML knowledge to optimize distributed training workloads.


About the team

Inclusive Team Culture: Here at AWS, we embrace differences and are committed to a culture of inclusion. We have employee-led affinity groups and learning experiences. Amazon’s leadership principles encourage seeking diverse perspectives, learning, and earning trust.

Work/Life Balance: We value balance between personal and professional life and offer flexible working hours.

Mentorship & Career Growth: The team supports new members with mentorship and project assignments that foster professional growth.


BASIC QUALIFICATIONS

* 3+ years of non-internship professional software development experience
* 3+ years of non-internship design or architecture experience (design patterns, reliability and scaling)
* Experience programming with at least one software programming language
* Deep Learning industry experience


PREFERRED QUALIFICATIONS

* 3+ years of full software development life cycle experience (coding standards, code reviews, source control, build, testing, operations)
* Bachelor's degree in computer science or equivalent
* Experience with PyTorch/JAX/TensorFlow, distributed libraries and frameworks, end-to-end model training; opportunity to optimize and scale large deep learning models on Trainium architecture

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: job duties include working safely, communicating effectively, and following laws and company policies. The company may consider qualified applicants with arrest and conviction records in accordance with local ordinances. Our inclusive culture empowers Amazonians to deliver the best results; accommodations are available during the application and hiring process. For more information, visit amazon.jobs content on accommodations.

Our compensation reflects the cost of labor across US markets. Base pay ranges from $129,300/year to $223,600/year, with compensation varying by market, knowledge, skills, and experience. Amazon is a total compensation company; equity, sign-on bonuses, and other benefits may be provided as part of a total package. This position will remain posted until filled. Applicants should apply via our career site.

Important FAQs for current Government employees: Please review the FAQs before proceeding. Amazon is an equal opportunity employer and does not discriminate on protected status.

#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Dsp software engineer
Cambridge
Singular Recruitment
Software engineer
£55,000 a year
Similar job
Software engineer
Cambridge
Aquent
Software engineer
Similar job
Clojure software engineer (6382) - cambridge
Cambridge
Cambridge University Press and Assessment
Software engineer
See more jobs
Similar jobs
Amazon recruitment
Amazon jobs in Cambridge
It jobs in Cambridge
jobs Cambridge
jobs Cambridgeshire
jobs England
Home > Jobs > It jobs > Software engineer jobs > Software engineer jobs in Cambridge > Software Engineer- AI/ML, AWS Neuron Distributed Training

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save