Job Title: Principal Engineer - Distributed Compute
Job Description
The Principal Engineer is the senior technical authority responsible for setting the engineering direction, ensuring platform reliability, and driving innovation across a critical enterprise technology domain. This role provides deep technical leadership, shapes long-term strategy, and ensures that engineering teams deliver secure, scalable, and resilient services that underpin the organisation’s digital ecosystem. The Principal Engineer acts as the highest level hands-on expert, partnering with architects, product owners, and engineering squads to define standards, modernise platforms, and embed automation and DevOps practices across all services.
Responsibilities
1. Serve as the domain’s foremost technical expert, accountable for engineering excellence and long-term platform strategy.
2. Define and maintain technical standards, patterns, and best practices across the domain.
3. Lead complex design decisions, ensuring solutions are secure, scalable, and aligned with enterprise architecture.
4. Drive adoption of automation, DevOps tooling, and platform as a product principles across all engineering teams.
5. Oversee the health, performance, and lifecycle of core platforms within the domain.
6. Ensure capacity planning, resilience, backup, recovery, and disaster readiness are embedded into all services.
7. Champion observability, monitoring, and proactive incident prevention.
8. Identify opportunities to modernise legacy platforms, reduce technical debt, and introduce new technologies.
9. Evaluate emerging tools, frameworks, and architectures relevant to the domain.
10. Lead proof of concepts and guide engineering teams through adoption.
11. Partner with cross-domain Principal Engineers to ensure cohesive enterprise-wide engineering standards.
12. Work closely with product, security, architecture, and operations teams to deliver integrated solutions.
13. Mentor senior engineers and uplift engineering capability across the organisation.
14. Ensure compliance with security, regulatory, and operational standards.
15. Provide technical oversight for major changes, upgrades, and transformation initiatives.
16. Act as an escalation point for critical incidents and complex technical challenges.
Essential Skills
17. Strong experience with virtualisation platforms (VMware ESXi/vSphere, KVM, Hyper V, or similar).
18. Deep understanding of Linux and/or Windows operating systems, including kernel-level concepts.
19. Hands-on experience with distributed compute systems or cluster management.
20. Proficiency with automation and scripting skills (Python, Bash, PowerShell, etc.).
21. Solid understanding of networking fundamentals in virtualised and distributed environments.
Additional Skills & Qualifications
22. experience with containerisation and orchestration (Docker, Kubernetes, Nomad).
23. Knowledge of cloud compute platforms (Azure, AWS, GCP).
24. Familiarity with high-performance computing (HPC) or large-scale scheduling systems.
25. Exposure to security hardening and compliance frameworks.
26. experience with performance tuning at OS, hypervisor, or cluster level.
Location
Sheffield, UK
Trading as TEKsystems. Allegis Group Limited, Maxis 2, Western Road, Bracknell, RG12 1RT, United Kingdom. No. 2876353. Allegis Group Limited operates as an Employment Business and Employment Agency as set out in the Conduct of Employment Agencies and Employment Businesses Regulations 2003. TEKsystems is a company within the Allegis Group network of companies (collectively referred to as "Allegis Group"). Aerotek, Aston Carter, EASi, Talentis Solutions, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands.