Senior Software Engineer/SRE - Application Middleware
Location: London, United Kingdom
Description & Requirements
Are you passionate about building high‑performance systems that are fast, resilient, and operate at global scale? Join Bloomberg’s Application Middleware SRE team, where you’ll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal running smoothly for hundreds of thousands of users around the world.
The Team
We’re the Site Reliability Engineering team within Bloomberg’s Application Middleware group. Our mission is to ensure that Bloomberg’s core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency, including:
* Gateways: Secure, high‑performance TCP/SSL entry points to our data centers
* HFN & NSTP: A global HTTP CDN and SOCKS5 proxy network delivering fast access from any geography
* Playlist Services: Dynamic path configuration systems optimizing user connectivity in real time
* PGM Relays: Infrastructure for reliable multicast data delivery
What You’ll Do
* Build production‑grade software that powers Bloomberg’s global infrastructure
* Design and implement scalable, fault‑tolerant systems with a focus on observability, performance, and automation
* Collaborate across engineering teams to introduce automated, self‑service operational workflows
* Conduct deep systems analysis and root cause investigations for complex, distributed systems
* Propose and prototype innovative approaches to reliability and risk mitigation
* Contribute to design docs, runbooks, and post‑incident reviews—clear communication is part of the job
You’ll Need to Have
* A degree in Computer Science, Engineering, Mathematics, or equivalent practical experience
* Strong software engineering skills in a high‑level language (Python and C++ are primary)
* A deep understanding of software system reliability and risk management—including how to identify potential points of failure and design mitigation strategies
* Good understanding of data structures, algorithms, and system design
* Experience navigating and improving large, distributed codebases
* Ability to identify system risks and engineer around points of failure
* Clear written and verbal communication, including technical documentation and incident analysis
We’d Love to See
* Systems knowledge: Operating systems, networking protocols (TCP, UDP, multicast), or core database concepts in modern infrastructure
* Cluster management experience, especially with Argo and/or Kubernetes or other pipeline management platforms
* Machine management at scale: capacity planning and automating lifecycles of large machine fleets
* System observability and monitoring: SLIs/SLOs/SLAs, alerting, and building dashboards for complex systems
* Reliability in distributed systems: knowledge of fault tolerance and network/node failure challenges
* Mentoring: Proven experience mentoring and growing junior engineers
#J-18808-Ljbffr