The central action of the market is the trade - the exchange of assets between two parties. The process of trading is composed of many parts, but once research is complete and they've decided to proceed, traders use trade automation and an Execution Management System (EMS) to route orders, execute trades, and observe the results. Trade Automation and Execution (TRAX) is an organization within Enterprise Products Engineering which provides both trade automation and an EMS for trading a variety of assets to our clients. There is substantial work related to the reliability, scalability, and technical risk management of those systems - this is where TRAX Environments SRE plays a role. Our team is focused on partnering across the TRAX organization to work on reliability, scalability, and technical risk management for our trade automation and Execution Management Systems. Our team is responsible for maintaining the holistic Environments within which TRAX systems operate. The combination of software, hardware, and firm and regulatory requirements is what we consider the 'Environment' in which software runs. To get a concept of scale, presently this includes 65 parent clusters with 200 child clusters, and hundreds of services and middleware packages. These clusters go through changes over time which create risk. This can include increases in trading volume resulting in higher CPU needs, new feature deployments resulting in different performance behavior, and firm and regulatory requirements such as OS upgrades and deprecations of older technology. TRAX Environments SRE is responsible for understanding these changes and the risk they create and partnering with stakeholder teams to mitigate them. What's in it for you : You'll have direct influence on the stability and resilience of Bloomberg systems, specifically for products that directly face clients. This means your work can directly improve the client experience as they perform trades. You'll learn best practices of managing distributed systems by helping understand and fix risks our team identifies. You'll work with many team members across the organization who are seated in New York, London, and Frankfurt. This will increase your network significantly as we work across many teams. We'll trust you to : Identify Environment risks across TRAX to drive our reliability efforts Partner with teams to help them navigate a variety of reliability issues Communicate solutions and roadmaps in meetings and presentations Develop and help stakeholder teams implement paved-path Chaos Engineering solutions Collaborate with company-wide infrastructure teams and promote best practices and guidelines Develop our own best practices for cluster and Environment management to help improve stability across TRAX clusters Develop and maintain tools and automation capable of early detection and intervention of issues in our production environment You'll need to have : Demonstrated experience building enterprise applications with an object-oriented programming language (Python, Java, C++ etc.) A Degree in Computer Science, Engineering, Mathematics, similar field of study or equivalent work experience Knowledge of Unix or Linux fundamentals (or basic knowledge and a strong desire to learn) Experience with contributing to and triaging scaling and reliability problems on production distributed systems Experience with metrics or monitoring solutions like Grafana, Prometheus, Graphite or Telegraf, Humio, and Metrictan We would love to see : Experience with lower level languages like C, C++, or Java Eagerness to learn about all levels of any software or hardware stacks Strong interest interest in written and verbal technical communication