Role overview for the Software Engineer (Monitoring Platform)
As a Software Engineer (Monitoring Platform) here at SRT, you will be part of a small team responsible for designing, building, and maintaining our productised monitoring and observability platform. This platform is deployed across geographically distributed on‑premises sites worldwide and serves clients with varying infrastructure and WAN capabilities. Rather than simply using Prometheus and Grafana, you will engineer the frameworks, tooling, and configuration pipelines that make our monitoring platform consistent, maintainable, and scalable across dozens of deployments. You will work closely with a lead observability engineer who oversees the platform’s architecture and have the authority to architect monitoring solutions and specify changes to be implemented by other development teams. We are fortunate to have a team of highly experienced engineers— including UX designers—who can provide support and guidance as we extend the platform’s capabilities to serve both internal engineers and external end‑users. Please note: you will be required to come to our Cardiff office one day a week.
Key Responsibilities - Software Engineer (Monitoring Platform) - (not exhaustive)
Platform Engineering & Configuration-as-Code
* Build and maintain configuration generation frameworks using Ansible, Jinja2, and Jsonnet to ensure consistency across deployments
* Design and manage Docker Compose‑based service orchestration for the monitoring stack
* Develop and maintain CI/CD pipelines (Jenkins) for building, testing, and packaging platform releases
Dashboards-as-Code & Visualisation
* Develop Grafana dashboards programmatically using the Grafana Foundation SDK (Python) and JSON provisioning
* Design reusable, templated dashboard components that can be configured per deployment
* Collaborate with engineering and product teams to create tailored visualisations for both engineers and end‑users
Monitoring Architecture & Design
* Design and configure Prometheus‑based metric collection, including recording rules, alerting rules, and service discovery
* Develop and maintain metric exporters for application and system‑level data
* Architect monitoring solutions and produce specifications for implementation by other development teams
Tooling & Automation
* Build and maintain Python and Bash tooling for deployment, bundling, and platform operations
* Develop automation to support environment‑specific configuration layering and threshold management
* Contribute to the platform’s packaging and distribution pipeline
Required Skills & Experience - Software Engineer (Monitoring Platform)
* Strong software engineering fundamentals—clean, well‑structured, maintainable code across languages and files
* Proven experience with Prometheus (including PromQL) and Grafana in production environments
* Experience with configuration management and generation tools (Ansible, Jinja2, or similar)
* Proficiency in Python and Bash in a Linux environment
* Experience with Docker and container orchestration (Docker Compose)
* Strong knowledge of Linux‑based systems
* Familiarity with CI/CD pipelines (Jenkins or similar)
* Architectural thinking—designing solutions that are consistent, scalable, and maintainable across multiple deployments
* Ability to work autonomously in a small team with significant ownership
Desirable Skills
* Experience with Grafana‑as‑code approaches (Grafana Foundation SDK, Grafonnet, or JSON provisioning)
* Familiarity with Jsonnet for configuration generation
* Experience with Thanos or other long‑term metric storage solutions
* Knowledge of SNMP‑based monitoring
Within SRT the role title for this position will be System Monitoring & Observability Engineer.
SRT Marine Systems plc are an equal opportunity employer. We are committed to creating an inclusive working environment for all employees and actively encourage applications from all sectors of the community.
#J-18808-Ljbffr