Join us as a Site Reliability Engineer where you'll spearhead the evolution of our digital landscape, driving innovation and excellence.
As a Microsoft SQL Database Site Reliability Engineer (SRE) at Barclays, you will assume a key technical role. You will assist in shaping the direction of our database administration, ensuring our technological approaches are innovative and aligned with the Bank’s business goals. You will contribute high-impact projects to completion, collaborate with management, and implement SRE practices using software engineering and database administration to address infrastructure and operational challenges at scale. As part of the Database SRE team, you will be data-driven and work to eliminate TOIL through simplification, automation, and observability, thereby enhancing the reliability of our platforms. With a focus on database scalability, availability, security, and performance, you will work closely with the Engineering team, product managers, and other teams. You will ensure the seamless flow and robust security of information on our platforms, meeting high traffic volumes and demanding operational needs.
To be successful as a Site Reliability Engineer, you should have experience with:
* Technical specialisation with MS SQL expertise on version - SQL 2016—2022 for complex database related issues from availability, to tuning to architecture on enterprise scale.
* Contribute shaping, designing SRE practice for MSSQL offering, delivering through SRE team.
* Serve as the technical escalation for complex database related issues, providing expert solutions.
* Assist establishment and evolution of the SRE function and implementing advanced SLIs and SLOs.
Some other highly valued skills may include:
* Experience on database automation with estate standardization.
* Expert knowledge in system configuration management tools such as Chef, Ansible for database server configurations.
* Expert expertise with scripting languages (e.g. PowerShell) for automation/migration tasks.
You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen strategic thinking and digital and technology, as well as job-specific technical skills
This role will be based in our Knutsford campus.
Purpose of the role
To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them.
Accountabilities
* Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
* Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.
* Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience.
* Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
* Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely
#J-18808-Ljbffr