We have an excellent contract job opportunity for Observability/Monitoring Service Owner – Cloud for our leading client.Role overview Own the technical execution of the Observability solutions, integration of monitoring tools, leveraging the ai capabilities in the NOW platform to manage events of the client’s Transform products and technical platforms.Contract – 6 months (high potential to extend further)Location – Waterside (UB7 0GB) (2–3 days per week on-site)Pay – attractive daily rate (inside IR35)In this role, you will…Leadership andernance:
1. Lead and own the IT observability, Automation and Autohealing services for IAG Transform.
2. Foster a culture of innovation, collaboration, and continuous improvement in the organisation.
3. Develop and implement policies, process and procedures for observability service.
4. Define standards for logs, event alerts and quality assurance.
5. Establishernance frameworks to ensure consistent andpliant usage of observability tools.
6. Set up technical review board for any monitoring solutions to define/validate/endorse monitoring strategies, solutions, demands, etc.
7. Conduct regular audits to ensurepliance with established policies and standards.
8. Responsible for providing an observability centre of excellence, own and provide observability solutions to product and platform teams.
Innovation and Strategy:
9. Develop strategies to leverage new observability tools and technologies to enhance IT service operations and overall business operations.
10. Lead proof-of-concept initiatives to automation resolution of events and incidents.
11. Introduce and implement new machine learning models and aiops features.
Process Improvement:
12. Responsible to identify service optimisation initiatives to mature the overall service.
13. Continuously improve IT and business service availability through effective use of observability and automation tooling.
14. Identify opportunities to automate processes and reduce manual efforts.
15. Optimise metric intelligence.
Vendor Management:
16. Manage vendors and partners to provide best-in-class service to meet IAG requirements.
17. Manage vendor relationships, service-level agreements (SLAs), escalations and CSI plans.
18. Evaluate and select new vendors and tools as needed.
Observability Tooling Architecture:
19. Design and oversee the implementation of aprehensive enterprise observability tooling architecture and strategy that supports ITSM, monitoring, observability, automation, and delivery management.
20. Engage in AiOps project to ensure that the key monitoring tools like Datadog, AWS, Azure monitor, Dynatrace, etc is feeding the right logs and metrics into event mgt module in service now.
21. Optimize observability tooling infrastructure to improve efficiency, reliability, and performance.
22. Ensure that all tools integrate seamlessly with each other and with other enterprise systems.
23. Develop and maintain a roadmap for enterprise tool enhancements and upgrades.
24. Set up business service monitoring dashboards for the critical business services
Automation & Autohealing:
25. Own the automation and autohealing service, platforms and tools.
26. Define the automation and autohealing policy, process and procedure.
27. Identify potential use cases for automation and autohealing and take it through the righternance to implement automation playbooks using ansible or any AWS/AZURE native services that seem fit for the use case.
28. Responsible for reduction in manual efforts in service ops and increase in automation.
Tool Integration and Optimization:
29. Work collaboratively with cross-functional teams to ensure integration of tools across the Enterprise to reduce manual effort and maximise quality and productivity.
30. Define the technical specifications, standards, and policy for technical integration of monitoring tools into ServiceNow/Ansible.
31. Validate the technical architecture of the integration to ensure its fit for use, fit for purpose, its scalable and flexible to meet the demands of measuring business services.
32. Implement best practices, industry standards and frameworks for configuration and usage of observability and automation technology tools.
ITSM Tooling:
33. Responsible to identify opportunities to increase the proactive prediction, detection and restoration of events and incidents using machine learning models.
34. Responsible to leverage the aiops, service now to increase the automation of resolution.
35. Design and oversee the implementation of ITSM tooling solutions that support ITIL-aligned processes.
36. Work collaboratively with cross-functional teams to ensure integration of ITSM tools with other essential enterprise tools (, monitoring, CMDB, service desk, automation tools).
Training and Support:
37. Provide training and support to technology staff on the effective use of observability and automation services.
38. Serve as a subject matter expert for enterprise tools and related technologies.
Minimum Requirements:
39. Extensive experience in observability and automation technology, tools, service, process with a strong focus on management, effectiveness and architecture.
40. Significant experience in observability and automation architecture and enterprise systems.
41. Proven expertise in designing, implementing, and managing a variety of observability tools such as Cloudwatch, Azure monitor, Datadog, ThousandEyes, etc.
42. Proven expertise in designing, implementing and managing a variety of automation and autohealing tools such as ansible, NextThink and native AWS/Azure services.
43. Experience of integrating with other industry tooling such as ServiceNow, Ansible, Next Think, GitHub, and other DevOps tooling.
44. Experience with industry standard SDLCs including but not limited to Agile, Waterfall, Hybrid, product operating model, etc.
45. Demonstrated ability to integrate and optimize observability tooling acrossplex IT environments in cloud hosting specifically AWS & Azure, On-Prem and SaaS. Preferred experience with range of cloud native solutions ie Kubernetes monitoring.
46. Experience and knowledge of Service now platform, preferably ITSM/ITOM/AIOps including Metrics intelligence.
47. Experience in defining and owning the event management and automation process.
48. Strong understanding of event correlation, noise reduction techniques which willplement the AIOps, automation and autohealing capabilities.
49. Experience with defining observability, automation and autohealing strategies to increase the coverage and adoption across the landscape.
50. Experience embedding observability, automation and autohealing practices into BAU operations through service design and service operations processes.
Critical Skills:
51. Excellent analytical, problem-solving, and strategic thinking skills.
52. Strongmunication and interpersonal skills with the ability to work effectively with cross-functional teams.
53. Exceptional organizational,munication, and interpersonal skills specific to a fast-paced, global corporate environment.
54. Robust problem-solving and analytical capabilities.
55. Experience in vendor management and negotiation.
56. Excellent verbal and writtenmunication skills to effectively convey change proposals, document architecture and processes and liaise with stakeholders at all levels.
57. Meticulous attention to detail to ensure accuracy and thoroughness.