Role Overview
This is a varied role working across a multitude of observability tools to ensure smooth operation of Liquidnet’s Production trading platforms and underlying infrastructure supporting them. The role is responsible for early identification of issues, correlating multiple issues across hardware and software and ensuring clear runbooks are created and followed at all times.
Liquidnet employ an offshore ‘Network Operational Centre’ (NOC) in India who perform ‘eyes on glass’ monitoring of systems, raising alerts to the relevant teams as per runbook definition. This role provides a level of oversight and guidance to the offshore team, but is embedded within the wider Production Support organisation so that it can provide realtime statistics and input towards root cause analysis, particularly during major incidents. Input is critical on major incident bridges and post-mortems to ensure continuous improvement to the observability of systems.
Liquidnet currently operate on-prem however over the next 2 years, there is a major transformation programme that will see all non-latency-sensitive applications moving to the Cloud, so experience in observability across both setups is crucial. A consolidation project is also underway across the observability tooling stack within the TP ICAP group that is likely to have an impact on the Liquidnet tooling and working practices.
Lastly, there is a big drive to automate as much as possible at Liquidnet and so this role would contribute to this by automating smoke checks and other manual workflows. So the ideal candidate would combine a scripting and automation skillset with a ruthless desire to reduce TOIL and repetitive, low-value tasks.
Role Responsibilities
1. Day to day oversight of the offshore NOC (alongside the role’s manager in New York)
2. Day to day and responsibility for monitoring tooling, ensuring robust design and prescribed documentation is followed at all times
3. Actively engage and participate on incident bridges, helping to identify root cause as quickly as possible through realtime status updates and monitoring
4. Participate in post-mortems to ensure monitoring quality is continually evaluated and any gaps identified within incidents are closed as quickly as possible
5. Onboard new workflows and systems into the observability stack, ensuring adherence to standards. This includes building bespoke solutions where vendor products fall short
6. Working with DevOps and Deployment teams to ensure system changes are agreed from an observability perspective and not causing unnecessary risk
7. Contributing towards Automation and AI workstreams with a particular focus on the automation of post-deployment checks and smoke testing of application workflows to remove the requirement for manual work
8. Occasional weekend work will be required during major upgrades and out of hours testing
Experience / Competences
Essential
9. At least 5 years hands-on experience within a SysOps or NOC team, ideally within a financial institution (buy-side, sell-side, venue/platform provider)
10. Working knowledge of Cloud based infrastructure and application monitoring, ideally with AWS certification (Cloud Practitioner, SysOps Administrator or other)
11. Proven hands-on automation and scripting experience (PERL, Python, Powershell, Bash etc)
12. Basic application support experience within a Unix / Linux environment
13. Experience of supporting Windows Server environments
14. Experience in troubleshooting network problems: i.e. firewall and routing problemsProven track record of implementing and maintaining a robust and flexible monitoring solution across a complex technical environment
15. Proactive, tenacious individual with ability to solve complex issues
16. Willingness to challenge the status quo and bring about positive change
17. Capable of balancing multiple conflicting priorities and managing stakeholder expectations honestly and appropriately
Desired
18. Knowledge of OTEL and STATSD protocols
19. Knowledge of DevOps principles and workflows, including collaboration
with Development teams
20. Experience with automation tools (Ansible, Puppet etc)
21. Experience supporting message-based architecture (Solace, Tibco, MQ etc)
22. Experience with industry-standard monitoring tools (ITRS, Prometheus or similar)
23. Working knowledge and experience working with SNMP and iLo protocols
24. AWS-certified to SysOps Administrator level
25. Basic knowledge of the FIX protocol and workflows
26. Experience within MSSQL, Oracle or Sybase database environments
27. Experience working within an ITIL framework, ideally with ITIL Foundation qualification
#LI-Hybrid #LI-ASO #NIJobs