 
        
        About the role
We’re looking for a Senior Software Quality Engineer to own test strategy end-to-end for backend services. You’ll build scalable automation and performance frameworks, integrate them into CI/CD, and validate resiliency and operational readiness across AWS/Azure environments. You’ll partner closely with engineering, SRE, and product to enable fast, reliable releases.
Key responsibilities
Strategy and planning
 * Own test strategy, planning, and estimation for services and programs
 * Define quality gates, risk-based coverage, and release-readiness criteria
Automation and quality engineering
 * Design and maintain unified automation frameworks (Java, Cucumber, Robot Framework)
 * Build API and integration tests (Postman), reduce flakiness, and improve maintainability
 * Standardize builds (Gradle) and containerize test tooling (Docker)
Performance engineering
 * Design, execute, and analyze load/stress/soak tests with Gatling
 * Model realistic workloads, establish SLOs, and provide tuning recommendations
 * Track throughput, latency (P95/P99), error budgets, and capacity signals
Resilience and operational readiness
 * Run chaos tests with Litmus; validate failure handling, timeouts, and fallbacks
 * Verify backup/restore and disaster recovery objectives (RTO/RPO)
 * Lead game-days and resilience drills; document runbooks and playbooks
Observability and feedback loops
 * Instrument and monitor with Prometheus, Grafana, and New Relic
 * Wire test results and service telemetry into dashboards and alerts
 * Enable data-driven go/no-go decisions with objective quality signals
CI/CD and DevOps integration
 * Integrate tests into pipelines (Git/GitHub), enforce quality gates, and parallelize execution
 * Support trunk-based development, shift-left checks, and stable environments
Collaboration and enablement
 * Partner with developers, SRE, and product to triage, root-cause, and prevent defects
 * Mentor engineers on testing best practices and reliability-first design
 * Contribute to documentation, standards, and continuous improvement
What we expect from the candidate (must-haves)
 * 10+ years in Quality Engineering/SDET roles focused on backend or platform services
 * Strong coding with Java and hands-on automation using Cucumber and/or Robot Framework
 * Proven experience building CI/CD-integrated test frameworks (Git/GitHub, Gradle, Docker)
 * Performance testing expertise with Gatling (workload design, analysis, recommendations)
 * Chaos and resilience testing experience (Litmus) and operational readiness validation
 * Observability: Prometheus/Grafana/New Relic for metrics, dashboards, SLOs, and alerting
 * API testing experience (Postman), strong understanding of REST and common integration patterns
 * Cloud experience with AWS and/or Azure
 * Solid grasp of testing strategy: functional, integration, system, and non-functional
 * Excellent communication, critical thinking, and cross-functional collaboration
Nice to have
 * Hercules or similar performance harness tooling
 * Experience with Azure DevOps, GitHub Actions, or Jenkins (pipelines and environments)
 * Contract testing, service virtualization, or test containers
 * Kubernetes familiarity (Litmus typically runs on K8s), IaC basics (e.g., Terraform)
 * Domain knowledge in banking/fintech, compliance-minded testing
Success metrics you’ll influence
 * Reduced test cycle time and flakiness rate; improved pipeline pass rate
 * Meaningful automation coverage aligned to business risk
 * Measurable improvements in P95/P99 latency and error budgets
 * Fewer escaped defects and faster MTTD/MTTR via actionable telemetry
 * Consistent, auditable release-readiness signals
First 90 days
 * 0–30: Onboard, baseline current coverage and performance; ship quick wins in CI gating
 * 31–60: Deliver Gatling suites and dashboards (Prometheus/Grafana/New Relic); standardize framework patterns
 * 61–90: Run first chaos game-day; validate backup/restore; publish reliability playbooks; measure impact
Tech stack you’ll use
 * Languages/Frameworks: Java, Cucumber, Robot Framework
 * Performance/Resilience: Gatling, Litmus, Hercules (nice to have)
 * API/Tools: Postman, Git/GitHub, Gradle, Docker
 * Observability: Prometheus, Grafana, New Relic
 * Cloud: AWS, Azure
#J-18808-Ljbffr