Job Description:
Responsibilities:
* Monitor and analyze capacity metrics, aligning hardware requirements with physical data center resources.
* Perform complex rack-based 3-phase power calculations to balance redundant bus feeds and ensure electrical stability.
* Design and configure network fabric architectures to ensure high availability and robust network connection redundancy at the node level.
* Manage remote ticketing and troubleshooting in lights-out environments using Out-of-Band (OOB) management for full remote operational control.
* Apply Site Reliability Engineering (SRE) principles to data center operations, focusing on automation, latency reduction, and overall reliability of physical and virtual infrastructure.
* Execute hands-on technical tasks, including server builds, hardware assembly, and troubleshooting.
* Coordinate with Data Center Operations teams to support physical deployments, including racking and precision cabling.
* Manage incoming requests through Jira and ServiceNow, prioritizing tasks and coordinating with cross-functional teams to maintain uptime.
* Coordinate procurement and vendor relations to ensure timely hardware orders, resolve shipment discrepancies, and maintain accurate documentation.
* Identify and implement process improvements to enhance operational efficiency, scalability, and time-to-compute metrics.
Requirement/Must Have:
* Deep familiarity with spine-leaf architecture, network fabric design, and implementing redundant failover paths per node.
* Practical knowledge of 3-phase power distribution, including managing A/B redundant power feeds and load balancing.
* Experience using Infrastructure as Code (IaC) and SRE methodologies to automate repetitive operational tasks and hardware provisioning workflows.
* Proven ability to maintain uptime in lights-out facilities through advanced remote monitoring and Out-of-Band (OOB) management.
* Strong knowledge of data center infrastructure and the ability to translate technical demand into physical hardware needs.
* Experience with server builds, hardware troubleshooting, and physical deployment support.
* Proficiency with Jira and ServiceNow.
Skills:
* Advanced networking and network fabric design.
* Strong understanding of data center capacity and power engineering.
* Knowledge of automation and infrastructure reliability practices.
* Excellent troubleshooting and operational problem-solving skills.
* Strong project management and workflow prioritization skills.
* Excellent communication and cross-functional collaboration skills.
* Ability to work independently in a fast-paced environment.
#J-18808-Ljbffr