Hydrolix Logo

Principal Reliability and Automation Engineer

Hydrolix

All India, Delhi • 2 months ago

Experience: 10 to 14 Yrs

PREMIUM
Deal of the Day --:--:--

7 Days Free Trial

Upgrade to CVX24 Premium

Offer Announcement Banner
  • Free Resume Writing
  • Get a Verified Blue tick
  • See who viewed your profile
  • Unlimited chat with recruiters
  • Rank higher in recruiter searches
  • Get up to 10× more recruiter visibility
  • Auto-forward profile to 10 top recruiters
  • Receive verified recruiter messages directly
  • Unlock hidden jobs, not visible to free users

A small token amount will be charged to verify. Get Refund in 48 Hours.
After free-trial 6 Months subscription will be auto Activated @ $ 1 (Cancel Anytime).
Free Earplugs Delivery Only after Payment of Rs. 99 for Five Consecutive Months.

Job Description

As a Principal Site Reliability Engineer at our dynamic Services team, you will play a crucial role in ensuring the reliability and scalability of our cutting-edge platform. Your deep expertise in system reliability and automation will be instrumental in delivering exceptional solutions tailored to our customers' unique needs. **Key Responsibilities:** - **Reliability Engineering:** Design and build automated systems to ensure the reliability and scalability of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms. - **Automation and Efficiency:** Identify and eliminate repetitive manual work through automation and improved tooling, freeing the team to focus on high-value work. - **Observability Infrastructure:** Enhance observability systems for deep visibility into system behavior, debugging, troubleshooting, and data-driven reliability decisions. - **CI/CD and Deployment Automation:** Design robust CI/CD pipelines and deployment automation for safe, frequent releases with minimal human intervention. - **Infrastructure Reliability:** Deploy and maintain a highly reliable fleet of Kubernetes clusters and Hydrolix deployments. - **Service Optimization:** Implement systems and processes to enhance the reliability, availability, and performance of our services. - **Root Cause Analysis:** Conduct comprehensive root cause analyses for system failures and implement long-term preventive measures. - **Collaboration and Customer Engagement:** Work closely with cross-functional teams, share knowledge, and champion SRE best practices. **Qualifications and Skills:** - **SRE Expertise:** Minimum 10+ years of experience as a Site Reliability Engineer or DevOps Engineer supporting large-scale distributed systems. - **Architecture, Performance & Scalability:** Deep experience in designing system architectures with reliability, scalability, and operability as primary concerns. - **Automation, Platform & Infrastructure Engineering:** Track record of eliminating toil through automation and expertise in configuration management tools. - **Observability & Reliability Engineering:** Deep expertise in observability tools, reliability concepts, and experience with chaos engineering. - **Kubernetes & Distributed Systems:** Understanding of Kubernetes architecture, operations, and experience in operating multi-cluster environments. - **Cloud & Multi-Cloud Expertise:** Proficiency in at least one major cloud platform and familiarity with multi-cloud architectures. - **Networking, Security & Traffic Management:** Experience in network load balancing, security technology stacks, and standard networking protocols. - **Data & Storage Systems:** Experience with SQL databases and ability to reason about performance and scaling characteristics of data-intensive systems. - **Programming & Systems Engineering:** Strong programming skills in Go, Python, or Rust with the ability to build and maintain production-quality tools. - **Linux & Infrastructure Fundamentals:** Deep experience with Linux systems, including performance tuning and low-level troubleshooting. - **Incident Management & Operational Excellence:** Extensive experience in leading high-severity incidents, driving post-incident reviews, and improving operational standards. We are excited to see how your expertise can contribute to the success of Hydrolix and make a significant impact on our platform. As a Principal Site Reliability Engineer at our dynamic Services team, you will play a crucial role in ensuring the reliability and scalability of our cutting-edge platform. Your deep expertise in system reliability and automation will be instrumental in delivering exceptional solutions tailored to our customers' unique needs. **Key Responsibilities:** - **Reliability Engineering:** Design and build automated systems to ensure the reliability and scalability of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms. - **Automation and Efficiency:** Identify and eliminate repetitive manual work through automation and improved tooling, freeing the team to focus on high-value work. - **Observability Infrastructure:** Enhance observability systems for deep visibility into system behavior, debugging, troubleshooting, and data-driven reliability decisions. - **CI/CD and Deployment Automation:** Design robust CI/CD pipelines and deployment automation for safe, frequent releases with minimal human intervention. - **Infrastructure Reliability:** Deploy and maintain a highly reliable fleet of Kubernetes clusters and Hydrolix deployments. - **Service Optimization:** Implement systems and processes to enhance the reliability, availability, and performance of our services. - **Root Cause Analysis:** Conduct comprehensive root cause analyses for system failures and implement long-term preventive measures. - **Collaboration and Customer Engagement:** Work closely with cross-functional teams, share knowledge, and champion SRE best practices. **Qualifi

Posted on: March 1, 2026

Relevant Jobs

Senior Designer- Electrical

Barry-Wehmiller

All India, Chennai

View Job →

Lead Platform Engineer/Platform Architect

PEOPLE EQUATION PRIVATE LIMITED

All India

View Job →

Engineering Manager (JIRA Project Management)

Newgen Software

All India, Noida

View Job →

Senior Project Head

DAS FOODTECH PVT. LTD.

All India, Gurugram

View Job →

Customer Service - Engineering

Cadence

All India, Pune

View Job →

Site Reliability Engineer - Vice President Level

NatWest Group

All India, Gurugram

View Job →

Software Development Specialist

Accelya Services India

All India

View Job →

Senior RF Systems Specialist

Botlab Dynamics

All India

View Job →

Engineering Manager (JIRA Project Management)

Newgen Software

All India, Noida

View Job →

Senior RF Systems Specialist

Botlab Dynamics

All India

View Job →