Gruve Logo

Senior Site Reliability Engineer

Gruve

All India • 1 month ago

Experience: 6 to 10 Yrs

PREMIUM
Deal of the Day --:--:--

A recruiter messaged CVX24 Premium users few seconds ago.

Upgrade to CVX24 Premium: Only $2.49

Bluetooth Earphone
  • Free Resume Writing
  • Get a Verified Blue tick
  • See who viewed your profile
  • Unlimited chat with recruiters
  • Rank higher in recruiter searches
  • Get up to 10× more recruiter visibility
  • Get practical interview tips and guidance
  • Receive verified recruiter messages directly
  • Unlock hidden jobs, not visible to free users
$4.99 $2.49 🔥 50% OFF
Activate
Bluetooth Earphone

(Validity: 6 Months. After payment confirmation we will reach out to you)

Job Description

Role Overview: At Gruve, you will be leading reliability strategy and architectural improvements across various areas including infrastructure, GPU systems, observability, ML Ops, and IT Ops. Your role involves mentoring engineers, managing high-severity incidents, and driving SLO governance. Working with a team of SRE engineers, you will be responsible for setting up, maintaining, and troubleshooting the stack from bare metal through the application layer. Key Responsibilities: - Architect reliability improvements across Kubernetes, GPU infrastructure, ML Ops, networking, and monitoring. - Lead incident management, blameless post-mortems, and error-budget policies. - Drive automation, IaC, and reliability tooling at scale. - Oversee metrics, logs, tracing, and dashboards; ensure actionable alerting. - Integrate GPU operators/exporters and model lifecycle workflows for inference platforms. - Mentor junior and mid-level SREs and guide cross-team initiatives. Qualifications Required: - 69 years of SRE or platform engineering experience. - Expertise in Kubernetes operations and cloud platform experience (AWS/GCP/Azure). - Advanced networking and security fundamentals. - Strong coding background in Python, Go, or Java. - Deep observability knowledge in Prometheus, Grafana, ELK / Fluentd. About Gruve: Gruve is an innovative software services startup dedicated to transforming enterprises into AI powerhouses. Specializing in cybersecurity, customer experience, cloud infrastructure, and advanced technologies such as Large Language Models (LLMs), Gruve's mission is to assist customers in utilizing their data for making more intelligent decisions. As a well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks. If you are passionate about technology and eager to make an impact, Gruve fosters a culture of innovation, collaboration, and continuous learning in a diverse and inclusive workplace. Gruve is an equal opportunity employer welcoming applicants from all backgrounds. Role Overview: At Gruve, you will be leading reliability strategy and architectural improvements across various areas including infrastructure, GPU systems, observability, ML Ops, and IT Ops. Your role involves mentoring engineers, managing high-severity incidents, and driving SLO governance. Working with a team of SRE engineers, you will be responsible for setting up, maintaining, and troubleshooting the stack from bare metal through the application layer. Key Responsibilities: - Architect reliability improvements across Kubernetes, GPU infrastructure, ML Ops, networking, and monitoring. - Lead incident management, blameless post-mortems, and error-budget policies. - Drive automation, IaC, and reliability tooling at scale. - Oversee metrics, logs, tracing, and dashboards; ensure actionable alerting. - Integrate GPU operators/exporters and model lifecycle workflows for inference platforms. - Mentor junior and mid-level SREs and guide cross-team initiatives. Qualifications Required: - 69 years of SRE or platform engineering experience. - Expertise in Kubernetes operations and cloud platform experience (AWS/GCP/Azure). - Advanced networking and security fundamentals. - Strong coding background in Python, Go, or Java. - Deep observability knowledge in Prometheus, Grafana, ELK / Fluentd. About Gruve: Gruve is an innovative software services startup dedicated to transforming enterprises into AI powerhouses. Specializing in cybersecurity, customer experience, cloud infrastructure, and advanced technologies such as Large Language Models (LLMs), Gruve's mission is to assist customers in utilizing their data for making more intelligent decisions. As a well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks. If you are passionate about technology and eager to make an impact, Gruve fosters a culture of innovation, collaboration, and continuous learning in a diverse and inclusive workplace. Gruve is an equal opportunity employer welcoming applicants from all backgrounds.

Posted on: March 6, 2026

Relevant Jobs

Medical Copywriter

Thepharmadaily

All India

View Job →

QuickTV AI Video and Sound Editor (Contract)

Sharechat

All India

View Job →

Senior Designer- Electrical

Barry-Wehmiller

All India, Chennai

View Job →

Digital and print media artist

Stackular

All India, Hyderabad

View Job →

Director Brand Marketing

Upstox

All India

View Job →

Content and Social Media Marketing Internship

calmveda

All India, Delhi

View Job →

Social Media & Content Lead

FrugalTesting

All India

View Job →

Video Content Creator/Producer (Shoot & Edit)

alt.f coworking

All India, Gurugram

View Job →

Video Editing/Making - Internship

Animtopedia Private Limited

All India, Faridabad

View Job →

Senior Performance Marketer

Get Marketed

All India, Jaipur

View Job →