Deutsche Telekom Digital Labs Private Limited Logo

DevOps Manager

Deutsche Telekom Digital Labs Private Limited

All India • 2 months ago

Experience: 10 to 14 Yrs

PREMIUM
Deal of the Day --:--:--

7 Days Free Trial

Upgrade to CVX24 Premium

Offer Announcement Banner
  • Free Resume Writing
  • Get a Verified Blue tick
  • See who viewed your profile
  • Unlimited chat with recruiters
  • Rank higher in recruiter searches
  • Get up to 10× more recruiter visibility
  • Auto-forward profile to 10 top recruiters
  • Receive verified recruiter messages directly
  • Unlock hidden jobs, not visible to free users

A small token amount will be charged to verify. Get Refund in 48 Hours.
After free-trial 6 Months subscription will be auto Activated @ $ 1 (Cancel Anytime).
Free Earplugs Delivery Only after Payment of Rs. 99 for Five Consecutive Months.

Job Description

Role Overview: You are a DevOps Engineering Manager responsible for leading Cloud and Platform engineering for AI-first teams. Your main focus will be on building and operating highly Reliable, Secure, and Scalable platforms that support microservices-based workloads and enable rapid experimentation and production rollout of Agentic AI systems. You will collaborate closely with AI/ML, platform, and product teams across India and Europe to operationalize AI solutions at scale. Key Responsibilities: - Define and own the cloud and platform architecture for large-scale containerized microservices and Agentic AI / LLM workloads, ensuring scalability, reliability, and cost efficiency. - Lead CI/CD platform engineering, enabling automated build, test, security scanning, and deployment for backend services, React-based web applications, and mobile app backends. - Enable production-grade AI platforms, supporting agent frameworks, vector databases, prompt pipelines, and inference. - Define Infrastructure as code standards, cloud account structures, networking, and environment provisioning across AWS and secondary clouds. - Implement and enforce SRE practices: define SLIs/SLOs, error budgets, capacity and reliability targets, and lead incident response and post-incident reviews. - Ensure end-to-end observability across services and AI workloads, including logs, metrics, traces, model performance, and cost visibility. - Embed security, compliance, and governance by design, including IAM, secrets management, network security, vulnerability management, and AI-specific controls. - Make informed build vs. buy decisions, evaluate emerging cloud and AI infrastructure technologies, and drive continuous platform modernization. Qualification Required: - 10+ years of experience in DevOps / Cloud / Platform Engineering, including people management and technical leadership. - Deep hands-on expertise with AWS, with working exposure to GCP and Azure in multi-cloud or hybrid environments. - Proven experience operating large-scale, production-grade containerized workloads, with a strong understanding of high availability, fault tolerance, and capacity planning in global teams. - Practical experience supporting AI/ML or LLM workloads in production environments. - Strong expertise in Kubernetes and Docker, including cluster operations, workload isolation, ingress, service meshes, and deployment strategies. - Advanced experience with Infrastructure as Code for cloud provisioning, networking, security controls, and environment standardization across multiple stages. - Solid understanding of observability and reliability engineering, including metrics, logging, tracing, alerting, and defining SLIs/SLOs for distributed systems and AI services. - Hands-on exposure with cloud security and compliance practices, including IAM design, secrets management, vulnerability scanning, and secure deployment patternsespecially for AI platforms. - Knowledge of cloud cost optimization (FinOps), especially for AI workloads. - Background in strong product-based organizations solving real customer-facing problems. Additional Details of the Company: The company values an AI-first mindset with curiosity and adaptability to turn rapid AI innovation into stable production systems. They are seeking a strategic thinker with hands-on technical depth, excellent communication and collaboration skills in global, distributed teams, and an ownership-driven leader who builds accountable teams and fosters a culture of reliability, automation, and continuous improvement. Role Overview: You are a DevOps Engineering Manager responsible for leading Cloud and Platform engineering for AI-first teams. Your main focus will be on building and operating highly Reliable, Secure, and Scalable platforms that support microservices-based workloads and enable rapid experimentation and production rollout of Agentic AI systems. You will collaborate closely with AI/ML, platform, and product teams across India and Europe to operationalize AI solutions at scale. Key Responsibilities: - Define and own the cloud and platform architecture for large-scale containerized microservices and Agentic AI / LLM workloads, ensuring scalability, reliability, and cost efficiency. - Lead CI/CD platform engineering, enabling automated build, test, security scanning, and deployment for backend services, React-based web applications, and mobile app backends. - Enable production-grade AI platforms, supporting agent frameworks, vector databases, prompt pipelines, and inference. - Define Infrastructure as code standards, cloud account structures, networking, and environment provisioning across AWS and secondary clouds. - Implement and enforce SRE practices: define SLIs/SLOs, error budgets, capacity and reliability targets, and lead incident response and post-incident reviews. - Ensure end-to-end observability across services and AI workloads, including logs, metrics, traces, model performance, and cost visib

Posted on: March 12, 2026

Relevant Jobs

Site Reliability Engineer - Vice President Level

NatWest Group

All India, Gurugram

View Job →

Senior Site Reliability Engineer, Tenant Services Geo (Mumbai)

Gitlab

All India

View Job →

DevOps Release & Deployment Architect

Sureify

All India, Hyderabad

View Job →

Digital Technology Advisor - Software Architecture

Baker Hughes

All India

View Job →

Senior Site Reliability Engineer, Tenant Services Geo (Mumbai)

Gitlab

All India

View Job →

Digital Technology Advisor - Software Architecture

Baker Hughes

All India

View Job →

Senior Site Reliability Engineer, Tenant Services Geo (Mumbai)

Gitlab

All India

View Job →

Senior Site Reliability Engineer, Tenant Services Geo (Mumbai)

Gitlab

All India

View Job →

Senior Site Reliability Engineer, Tenant Services Geo (Mumbai)

Gitlab

All India

View Job →

Site Reliability Engineer - Vice President Level

NatWest Group

All India, Gurugram

View Job →