Site Reliability Engineer Manager

Macquarie Group Limited

Hyderabad • 1 month ago

Experience: 9 to 13 Yrs

PREMIUM

Deal of the Day --:--:--

15 Days Free Trial

Upgrade to CVX24 Premium

Free Resume Writing
Get a Verified Blue tick
See who viewed your profile
Unlimited chat with recruiters
Rank higher in recruiter searches
Get up to 10× more recruiter visibility
Auto-forward profile to 10 top recruiters
Receive verified recruiter messages directly
Unlock hidden jobs, not visible to free users

Activate

A small token amount will be charged to verify. Get Refund in 48 Hours.
After free-trial 6 Months subscription will be auto Activated @ $2.49 (Cancel Anytime).
Free Bluetooth earphones with 6 Months subscription only.

Job Description

Join us in our User Access Management (UAM) transformation journey where we have exciting opportunities for engineers to innovate, design, build, and maintain solutions. Our team ensures reliable, efficient UAM systems by solving operational challenges with code whilst driving automation, continuous improvement, and collaboration. As a Site Reliability Engineer (SRE) at Macquarie, you will play a crucial role in designing and operating large-scale, distributed systems powering our identity applications. Your expertise in Go, Python, or Java, along with knowledge of distributed systems, networking, and Linux internals, will be utilized to deliver robust and reliable solutions. Your responsibilities will include automating deployment, monitoring, and recovery using tools like Kubernetes, gRPC, and cloud platforms such as AWS or GCP. Additionally, you will leverage observability tooling like Prometheus, Grafana, and OpenTelemetry for production debugging. Your curiosity about how complex systems fail and commitment to building resilient, scalable services through continuous improvement and collaboration will be key to your success. Key Responsibilities: - Design and operate large-scale, distributed systems using Go, Python, or Java - Automate deployment, monitoring, and recovery utilizing tools like Kubernetes, gRPC, and cloud platforms - Leverage observability tooling such as Prometheus, Grafana, and OpenTelemetry for production debugging - Collaborate with teams to drive automation, continuous improvement, and reliability Qualifications Required: - Proven experience (9+ years) in software engineering or reliability roles - Expertise in defining and measuring SLOs, SLIs, and error budgets - Strong ability to design scalable, reliable systems and evaluate architectural choices for latency and performance - Proficiency in building automation tools and services using Python, Go, or similar languages - Skilled in maintaining high-quality documentation and runbooks through code generation and automation At Macquarie, we offer a wide range of benefits including wellbeing leave, paid maternity and parental leave, company-subsidized childcare services, paid volunteer leave, and comprehensive medical and life insurance cover. We provide access to learning and development opportunities, hybrid and flexible working arrangements, as well as reimbursement for work-from-home equipment. If you are inspired to build a better future and excited about the role at Macquarie, we encourage you to apply and be part of our diverse, equitable, and inclusive workplace where everyone is welcomed and valued. Join us in our User Access Management (UAM) transformation journey where we have exciting opportunities for engineers to innovate, design, build, and maintain solutions. Our team ensures reliable, efficient UAM systems by solving operational challenges with code whilst driving automation, continuous improvement, and collaboration. As a Site Reliability Engineer (SRE) at Macquarie, you will play a crucial role in designing and operating large-scale, distributed systems powering our identity applications. Your expertise in Go, Python, or Java, along with knowledge of distributed systems, networking, and Linux internals, will be utilized to deliver robust and reliable solutions. Your responsibilities will include automating deployment, monitoring, and recovery using tools like Kubernetes, gRPC, and cloud platforms such as AWS or GCP. Additionally, you will leverage observability tooling like Prometheus, Grafana, and OpenTelemetry for production debugging. Your curiosity about how complex systems fail and commitment to building resilient, scalable services through continuous improvement and collaboration will be key to your success. Key Responsibilities: - Design and operate large-scale, distributed systems using Go, Python, or Java - Automate deployment, monitoring, and recovery utilizing tools like Kubernetes, gRPC, and cloud platforms - Leverage observability tooling such as Prometheus, Grafana, and OpenTelemetry for production debugging - Collaborate with teams to drive automation, continuous improvement, and reliability Qualifications Required: - Proven experience (9+ years) in software engineering or reliability roles - Expertise in defining and measuring SLOs, SLIs, and error budgets - Strong ability to design scalable, reliable systems and evaluate architectural choices for latency and performance - Proficiency in building automation tools and services using Python, Go, or similar languages - Skilled in maintaining high-quality documentation and runbooks through code generation and automation At Macquarie, we offer a wide range of benefits including wellbeing leave, paid maternity and parental leave, company-subsidized childcare services, paid volunteer leave, and comprehensive medical and life insurance cover. We provide access to learning and development opportunities, hybrid and flexible working arrangements, as wel

Skills Required

Go Python Java distributed systems networking Linux internals Kubernetes AWS GCP automation automation tools documentation monitoring metrics tracing forecasting resource optimization performance tuning gRPC Prometheus Grafana OpenTelemetry SLOs SLIs error budgets scalable systems architectural choices selfhealing autoscaling capabilities runbooks mean time to detect MTTD mean time to recover MTTR logs datadriven scaling

Posted on: March 24, 2026

Relevant Jobs