Site Reliability Engineer Manager
Macquarie Group Limited
Hyderabad • 1 month ago
Experience: 9 to 13 Yrs
PREMIUM
Deal of the Day
--:--:--
15 Days Free Trial
Upgrade to CVX24 Premium
- Free Resume Writing
-
Get a Verified Blue tick
- See who viewed your profile
- Unlimited chat with recruiters
- Rank higher in recruiter searches
- Get up to 10× more recruiter visibility
- Auto-forward profile to 10 top recruiters
- Receive verified recruiter messages directly
- Unlock hidden jobs, not visible to free users
$0
Activate
$0
A small token amount will be charged to verify.
Get Refund in 48 Hours.
After free-trial 6 Months subscription will be auto Activated @ $2.49 (Cancel Anytime).
Free Bluetooth earphones with 6 Months subscription only.
Enter Your Details
Job Description
Join us in our User Access Management (UAM) transformation journey where we have exciting opportunities for engineers to innovate, design, build, and maintain solutions. Our team ensures reliable, efficient UAM systems by solving operational challenges with code whilst driving automation, continuous improvement, and collaboration.
As a Site Reliability Engineer (SRE) at Macquarie, you will play a crucial role in designing and operating large-scale, distributed systems powering our identity applications. Your expertise in Go, Python, or Java, along with knowledge of distributed systems, networking, and Linux internals, will be utilized to deliver robust and reliable solutions. Your responsibilities will include automating deployment, monitoring, and recovery using tools like Kubernetes, gRPC, and cloud platforms such as AWS or GCP. Additionally, you will leverage observability tooling like Prometheus, Grafana, and OpenTelemetry for production debugging. Your curiosity about how complex systems fail and commitment to building resilient, scalable services through continuous improvement and collaboration will be key to your success.
Key Responsibilities:
- Design and operate large-scale, distributed systems using Go, Python, or Java
- Automate deployment, monitoring, and recovery utilizing tools like Kubernetes, gRPC, and cloud platforms
- Leverage observability tooling such as Prometheus, Grafana, and OpenTelemetry for production debugging
- Collaborate with teams to drive automation, continuous improvement, and reliability
Qualifications Required:
- Proven experience (9+ years) in software engineering or reliability roles
- Expertise in defining and measuring SLOs, SLIs, and error budgets
- Strong ability to design scalable, reliable systems and evaluate architectural choices for latency and performance
- Proficiency in building automation tools and services using Python, Go, or similar languages
- Skilled in maintaining high-quality documentation and runbooks through code generation and automation
At Macquarie, we offer a wide range of benefits including wellbeing leave, paid maternity and parental leave, company-subsidized childcare services, paid volunteer leave, and comprehensive medical and life insurance cover. We provide access to learning and development opportunities, hybrid and flexible working arrangements, as well as reimbursement for work-from-home equipment.
If you are inspired to build a better future and excited about the role at Macquarie, we encourage you to apply and be part of our diverse, equitable, and inclusive workplace where everyone is welcomed and valued. Join us in our User Access Management (UAM) transformation journey where we have exciting opportunities for engineers to innovate, design, build, and maintain solutions. Our team ensures reliable, efficient UAM systems by solving operational challenges with code whilst driving automation, continuous improvement, and collaboration.
As a Site Reliability Engineer (SRE) at Macquarie, you will play a crucial role in designing and operating large-scale, distributed systems powering our identity applications. Your expertise in Go, Python, or Java, along with knowledge of distributed systems, networking, and Linux internals, will be utilized to deliver robust and reliable solutions. Your responsibilities will include automating deployment, monitoring, and recovery using tools like Kubernetes, gRPC, and cloud platforms such as AWS or GCP. Additionally, you will leverage observability tooling like Prometheus, Grafana, and OpenTelemetry for production debugging. Your curiosity about how complex systems fail and commitment to building resilient, scalable services through continuous improvement and collaboration will be key to your success.
Key Responsibilities:
- Design and operate large-scale, distributed systems using Go, Python, or Java
- Automate deployment, monitoring, and recovery utilizing tools like Kubernetes, gRPC, and cloud platforms
- Leverage observability tooling such as Prometheus, Grafana, and OpenTelemetry for production debugging
- Collaborate with teams to drive automation, continuous improvement, and reliability
Qualifications Required:
- Proven experience (9+ years) in software engineering or reliability roles
- Expertise in defining and measuring SLOs, SLIs, and error budgets
- Strong ability to design scalable, reliable systems and evaluate architectural choices for latency and performance
- Proficiency in building automation tools and services using Python, Go, or similar languages
- Skilled in maintaining high-quality documentation and runbooks through code generation and automation
At Macquarie, we offer a wide range of benefits including wellbeing leave, paid maternity and parental leave, company-subsidized childcare services, paid volunteer leave, and comprehensive medical and life insurance cover. We provide access to learning and development opportunities, hybrid and flexible working arrangements, as wel
Skills Required
Go
Python
Java
distributed systems
networking
Linux internals
Kubernetes
AWS
GCP
automation
automation tools
documentation
monitoring
metrics
tracing
forecasting
resource optimization
performance tuning
gRPC
Prometheus
Grafana
OpenTelemetry
SLOs
SLIs
error budgets
scalable systems
architectural choices
selfhealing
autoscaling capabilities
runbooks
mean time to detect MTTD
mean time to recover MTTR
logs
datadriven scaling
Posted on: March 24, 2026
Relevant Jobs
Step 2 of 2