Sr Specialist App/Prod Support SITE RELIABILITY ENGINEER
At&T
All India, Chennai • 1 month ago
Experience: 9 to 13 Yrs
PREMIUM
Deal of the Day
--:--:--
15 Days Free Trial
Upgrade to CVX24 Premium
- Free Resume Writing
-
Get a Verified Blue tick
- See who viewed your profile
- Unlimited chat with recruiters
- Rank higher in recruiter searches
- Get up to 10× more recruiter visibility
- Auto-forward profile to 10 top recruiters
- Receive verified recruiter messages directly
- Unlock hidden jobs, not visible to free users
$0
Activate
$0
A small token amount will be charged to verify.
Get Refund in 48 Hours.
After free-trial 6 Months subscription will be auto Activated @ $2.49 (Cancel Anytime).
Free Bluetooth earphones with 6 Months subscription only.
Enter Your Details
Job Description
As a Site Reliability Engineer (SRE) at our company, your role will involve working closely with development teams to define Non-Functional Requirements, such as reliability, performance, and scalability for Java-based enterprise applications. Your primary responsibilities will include:
- Leading the response to production issues, from troubleshooting to implementing immediate fixes, to ensure minimal downtime and adherence to service level agreements (SLAs).
- Building alerting, monitoring, and dashboards for proactive problem identification, with recent hands-on experience in alert creation and maintenance.
- Utilizing strong analytical and technical skills to diagnose and resolve complex issues within production environments, focusing on immediate impact mitigation and collaborating with development teams for long-term solutions.
- Monitoring application performance using APM tools and optimizing performance through code improvements and resource optimization.
Additionally, it would be beneficial if you have experience in:
- Creating and maintaining comprehensive documentation for system architecture, deployment procedures, and troubleshooting guides.
- Developing and maintaining scripts and automation tools to streamline operations and deployment processes.
- Working with development teams to identify and provide non-functional requirements and acceptance criteria during design and development.
- Participating in security assessments and implementing security best practices to safeguard applications and data.
- Providing metrics and status reports to leadership and stakeholder communities, establishing processes for metrics gathering and communication.
- Working closely with Product Development teams for knowledge transfer related to system changes.
Qualifications required for this role:
- Bachelors degree in computer science, Information Technology, or related field.
- 9+ years of technical engineering experience in architecting and developing web applications or SRE roles.
- Strong experience in Observability tools and problem-solving skills.
- Proficiency in Java, J2EE technologies, and automation tools.
- Familiarity with containerization, cloud services, and DevOps practices.
- Knowledge of network protocols, load balancing, security principles, and database SQL queries.
Desirable Skills:
- Certifications in Java, cloud technologies, or SRE methodologies.
- Experience in Salesforce Sales, Service, and Marketing Clouds.
- Experience within high tech, software, or wireless/telecom industry.
- Foundational understanding of Artificial Intelligence (AI) and Machine Learning (ML) principles.
**Job requires working in night shifts**
Location: Chennai, Tamil Nadu, India
Please note that AT&T is an equal employment opportunity employer, providing reasonable accommodations for qualified individuals with disabilities. Background checks are initiated only after an offer is made. As a Site Reliability Engineer (SRE) at our company, your role will involve working closely with development teams to define Non-Functional Requirements, such as reliability, performance, and scalability for Java-based enterprise applications. Your primary responsibilities will include:
- Leading the response to production issues, from troubleshooting to implementing immediate fixes, to ensure minimal downtime and adherence to service level agreements (SLAs).
- Building alerting, monitoring, and dashboards for proactive problem identification, with recent hands-on experience in alert creation and maintenance.
- Utilizing strong analytical and technical skills to diagnose and resolve complex issues within production environments, focusing on immediate impact mitigation and collaborating with development teams for long-term solutions.
- Monitoring application performance using APM tools and optimizing performance through code improvements and resource optimization.
Additionally, it would be beneficial if you have experience in:
- Creating and maintaining comprehensive documentation for system architecture, deployment procedures, and troubleshooting guides.
- Developing and maintaining scripts and automation tools to streamline operations and deployment processes.
- Working with development teams to identify and provide non-functional requirements and acceptance criteria during design and development.
- Participating in security assessments and implementing security best practices to safeguard applications and data.
- Providing metrics and status reports to leadership and stakeholder communities, establishing processes for metrics gathering and communication.
- Working closely with Product Development teams for knowledge transfer related to system changes.
Qualifications required for this role:
- Bachelors degree in computer science, Information Technology, or related field.
- 9+ years of technical engineering experience in architecting and developing web applications or SRE roles.
- Strong experience in Observability too
Skills Required
Java
Architecture
Operational Support
Incident Management
Problem Solving
Performance Optimization
APM
Dynatrace
Automation
Monitoring
Dashboards
Documentation
Automation Tools
Capacity Planning
Security
Release Management
Communication
Knowledge Transfer
J2EE
Salesforce
Salesforce Marketing Cloud
Mulesoft
Splunk
WebLogic
Object Oriented Programming
Java Script
Spring
Automation Tools
Scripting Languages
Python
Containerization
Docker
Kubernetes
Cloud Services
Azure
Git
Jenkins
Load Balancing
SQL Queries
Software Industry
Artificial Intelligence
Machine
Site Reliability Engineering
Observability
App Dynamics
ELK
Non Functional Requirements
Blameless Postmortems
Problem Management Engineering
Synthetic Monitoring
API Gateways
Shell
DevOps Practices
CICD Pipelines
Network Protocols
Security Principles
Certifications
Salesforce Sales
Salesforce Service
Salesforce Marketing Clouds
High Tech Industry
Wireless Industry
Posted on: March 7, 2026
Relevant Jobs
Step 2 of 2