Senior HPC Engineer
Netweb Technologies India Ltd.
All India, Faridabad • 1 month ago
Experience: 8 to 12 Yrs
PREMIUM
Deal of the Day
--:--:--
A recruiter messaged CVX24 Premium users few seconds ago.
Upgrade to CVX24 Premium: Only $2.49
- Free Resume Writing
-
Get a Verified Blue tick
- See who viewed your profile
- Unlimited chat with recruiters
- Rank higher in recruiter searches
- Get up to 10× more recruiter visibility
- Get practical interview tips and guidance
- Receive verified recruiter messages directly
- Unlock hidden jobs, not visible to free users
$4.99
$2.49
🔥 50% OFF
Activate
$4.99
$2.49
all inc.
(Validity: 6 Months. After payment confirmation we will reach out to you)
Enter Your Details
Job Description
As a Senior Engineer-HPC at our company, you will be an accomplished HPC Systems Engineer with over 10 years of enterprise Linux administration experience and more than 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Your expertise in designing, implementing, and optimizing HPC infrastructure will be crucial in delivering maximum performance for demanding workloads.
Key Responsibilities:
- Design, implement, and maintain HPC environments, including compute, storage, and network components.
- Configure and optimize workload managers/schedulers such as Slurm, PBS Pro for efficient job scheduling and resource allocation.
- Implement performance tuning for CPU, GPU, memory, I/O, and network subsystems to meet workload demands.
- Manage HPC filesystem solutions like Lustre, BeeGFS, or GPFS/Spectrum Scale.
Linux Administration:
- Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments.
- Manage kernel upgrades, patching, and security hardening.
- Troubleshoot kernel-level and system-level issues for performance and stability.
Automation & Configuration Management:
- Develop and maintain Ansible playbooks/roles for automated provisioning, configuration, and patching of HPC systems.
- Integrate Ansible with CI/CD pipelines for infrastructure as code (IaC) practices.
- Automate cluster deployment and ensure environment consistency across nodes.
Monitoring, Troubleshooting & Support:
- Implement and maintain monitoring tools like Grafana, Prometheus, Nagios.
- Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks.
- Provide Tier-3 escalation support for Linux/HPC-related incidents.
Collaboration & Documentation:
- Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions.
- Document architecture, SOPs, troubleshooting guides, and performance tuning methodologies.
Required Skills & Experience:
- 8-10 years of hands-on Linux system administration experience in production environments.
- 5+ years managing HPC clusters at scale (500+ cores / multiple petabytes of storage).
- Strong Ansible automation skills, deep understanding of MPI, OpenMP, and GPU/accelerator integration in HPC workloads.
- Proficiency with HPC job schedulers (Slurm, PBS Pro, LSF), HPC storage (Lustre, BeeGFS, GPFS), TCP/IP networking, Infiniband, RDMA technologies.
- Experience with performance tuning and benchmarking tools, scripting proficiency in Bash, Python, or Perl.
Preferred Qualifications:
- Experience with containerized HPC, familiarity with cloud-HPC integration, knowledge of security compliance standards.
- Contributions to HPC community tools or open-source projects.
Soft Skills:
- Strong problem-solving and analytical thinking.
- Ability to mentor junior engineers and collaborate across teams.
- Excellent communication skills for technical and non-technical stakeholders. As a Senior Engineer-HPC at our company, you will be an accomplished HPC Systems Engineer with over 10 years of enterprise Linux administration experience and more than 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Your expertise in designing, implementing, and optimizing HPC infrastructure will be crucial in delivering maximum performance for demanding workloads.
Key Responsibilities:
- Design, implement, and maintain HPC environments, including compute, storage, and network components.
- Configure and optimize workload managers/schedulers such as Slurm, PBS Pro for efficient job scheduling and resource allocation.
- Implement performance tuning for CPU, GPU, memory, I/O, and network subsystems to meet workload demands.
- Manage HPC filesystem solutions like Lustre, BeeGFS, or GPFS/Spectrum Scale.
Linux Administration:
- Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments.
- Manage kernel upgrades, patching, and security hardening.
- Troubleshoot kernel-level and system-level issues for performance and stability.
Automation & Configuration Management:
- Develop and maintain Ansible playbooks/roles for automated provisioning, configuration, and patching of HPC systems.
- Integrate Ansible with CI/CD pipelines for infrastructure as code (IaC) practices.
- Automate cluster deployment and ensure environment consistency across nodes.
Monitoring, Troubleshooting & Support:
- Implement and maintain monitoring tools like Grafana, Prometheus, Nagios.
- Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks.
- Provide Tier-3 escalation support for Linux/HPC-related incidents.
Collaboration & Documentation:
- Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions.
- Document architecture, SOPs
Skills Required
Linux system administration
MPI
OpenMP
Infiniband
Performance tuning
Bash scripting
Perl scripting
Mentoring
Communication skills
HPC clusters management
Ansible automation
GPUaccelerator integration
HPC schedulers
HPC storage
TCPIP networking
RDMA technologies
Python scripting
Containerized HPC
CloudHPC integration
Security compliance standards
Problemsolving
Analytical thinking
Posted on: March 16, 2026
Relevant Jobs
Step 2 of 2