Netweb Technologies India Ltd. Logo

Senior HPC Engineer

Netweb Technologies India Ltd.

All India, Faridabad • 1 month ago

Experience: 8 to 12 Yrs

PREMIUM
Deal of the Day --:--:--

A recruiter messaged CVX24 Premium users few seconds ago.

Upgrade to CVX24 Premium: Only $2.49

Bluetooth Earphone
  • Free Resume Writing
  • Get a Verified Blue tick
  • See who viewed your profile
  • Unlimited chat with recruiters
  • Rank higher in recruiter searches
  • Get up to 10× more recruiter visibility
  • Get practical interview tips and guidance
  • Receive verified recruiter messages directly
  • Unlock hidden jobs, not visible to free users
$4.99 $2.49 🔥 50% OFF
Activate
Bluetooth Earphone

(Validity: 6 Months. After payment confirmation we will reach out to you)

Job Description

As a Senior Engineer-HPC at our company, you will be an accomplished HPC Systems Engineer with over 10 years of enterprise Linux administration experience and more than 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Your expertise in designing, implementing, and optimizing HPC infrastructure will be crucial in delivering maximum performance for demanding workloads. Key Responsibilities: - Design, implement, and maintain HPC environments, including compute, storage, and network components. - Configure and optimize workload managers/schedulers such as Slurm, PBS Pro for efficient job scheduling and resource allocation. - Implement performance tuning for CPU, GPU, memory, I/O, and network subsystems to meet workload demands. - Manage HPC filesystem solutions like Lustre, BeeGFS, or GPFS/Spectrum Scale. Linux Administration: - Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments. - Manage kernel upgrades, patching, and security hardening. - Troubleshoot kernel-level and system-level issues for performance and stability. Automation & Configuration Management: - Develop and maintain Ansible playbooks/roles for automated provisioning, configuration, and patching of HPC systems. - Integrate Ansible with CI/CD pipelines for infrastructure as code (IaC) practices. - Automate cluster deployment and ensure environment consistency across nodes. Monitoring, Troubleshooting & Support: - Implement and maintain monitoring tools like Grafana, Prometheus, Nagios. - Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks. - Provide Tier-3 escalation support for Linux/HPC-related incidents. Collaboration & Documentation: - Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions. - Document architecture, SOPs, troubleshooting guides, and performance tuning methodologies. Required Skills & Experience: - 8-10 years of hands-on Linux system administration experience in production environments. - 5+ years managing HPC clusters at scale (500+ cores / multiple petabytes of storage). - Strong Ansible automation skills, deep understanding of MPI, OpenMP, and GPU/accelerator integration in HPC workloads. - Proficiency with HPC job schedulers (Slurm, PBS Pro, LSF), HPC storage (Lustre, BeeGFS, GPFS), TCP/IP networking, Infiniband, RDMA technologies. - Experience with performance tuning and benchmarking tools, scripting proficiency in Bash, Python, or Perl. Preferred Qualifications: - Experience with containerized HPC, familiarity with cloud-HPC integration, knowledge of security compliance standards. - Contributions to HPC community tools or open-source projects. Soft Skills: - Strong problem-solving and analytical thinking. - Ability to mentor junior engineers and collaborate across teams. - Excellent communication skills for technical and non-technical stakeholders. As a Senior Engineer-HPC at our company, you will be an accomplished HPC Systems Engineer with over 10 years of enterprise Linux administration experience and more than 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Your expertise in designing, implementing, and optimizing HPC infrastructure will be crucial in delivering maximum performance for demanding workloads. Key Responsibilities: - Design, implement, and maintain HPC environments, including compute, storage, and network components. - Configure and optimize workload managers/schedulers such as Slurm, PBS Pro for efficient job scheduling and resource allocation. - Implement performance tuning for CPU, GPU, memory, I/O, and network subsystems to meet workload demands. - Manage HPC filesystem solutions like Lustre, BeeGFS, or GPFS/Spectrum Scale. Linux Administration: - Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments. - Manage kernel upgrades, patching, and security hardening. - Troubleshoot kernel-level and system-level issues for performance and stability. Automation & Configuration Management: - Develop and maintain Ansible playbooks/roles for automated provisioning, configuration, and patching of HPC systems. - Integrate Ansible with CI/CD pipelines for infrastructure as code (IaC) practices. - Automate cluster deployment and ensure environment consistency across nodes. Monitoring, Troubleshooting & Support: - Implement and maintain monitoring tools like Grafana, Prometheus, Nagios. - Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks. - Provide Tier-3 escalation support for Linux/HPC-related incidents. Collaboration & Documentation: - Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions. - Document architecture, SOPs

Posted on: March 16, 2026

Relevant Jobs

Medical Copywriter

Thepharmadaily

All India

View Job →

QuickTV AI Video and Sound Editor (Contract)

Sharechat

All India

View Job →

Senior Designer- Electrical

Barry-Wehmiller

All India, Chennai

View Job →

Digital and print media artist

Stackular

All India, Hyderabad

View Job →

Director Brand Marketing

Upstox

All India

View Job →

Content and Social Media Marketing Internship

calmveda

All India, Delhi

View Job →

Social Media & Content Lead

FrugalTesting

All India

View Job →

Video Content Creator/Producer (Shoot & Edit)

alt.f coworking

All India, Gurugram

View Job →

Video Editing/Making - Internship

Animtopedia Private Limited

All India, Faridabad

View Job →

Senior Performance Marketer

Get Marketed

All India, Jaipur

View Job →