ML Research Engineer (Inference)
Important Group
All India • 3 weeks ago
Experience: 1 to 5 Yrs
PREMIUM
Deal of the Day
--:--:--
15 Days Free Trial
Upgrade to CVX24 Premium
- Free Resume Writing
-
Get a Verified Blue tick
- See who viewed your profile
- Unlimited chat with recruiters
- Rank higher in recruiter searches
- Get up to 10× more recruiter visibility
- Auto-forward profile to 10 top recruiters
- Receive verified recruiter messages directly
- Unlock hidden jobs, not visible to free users
$0
Activate
$0
A small token amount will be charged to verify.
Get Refund in 48 Hours.
After free-trial 6 Months subscription will be auto Activated @ $2.49 (Cancel Anytime).
Free Bluetooth earphones with 6 Months subscription only.
Enter Your Details
Job Description
Role Overview:
You will be a Research Engineer on the Inference ML team at Cerebras Systems, where you will be responsible for adapting advanced language and vision models to efficiently run on the flagship Cerebras architecture. Working closely with ML researchers and engineers, you will design, prototype, validate, and optimize models to push the boundaries of inference research on the world's fastest AI accelerator.
Key Responsibilities:
- Implement and adapt transformer-based models (NLP and/or vision) to operate on Cerebras hardware
- Assist in optimizing models for inference performance, focusing on latency and throughput
- Conduct experiments, analyze outcomes, and contribute to model enhancements
- Assist in setting up and validating models on the Cerebras system
- Troubleshoot and debug model or system issues under the guidance of senior team members
- Support profiling and performance analysis utilizing internal tools
- Collaborate with cross-functional teams (ML, software, hardware) for model integration
Qualifications Required:
- Bachelors or Masters degree in Computer Science, Engineering, or a related field
- 13 years of experience in software engineering or machine learning, including internships
- Proficiency in Python and at least one ML framework (e.g., PyTorch, Transformers, vLLM or SGLang)
- Understanding of deep learning concepts such as neural networks and transformers
- Experience with Generative AI and Machine Learning systems
- Strong programming skills in Python and/or C++
Additional Details:
Cerebras Systems is known for building the world's largest AI chip, 56 times larger than GPUs. Their wafer-scale architecture provides significant AI compute power on a single chip, simplifying programming and delivering industry-leading training and inference speeds. Cerebras' collaboration with OpenAI and their commitment to groundbreaking technology make it an exciting place to work and contribute to cutting-edge advancements in the AI industry.
If you are passionate about AI research and want to work on cutting-edge technology that is revolutionizing the industry, Cerebras Systems offers a unique opportunity to be part of a dynamic team that values innovation and collaboration. Apply now and be at the forefront of groundbreaking advancements in AI! Role Overview:
You will be a Research Engineer on the Inference ML team at Cerebras Systems, where you will be responsible for adapting advanced language and vision models to efficiently run on the flagship Cerebras architecture. Working closely with ML researchers and engineers, you will design, prototype, validate, and optimize models to push the boundaries of inference research on the world's fastest AI accelerator.
Key Responsibilities:
- Implement and adapt transformer-based models (NLP and/or vision) to operate on Cerebras hardware
- Assist in optimizing models for inference performance, focusing on latency and throughput
- Conduct experiments, analyze outcomes, and contribute to model enhancements
- Assist in setting up and validating models on the Cerebras system
- Troubleshoot and debug model or system issues under the guidance of senior team members
- Support profiling and performance analysis utilizing internal tools
- Collaborate with cross-functional teams (ML, software, hardware) for model integration
Qualifications Required:
- Bachelors or Masters degree in Computer Science, Engineering, or a related field
- 13 years of experience in software engineering or machine learning, including internships
- Proficiency in Python and at least one ML framework (e.g., PyTorch, Transformers, vLLM or SGLang)
- Understanding of deep learning concepts such as neural networks and transformers
- Experience with Generative AI and Machine Learning systems
- Strong programming skills in Python and/or C++
Additional Details:
Cerebras Systems is known for building the world's largest AI chip, 56 times larger than GPUs. Their wafer-scale architecture provides significant AI compute power on a single chip, simplifying programming and delivering industry-leading training and inference speeds. Cerebras' collaboration with OpenAI and their commitment to groundbreaking technology make it an exciting place to work and contribute to cutting-edge advancements in the AI industry.
If you are passionate about AI research and want to work on cutting-edge technology that is revolutionizing the industry, Cerebras Systems offers a unique opportunity to be part of a dynamic team that values innovation and collaboration. Apply now and be at the forefront of groundbreaking advancements in AI!
Skills Required
Python
Transformers
neural networks
transformers
ML framework eg
PyTorch
vLLM
SGLang
deep learning concepts eg
Generative AI
Machine Learning systems
speculative decoding
neural network pruning
compression
sparse attention
quantization
sparsity
posttraining techniques
inferencefocused evaluations
Linux environments
Posted on: April 12, 2026
Relevant Jobs
Step 2 of 2