AI Agent Performance & Reliability Engineer

Pilotcrew AI

All India, Delhi • 1 month ago

Experience: 2 to 6 Yrs

PREMIUM

Deal of the Day --:--:--

15 Days Free Trial

After Free Trial → Flat 50% OFF

Upgrade to CVX24 Premium

Free Resume Writing
Get a Verified Blue tick
See who viewed your profile
Unlimited chat with recruiters
Rank higher in recruiter searches
Get up to 10× more recruiter visibility
Auto-forward profile to 10 top recruiters
Receive verified recruiter messages directly
Unlock hidden jobs, not visible to free users

Activate

A small token amount will be charged to verify. Get Refund in 48 Hours.
Free Earplugs Delivery Only after Payment of Rs. 99 for Five Consecutive Months.
After free-trial 6 Months subscription will be auto Activated @ $ 1 (Cancel Anytime). Quoted price includes 50% discount.

Job Description

As a Machine Learning Engineer at Pilotcrew AI, you will be responsible for designing and building scalable evaluation infrastructure for Large Language Models (LLMs) and AI agents. Your role will involve architecting distributed inference pipelines, implementing automated benchmarking systems, developing adversarial testing frameworks, and optimizing inference for latency, cost, and throughput. Key Responsibilities: - Design and implement distributed LLM inference pipelines - Build automated benchmarking systems for reasoning, planning, and tool use - Implement pass@k, reliability metrics, variance analysis, and statistical confidence evaluation - Develop adversarial testing frameworks for stress-testing agents - Create structured evaluation pipelines (rule-based and model-based graders) - Build trace capture, logging, and telemetry systems for multi-step agent workflows - Validate tool calls and sandboxed execution environments - Optimize inference for latency, cost, and throughput - Manage dataset versioning and reproducible benchmark pipelines - Deploy and monitor GenAI systems in production (AWS/GCP/Azure) Qualifications Required: - Strong Python programming and system design skills - Hands-on experience with Generative AI systems and LLM APIs - Experience with PyTorch or TensorFlow - Experience building production ML or GenAI systems - Strong understanding of decoding strategies, temperature effects, and sampling variance - Familiarity with async processing, distributed task execution, or job scheduling - Experience with Docker and cloud deployment - Strong debugging, observability, and reliability engineering mindset Additional Company Details: Pilotcrew AI builds infrastructure for AI Agent Evaluation, benchmarking large language models, running automated agent evaluations, and hosting AI arenas for competitive testing. The company's mission is to make AI agents measurable, reliable, and production-ready through structured, scalable evaluation systems. Why Join Pilotcrew AI: - Work on cutting-edge AI agent evaluation infrastructure - Solve real-world GenAI reliability challenges - High technical ownership and autonomy - Opportunity to shape how AI agents are benchmarked at scale As a Machine Learning Engineer at Pilotcrew AI, you will be responsible for designing and building scalable evaluation infrastructure for Large Language Models (LLMs) and AI agents. Your role will involve architecting distributed inference pipelines, implementing automated benchmarking systems, developing adversarial testing frameworks, and optimizing inference for latency, cost, and throughput. Key Responsibilities: - Design and implement distributed LLM inference pipelines - Build automated benchmarking systems for reasoning, planning, and tool use - Implement pass@k, reliability metrics, variance analysis, and statistical confidence evaluation - Develop adversarial testing frameworks for stress-testing agents - Create structured evaluation pipelines (rule-based and model-based graders) - Build trace capture, logging, and telemetry systems for multi-step agent workflows - Validate tool calls and sandboxed execution environments - Optimize inference for latency, cost, and throughput - Manage dataset versioning and reproducible benchmark pipelines - Deploy and monitor GenAI systems in production (AWS/GCP/Azure) Qualifications Required: - Strong Python programming and system design skills - Hands-on experience with Generative AI systems and LLM APIs - Experience with PyTorch or TensorFlow - Experience building production ML or GenAI systems - Strong understanding of decoding strategies, temperature effects, and sampling variance - Familiarity with async processing, distributed task execution, or job scheduling - Experience with Docker and cloud deployment - Strong debugging, observability, and reliability engineering mindset Additional Company Details: Pilotcrew AI builds infrastructure for AI Agent Evaluation, benchmarking large language models, running automated agent evaluations, and hosting AI arenas for competitive testing. The company's mission is to make AI agents measurable, reliable, and production-ready through structured, scalable evaluation systems. Why Join Pilotcrew AI: - Work on cutting-edge AI agent evaluation infrastructure - Solve real-world GenAI reliability challenges - High technical ownership and autonomy - Opportunity to shape how AI agents are benchmarked at scale

Skills Required

System design Docker Debugging Reliability engineering Distributed systems Python programming Generative AI systems LLM APIs PyTorch TensorFlow Cloud deployment Observability AI agent architectures Reward modeling Evaluation science Vector databases Internal benchmarking platforms

Posted on: March 16, 2026

Relevant Jobs

Remote Senior Project Manager - IT Infrastructure

WhatJobs Direct

All India, Chennai

View Job →

Backend Architect - Magento Platform

Xebia IT Architects India Pvt Ltd

All India, Gurugram

View Job →

DevOps Release & Deployment Architect

Sureify

All India, Hyderabad

View Job →

MEAN/MERN Stack Developer

Bigscal Technologies Pvt Ltd.

All India

View Job →

Dot Net Full stack- Senior

Ernst & Young

All India, Chennai

View Job →

Security Architect

GlobalLogic

Hyderabad

View Job →

Software Engineer II - .Net Core Frameworks

Sampoorna Consultants Pvt. Ltd

All India, Chennai

View Job →

Software Engineer II - .Net Core Frameworks

Sampoorna Consultants Pvt. Ltd

All India, Chennai

View Job →

AI Agent DevOps engineer

Chuwa America Corporation

All India, Hyderabad

View Job →

Security Architect

GlobalLogic

Hyderabad

View Job →

AI Agent Performance & Reliability Engineer

15 Days Free Trial

Enter Your Details

Job Description

Skills Required

Relevant Jobs

Remote Senior Project Manager - IT Infrastructure

Backend Architect - Magento Platform

DevOps Release & Deployment Architect

MEAN/MERN Stack Developer

Dot Net Full stack- Senior

Security Architect

Software Engineer II - .Net Core Frameworks

Software Engineer II - .Net Core Frameworks

AI Agent DevOps engineer

Security Architect

Application Submitted

Your Professional Info

Login / Register Free