Pilotcrew AI Logo

AI Agent Performance & Reliability Engineer

Pilotcrew AI

All India, Delhi • 1 month ago

Experience: 2 to 6 Yrs

PREMIUM
Deal of the Day --:--:--

15 Days Free Trial

After Free Trial → Flat 50% OFF

Upgrade to CVX24 Premium

Offer Announcement Banner
  • Free Resume Writing
  • Get a Verified Blue tick
  • See who viewed your profile
  • Unlimited chat with recruiters
  • Rank higher in recruiter searches
  • Get up to 10× more recruiter visibility
  • Auto-forward profile to 10 top recruiters
  • Receive verified recruiter messages directly
  • Unlock hidden jobs, not visible to free users

A small token amount will be charged to verify. Get Refund in 48 Hours.
Free Earplugs Delivery Only after Payment of Rs. 99 for Five Consecutive Months.
After free-trial 6 Months subscription will be auto Activated @ $ 1 (Cancel Anytime). Quoted price includes 50% discount.

Job Description

As a Machine Learning Engineer at Pilotcrew AI, you will be responsible for designing and building scalable evaluation infrastructure for Large Language Models (LLMs) and AI agents. Your role will involve architecting distributed inference pipelines, implementing automated benchmarking systems, developing adversarial testing frameworks, and optimizing inference for latency, cost, and throughput. Key Responsibilities: - Design and implement distributed LLM inference pipelines - Build automated benchmarking systems for reasoning, planning, and tool use - Implement pass@k, reliability metrics, variance analysis, and statistical confidence evaluation - Develop adversarial testing frameworks for stress-testing agents - Create structured evaluation pipelines (rule-based and model-based graders) - Build trace capture, logging, and telemetry systems for multi-step agent workflows - Validate tool calls and sandboxed execution environments - Optimize inference for latency, cost, and throughput - Manage dataset versioning and reproducible benchmark pipelines - Deploy and monitor GenAI systems in production (AWS/GCP/Azure) Qualifications Required: - Strong Python programming and system design skills - Hands-on experience with Generative AI systems and LLM APIs - Experience with PyTorch or TensorFlow - Experience building production ML or GenAI systems - Strong understanding of decoding strategies, temperature effects, and sampling variance - Familiarity with async processing, distributed task execution, or job scheduling - Experience with Docker and cloud deployment - Strong debugging, observability, and reliability engineering mindset Additional Company Details: Pilotcrew AI builds infrastructure for AI Agent Evaluation, benchmarking large language models, running automated agent evaluations, and hosting AI arenas for competitive testing. The company's mission is to make AI agents measurable, reliable, and production-ready through structured, scalable evaluation systems. Why Join Pilotcrew AI: - Work on cutting-edge AI agent evaluation infrastructure - Solve real-world GenAI reliability challenges - High technical ownership and autonomy - Opportunity to shape how AI agents are benchmarked at scale As a Machine Learning Engineer at Pilotcrew AI, you will be responsible for designing and building scalable evaluation infrastructure for Large Language Models (LLMs) and AI agents. Your role will involve architecting distributed inference pipelines, implementing automated benchmarking systems, developing adversarial testing frameworks, and optimizing inference for latency, cost, and throughput. Key Responsibilities: - Design and implement distributed LLM inference pipelines - Build automated benchmarking systems for reasoning, planning, and tool use - Implement pass@k, reliability metrics, variance analysis, and statistical confidence evaluation - Develop adversarial testing frameworks for stress-testing agents - Create structured evaluation pipelines (rule-based and model-based graders) - Build trace capture, logging, and telemetry systems for multi-step agent workflows - Validate tool calls and sandboxed execution environments - Optimize inference for latency, cost, and throughput - Manage dataset versioning and reproducible benchmark pipelines - Deploy and monitor GenAI systems in production (AWS/GCP/Azure) Qualifications Required: - Strong Python programming and system design skills - Hands-on experience with Generative AI systems and LLM APIs - Experience with PyTorch or TensorFlow - Experience building production ML or GenAI systems - Strong understanding of decoding strategies, temperature effects, and sampling variance - Familiarity with async processing, distributed task execution, or job scheduling - Experience with Docker and cloud deployment - Strong debugging, observability, and reliability engineering mindset Additional Company Details: Pilotcrew AI builds infrastructure for AI Agent Evaluation, benchmarking large language models, running automated agent evaluations, and hosting AI arenas for competitive testing. The company's mission is to make AI agents measurable, reliable, and production-ready through structured, scalable evaluation systems. Why Join Pilotcrew AI: - Work on cutting-edge AI agent evaluation infrastructure - Solve real-world GenAI reliability challenges - High technical ownership and autonomy - Opportunity to shape how AI agents are benchmarked at scale

Posted on: March 16, 2026

Relevant Jobs

Remote Senior Project Manager - IT Infrastructure

WhatJobs Direct

All India, Chennai

View Job →

Backend Architect - Magento Platform

Xebia IT Architects India Pvt Ltd

All India, Gurugram

View Job →

DevOps Release & Deployment Architect

Sureify

All India, Hyderabad

View Job →

MEAN/MERN Stack Developer

Bigscal Technologies Pvt Ltd.

All India

View Job →

Dot Net Full stack- Senior

Ernst & Young

All India, Chennai

View Job →

Security Architect

GlobalLogic

Hyderabad

View Job →

Software Engineer II - .Net Core Frameworks

Sampoorna Consultants Pvt. Ltd

All India, Chennai

View Job →

Software Engineer II - .Net Core Frameworks

Sampoorna Consultants Pvt. Ltd

All India, Chennai

View Job →

AI Agent DevOps engineer

Chuwa America Corporation

All India, Hyderabad

View Job →

Security Architect

GlobalLogic

Hyderabad

View Job →