Agentic AI Engineer

beBee Careers

Varanasi, Uttar Pradesh

Full–time

8 days ago

Full–time

PythonJavaScriptAI TestingLLMAutomation

Job Description

Role Overview

We are seeking a detail-oriented and proactive Agentic AI Engineer to join our AI product team in Bangalore. The ideal candidate will be responsible for testing efforts related to goal-driven LLM workflows, emphasizing system robustness, safety, and reliability.

Responsibilities

Design, execute, and automate test plans for agentic AI workflows (tool usage, planning, state transitions).
Validate AI behaviors across prompts, subgoal decomposition, retries, and multi-step tool chains.
Identify and troubleshoot issues such as hallucinations, inconsistent responses, planning and memory failures, tool invocation errors, latency or cost inefficiencies.
Evaluate model robustness, safety alignment, and adherence to business rules.
Integrate evaluation metrics like factuality, coherence, completeness, and toxicity into test suites.
Analyze prompt behavior, trace tool execution paths, and maintain test logs and audit trails.
Run structured experiments to evaluate changes to prompts, tools, or workflow logic.
Maintain regression pipelines and CI/CD integration for agent QA workflows.

Requirements

3+ years of QA or test automation experience, preferably in AI, ML, or agent-driven systems.
Proficiency in Python or JavaScript testing frameworks (e.g., Pytest, Playwright).
Strong grasp of REST APIs, JSON workflows, and debugging distributed systems.
Familiarity with LLM-based systems and orchestration tools (LangChain, OpenAI tool calling, CrewAI).
Experience evaluating AI output quality using metrics like precision, hallucination, and contextual correctness.
Exposure to AI observability or evaluation platforms such as LangSmith, Trulens, or similar.

Skills

Python
JavaScript
AI Testing
REST APIs
LLM Orchestration

Job Highlights

Role: AI Engineer focused on agentic workflows
Location: Bangalore
Experience: 3+ years in QA/test automation (AI/ML)
Skills: Python, JavaScript, REST APIs, LLM systems, testing frameworks
Focus: System robustness, safety, multi-agent coordination, evaluation metrics