This is not a slide-making or prompt-engineering role. We are looking for someone who has built multi-agent AI systems that run in production - not demos, not pilots that died after a sprint. You will anchor AI delivery programs end-to-end, work directly with global clients, and stay sharp on a field that changes every few weeks.
You will report into and replicate the function of a senior AI delivery leader - which means you need both the depth to architect solutions and the presence to walk a CXO through what you built and why it works.
Delivery & Architecture
Own end-to-end delivery of AI-native programs - from architecture through production deployment
Design and build multi-agent orchestration systems using LangChain, LangGraph, CrewAI, or equivalent
Integrate agent systems with enterprise surfaces: APIs, ERPs, CRMs, data platforms - not toy datasets
Define agent topology: tool routing, memory strategy, state machines, fallback handling
Agentic Coding & Development
Run agentic coding workflows using Claude Code, Cursor, OpenAI Codex, or equivalent CLI tools
Lead projects where AI writes significant portions of the codebase - and you guide, review, and ship it
Work with CLAUDE.md, shared context frameworks, and multi-session agent setups for team use
Debug non-deterministic agent outputs systematically - not by gut feel
Client & Stakeholder Engagement
Translate business problems into agent architectures for global CXO-level stakeholders
Run discovery workshops, solution reviews, and delivery cadences with client teams
Prepare and present technical proposals, POC plans, and roadmaps - own the story end-to-end
Team & Practice
Mentor junior AI engineers; raise AI engineering quality across the delivery team
Stay current: evaluate new models, frameworks, and tooling before the hype catches up
Contribute to internal knowledge bases, reusable frameworks, and accelerators
Not just what you know. What you have shipped.
Deployed 2–3 agent-based systems in production - stateful, multi-step, real users
Used LangGraph for multi-agent orchestration with memory, tool routing, and state management
Built projects where AI (Claude Code, Codex, Cursor) wrote significant portions of the code
Implemented RAG pipelines end-to-end - chunking, embedding, retrieval, re-ranking, evaluation
Integrated agents with real enterprise APIs - not just OpenAI playground or sample data
Debugged a production agent failure - and fixed it without blaming the model
Can articulate when NOT to use agents - that is how we know you have built things
Experience with Claude Code CLI in team environments (CLAUDE.md, shared context, multi-session flows)
Familiarity with LangSmith for agent tracing, evaluation pipelines, and debugging at scale
Has shipped something using MCP (Model Context Protocol) or similar shared-context tooling
QA/testing mindset for agents - systematic evaluation of non-deterministic outputs
Background in IT services or consulting - managing client expectations while building
Experience with SLMs, fine-tuning, or on-device/edge agent deployment
Someone who lists LLMs on a resume but has only called the API in a Jupyter notebook
AI enthusiasts whose hands-on experience is less than a year old
People who explain everything in terms of frameworks they have never deployed
Consultants who can only narrate what others have built
Not a theory round.
Expect to walk through something you have actually built - architecture decisions, what broke in production, what you would do differently. If you cannot do that with specifics, this role is not the right fit.
Evaluation stages:
Stage 1 - Technical screen: Walk us through a live agent system you built
Stage 2 - Architecture discussion: Given a business problem, design an agent solution on the spot
Stage 3 - Stakeholder simulation: Present your approach to a non-technical executive audience
Agent Orchestration
LangChain, LangGraph, CrewAI - not just conceptual
Agentic Coding Tools
Claude Code CLI, Cursor, OpenAI Codex, Copilot
RAG & Vector Stores
Chroma, Weaviate, Pinecone - knows where RAG breaks
LLM APIs & SDKs
Anthropic, OpenAI, Gemini - prompt design, tool use
Python / TypeScript
Primary languages for agent + backend development
LangSmith / Observability
Tracing, evaluation, debugging agent runs
Cloud Platforms
Azure, AWS, GCP (at least one) - deployment, infra, managed services
API & System Integration
REST, gRPC, Kafka - enterprise integration patterns
MCP / Shared Context
Model Context Protocol, CLAUDE.md, Beads
Agent Evaluation
Testing non-deterministic outputs, guardrails, evals
CI/CD & DevOps
Git, containers, pipelines - agents need to ship
Client Communication
Can present architecture to a CXO without jargon