Within Assurance, we assist our clients to address the accounting and financial reporting challenges facing their business. You will be part of a team that provides insight and services that accelerate analytics, decision-making and innovation to build a stronger and more efficient finance function. You will experience ongoing professional development through diverse experiences, world-class learning and individually tailored coaching.
That is how we develop outstanding leaders who team to deliver on our promises to all of our stakeholders, and in so doing, play a critical role in building a better working world for our people, for our clients and for our communities.
Sound interesting? Well this is just the beginning. Because whenever you join, however long you stay, the exceptional EY experience lasts a lifetime.
The opportunity
As a Senior QA Engineer - AI Systems in the AI Hub, you will test, evaluate, and improve AI products before they reach clients and production. You will validate end-to-end AI solutions (LLMs, agents, RAG, OCR/document intelligence, traditional ML, prompts/tools, and data pipelines) and the human workflows around them.
Your key responsibilities
You will partner with AI engineers, full stack/data/DevOps teams, product and security to define quality and risk, build evaluation approaches, surface failure modes, and ensure solutions are reliable, safe, explainable, and enterprise-ready.
Skills and attributes for success
Define and execute AI QA strategies for PoCs, MVPs, and production (models, prompts, retrieval, agents/tools, data flows, user workflows, ops)
Translate business needs into measurable quality criteria, evaluation metrics, acceptance tests, and release gates
Build and maintain eval assets: golden answers, synthetic/edge-case libraries, and regression suites for LLM/RAG/OCR/agentic/ML systems
LLM/GenAI testing: response quality, grounding, safety/refusals, bias, privacy leakage, and prompt adherence
RAG testing: ingestion/OCR & metadata quality, chunking/indexing, retrieval relevance, citations, and evidence traceability
Agent testing: tool selection/call accuracy, permissions, multi-step task completion, state/error handling, escalation, and safeguards
OCR/document intelligence testing: extraction accuracy (incl. tables/layout/multilingual), confidence scores, and downstream impact
Traditional ML testing: data/features/labels, performance & thresholds, drift, calibration, monitoring, and retraining readiness
Adversarial/negative testing: prompt injection (direct/indirect), jailbreaks, sensitive data disclosure, unsafe outputs, poisoning/manipulation risks
Integrate evaluations into CI/CD (automated gates, regression runs, dashboards, and monitoring feedback loops)
Use traces/logs/retrieved context/tool calls and feedback to debug failures and recommend improvements
Validate behavior across personas, languages, domains, data quality, ambiguity, and client constraints
Production readiness: observability/alerts, HITL review & fallbacks, cost/tokens, latency, and reliability under load
Apply standard QA where useful (API, integration, UI, performance, E2E) while prioritizing AI quality and risk
Collaborate with product/UX/engineering/security and clients to ensure solutions are trustworthy and aligned to business intent
Document test plans, eval results, limitations, failure modes, risk decisions, and release recommendations (audit-friendly)
To qualify for the role you must have
7+ years in QA engineering, test automation, AI/ML evaluation, data quality, or related roles
Hands-on testing of AI/ML/GenAI/OCR/data solutions beyond traditional UI/API checks
Working knowledge of LLMs, prompts, embeddings/vector search, RAG, agents, OCR/document extraction, and model-serving APIs
Test strategy for probabilistic/context-dependent systems and non-deterministic outputs
Build and maintain eval datasets (golden sets, adversarial/edge cases) and regression suites
Evaluate quality with fit-for-purpose metrics (e.g., groundedness/relevance, task success, hallucinations, OCR accuracy, precision/recall, latency, cost)
Experience with test automation using Python, TypeScript/JavaScript, or similar scripting languages
Familiarity with Azure AI/Azure OpenAI/Azure AI Foundry/Document Intelligence or similar platforms
Understanding of responsible AI and enterprise risk (safety, privacy, bias/fairness, explainability)
Experience in iterative delivery teams (PoCs/MVPs to production)
Ideally you’ll also have
Experience with AI evaluation tools or frameworks such as Azure AI Foundry evaluators, promptfoo, Ragas, DeepEval, MLflow, LangSmith, Weights & Biases, or similar tools
Experience testing agentic systems built with frameworks such as LangChain, LangGraph, Semantic Kernel, AutoGen, OpenAI Agents SDK, or Microsoft Agent Framework
Experience with LLM-as-judge evaluation patterns, custom evaluators, human-in-the-loop review workflows, and evaluation rubric design
Experience with document-heavy AI systems, OCR quality assessment, Arabic/English content, financial/regulatory documents, or enterprise knowledge bases
Experience with responsible AI, AI governance, NIST AI RMF, OWASP Top 10 for LLM Applications, OWASP Machine Learning Security Top 10, or similar AI risk frameworks
Experience with cybersecurity or AI red-teaming techniques, especially prompt injection, data leakage, excessive agency, unsafe tool use, vector/embedding weaknesses, and model manipulation risks
Experience with Azure Monitor, Application Insights, OpenTelemetry traces, or production AI observability dashboards
Experience with Playwright, Cypress, Postman, pytest, Vitest, or other automation tools used to support end-to-end quality gates
Experience supporting regulated or security-sensitive environments where privacy, auditability, access control, and data residency matter
GenAI engineering exposure (prompting, RAG design, model selection, eval harnesses, and secure deployment patterns)
What we look for
We are interested in entrepreneurs who have the confidence to develop and promote a brand new strategic vision both internally and externally. You will be business savvy with a passion for innovation as well as the motivation to create your own EY journey.
What we offer
We offer a competitive compensation package where you’ll be rewarded based on performance and recognized for the value you bring to our business. Plus, we offer:
If you can demonstrate that you meet the criteria above, please contact us as soon as possible.
The exceptional EY experience. It’s yours to build.
EY | Building a better working world
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.
Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate.
Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today.