CtrlK

QA Engineer - AI Systems (Senior Consultant) - AI Hub - Amman at EY (Amman) · CodeHub Jobs

Job description

Within Assurance, we assist our clients to address the accounting and financial reporting challenges facing their business. You will be part of a team that provides insight and services that accelerate analytics, decision-making and innovation to build a stronger and more efficient finance function. You will experience ongoing professional development through diverse experiences, world-class learning and individually tailored coaching.

That is how we develop outstanding leaders who team to deliver on our promises to all of our stakeholders, and in so doing, play a critical role in building a better working world for our people, for our clients and for our communities.

Sound interesting? Well this is just the beginning. Because whenever you join, however long you stay, the exceptional EY experience lasts a lifetime.

The opportunity

As a Senior QA Engineer - AI Systems in the AI Hub, you will test, evaluate, and improve AI products before they reach clients and production. You will validate end-to-end AI solutions (LLMs, agents, RAG, OCR/document intelligence, traditional ML, prompts/tools, and data pipelines) and the human workflows around them.

Your key responsibilities

You will partner with AI engineers, full stack/data/DevOps teams, product and security to define quality and risk, build evaluation approaches, surface failure modes, and ensure solutions are reliable, safe, explainable, and enterprise-ready.

Skills and attributes for success

Define and execute AI QA strategies for PoCs, MVPs, and production (models, prompts, retrieval, agents/tools, data flows, user workflows, ops)

Translate business needs into measurable quality criteria, evaluation metrics, acceptance tests, and release gates

Build and maintain eval assets: golden answers, synthetic/edge-case libraries, and regression suites for LLM/RAG/OCR/agentic/ML systems

LLM/GenAI testing: response quality, grounding, safety/refusals, bias, privacy leakage, and prompt adherence

RAG testing: ingestion/OCR & metadata quality, chunking/indexing, retrieval relevance, citations, and evidence traceability

Agent testing: tool selection/call accuracy, permissions, multi-step task completion, state/error handling, escalation, and safeguards

OCR/document intelligence testing: extraction accuracy (incl. tables/layout/multilingual), confidence scores, and downstream impact

Traditional ML testing: data/features/labels, performance & thresholds, drift, calibration, monitoring, and retraining readiness

Adversarial/negative testing: prompt injection (direct/indirect), jailbreaks, sensitive data disclosure, unsafe outputs, poisoning/manipulation risks

Integrate evaluations into CI/CD (automated gates, regression runs, dashboards, and monitoring feedback loops)

Use traces/logs/retrieved context/tool calls and feedback to debug failures and recommend improvements

Validate behavior across personas, languages, domains, data quality, ambiguity, and client constraints

Production readiness: observability/alerts, HITL review & fallbacks, cost/tokens, latency, and reliability under load

Apply standard QA where useful (API, integration, UI, performance, E2E) while prioritizing AI quality and risk

Collaborate with product/UX/engineering/security and clients to ensure solutions are trustworthy and aligned to business intent

Document test plans, eval results, limitations, failure modes, risk decisions, and release recommendations (audit-friendly)

To qualify for the role you must have

- 7+ years in QA engineering, test automation, AI/ML evaluation, data quality, or related roles
- Hands-on testing of AI/ML/GenAI/OCR/data solutions beyond traditional UI/API checks
- Working knowledge of LLMs, prompts, embeddings/vector search, RAG, agents, OCR/document extraction, and model-serving APIs
- Test strategy for probabilistic/context-dependent systems and non-deterministic outputs
- Build and maintain eval datasets (golden sets, adversarial/edge cases) and regression suites
- Evaluate quality with fit-for-purpose metrics (e.g., groundedness/relevance, task success, hallucinations, OCR accuracy, precision/recall, latency, cost)
- Experience with test automation using Python, TypeScript/JavaScript, or similar scripting languages
- Familiarity with Azure AI/Azure OpenAI/Azure AI Foundry/Document Intelligence or similar platforms
- Understanding of responsible AI and enterprise risk (safety, privacy, bias/fairness, explainability)
- Experience in iterative delivery teams (PoCs/MVPs to production)

Ideally you’ll also have

- Experience with AI evaluation tools or frameworks such as Azure AI Foundry evaluators, promptfoo, Ragas, DeepEval, MLflow, LangSmith, Weights & Biases, or similar tools
- Experience testing agentic systems built with frameworks such as LangChain, LangGraph, Semantic Kernel, AutoGen, OpenAI Agents SDK, or Microsoft Agent Framework
- Experience with LLM-as-judge evaluation patterns, custom evaluators, human-in-the-loop review workflows, and evaluation rubric design
- Experience with document-heavy AI systems, OCR quality assessment, Arabic/English content, financial/regulatory documents, or enterprise knowledge bases
- Experience with responsible AI, AI governance, NIST AI RMF, OWASP Top 10 for LLM Applications, OWASP Machine Learning Security Top 10, or similar AI risk frameworks
- Experience with cybersecurity or AI red-teaming techniques, especially prompt injection, data leakage, excessive agency, unsafe tool use, vector/embedding weaknesses, and model manipulation risks
- Experience with Azure Monitor, Application Insights, OpenTelemetry traces, or production AI observability dashboards
- Experience with Playwright, Cypress, Postman, pytest, Vitest, or other automation tools used to support end-to-end quality gates
- Experience supporting regulated or security-sensitive environments where privacy, auditability, access control, and data residency matter
- GenAI engineering exposure (prompting, RAG design, model selection, eval harnesses, and secure deployment patterns)

What we look for

We are interested in entrepreneurs who have the confidence to develop and promote a brand new strategic vision both internally and externally. You will be business savvy with a passion for innovation as well as the motivation to create your own EY journey.

What we offer

We offer a competitive compensation package where you’ll be rewarded based on performance and recognized for the value you bring to our business. Plus, we offer:

Continuous learning: You’ll develop the mindset and skills to navigate whatever comes next.
Success as defined by you: We’ll provide the tools and flexibility, so you can make a meaningful impact, your way.
Transformative leadership: We’ll give you the insights, coaching and confidence to be the leader the world needs.
Diverse and inclusive culture: You’ll be embraced for who you are and empowered to use your voice to help others find theirs.

If you can demonstrate that you meet the criteria above, please contact us as soon as possible.

The exceptional EY experience. It’s yours to build.

EY | Building a better working world

EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.

Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate.

Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today.

QA Engineer - AI Systems (Senior Consultant) - AI Hub - Amman

Similar roles

Job description

Sustainability Officer

Finance Coordinator Palestine Amman Hub

Lead Software QA Engineer

Customer Service Technical Support Representative