About Welvaart
At Welvaart, we create technology solutions that put people at the center.
Our close leadership style and flexible culture of growth empower our teams and elevate the quality of our delivery. We combine rigor, innovation, and empathy to drive projects that transform businesses and build lasting relationships of trust.
We complement this vision with a performance‑driven Digital Marketing offering, helping companies strengthen their visibility, enhance their online presence, and accelerate growth through smart, measurable strategies.
Project
You will define and maintain evaluation strategies for AI and LLM systems, creating and managing versioned datasets that cover core scenarios, edge cases, negative paths, and safety conditions. You will validate conversational behavior end to end—from intent recognition and slot extraction to state transitions, business rules, and tool or function-calling correctness.
You will play a key role in detecting regressions and evaluation drift as models or prompts evolve, defining meaningful metrics and thresholds (accuracy, precision, recall, F1), and providing clear quality signals to support release decisions. Working closely with QA and ML teams, you will integrate evaluation practices into CI/CD and help bring structure and determinism to inherently non-deterministic systems.
Role
We are looking for
What you can discover with us?
UNLEASH THE POWER OF YOUR CAREER