AI Quality Analyst
AI Quality Analyst
Requirements
What You Will Be Doing
-
Architect Automated Evaluation Frameworks: Design, implement, and maintain scalable evaluation pipelines (Evals) for LLMs and agent graphs using modern tooling like LangSmith, DeepEval, Ragas, or Opik.
-
Curate Ground-Truth Benchmarks: Collaborate with domain experts to build, version, and sanitize robust gold-standard datasets, synthetic evaluation profiles, and edge-case testing matrices reflecting real-world business scenarios.
-
Own Non-Deterministic Quality Tracking: Define, monitor, and enforce quality KPIs across multi-agent workflows—specifically focusing on tool-calling accuracy, intent-recognition safety, structured output formatting, and context-retrieval (RAG) precision.
-
Mitigate and Quantify Systemic Risk: Lead rigorous failure and hallucination analyses on production outputs. Implement structured LLM-as-Judge patterns, validation metrics, and guardrail heuristics while actively ensuring the judge profiles remain free of baseline evaluation bias.
-
Enforce CI/CD Evaluation Gates: Partner directly with MLOps and Backend Engineering teams to integrate automated testing gates into our deployment pipelines, proactively preventing regressions or behavioral drifts from reaching production runtime environments.
-
Drive Optimization for Latency & Cost: Regularly analyze the efficiency of prompt templates, few-shot structures, and model selections (e.g., GPT, Claude, LLaMA) to ensure a highly calibrated balance between execution throughput, sub-second latency, and platform compute costs.
Benefits
Make a genuine impact on the product
Join our upward trajectory, and grow with us. We provide the resources and opportunities for continuous personal and professional development, empowering you to make a genuine impact on our evolving product.
Work in the EU
Embark on this exciting journey with us and enjoy the flexibility of traveling and working remotely or in a hybrid model across Europe.
Become a stock options holder
Unlock your inner entrepreneur and align your aspirations with ours through our Stock Options Program. This exciting opportunity is available to every team member, from junior team members to our founders.
Receive unwavering support and care
Finom stands by you at every step, embodying our commitment to your well-being and success reflected in our modern, friendly, and eco-conscious corporate culture. We offer constant support and care to ensure your Finom experience is successful and fulfilling.
Work & Swim program
Immerse yourself in our exclusive Work & Swim Program. Spend one month in a comfortable corporate apartment in enchanting Cyprus. It's the ideal opportunity to strike the perfect work-life balance while enjoying breathtaking Mediterranean views.
Equal Opportunity Statement
At Finom, we're an equal opportunity employer and value diversity at our company. We embrace diversity and invite applications from all walks of life. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, disability status, or other applicable legally protected characteristics.
Original Advert
Who You Are
-
A Data-Savvy Automation Advocate: You possess strong software engineering fundamentals and concrete Python coding experience, allowing you to seamlessly script custom evaluation routines and query multi-tenant databases.
-
An Analytical Thinker with an AI Lens: You understand that testing non-deterministic LLMs requires a completely different mindset than traditional QA. You possess deep intuition for token behaviors, retrieval dynamics, prompt engineering nuances, and failure states.
-
Radically Autonomous & Collaborative: You do not wait around for static technical specifications. You independently coordinate syncs with AI leads, domain backend engineers, and product stakeholders to identify and patch system vulnerabilities.
-
Rigorously Quality-Oriented: You hold a low ego but maintain high standards for system stability. You are deeply passionate about separating market hype from practical, measurable production metrics.
Application managed by Finom