Career Level

L9 — Principal AI Eval Engineer

February 2026

“You don’t run evaluations. You build the organizations that run evaluations.”

At L9, you build eval organizations. Not teams — organizations. You hire, you train, you define the culture, you design the methodology, you set the standards, and you ensure that the organization produces consistently excellent evaluation work without depending on any single person — including you.

This is rare air. There aren’t many people in the world who can build an AI evaluation organization from zero. The discipline is too new. The talent is too scarce. The methodology is still being invented. You’re building the plane while flying it — hiring evaluators for a role that didn’t exist five years ago, training them in methods that are still evolving, and delivering work that clients and regulators trust.

Your output isn’t eval reports. It’s eval teams. When a company needs to stand up an AI evaluation function, you’re the person who makes it happen. You design the org structure, write the job descriptions, build the training curriculum, create the methodology playbooks, set the quality standards, and develop the culture that attracts rigorous people. Then you step back and watch it run.

What You Do

Build eval organizations — from zero. Org design, hiring plan, training curriculum, methodology library, tooling stack, process documentation. The full picture.
Hire and develop eval talent — find people with the right combination of statistical rigor, domain curiosity, and practical judgment. Develop them through the L1-L8 career track.
Define eval culture — set the values and norms that make an eval organization work. Intellectual honesty. Methodological rigor. Practical impact. No theater.
Strategic eval leadership — work with company leadership to align eval priorities with business strategy, regulatory requirements, and product roadmaps.
Methodology governance — ensure eval methodology stays current, rigorous, and practical across the organization. Update standards as the field evolves.
Client portfolio management — manage relationships across multiple clients, ensuring consistent eval quality and strategic alignment.
Industry leadership — represent Worca’s eval practice in the industry. Standards bodies, conferences, advisory boards.

AI Skills Required

AI-assisted org design — use AI to analyze organizational structures, identify optimal team configurations, and model growth scenarios
AI-powered talent assessment — design and refine AI-assisted evaluation systems for assessing eval talent across the career ladder
Methodology lifecycle management — use AI to track the evolving state of eval methodology, identify gaps, and prioritize research
Strategic AI analysis — use AI to analyze industry trends, regulatory changes, and competitive landscape to inform eval strategy
Scalable training systems — build AI-enhanced training programs that develop eval talent efficiently across the organization

Self-Evaluation Checklist

I’ve built an eval organization (or function) from zero — hiring, training, methodology, culture
The eval teams I’ve built produce high-quality work without my direct involvement
I’ve hired and developed 10+ evaluators across multiple career levels
The eval culture I’ve created attracts top talent — people want to join
I manage client relationships across a portfolio, not just individual engagements
I represent Worca’s eval practice externally — conferences, standards bodies, advisory roles
I’ve developed 2+ evaluators to L5 or above
My organizational systems (hiring, training, methodology, quality) are documented and replicable
Company leadership seeks my strategic guidance on AI evaluation priorities

Training Curriculum

Month 1-12: Organization Building

Org Design from Scratch — study successful eval organizations (internal and external). Design your own org structure, hiring plan, and growth model.
Hiring and Assessment — develop expertise in identifying eval talent. Build interview processes, assessment rubrics, and trial project designs.
Culture Engineering — study what makes eval organizations excellent. Intellectual honesty, methodological rigor, practical impact. Design the culture intentionally.
Training System Design — build a training curriculum that develops L1 trainees into L5+ senior evaluators. Scalable, measurable, effective.

Month 13-24: Organizational Excellence

Multi-Client Operations — manage eval delivery across multiple simultaneous clients. Quality control, resource allocation, priority management.
Talent Development Pipeline — build a system that consistently produces excellent evaluators. Not dependent on individual mentors.
Methodology Governance — design systems for keeping eval methodology current across the organization. Research intake, standard updates, methodology review cycles.
Organizational Metrics — define and track the metrics that matter for an eval organization. Not just client satisfaction — evaluator development, methodology quality, retention, innovation.

Month 25-36: Industry Leadership

Advisory Board Participation — join advisory boards for AI companies, standards bodies, or regulatory agencies.
Industry Thought Leadership — establish yourself as a recognized voice on AI evaluation. Publishing, speaking, advising at the industry level.
Partnership Development — build relationships that create opportunities for Worca’s eval practice. Companies, regulators, academic institutions.
L10 Portfolio — compile your organizational building record, industry impact, and leadership outcomes for Partner consideration.

Ranking Standard

Metric	Threshold	How It’s Measured
Organizations built	1+ eval organization or function built from zero	Portfolio review
Talent developed	10+ evaluators hired and developed	Career tracking
Senior talent	2+ evaluators developed to L5+	Rank records
Client portfolio	5+ active client relationships managed	Account records
Organizational independence	Teams produce quality work without direct involvement	Quality audits
Industry presence	Recognized authority in AI evaluation	External references

Promotion to L10

By Invitation Only

L10 — Worca Partner, AI Eval — is not applied for. It’s offered. The Worca partnership evaluates candidates based on:

Organizational legacy — have they built eval organizations that outlast their direct involvement?
Industry impact — have their standards, benchmarks, or methodologies shaped how the industry evaluates AI?
Talent multiplication — have they developed senior eval leaders who themselves develop others?
Strategic vision — do they see where AI evaluation is going and have they positioned Worca to lead?
Cultural contribution — have they shaped Worca’s identity as an AI evaluation authority?

There is no timeline. There is no checklist. The partnership knows when someone is ready.

Mentorship at This Level

You receive: Worca Partner mentorship and strategic advisory.
You give: Mentorship across all levels, with focus on developing L7-L8 eval architects.
Referral cut: 8% of mentee’s monthly rate for 24 months after placement.
Leadership role: You are Worca’s eval practice. Your decisions, standards, and culture define it.

What Unlocks at L10

Worca Partnership — equity, strategic decision-making, profit sharing
Industry authority — your name is synonymous with AI evaluation excellence
Legacy — the organizations and people you’ve built carry the work forward
The privilege of defining how the world evaluates AI

← All Levels