Company

About the founder

How the studio is led, and how doctoral-level methodology informs expert-panel evaluation.

Who runs Delta Evals

Sherif Bakr

Computer Science PhD candidate · IEEE Senior Member · USPTO patent holder

Multi-agent labeling Soft-label distributions LLM calibration Active learning Software engineering Mechanical engineering

Sherif’s research focuses on multi-agent labeling, soft-label distributions, LLM calibration, and active learning — the methodological foundation behind Delta Evals’ expert-panel evaluation workflow.

He has graduate training in both software engineering and mechanical engineering, with prior work across AI trust and safety, software engineering, and applied research, including research at a U.S. national defense research laboratory.

Delta Evals is built around a simple thesis: frontier AI teams need auditable distributions of expert judgment, not just one-off labels.

Delta Evals was founded by Sherif Bakr, a Computer Science PhD candidate, IEEE Senior Member, and USPTO patent holder with graduate training in both software engineering and mechanical engineering.

Sherif’s doctoral work focuses on multi-agent labeling, soft-label distributions, LLM calibration, reinforcement learning, and active learning. That research directly informs Delta Evals’ approach: combine calibrated human experts, structured rubrics, disagreement modeling, and audit-ready QA to produce higher-signal evaluation data for frontier AI teams.

Before Delta Evals, Sherif worked across AI trust and safety, software engineering, and applied research, including experience with technical evaluation workflows, engineering-heavy problem domains, and research at a U.S. national defense research laboratory.

Have a data problem worth solving?

Tell us the model behaviors you want to measure or improve. We’ll come back with a labeling and benchmark plan in days, not weeks.

Start with a pilot Talk to us