Image
author

Derek Pham

Research Engineer
,
Snorkel AI

Derek Pham is a Research Engineer at Snorkel AI, working on benchmarks, evaluation, and synthetic data workflows for frontier model development. He previously built large-scale NLP systems in the data-as-a-service domain and holds an MS in Computer Science from Columbia University.

The latest from Derek

Automating Benchmark Design
The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark Tuning with an LLM-in-the-loop), a framework that leverages environment design principles to automate the process of dynamic benchmark design. BeTaL works by parameterizing key design choices in base benchmark templates and uses LLMs to reason through the resulting parameter space...
Research Paper
Accepted to ICLR 2026
Automating Benchmark Design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark…

Learn more about Automating Benchmark Design
Scaling Trust: Rubrics in Snorkel’s Quality Process
Blog
Scaling Trust: Rubrics in Snorkel’s Quality Process

Snorkel’s “Trusted Scale” philosophy Welcome to Part 4 of Snorkel AI’s rubric series. In previous posts, we explored how rubrics enable structured evaluation (Part 1), the spectrum of rubric types and use cases (Part 2), and the science behind designing and validating them (Part 3). In this latest installment, we pull back the curtain on how Snorkel puts these principles…

Oct 16, 2025
Learn more about Scaling Trust: Rubrics in Snorkel’s Quality Process
Image

For models that need to be right. Not just good enough.