author

Derek Pham

Research Engineer

,

Snorkel AI

Derek Pham is a Research Engineer at Snorkel AI, working on benchmarks, evaluation, and synthetic data workflows for frontier model development. He previously built large-scale NLP systems in the data-as-a-service domain and holds an MS in Computer Science from Columbia University.

The latest from Derek

Automating Benchmark Design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark Tuning with an LLM-in-the-loop), a framework that leverages environment design principles to automate the process of dynamic benchmark design. BeTaL works by parameterizing key design choices in base benchmark templates and uses LLMs to reason through the resulting parameter space...

Research Paper

Accepted to ICLR 2026

Automating Benchmark Design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark…

Oct 30, 2025 •

Amanda Dsouza, Harit Vishwakarma, Zhengyang Qi, Justin Bauer, Derek Pham, Thomas Walshe, Armin Parchami, Frederic Sala, Paroma Varma

Learn more about Automating Benchmark Design

Blog

Scaling Trust: Rubrics in Snorkel’s Quality Process

Snorkel’s “Trusted Scale” philosophy Welcome to Part 4 of Snorkel AI’s rubric series. In previous posts, we explored how rubrics enable structured evaluation (Part 1), the spectrum of rubric types and use cases (Part 2), and the science behind designing and validating them (Part 3). In this latest installment, we pull back the curtain on how Snorkel puts these principles…

Oct 16, 2025 •

Derek Pham

Learn more about Scaling Trust: Rubrics in Snorkel’s Quality Process

For models that need to be right. Not just good enough.

Request dataset samples

Talk to our team

Derek Pham

The latest from Derek

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?