Paroma Varma

Automating Benchmark Design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark Tuning with an LLM-in-the-loop), a framework that leverages environment design principles to automate the process of dynamic benchmark design. BeTaL works by parameterizing key design choices in base benchmark templates and uses LLMs to reason through the resulting parameter space...

Research Paper

Accepted to ICLR 2026

Automating Benchmark Design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark…

Oct 30, 2025 •

Amanda Dsouza, Harit Vishwakarma, Zhengyang Qi, Justin Bauer, Derek Pham, Thomas Walshe, Armin Parchami, Frederic Sala, Paroma Varma

Learn more about Automating Benchmark Design

Blog

Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment

Snorkel takes a step on the path to enterprise superalignment with new data development workflows for enterprise alignment

May 20, 2024 •

Alex Ratner , Tom Walshe, Chris Glaze , Fred Sala , Paroma Varma , Hoang Tran

Learn more about Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment

Blog

Here’s how Snorkel Flow + Google AI built an enterprise-ready model in a day

Google and Snorkel AI customized PaLM 2 using domain expertise and data development to improve performance by 38 F1 points in a matter of hours.

Mar 19, 2024 •

Ali Arsanjani , Paroma Varma

Learn more about Here’s how Snorkel Flow + Google AI built an enterprise-ready model in a day

DEEM’22: Data Management for End-to-End Machine Learning

The DEEM’22 workshop (Data Management for End-to-End Machine Learning) is held on Sunday June 12th, in conjunction with SIGMOD/PODS 2022. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arisingdata management issues in ML application scenarios. The workshop solicits regular research papers (10 pages) describing preliminary and ongoing research results, including industrial experience reports of end-to-end ML deployments, related to DEEM topics. In addition, DEEM 2022 establishes a new paper category for reports on applications and tools (4 pages) as a forum for sharing interesting...

Research Paper

DEEM’22: Data Management for End-to-End Machine Learning

The DEEM’22 workshop (Data Management for End-to-End Machine Learning) is held on Sunday June 12th, in conjunction with SIGMOD/PODS 2022. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arisingdata management issues in ML application scenarios. The workshop solicits regular research papers (10 pages) describing…

Oct 20, 2023 •

M. Boehm, et al.

Learn more about DEEM’22: Data Management for End-to-End Machine Learning

Parameterizing neural power spectra into periodic and aperiodic components

Electrophysiological signals exhibit both periodic and aperiodic properties. Periodic oscillations have been linked to numerous physiological, cognitive, behavioral and disease states. Emerging evidence demonstrates that the aperiodic component has putative physiological interpretations and that it dynamically changes with age, task demands and cognitive states. Electrophysiological neural activity is typically analyzed using canonically defined frequency bands, without consideration of the aperiodic (1/f-like) component. We show that standard analytic approaches can conflate periodic parameters (center frequency, power, bandwidth) with aperiodic ones (offset, exponent), compromising physiological interpretations. To overcome these limitations, we introduce an algorithm to parameterize neural power spectra as a combination...

Research Paper

Parameterizing neural power spectra into periodic and aperiodic components

Electrophysiological signals exhibit both periodic and aperiodic properties. Periodic oscillations have been linked to numerous physiological, cognitive, behavioral and disease states. Emerging evidence demonstrates that the aperiodic component has putative physiological interpretations and that it dynamically changes with age, task demands and cognitive states. Electrophysiological neural activity is typically analyzed using canonically defined frequency bands, without consideration of the aperiodic…

Nov 23, 2020 •

T. Donoghue, et al.

Learn more about Parameterizing neural power spectra into periodic and aperiodic components

Cardiac Imaging of Aortic Valve Area From 34 287 UK Biobank Participants Reveals Novel Genetic Associations and Shared Genetic Comorbidity With Multiple Disease Phenotypes

Background: The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods: From a sample of 34 287 white British ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac magnetic resonance imaging sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening, to identify genetic comorbidities. Results: A genome-wide association study of aortic valve area in these UK Biobank participants showed 3 significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, P=1.8×10−9), rs35991305 (chr12:94191968, CRADD, P=3.4×10−8), and chr17:45013271:C:T...

Research Paper

Cardiac Imaging of Aortic Valve Area From 34 287 UK Biobank Participants Reveals Novel Genetic Associations and Shared Genetic Comorbidity With Multiple Disease Phenotypes

Background: The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods: From a sample of 34 287 white British ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac magnetic resonance imaging sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by…

Oct 30, 2020 •

A. Córdova-Palomera, et al.

Learn more about Cardiac Imaging of Aortic Valve Area From 34 287 UK Biobank Participants Reveals Novel Genetic Associations and Shared Genetic Comorbidity With Multiple Disease Phenotypes

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

This paper explores the applicability of weak supervision, or relying on higher level, noisier forms of supervision to label training data, specifically using data programming.

Research Paper

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

This paper explores the applicability of weak supervision, or relying on higher level, noisier forms of supervision to label training data, specifically using data programming.

Dec 19, 2019 •

Z. Wheng, et al, 2019

Learn more about Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

Snuba: Automating Weak Supervision to Label Training Data

Presenting Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large, unlabeled dataset in the weak supervision setting.

Research Paper

Snuba: Automating Weak Supervision to Label Training Data

Presenting Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large, unlabeled dataset in the weak supervision setting.

Dec 16, 2019 •

P. Varma and C. Ré, 2019

Learn more about Snuba: Automating Weak Supervision to Label Training Data

Scene Graph Prediction With Limited Labels

This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples.

Research Paper

Scene Graph Prediction With Limited Labels

This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples.

Dec 13, 2019 •

V. Chen, et al, 2019

Learn more about Scene Graph Prediction With Limited Labels

Paroma Varma

The latest from Paroma

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?