Continual Learning Bench by Berkeley & Snorkel

We define and advance data and environments to push the AI frontier

Built on 10+ years of pioneering research in data-centric AI, including 250+ publications and benchmarks.

Browse research library

building benchmarks and collaborating with

from the lab

Featured research

Research Paper

Accepted to MLSys

Learning from Less: Measuring the Effectiveness of RLVR in Low Data Compute Regimes

Benchmark

Open Benchmarks Grants

Benchmarking Agents in Insurance Underwriting Environments

key research areas

Vision and impact

We help labs advance frontier models by working with domain experts to design and build complex, realistic datasets that drive model performance.

Benchmarking &
Evaluation

Build benchmarks that define and advance the AI frontier

featured work

Continual Learning Bench
Co-published with Berkeley

Terminal-Bench 2.0 (+3.0)
Co-authored with Laude Institute

BigLaw Bench: Research
Co-released with Harvey

SlopCode Bench
Co-released with UW-Madison

Scaling Subject Matter Expertise

Define how subject matter experts encode their knowledge into data

featured work

Weak-to-Strong Generalization Through Data-Centric Lens
ICLR 2025

Rapid Data Creation with Weak Supervision
Best of VLDB 2017

RL, Training, & Data Valuation

Drive dataset development based on feedback from RL and model training

featured work

Learning from Less: Effectiveness of RLVR in Low Data and Compute Regimes
MLSys 2026

4B FinQA Model Outperforms 235B Model with the Right Data
Co-authored with Berkeley

RIFT: A Rubric Failure Mode Taxonomy and Automated Diagnostics
ICLR Workshop 2026

initiatives

Community and open science

Open benchmarks, conversations, and research for real-world AI performance.

Open Benchmarks Grants

Backed by a $3M commitment, the program funds open-source datasets, benchmarks, and evaluation artifacts that shape how frontier AI systems are built and evaluated.

Learn more

Bench Talks

Our podcast series at the intersection of AI evaluation, data quality, and real-world impact.

Watch the latest episode

Reading Group

A recurring forum for researchers and practitioners to explore the latest frontier developments in AI while building meaningful connections within the community.

DEEP RESEARCH Expertise

Technical advisors and distinguished affiliates

Stephen Bach

Brown University

Eliot Horowitz Assistant Professor, Computer Science Department

Jason Fries

Stanford University

Assistant Professor of Biomedical Data Science and of Medicine

Jared Dunnmon

Co-Founder & Chief Scientist, Stealth Startup

Prev. Dir. of AI at DIU

Fred Sala

Chief Scientist

Snorkel AI

Assistant Professor @ University of Wisconsin-Madison

Chris Ré

Co-Founder

Snorkel AI

Professor @ Stanford University

Ludwig Schmidt

Stanford University · LAION

Stanford researcher and LAION collaborator

Karthik Narasimhan

Princeton University

Professor of Computer Science

Yu Su

Ohio State University

Associate Professor of Computer Science and Engineering

Lewis Tunstall

Hugging Face

Machine Learning Engineer

PUBLICATIONS

Browse research blogs and academic papers

Learning Dependency Structures for Weak Supervision Models

This work focuses on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real world tasks.

Research Paper

Learning Dependency Structures for Weak Supervision Models

This work focuses on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real world tasks.

Dec 09, 2019 •

P. Varma, et al, 2019

Learn more about Learning Dependency Structures for Weak Supervision Models

Interactive Programmatic Labeling for Weak Supervision

Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.

Research Paper

Interactive Programmatic Labeling for Weak Supervision

Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.

Dec 08, 2019 •

B. Cohen-Wang, et al, 2019

Learn more about Interactive Programmatic Labeling for Weak Supervision

Bootstrapping Conversational Agents with Weak Supervision

This paper presents a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision.

Research Paper

Bootstrapping Conversational Agents with Weak Supervision

This paper presents a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision.

Dec 07, 2019 •

N. Mallinar, et al, 2019

Learn more about Bootstrapping Conversational Agents with Weak Supervision

A Machine-Compiled Database of Genome-Wide Association Studies

Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.

Research Paper

A Machine-Compiled Database of Genome-Wide Association Studies

Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.

Dec 06, 2019 •

V. Kuleshov, et al, 2019

Learn more about A Machine-Compiled Database of Genome-Wide Association Studies

A Clinical Text Classification Paradigm Using Weak Supervision…

This work develops a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models.

Research Paper

A Clinical Text Classification Paradigm Using Weak Supervision…

Dec 05, 2019 •

Alex Ratner, Armin Parchami, Bhavishya Pohani

Learn more about A Clinical Text Classification Paradigm Using Weak Supervision…

Training Classifiers with Natural Language Explanations

Introducing BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision.

Research Paper

Training Classifiers with Natural Language Explanations

Introducing BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision.

Dec 20, 2018 •

B. Hancock, et al, 2018

Learn more about Training Classifiers with Natural Language Explanations

Software 2.0 and Snorkel: Beyond Hand-Labeled Data

This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.

Research Paper

Software 2.0 and Snorkel: Beyond Hand-Labeled Data

This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.

Dec 19, 2018 •

C. Ré, 2018 (invited)

Learn more about Software 2.0 and Snorkel: Beyond Hand-Labeled Data

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

Presenting Snorkel MeTal, an end-to-end system for multi-task learning.

Research Paper

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

Presenting Snorkel MeTal, an end-to-end system for multi-task learning.

Dec 18, 2018 •

A. Ratner, et al, 2018

Learn more about Snorkel MeTaL: Weak Supervision for Multi-Task Learning

Fonduer: Knowledge Base Construction From Richly Formatted Data

Introducing Fonduer, a machine-learning-based KBC system for richly formatted data.

Research Paper

Fonduer: Knowledge Base Construction From Richly Formatted Data

Introducing Fonduer, a machine-learning-based KBC system for richly formatted data.

Dec 17, 2018 •

S. Wu, et al, 2018

Learn more about Fonduer: Knowledge Base Construction From Richly Formatted Data

1 … 32 33 34 35

Let’s research together

Join our team of leading researchers and help shape the future of AI.

View all careers

Open Benchmark Grants

We define and advance data and environments to push the AI frontier

Featured research

Learning from Less: Measuring the Effectiveness of RLVR in Low Data Compute Regimes

SlopCode Bench: A community benchmark measuring code erosion

Harvey’s BigLaw Bench: Research

Continual Learning Bench: Evaluating agents that adapt and improve over time

Terminal-Bench 2.0: Raising the bar for AI agent evaluation

Benchmarking Agents in Insurance Underwriting Environments

Vision and impact

Benchmarking & Evaluation

Scaling Subject Matter Expertise

RL, Training, & Data Valuation

Community and open science

Open Benchmarks Grants

Bench Talks

Reading Group

Technical advisors and distinguished affiliates

Stephen Bach

Jason Fries

Jared Dunnmon

Fred Sala

Chris Ré

Ludwig Schmidt

Karthik Narasimhan

Yu Su

Lewis Tunstall

Browse research blogs and academic papers

Let’s research together

How do you want to work with Snorkel?

Benchmarking &
Evaluation

Browse research blogs and academic papers