Continual Learning Bench by Berkeley & Snorkel

We define and advance data and environments to push the AI frontier

Built on 10+ years of pioneering research in data-centric AI, including 250+ publications and benchmarks.

Browse research library

building benchmarks and collaborating with

from the lab

Featured research

Research Paper

Accepted to MLSys

Learning from Less: Measuring the Effectiveness of RLVR in Low Data Compute Regimes

Benchmark

Open Benchmarks Grants

Benchmarking Agents in Insurance Underwriting Environments

key research areas

Vision and impact

We help labs advance frontier models by working with domain experts to design and build complex, realistic datasets that drive model performance.

Benchmarking &
Evaluation

Build benchmarks that define and advance the AI frontier

featured work

Continual Learning Bench
Co-published with Berkeley

Terminal-Bench 2.0 (+3.0)
Co-authored with Laude Institute

BigLaw Bench: Research
Co-released with Harvey

SlopCode Bench
Co-released with UW-Madison

Scaling Subject Matter Expertise

Define how subject matter experts encode their knowledge into data

featured work

Weak-to-Strong Generalization Through Data-Centric Lens
ICLR 2025

Rapid Data Creation with Weak Supervision
Best of VLDB 2017

RL, Training, & Data Valuation

Drive dataset development based on feedback from RL and model training

featured work

Learning from Less: Effectiveness of RLVR in Low Data and Compute Regimes
MLSys 2026

4B FinQA Model Outperforms 235B Model with the Right Data
Co-authored with Berkeley

RIFT: A Rubric Failure Mode Taxonomy and Automated Diagnostics
ICLR Workshop 2026

initiatives

Community and open science

Open benchmarks, conversations, and research for real-world AI performance.

Open Benchmarks Grants

Backed by a $3M commitment, the program funds open-source datasets, benchmarks, and evaluation artifacts that shape how frontier AI systems are built and evaluated.

Learn more

Bench Talks

Our podcast series at the intersection of AI evaluation, data quality, and real-world impact.

Watch the latest episode

Reading Group

A recurring forum for researchers and practitioners to explore the latest frontier developments in AI while building meaningful connections within the community.

DEEP RESEARCH Expertise

Technical advisors and distinguished affiliates

Stephen Bach

Brown University

Eliot Horowitz Assistant Professor, Computer Science Department

Jason Fries

Stanford University

Assistant Professor of Biomedical Data Science and of Medicine

Jared Dunnmon

Co-Founder & Chief Scientist, Stealth Startup

Prev. Dir. of AI at DIU

Fred Sala

Chief Scientist

Snorkel AI

Assistant Professor @ University of Wisconsin-Madison

Chris Ré

Co-Founder

Snorkel AI

Professor @ Stanford University

Ludwig Schmidt

Stanford University · LAION

Stanford researcher and LAION collaborator

Karthik Narasimhan

Princeton University

Professor of Computer Science

Yu Su

Ohio State University

Associate Professor of Computer Science and Engineering

Lewis Tunstall

Hugging Face

Machine Learning Engineer

PUBLICATIONS

Browse research blogs and academic papers

Deep Text Mining of Instagram Data Without Strong Supervision

This paper showcases methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain.

Research Paper

Deep Text Mining of Instagram Data Without Strong Supervision

This paper showcases methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain.

Dec 16, 2018 •

K. Hammar, et al, 2018

Learn more about Deep Text Mining of Instagram Data Without Strong Supervision

Snorkel: Fast Training Set Generation for Information Extraction

Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.

Research Paper

Snorkel: Fast Training Set Generation for Information Extraction

Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.

Dec 20, 2017 •

A. Ratner, et al, 2017

Learn more about Snorkel: Fast Training Set Generation for Information Extraction

Learning to Compose Domain-Specific Transformations for Data Augmentation

Automating data augmentation by learning a generative sequence model over user-specified transformation functions.

Research Paper

Learning to Compose Domain-Specific Transformations for Data Augmentation

Automating data augmentation by learning a generative sequence model over user-specified transformation functions.

Dec 19, 2017 •

A. Ratner, et al, 2017

Learn more about Learning to Compose Domain-Specific Transformations for Data Augmentation

Learning the Structure of Generative Models Without Labeled Data

Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.

Research Paper

Learning the Structure of Generative Models Without Labeled Data

Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.

Dec 18, 2017 •

S. Bach, et al, 2017

Learn more about Learning the Structure of Generative Models Without Labeled Data

Inferring Generative Model Structure With Static Analysis

Presenting Coral, a paradigm that infers generative model structure, significantly reducing the amount of data required to learn structure.

Research Paper

Inferring Generative Model Structure With Static Analysis

Presenting Coral, a paradigm that infers generative model structure, significantly reducing the amount of data required to learn structure.

Dec 17, 2017 •

P. Varma, et al, 2017

Learn more about Inferring Generative Model Structure With Static Analysis

Swellshark: A Generative Model for Biomedical Named Entity Recognition Without Labeled Data

Introducing SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly.

Research Paper

Swellshark: A Generative Model for Biomedical Named Entity Recognition Without Labeled Data

Introducing SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly.

Nov 13, 2017 •

J. Fries, et al, 2017

Learn more about Swellshark: A Generative Model for Biomedical Named Entity Recognition Without Labeled Data

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Introducing Socratic learning, a paradigm that uses feedback from a discriminative model to automatically identify latent data subsets in training data.

Research Paper

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Introducing Socratic learning, a paradigm that uses feedback from a discriminative model to automatically identify latent data subsets in training data.

Nov 13, 2017 •

P. Varma, et al, 2017

Learn more about Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Snorkel: Rapid Training Data Creation With Weak Supervision

This paper presents a flexible interface layer to write labeling functions based on experience.

Research Paper

Snorkel: Rapid Training Data Creation With Weak Supervision

This paper presents a flexible interface layer to write labeling functions based on experience.

Oct 04, 2017 •

Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

Learn more about Snorkel: Rapid Training Data Creation With Weak Supervision

Data Programming: Creating Large Training Sets, Quickly

A paradigm for labeling training datasets programmatically rather than by hand.

Research Paper

Data Programming: Creating Large Training Sets, Quickly

A paradigm for labeling training datasets programmatically rather than by hand.

Dec 20, 2016 •

A. Ratner, et al. 2016

Learn more about Data Programming: Creating Large Training Sets, Quickly

1 … 33 34 35

Let’s research together

Join our team of leading researchers and help shape the future of AI.

View all careers

Open Benchmark Grants

We define and advance data and environments to push the AI frontier

Featured research

Learning from Less: Measuring the Effectiveness of RLVR in Low Data Compute Regimes

SlopCode Bench: A community benchmark measuring code erosion

Harvey’s BigLaw Bench: Research

Continual Learning Bench: Evaluating agents that adapt and improve over time

Terminal-Bench 2.0: Raising the bar for AI agent evaluation

Benchmarking Agents in Insurance Underwriting Environments

Vision and impact

Benchmarking & Evaluation

Scaling Subject Matter Expertise

RL, Training, & Data Valuation

Community and open science

Open Benchmarks Grants

Bench Talks

Reading Group

Technical advisors and distinguished affiliates

Stephen Bach

Jason Fries

Jared Dunnmon

Fred Sala

Chris Ré

Ludwig Schmidt

Karthik Narasimhan

Yu Su

Lewis Tunstall

Browse research blogs and academic papers

Let’s research together

How do you want to work with Snorkel?

Benchmarking &
Evaluation

Browse research blogs and academic papers