Research

Snorkel AI emerged from a research project, and we remain closely connected to the research community. Students and professors associated with the Snorkel project continue to publish academic papers that push the field forward, and the Snorkel AI research team integrates the most promising of those ideas into our platform.

Our picks

Getting better performance from foundation models (with less data)

August 4, 2023

•

Fred Sala

Snorkel AI researchers present 18 papers at NeurIPS 2023

The Snorkel AI team will present 18 research papers and talks at the 2023 Neural Information Processing Systems (NeurIPS) conference from December 10-16. The Snorkel papers cover a broad range of topics including fairness, semi-supervised learning, large language models (LLMs), and domain-specific models. Snorkel AI is proud of its roots in the research community and endeavors to remain at the forefront

October 31, 2023

•

Team Snorkel

Long context models in the enterprise: benchmarks and beyond

Snorkel researchers devised a new way to evaluate long context models and address their “lost-in-the-middle” challenges with mediod voting.

June 6, 2024

•

Amanda Dsouza

All articles on Research

Snorkel’s Journey to Data-Centric AI, with Chris Ré

The Future of Data-Centric AI Talk Series Background Snorkel co-founder Chris Ré is an associate professor of Computer Science at Stanford University and an award-winning researcher in data-based theory and machine learning. He has co-founded four companies based on his research in machine learning systems. Chris recently presented at the Future of Data-Centric AI virtual event in September, where he

November 3, 2021

•

Team Snorkel

Forager: Rapid Data Exploration for Rapid Model Development

Machine Learning Whiteboard (MLW) Open-source Series We started our machine learning whiteboard (MLW) series earlier this year as an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Fait Poms, a Ph.D. student at Stanford

October 14, 2021

•

Team Snorkel

Recap: The Future of Data-Centric AI Event

Main takeaways from The Future of Data-Centric AI Event We recently hosted The Future of Data-Centric AI, where academia, research, and industry experts and practitioners came together to discuss the shift from model-centric AI development to data-centric AI and what lies ahead. This post gives you a quick overview of the event and top takeaways from over eight hours of

October 11, 2021

•

Aarti Bagul

Building Malleable Machine Learning (ML) Systems

Defining and Building Malleable ML Systems – Machine Learning Whiteboard (MLW) Open-Source Series As you may know, earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning. In this

September 22, 2021

•

Team Snorkel

Applying Weak Supervision Research

ScienceTalks with Paroma Varma In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Paroma Varma – a co-founder of Snorkel AI and one of the first and leading contributors to the Snorkel project. We discuss Paroma’s path into machine learning, her work in optimization and signal processing during her undergrad, weak supervision and image data during her

September 13, 2021

•

Team Snorkel

Sliceline: Fast, Linear-Algebra-Based Slice Finding for ML Model Debugging

Diving Into SliceLine – Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Kaushik Shivakumar dives into

September 8, 2021

•

Team Snorkel

The Future of Data-Centric AI – Virtual Live Event

Join the live discussion. Learn how to unlock data-centric AI and make AI development practical in your organization Working with vast unstructured and unlabeled data is one of the bottlenecks in the machine learning lifecycle. Machine learning models can only get as reliable and accurate as the data being fed to them. With a data-centric approach 1, your data science

August 31, 2021

•

Team Snorkel

Developing and Managing Systems to Extract Structured Data

Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Manan Shah dives into “Glean: Structured Extractions from

August 2, 2021

•

Team Snorkel

Multi-Resolution Weak Supervision for Sequential Data

Machine Learning Whiteboard (MLW) Open-source Series Our machine learning whiteboard (MLW) is an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in discovering more about machine learning.In this episode, Hiromu Hota, Vincent Sunn Chen, Daniel Y. Fu, and Frederic Sala dive

June 25, 2021

•

Team Snorkel

Weak Supervision in Biomedicine

In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Jason Fries – a research scientist at Stanford University’s Biomedical Informatics Research lab and Snorkel Research, and one of the first contributors to the Snorkel open-source library. We discuss Jason’s path into machine learning, empowering doctors and scientists with weak supervision, and utilizing organizational resources in biomedical applications of Snorkel. This episode is part

June 16, 2021

•

Team Snorkel

Training Classifiers With Natural Language Explanations

Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, our Co-founder and Head of Technology. Braden Hancock

May 24, 2021

•

Team Snorkel

Applying Information Theory to ML With Fred Sala

In this episode of Science Talks, Frederic Sala – an assistant professor of Computer Science at the University of Wisconsin Madison and a research scientist at Snorkel discusses his path into machine learning, the central thesis that ties together his multidisciplinary research, his thoughts on the future of weak supervision, as well as his decision to go into academia.

May 19, 2021

•

Team Snorkel

3 Impractical Assumptions About AI to Avoid

Impractical ML assumptions are made every day in research, which limit its adoption. In the real world, these assumptions do not hold up. Learn more about how to avoid making these assumptions about AI application development.

May 4, 2021

•

Braden Hancock

Measuring NLP Progress With Sebastian Ruder

In this episode of Science Talks, Sebastian Ruder, Research Scientist at DeepMind, shares his thoughts on making AI practical with Snorkel AI’s Braden Hancock. This conversation covers progress made in the NLP domain with emerging research, new benchmarks like SuperGLUE, rich repositories and news sources that keep you in the loop and on top of what’s new in NLP, and more.

March 10, 2021

•

Team Snorkel

Productionizing ML Research With Thomas Wolf

In this episode of ScienceTalks, Snorkel AI’s Braden Hancock Hugging Face’s Chief Science Officer, Thomas Wolf. Thomas shares his story about how he got into machine learning and discusses important design decisions behind the widely adopted Transformers library, as well as the challenges of bringing research projects into production. ScienceTalks is an interview series from Snorkel AI, highlighting some of the best work and ideas to make AI practical.

February 5, 2021

•

Team Snorkel

Research

Our picks

All articles on Research

How do you want to work with Snorkel?