Category

Research

Snorkel AI emerged from a research project, and we remain closely connected to the research community. Students and professors associated with the Snorkel project continue to publish academic papers that push the field forward, and the Snorkel AI research team integrates the most promising of those ideas into our platform.

Our picks

Image for Getting better performance from foundation models (with less data)
Getting better performance from foundation models (with less data)
Getting better performance from foundation models (with less data)
August 4, 2023
Fred Sala
Image for Snorkel AI researchers present 18 papers at NeurIPS 2023
Snorkel AI researchers present 18 papers at NeurIPS 2023
The Snorkel AI team will present 18 research papers and talks at the 2023 Neural Information Processing Systems (NeurIPS) conference from December 10-16. The Snorkel papers cover a broad range of topics including fairness, semi-supervised learning, large language models (LLMs), and domain-specific models. Snorkel AI is proud of its roots in the research community and endeavors to remain at the forefront
October 31, 2023
Team Snorkel
Image for Long context models in the enterprise: benchmarks and beyond
Long context models in the enterprise: benchmarks and beyond
Snorkel researchers devised a new way to evaluate long context models and address their “lost-in-the-middle” challenges with mediod voting.
June 6, 2024
Amanda Dsouza

All articles on Research

Snorkel’s Journey to Data-Centric AI, with Chris Ré
The Future of Data-Centric AI Talk Series Background Snorkel co-founder Chris Ré is an associate professor of Computer Science at Stanford University and an award-winning researcher in data-based theory and machine learning. He has co-founded four companies based on his research in machine learning systems. Chris recently presented at the Future of Data-Centric AI virtual event in September, where he
November 3, 2021
Team Snorkel
Forager: Rapid Data Exploration for Rapid Model Development
Machine Learning Whiteboard (MLW) Open-source Series We started our machine learning whiteboard (MLW) series earlier this year as an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Fait Poms, a Ph.D. student at Stanford
October 14, 2021
Team Snorkel
Recap: The Future of Data-Centric AI Event
Main takeaways from The Future of Data-Centric AI Event We recently hosted The Future of Data-Centric AI, where academia, research, and industry experts and practitioners came together to discuss the shift from model-centric AI development to data-centric AI and what lies ahead. This post gives you a quick overview of the event and top takeaways from over eight hours of
October 11, 2021
Aarti Bagul
Building Malleable Machine Learning (ML) Systems
Defining and Building Malleable ML Systems – Machine Learning Whiteboard (MLW) Open-Source Series As you may know, earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning. In this
September 22, 2021
Team Snorkel
Applying Weak Supervision Research
ScienceTalks with Paroma Varma In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Paroma Varma – a co-founder of Snorkel AI and one of the first and leading contributors to the Snorkel project. We discuss Paroma’s path into machine learning, her work in optimization and signal processing during her undergrad, weak supervision and image data during her
September 13, 2021
Team Snorkel
Sliceline: Fast, Linear-Algebra-Based Slice Finding for ML Model Debugging
Diving Into SliceLine – Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Kaushik Shivakumar dives into
September 8, 2021
Team Snorkel
The Future of Data-Centric AI – Virtual Live Event
Join the live discussion. Learn how to unlock data-centric AI and make AI development practical in your organization Working with vast unstructured and unlabeled data is one of the bottlenecks in the machine learning lifecycle. Machine learning models can only get as reliable and accurate as the data being fed to them. With a data-centric approach 1, your data science
August 31, 2021
Team Snorkel
Developing and Managing Systems to Extract Structured Data
Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Manan Shah dives into “Glean: Structured Extractions from
August 2, 2021
Team Snorkel
Multi-Resolution Weak Supervision for Sequential Data
Machine Learning Whiteboard (MLW) Open-source Series Our machine learning whiteboard (MLW) is an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in discovering more about machine learning.In this episode, Hiromu Hota, Vincent Sunn Chen, Daniel Y. Fu, and Frederic Sala dive
June 25, 2021
Team Snorkel
Weak Supervision in Biomedicine
In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Jason Fries – a research scientist at Stanford University’s Biomedical Informatics Research lab and Snorkel Research, and one of the first contributors to the Snorkel open-source library. We discuss Jason’s path into machine learning, empowering doctors and scientists with weak supervision, and utilizing organizational resources in biomedical applications of Snorkel. This episode is part
June 16, 2021
Team Snorkel
Training Classifiers With Natural Language Explanations
Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, our Co-founder and Head of Technology. Braden Hancock
May 24, 2021
Team Snorkel
Applying Information Theory to ML With Fred Sala
In this episode of Science Talks, Frederic Sala – an assistant professor of Computer Science at the University of Wisconsin Madison and a research scientist at Snorkel discusses his path into machine learning, the central thesis that ties together his multidisciplinary research, his thoughts on the future of weak supervision, as well as his decision to go into academia.
May 19, 2021
Team Snorkel
3 Impractical Assumptions About AI to Avoid
Impractical ML assumptions are made every day in research, which limit its adoption. In the real world, these assumptions do not hold up. Learn more about how to avoid making these assumptions about AI application development.
May 4, 2021
Braden Hancock
Measuring NLP Progress With Sebastian Ruder
In this episode of Science Talks, Sebastian Ruder, Research Scientist at DeepMind, shares his thoughts on making AI practical with Snorkel AI’s Braden Hancock. This conversation covers progress made in the NLP domain with emerging research, new benchmarks like SuperGLUE, rich repositories and news sources that keep you in the loop and on top of what’s new in NLP, and more.
March 10, 2021
Team Snorkel
Productionizing ML Research With Thomas Wolf
In this episode of ScienceTalks, Snorkel AI’s Braden Hancock Hugging Face’s Chief Science Officer, Thomas Wolf. Thomas shares his story about how he got into machine learning and discusses important design decisions behind the widely adopted Transformers library, as well as the challenges of bringing research projects into production. ScienceTalks is an interview series from Snorkel AI, highlighting some of the best work and ideas to make AI practical.
February 5, 2021
Team Snorkel