Data development

2026: The year of environments

December 10, 2025
4 min read
Snorkel Team

Our NeurIPS 2025 retrospective

The Snorkel AI team

We just returned from NeurIPS 2025, and we’re still processing everything we saw. The energy around data-centric AI has never been stronger—and we couldn’t be more grateful to the research community for pushing these ideas forward.

The evolution we’ve witnessed

When we first brought Snorkel AI research to NeurIPS back in 2019, data-centric AI barely registered as a topic. Fast forward to 2025, and there’s an entire section of the conference floor dedicated to it. That kind of shift doesn’t happen by accident—it’s the result of countless researchers taking stock of the central role of top-quality data in realizing the best outcomes with AI.

What stood out this year

A few themes dominated the conversations we had and the talks we attended.

2026 will be the year of environments. Through talks like Aksel Joonas Reedi’s presentation on OpenEnv, Mike Merrill’s discussion of Terminal-Bench 2.0, and Grégoire Mialon’s discussion of ARE, we observe that the community is getting serious about building diverse, scalable environments for evaluations and RL. The insight that environments provide a natural curriculum for scaling complexity feels like it’s going to shape a lot of work in 2026. Noteworthy papers include:

Data still need human expertise. While tools and techniques are naturally vital, the trend that stands out is a greater recognition that data quality has a make-or-break impact on achieving desirable results, and working with human experts is still the best way to deliver top-quality data. We found some very interesting datasets among the accepted papers this year:

Rubrics are getting more principled. We saw exciting work on more systematic factorization of evaluation criteria, new human-in-the-loop paradigms for data development, and frameworks for continual learning. In Liangchen Luo’s talk, How to Develop in the Agentic Era, the emphasis on building evals before training strongly reinforces the notion that well-written rubrics and evaluation criteria are of utmost importance. Two papers of note here:

Our events

Snorkel Social

Snorkel AI cofounder & CEO Alex Ratner, cofounder & Chief Scientist Fred Sala, and the broader Snorkel research team hosted an intimate evening of whiskey, small bites, and research-driven conversation at The Whiskey House San Diego. We’re so grateful for everyone who joined us!

SEA Workshop (sponsorship)

We want to thank the SEA (Scaling Environments for Agents) workshop organizers for an excellent day, with highly engaging invited talks, and poster sessions that drew a great deal of interest. We were pleased to sponsor this event, along with our other Diamond sponsor Inclusion AI, and Platinum sponsors Vmax and Sonic Jobs.

Award winners

Outstanding papers:

Outstanding posters:

Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping

Thank you

To everyone who shared their work, challenged our thinking, and stopped by to chat—thank you. The progress in this field happens because researchers are willing to publish their failures alongside their successes, and build on each other’s ideas.

We’re heading into 2026 energized by what we saw. If the trends at NeurIPS are any indication, it’s going to be a big year for environments, evaluation, and data-centric approaches to AI development.See you at the next one. And in the meantime, if you’re interested in collaborating with us on building impactful environments or need expert-verified data developed in agent environments, come talk to us!

Share this article

Recommended articles

View all articles
Image
Agents’ Last Exam: AI Benchmarking for Real Work
At our latest Snorkel AI Reading Group, Yiyou Sun and David (Xinyang) Han (UC Berkeley, Center for Responsible and Decentralized Intelligence) presented Agents’ Last Exam (ALE) — a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. ALE is a collaboration between Berkeley RDI, Snorkel AI, and 300+ expert contributors across 55 professional subfields. ALE asks a deceptively simple question: can
June 29, 2026
Snorkel Team
alex-ratner-talk
Agentic AI Evaluation: Closing the Gap with Better Benchmarks and Data
Alex Ratner, co-founder and CEO of Snorkel AI, spoke at @Scale: Systems & Reliability about one of the most underappreciated problems in AI deployment: our ability to measure agents has been outpaced — arguably for the first time in the history of the field — by our ability to build them. The talk digs into what it actually takes to
June 22, 2026
Snorkel Team
Image
Benchtalks #3: We taught AI everything except how to learn
For our third Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with Parth Asawa, a PhD student at UC Berkeley advised by Matei Zaharia and Joey Gonzalez. Parth leads research on continual learning and is the creator of Continual Learning Bench, developed in collaboration
June 20, 2026
Vincent Sunn Chen
Image
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.