SNORKEL DATA SERIES // Terminal-Style Coding Tasks for Agents

Frontier datasets for terminal-based  agentic coding

Built for teams training agents in terminal and repository environments, this Snorkel Data Series provides the high-volume, expert-authored task data needed to move from simple code generation to autonomous software engineering

REQUEST DATA SAMPLES //

By submitting this form, I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.

Two dataset tracks: SWE-Bench-CLI+ and Terminal-Bench+

Designed for leading AI labs, our coding tracks are curriculum-structured to progressively increase difficulty, paired with dockerized evaluation infrastructure designed to match production engineering environments.

SWE-Bench CLI+
Terminal-based, repo-grounded SWE tasks inside real repositiories, spanning 7+ languages.

Agents must navigate real codebases, manage cross-file dependencies, and execute fixes via the CLI across multiple languages.

Terminal-Bench+
Multi-step terminal tasks with milestones, tools and larger environments

Optimized for long-horizon planning, tool use, and system-state manipulation under realistic constraints.

This Data Series is intentionally calibrated to stress state-of-the-art coding agents

Built for Frontier model evaluation.

Tiered difficulty from Core to Frontier
Calibrated to remain challenging for models that have memorized public software benchmarks.
Designed for SFT/RL training, benchmarking, and deployment validation

If your agent succeeds here, it performs in production.