SNORKEL DATA SERIES // Terminal-Style Coding Tasks for Agents
Frontier datasets for terminal-based agentic coding
Built for teams training agents in terminal and repository environments, this Snorkel Data Series provides the high-volume, expert-authored task data needed to move from simple code generation to autonomous software engineering
REQUEST DATA SAMPLES //
Two dataset tracks: SWE-Bench-CLI+ and Terminal-Bench+
Designed for leading AI labs, our coding tracks are curriculum-structured to progressively increase difficulty, paired with dockerized evaluation infrastructure designed to match production engineering environments.
SWE-Bench CLI+
Terminal-based, repo-grounded SWE tasks inside real repositiories, spanning 7+ languages.Agents must navigate real codebases, manage cross-file dependencies, and execute fixes via the CLI across multiple languages.
Terminal-Bench+
Multi-step terminal tasks with milestones, tools and larger environmentsOptimized for long-horizon planning, tool use, and system-state manipulation under realistic constraints.
This Data Series is intentionally calibrated to stress state-of-the-art coding agents
Built for Frontier model evaluation.
- Tiered difficulty from Core to Frontier
- Calibrated to remain challenging for models that have memorized public software benchmarks.
- Designed for SFT/RL training, benchmarking, and deployment validation
If your agent succeeds here, it performs in production.
Why the Snorkel Data Series
High-volume quarterly drops
Multi-layer quality pipeline
Unified execution environment
Direct roadmap influence
Expert-led validation
Every task is built and validated through a multi-layer quality pipeline
01
Human review
SMEs verify clarity, correctness, and full solvability.
02
LLM-assisted validation
Automated checks flag instruction-test mismatches and missing constraints.
03
Deterministic testing
Code-based unit tests validate compliance, syntax, formatting, and outcomes.
04
Guardrails
Additional checks catch cheating paths, non-determinism, and reward hacking.