Image
author

Harit Vishwakarma

Research Intern
,
Snorkel AI

Harit Vishwakarma is a Research Intern at Snorkel AI, focusing on evaluating and improving the reasoning capabilities of large language models. He recently completed his PhD in Computer Science at the University of Wisconsin–Madison. His research centers on studying and developing methods for reliable inference and leveraging them for automated data labeling and enhancing performance at test time. Next, he is off to the University of Oxford for a postdoc.

The latest from Harit

Automating Benchmark Design
The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark Tuning with an LLM-in-the-loop), a framework that leverages environment design principles to automate the process of dynamic benchmark design. BeTaL works by parameterizing key design choices in base benchmark templates and uses LLMs to reason through the resulting parameter space...
Research Paper
Accepted to ICLR 2026
Automating Benchmark Design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark…

Learn more about Automating Benchmark Design
Introducing SnorkelSpatial
Blog
Introducing SnorkelSpatial

A procedurally generated and programmatically verified benchmark for evaluating spatial reasoning capabilities in LLMs Large language models (LLMs) are showing remarkable results on solving complex reasoning problems across domains—from mathematical proofs and logical puzzles to graduate-level science and engineering questions. On the other hand, their spatial reasoning capabilities are less understood, even though such reasoning underlies many everyday tasks. We…

Oct 24, 2025
Learn more about Introducing SnorkelSpatial
Image

For models that need to be right. Not just good enough.