Image

Conversational, decision-grade
responses in 15 seconds

Impact
15.2

second responses vs hours 

98.6%

safety & governance score

+5 pts

decision usefulness with GPT-5.4-mini upgrade

The challenge

A global media SaaS company that helps large enterprise clients manage communications, reputation, and strategic decision-making. It analyzes hundreds of millions of sources daily from public news, social, and broadcast to proprietary analyst-curated databases. Their competitive advantage is the layer on top of publicly available data: in-house human editorial teams, proprietary scoring and analytics frameworks, and years of analyst judgment refined into decision-grade intelligence. When a crisis signal is building or a competitor’s narrative is gaining traction, speed and accuracy matter enormously. Historically, getting an answer meant waiting for a human analyst to manually aggregate across those sources: a process measured in hours, not seconds.

The company’s AI team set out to make that synthesis conversational and instant. The hard part was encoding the institutional expertise that makes their output decision-grade and informs communications and strategic decisions that can run into tens or hundreds of millions of dollars.

The solution

Snorkel designed and built a multi-agent conversational intelligence system which orchestrates specialized agents across the company’s data sources, returning grounded, decision-ready answers in seconds. Snorkel built a custom evaluation harness around the client team’s own institutional knowledge: what made an answer useful for decision-makers, what counted as properly grounded, where the process needed to be reliable, and which safety and governance boundaries mattered for their use cases.

Snorkel was able to easily assess the impact of upgrading from GPT-4.1-mini to GPT-5.4-mini.  The harness showed a 5-point lift in decision usefulness, a 100% pass rate on safety-critical refusal checks, and an improvement from 82.6% to 98.6% on broader governance checks for avoiding internal jargon and keeping system details out of responses. This provided a clear, data-backed case to upgrade to GPT-5.4-mini. 

The outcome

The agent replaces a process which used to take hours and to deliver answers in an average of 15 seconds, with safety scores high enough to clear enterprise launch requirements. As models continue to evolve, the eval-first foundation lets the client test, compare, and swap models without rebuilding the agent or losing the expert judgement that makes it trustworthy. 

Share this customer story

More customer stories

View all stories
Image
From hours to seconds on CLO contract review with 94% end user acceptance
A top 10 US bank manages CLO portfolios totaling billions in assets, each governed by contracts up to 500 pages.
Image
Deploying production AI in <60 days to accelerate claims review 67%
A leading global firm transforming insurance subrogation operations with AI found that manual review processes capped their throughput to ~30% of available claims.
Image
DIU enhances decision-making resilience with Snorkel AI
Strategic dominance in the Indo-Pacific relies on the ability to track and coordinate friendly forces — “blue objects” — with absolute precision. To maintain operational awareness in dynamic and contested environments, the Department of War identified a requirement for adaptable, dual-use technologies that enhance logistics and decision-making resilience.
Image

For models that need to be right. Not just good enough.