12 min read

Thomson Reuters Bets on 100% Accurate Vertical AI

This week’s conversations highlight a crucial shift from foundational models to specialized applications.

Thomson Reuters Bets on 100% Accurate Vertical AI

The AI talent wars are over; now it's about who can ship product from the scattered remains of the old guard and the new wave.


The Intake

📊 12 episodes across 10 podcasts

⏱ 799 minutes of intelligence analyzed

🎙 Featuring: Corey Knowles (Host), Grant Harvey (Host), Andrew Dai (Co-founder and CEO, Elorian), Corey (Host), Grant (Host)


Presented by

Velocity Road

AI That Moves the Needle in 90 Days

Not consulting. Not software. Orchestration. We deploy AI that actually works for middle market companies and their PE investors.

Learn More

The Big Shift

This week’s conversations consistently highlighted a crucial shift beyond foundational model capabilities: the real battleground for AI value is now in specialized applications, efficient infrastructure, and an acute understanding of organizational friction.

Why it matters: Despite the hype around powerful new models, enterprises are hitting a wall where general-purpose AI falls short, whether due to accuracy demands in regulated fields or the sheer cost of scaling, pushing a strong move toward bespoke solutions.

The evidence: We heard from Thomson Reuters' Steve Hasker, emphasizing that "professional-grade" vertical AI requires 100% accuracy in legal and tax, a stark contrast to the 90-97% of general models. Similarly, Rubrik's Devvret Rishi pointed out that traditional security models created for static software fail against creative, improvisational AI agents.

"In the legal profession, where if you use a foundation model that's say 90% correct or even 97% correct, that is not good enough."
— Steve Hasker, CEO at Thomson Reuters on The AI in Business Podcast

Meanwhile, the debate around the Fable 5 shutdown shows a growing unease with relying solely on frontier models due to "cost and access predictability concerns" (NLW on The AI Daily Brief: Artificial Intelligence News and Analysis). This has led to a surge in interest for smaller, open models and compound architectures that can deliver similar performance at a fraction of the cost.

The move: Prioritize deep vertical domain knowledge and customized solutions over a generic "AI-first" approach. Invest in infrastructure and integration that can handle the nuanced constraints of your industry, rather than chasing the next big foundational model. The critical need is to reduce AI adoption risk and improve ROI by focusing on specific, high-value applications where human expertise can be augmented, not replaced, and where AI can be held to higher standards of accuracy and predictability.


The Rundown

① Specialized Models Outperform General AI in Visual Reasoning.

Current frontier AI models, despite their scale, still struggle with complex visual reasoning tasks like spatial understanding and object permanence, performing no better than a preschooler. (Corey Knowles on The Neuron: AI Explained)

Why it matters: For industries relying on visual data (e.g., engineering, satellite analysis, robotics), investing in specialist visual AI models (like Elorian's) is crucial to bridge this gap, as general multimodal models are not yet fit for purpose.

② AI Agents Demand a Revolution in Enterprise Security.

Traditional security models, built on static rules and human approval, are catastrophically inadequate for AI agents that can "plan, improvise, [and] call tools" at machine speed. (Devvret Rishi on The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence))

What to watch: Enterprises need to adopt dynamic runtime security with AI-in-the-loop systems, like Rubrik's Sage, that can enforce policies, maintain visibility, and enable 'rewind' capabilities to counter agent-driven threats and inadvertent errors. Devvret Rishi highlighted that "What's really supposed to stop it from, like, for example, taking sensitive data from Salesforce and then writing it out in an email to another customer? Like those conventional guardrails that you would have that say, like, I've secured each system individually, doesn't really work in this agent feature."

③ US AI Policy is Unpredictable and Vulnerable to Individual Influence.

The US government's AI regulatory approach, especially under the Trump administration, is characterized by its unpredictability and susceptibility to influence from individual conversations (e.g., Amazon CEO Andy Jassy's influence on the Fable 5 situation). (Hayden Field on Decoder with Nilay Patel)

The context: This creates a volatile political risk environment for tech companies, making consistent strategic planning difficult. Hayden Field noted that "Companies everywhere are saying, oh, wow, we gotta make political risk part of our business plan now... which is not good for the industry."

④ Self-Driving Labs are Accelerating Scientific Discovery.

Radical AI's Joseph Krause is building "self-driving labs" that have produced 1200 alloys in six months, with 300 novel and 10 commercially promising, by automating hypothesis generation, synthesis, and characterization. (Joseph Krause on Latent Space: The AI Engineer Podcast)

Why it matters: This closed-loop approach drastically accelerates materials science discovery, offering a model for other scientific disciplines to move beyond human intuition and slow, serial processing for breakthrough innovation.

⑤ Process Supervision is Key to Trustworthy AI in High-Stakes Research.

Elicit is using "process supervision" and domain-specific languages to ensure reliable AI workflows for critical decisions in life sciences, addressing the instability of current AI model probabilities. (Andreas Stuhlmüller & Jungwon Byun on "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis)

What to watch: For regulated industries, building robust evaluation infrastructure and AI observability tools that check not just the outcome but the step-by-step reasoning is essential for adopting AI in critical operations without compromising reliability.


The Signals

🔥 Heating Up

Vertical AI in regulated environments: Demand for highly accurate, domain-specific AI solutions is surging, especially in legal and tax, where general models fall short. (Steve Hasker on The AI in Business Podcast)

Self-Driving Lab (SDL): The concept and implementation of automated scientific discovery platforms are accelerating, particularly in materials science. (Joseph Krause on Latent Space: The AI Engineer Podcast)

AI for science: Significant investment and breakthroughs are being made in using AI to accelerate research, especially in drug discovery and materials. (Jungwon Byun on "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis)

👀 On Watch

🆕 Lip Bu Tan: The Intel CEO is making strategic moves to revitalize the company by focusing on niche AI markets like agentic and physical AI, challenging established narratives. (Lip Bu Tan on No Priors: Artificial Intelligence | Technology | Startups)

🆕 Anjney Midha: The CEO of AMP is challenging the traditional compute scaling narrative, advocating for "output maxing" and responsible infrastructure to maximize AI efficiency. (Anjney Midha on Latent Space: The AI Engineer Podcast)

🆕 Model FLOPs Utilization (MFU): A critical metric for AI efficiency, MFU is gaining attention as a bottleneck in current AI scaling efforts, with new frontier labs reportedly running at surprisingly low utilization rates. (Anjney Midha on Latent Space: The AI Engineer Podcast)

🆕 Siri AI: Apple's integration of a custom Google Gemini for Siri signals a shift in mobile AI strategies and reliance on competitors for core capabilities. (Jeremie Harris on Last Week in AI)

🧊 Cooling Off

Claude Fable 5: Despite impressive benchmarks, the severe guardrails and unpredictable access issues are causing enterprises to seek alternatives. (Jeremie Harris on Last Week in AI)

General-purpose foundation models in regulated fields: Their lower accuracy thresholds (<97%) are proving insufficient for fiduciary professions, driving demand towards specialized vertical AI. (Dan Faggella on The AI in Business Podcast)


The Debate

The pace of recursive self-improvement (RSI) in AI and its implications for future development remains a hot topic.

🐂 The bull case: Daniel Kokotajlo, co-author of "AI 2027," predicts that by late 2028, 50% of AI models will be capable of their own R&D, leading to a rapid, exponential acceleration of AI capabilities. He believes that "once they've fully automated the AI research process, things will probably go faster and faster." (Hard Fork)

🐻 The bear case: Sayash Kapoor from Princeton, co-author of "AI as Normal Technology," argues that real-world bottlenecks, especially the "rate of hallucinations or the reliability" in subjective domains like law, will slow down diffusion. He emphasizes that "to really get to sort of artificial superintelligence, you need to cover all of these different domains," implying that computational advancements alone won't suffice. (Hard Fork)

Our read: While AI will undoubtedly accelerate R&D, real-world deployment and achieving true "superintelligence" will likely be tempered by the persistent challenges of reliability, domain specificity, and the sheer inertia of existing human systems.


The Bottom Line

The next AI frontier isn't just bigger models, but smarter utilization, specialized application, and resilient infrastructure that can ship for the real world beyond the lab.


📖 Want the full episode breakdowns, guest details, and listen links?

Read the Episode Guide →

Episode Guide

1. The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) — "Why AI Agents Break the GenAI Security Model with Devvret Rishi - #770"

Runtime: 56 min | Host: Sam Charrington | Guest: Devvret Rishi (GM of AI, Rubrik)

Worth a listen for: Enterprise security leaders needing to secure their AI deployments and understand the unique threats posed by autonomous AI agents.

This episode unpacks how traditional enterprise security fails in the face of creative, fast-acting AI agents and details Rubrik's 'Sage' solution using small language models for dynamic runtime security and recovery. It highlights the urgent need for external, AI-powered security for agent sprawl.

"Static rules are hard because agents are creative. They don't just follow a fixed path through software. They plan, improvise, call tools, and find workarounds. And human approval is hard because agents can operate much faster than we can."
— Devvret Rishi, GM of AI at Rubrik

▶ Listen · Apple Podcasts

2. The Neuron: AI Explained — "Why Frontier AI Still Sees Like a Toddler, w/ Andrew Dai"

Runtime: 43 min | Host: Corey Knowles | Guest: Andrew Dai (Co-founder and CEO, Elorian)

Worth a listen for: Engineers and product managers working with visual AI in fields like robotics, design, or satellite imaging, to understand current limitations and future directions.

Corey and Grant speak with Andrew Dai about the surprising limitations of frontier AI in visual reasoning, noting their struggle with spatial awareness and causality. The discussion emphasizes the need for specialized models and frequently updated evaluations to drive real progress in fields like engineering design.

"Frontier models still can't reason images much better than a 3 year old. Anything we say that takes longer than a second, these models just can't handle."
— Corey Knowles, Host on The Neuron: AI Explained

▶ Listen · Apple Podcasts

3. The AI in Business Podcast — "How Vertical AI Achieves Defensible Accuracy - with Steve Hasker of Thomson Reuters"

Runtime: 27 min | Host: Dan Faggella | Guest: Steve Hasker (CEO, Thomson Reuters)

Worth a listen for: Executives in regulated industries (legal, tax, healthcare) evaluating AI solutions, to grasp the critical difference between general and professional-grade AI.

Steve Hasker explains why "professional-grade" vertical AI requires 100% accuracy in regulated fields, contrasting it with general-purpose models. He details how Thomson Reuters leverages domain experts and historical content to achieve such accuracy for applications like AI-driven litigation support and tax preparation.

"In the legal profession, where if you use a foundation model that's say 90% correct or even 97% correct, that is not good enough."
— Steve Hasker, CEO at Thomson Reuters on The AI in Business Podcast

▶ Listen · Apple Podcasts

4. Decoder with Nilay Patel — "Who decides when AI is too dangerous?"

Runtime: 41 min | Host: Nilay Patel | Guest: Hayden Field (Senior AI Reporter, The Verge)

Worth a listen for: Executives and policy makers concerned with AI governance and the unpredictable landscape of US AI regulation.

This episode dissects the US government's export controls on Anthropic's Fable 5, revealing the unpredictable and personal nature of AI regulation under the current administration. Hayden Field discusses the implications for transparency, industry stability, and the unique position of companies advocating for AI regulation while developing powerful models.

"Companies everywhere are saying, oh, wow, we gotta make political risk part of our business plan now... which is not good for the industry."
— Hayden Field, Senior AI Reporter at The Verge on Decoder with Nilay Patel

▶ Listen

5. Latent Space: The AI Engineer Podcast — "🔬 The Self-Driving Lab — Joseph Krause, Radical AI"

Runtime: 77 min | Host: swyx + Alessio | Guest: Joseph Krause (CEO, Radical AI)

Worth a listen for: Scientists, R&D leaders, and investors in materials science, chemistry, and other experimental domains looking to accelerate discovery.

Joseph Krause discusses Radical AI's work building "self-driving labs" that automate materials discovery, successfully producing 1200 alloys, many novel, in six months. He highlights the critical role of experimental data, human intuition capture, and interdisciplinary teams in accelerating scientific progress beyond what AI alone can do.

"in materials, the ground truth is the material itself, you have to be able to make it, you have to be able to test it and characterize it."
— Joseph Krause, CEO of Radical AI on Latent Space: The AI Engineer Podcast

▶ Listen · Apple Podcasts

6. "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis — "Dean Ball, on Joining OpenAI: New Power Centers, Frontier AI Policy, & Main Character Energy"

Runtime: 159 min | Host: Nathan | Guest: Dean Ball (Author of Hyperdimensional, Senior Fellow at the Foundation for American Innovation (soon to be leading Strategic Futures at OpenAI))

Worth a listen for: Anyone interested in the intersection of AI, policy, and geopolitics, especially those tracking power shifts in the AI landscape.

Dean Ball, soon to lead Strategic Futures at OpenAI, discusses the evolving landscape of frontier AI policy, critiquing the US government's unpredictable actions (like the 'Fable ban'). He emphasizes the emergence of Frontier AI Labs as new centers of political and economic power and the need for public transparency in policy development, even when the government opts for secrecy.

"The biggest concern you hear from people abroad, especially in Europe, is I just worry that you Americans are going to turn off the models at some point if you get mad at us. And when I was in government we were trying to assuage this concern."
— Dean Ball, Author of Hyperdimensional, Senior Fellow at the Foundation for American Innovation (soon to be leading Strategic Futures at OpenAI) on "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

▶ Listen · Apple Podcasts

7. "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis — "Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research"

Runtime: 106 min | Host: Erik Torenberg | Guest: Andreas Stuhlmüller (Co-founder, Elicit)

Worth a listen for: Researchers, R&D managers, and software engineers engaged in high-stakes, data-intensive tasks, particularly in life sciences.

Andreas Stuhlmüller and Jungwon Byun of Elicit discuss their AI platform for scientific research, focusing on "process supervision" for reliable AI workflows in life sciences. They highlight the need for robust evaluation infrastructure, "world models" for improved reasoning, and the unique challenges of building AI that performs tasks as specified, especially in regulated environments.

"Elicit was founded on the belief that process supervision, where models are evaluated and rewarded for the quality of their step by step reasoning, or rather than just their final answer, would improve the consistency, reliability and legibility of AI workflows."
— Erik Torenberg, Host of "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

▶ Listen · Apple Podcasts

8. Hard Fork — "‘Hard Fork’ Live, Part 3: Differing Visions of an A.I. Future"

Runtime: 56 min | Host: The New York Times | Guest: Daniel Kokotajlo (Executive Director and Co-author of AI 2027, AI Futures Project)

Worth a listen for: Anyone interested in the future trajectory of AI, from its potential for superintelligence to its real-world integration challenges.

Daniel Kokotajlo and Sayash Kapoor debate the pace of AI advancement; one predicting rapid recursive self-improvement and the other cautioning against real-world bottlenecks like AI hallucinations. The episode also touches on the practical challenges of humanoid robots and the critical issue of data privacy.

"What's your best estimate for when we will achieve AI models that can do their own AI R&D? Probably 50% by late 2028."
— Daniel Kokotajlo, Executive Director of AI Futures Project on Hard Fork

▶ Listen · Apple Podcasts

9. Last Week in AI — "#248 - Fable 5, Siri AI, IPOs, Policy on the AI ​​Exponential"

Runtime: 101 min | Host: Andrey Kurenkov | Guest: Jeremie Harris (Host, Gladstone AI)

Worth a listen for: Tech investors, product managers, and strategists tracking competitive dynamics in the AI services and hardware markets.

Andrey Kurenkov and Jeremie Harris analyze Anthropic's Claude Fable 5, Apple's Siri AI, and SpaceX's venture into orbital AI data centers. The episode covers critical discussions around AI regulation, the strategic implications of Apple licensing Google Gemini, and the intense competition driving aggressive pricing and innovative delivery models in AI.

"The general consensus is this, like, the real deal, like Fable 5 is a big leap. Everyone I've seen discuss it in terms of their firsthand experience says that this model is now able to be handling really complex stuff and sort of trust it to deliver on it in a way that was not the case with prior models."
— Andrey Kurenkov, Host at Astrocade on Last Week in AI

▶ Listen · Apple Podcasts

10. No Priors: Artificial Intelligence | Technology | Startups — "Re-engineering the Semiconductor Supply Chain with Intel CEO Lip Bu Tan"

Runtime: 45 min | Host: Elad Gil | Guest: Lip Bu Tan (CEO (and former CEO of Cadence, legendary investor from Walden), Intel)

Worth a listen for: Semiconductor industry leaders, hardware investors, and executives interested in Intel's turnaround strategy and the future of AI infrastructure.

Intel CEO Lip Bu Tan discusses his mission to "save Intel" by focusing on cultural change, faster decision-making, and strategic investments. He outlines Intel's transformation to an AI-enabled company, targeting niche areas like agentic and physical AI, challenging the notion that GPUs dominate all AI workloads.

"Jensen Huang, my old time friend, he also put 5 billion in investing and support me. His 5 billion become 25 billion now or more."
— Lip Bu Tan, CEO of Intel on No Priors: Artificial Intelligence | Technology | Startups

▶ Listen · Apple Podcasts

11. The AI Daily Brief: Artificial Intelligence News and Analysis — "The Models Trying to Fill the Fable Gap"

Runtime: 29 min | Host: Nathaniel Whittemore | Guest: Shadow Matthew

Worth a listen for: Enterprise AI implementers and strategists seeking cost-effective and predictable alternatives to high-cost frontier models.

Nathaniel Whittemore discusses the fallout from the Fable 5 shutdown, highlighting enterprises' move to alternative AI models due to cost and predictability concerns. The episode reveals Microsoft's exploration of Chinese open models and advanced AI architectures like OpenRouter's Fusion API for optimizing inference costs with agentic workloads.

"You cannot export control your way out of an open source race. The ban didn't slow China down. Indeed, a lot of the early coverage has been around GLM 5.2 beating models like GPT5.5 on a variety of highly valuable tasks, including Long horizon code tasks for a fraction of the cost."
— NLW on The AI Daily Brief: Artificial Intelligence News and Analysis

▶ Listen · Apple Podcasts

12. Latent Space: The AI Engineer Podcast — "The Professor of Outputmaxxing — Anjney Midha, AMP"

Runtime: 59 min | Host: swyx | Guest: Anjney Midha (CEO, Founder, AMP)

Worth a listen for: Infrastructure and operations leaders, or anyone grappling with the realities of AI scale, efficiency, and resource management.

Anjney Midha, CEO of AMP, addresses the critical issue of compute utilization in AI, advocating for "output maxing" and "responsible infrastructure" over simply acquiring more GPUs. He discusses the surprising inefficiencies in current AI scaling, the growing local community backlash against data centers, and the strategic importance of culture in AI labs.

"If anything, AI scaling should be putting a premium on the value of common sense and infrastructure because the margin of error now is so much lower and the costs of wastage are so much higher."
— Anjney Midha, CEO, Founder of AMP on Latent Space: The AI Engineer Podcast

▶ Listen · Apple Podcasts

PARTNER

Not sure where AI fits in your operations? Start with the data.

Velocity Road's AI Readiness Assessment maps your organization against 7 operational dimensions and shows exactly where AI creates ROI — in under 10 minutes.

Take the Assessment -> →

Avi Savar

Get AI & Technology in your inbox

How AI and Tech are reshaping business. Free.