HomeOpinionMira Murati's Thinking Machines Says Every AI Lab Built Interaction Wrong

Mira Murati’s Thinking Machines Says Every AI Lab Built Interaction Wrong

Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, released a research preview of what it calls Interaction Models on May 11, 2026. The claim at the center of the release is direct: every major AI lab has built its interaction layer as an afterthought, and the resulting latency and limitation is not a tuning problem but an architectural one.

The Thinking Machines interaction model is the first commercial-class multimodal model designed from scratch for real-time collaboration. It does not wait for a user to finish speaking before processing input. It listens, watches, and responds simultaneously across audio, video, and text, in 200-millisecond micro-turns.

What the Thinking Machines Interaction Model Actually Does

The core architecture is called a multi-stream, micro-turn design. Instead of flattening all inputs and outputs into a single ordered token sequence — the approach every major lab uses today — the model runs continuous parallel streams for audio, video, and text, grounded in real time. This means the model perceives what the user is doing even while it is generating a response.

The system ships as two components working together. The interaction model handles real-time dialog: it manages turn-taking, detects whether a speaker is thinking or yielding, issues visual and verbal interjections without waiting for a prompt, and maintains time-awareness. A separate background model handles tasks that require deeper reasoning or tool calls, passing results back to the interaction model so they can be woven into live conversation without a noticeable pause.

The demos published alongside the research preview illustrate what this looks like in practice. In one, the model tracks posture in real time, interrupting the user the moment they start to slouch rather than waiting to be asked. In another, it simultaneously builds a data visualization and discusses the underlying business context while the user is still speaking. In a third, it provides live translations and fact-corrections whispered in the user’s ear during a conversation, without interrupting the flow.

The Benchmark Numbers

The model released in this preview is TML-Interaction-Small: a 276-billion-parameter Mixture-of-Experts architecture with 12 billion active parameters. On the FD-bench, the benchmark designed specifically to measure interaction quality rather than raw intelligence, TML-Interaction-Small achieved a turn-taking latency of 0.40 seconds. GPT-realtime-2.0 clocked in at 1.18 seconds. Gemini-3.1-flash-live hit 0.57 seconds.

On the interaction quality score within FD-bench V1.5, TML-Interaction-Small scored 77.8. GPT-realtime-2.0 minimal scored 46.8. That is roughly a 3x gap on the metric that measures what makes real-time AI actually usable.

The Thinking Machines interaction model also outperformed competing models on visual proactivity benchmarks, including RepCount-A and ProactiveVideoQA, where other frontier models either stayed silent or produced incorrect answers when presented with streaming video input.

One technical criticism worth noting: an independent analysis by engineer Sean Goedecke pointed out that the background model architecture, while clever, raises open questions about self-correction behavior when the slower background model contradicts what the faster interaction model already said. That is a fair concern, and one Thinking Machines acknowledged by calling this a research preview rather than a production release.

Also Read: Google Was Supposed to Lose the AI Race. It Just Hit an All-Time High.

The Argument Against the Entire Industry

The research paper published alongside the release is unusually pointed. It cites a recent Anthropic model card directly, quoting Anthropic’s own finding that their model underperforms when used in a synchronous, hands-on-keyboard pattern and that users perceive it as too slow in that mode. Thinking Machines uses this to argue that the industry has converged on autonomous, long-running agent workflows not because that is what users need, but because that is what the current architecture supports.

The argument is that every major lab has optimized for the wrong objective. Autonomy is valuable, but most real work requires a human to stay in the loop, clarifying intent and redirecting as understanding develops. The current turn-based architecture physically prevents that kind of collaboration because the model’s perception freezes while it is generating a response.

For AI labs that have spent the last two years racing to build better autonomous agents, this is a structural critique, not a product complaint.

What Comes Next

The research preview is available to a limited group of researchers. A wider release is planned for later in 2026, with larger models to follow once Thinking Machines solves the latency constraints that currently make bigger architectures too slow to serve in real-time settings.

Thinking Machines raised $2 billion in seed funding led by a16z, with participation from Nvidia, AMD, Accel, ServiceNow, and others. Its first product, Tinker, focused on model fine-tuning for developers. The interaction model is its first major model release and represents the clearest statement yet of what the company believes AI collaboration should actually look like.

Whether the benchmarks hold under enterprise conditions remains to be seen. But the architecture argument — that real-time collaboration cannot be bolted on after the fact — is one the industry has not yet answered.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Newsletter to be a part of an engaging community.

Rohit Yadav
Rohit Yadav
Rohit is the Founder & CEO at Analytics Drift.

Most Popular