On April 26, 2026, OpenAI CEO Sam Altman posted a short note on X that accumulated 1.4 million impressions within 48 hours: “feels like a good time to seriously rethink how operating systems and user interfaces are designed (also the internet; there should be a protocol that is equally usable by people and agents).”
Most people read it as a provocation. It is also a roadmap.
Two days later, OpenAI’s developer account posted a two-minute demo video and a link to a GitHub repo called openai/realtime-voice-component. The demo showed a user playing chess on a webpage using only voice. No clicking. No typing. The user spoke, the app responded, and the game progressed. The tweet pulled 1.3 million views in under 24 hours.
What the Repo Actually Is
The openai/realtime-voice-component is an open-source React toolkit that lets developers build voice-controlled applications using gpt-realtime-1.5, OpenAI’s flagship audio model for voice agents. Rather than building a voice assistant that sits on top of an existing UI, the component is designed to let voice control the state of the application directly. The user speaks a goal. The AI reads the current state of the app. The AI completes the action.
OpenAI describes it as a reference implementation, not a production-ready product. But reference implementations are how platforms begin. The repo is licensed under Apache-2.0, meaning anyone can fork it, extend it, and ship on top of it. That is the point.
Also Read: DeepSeek V4 Runs on Huawei Chips
The Interface Layer Is a Business
Every major computing shift of the last 50 years has been, at its core, a fight over the interface layer. The command line gave way to the graphical desktop. The desktop gave way to the browser. The browser gave way to the mobile app store. Each transition reshuffled which companies controlled how humans accessed software and data. The companies that owned the interface layer captured the most value.
The current interface layer, the app grid on your phone, the browser tab, the operating system underneath it all, was designed for humans who click and tap. It was not designed for AI agents operating on your behalf. An AI navigating a traditional app is working inside an interface built for fingers and eyes, not for machine reasoning. The ceiling on what it can do is set by the constraints of a paradigm it did not create.
Investor Chamath Palihapitiya, responding to the broader conversation this week, framed the shift this way: “The past 50 years of computing was about inventing form factors to interact with information. AI is about interacting with knowledge. It is completely different. Agents and models are there to do the dirty work. We need a new layer, more executive function, less tactical tools.”
What Comes After the Click
Sam Altman’s note pointed at something specific: the internet needs a protocol equally usable by people and agents. That does not exist yet. The web was built for human eyes and human hands. Menus, buttons, forms, navigation flows, all of it assumes a human on one end. An agent trying to navigate that infrastructure is doing so through workarounds.
The OpenAI real-time voice component is one small piece of what a different kind of interface could look like. Voice in, action out. The AI sees the state of the application. The AI completes the task. The user never touches a button.
Whether this specific toolkit becomes the foundation for something larger is not the point. The point is that the question Sam Altman raised on April 26 is now being answered in code, in public, with an open-source license. Developers can start building the answer today.
The interface layer of computing is not a permanent infrastructure. It is an assumption. That assumption is being questioned at the highest levels of the AI industry, and the tools to replace it are already shipping.

