OpenAI has announced significant enhancements to its popular generative AI assistant, ChatGPT, expanding its capabilities beyond text-based interactions. ChatGPT, known for generating essays, poems, and summaries from text prompts, is now set to support voice conversations and image-based searches.
This development marks a notable evolution in the generative AI field, as OpenAI integrates voice-based assistant features with its powerful large language models (LLMs). Users can now engage in voice conversations with ChatGPT, asking it questions or requesting spontaneous tasks like crafting bedtime stories with vocal prompts.
The voice functionality is powered by a new text-to-speech model capable of producing human-like voices from text inputs. OpenAI collaborated with established voice actors to create five distinct voices and utilized the open-source Whisper speech recognition system to transcribe spoken words into text.
In addition to voice capabilities, ChatGPT users can utilize image-based queries. For example, they can upload an image and ask ChatGPT to provide explanations or instructions related to the image.
These new features will roll out to paying Plus and Enterprise subscribers over the next two weeks. To activate voice features, users must navigate to the app’s “settings” menu, select “new features,” and opt-in to voice conversations. They can then choose their preferred voice by tapping the headphone button in the top-right corner.
Initially, voice capabilities will be available in the ChatGPT Android and iOS apps on an opt-in beta basis, while image search will be accessible by default on all platforms. This expansion signifies OpenAI’s commitment to enhancing user interactions with ChatGPT and making it a more versatile and interactive AI assistant.