With the debut of voice-controlled chatbots and assistants, the world witnessed a new chapter unfolding before them where things could get done by mere voice. Powered by voice recognition AI software, the world slowly prepared itself for keyword-less reality. The launch of voice-controlled devices like Apple Siri and Amazon Echo was met by unpredicted market demand. Customers were intrigued by the features and benefits these gadgets offered them. Following these, we had automotive voice assistants like BMW Intelligent Personal Assistant, Amazon Echo Auto, Apple CarPlay, Google Android Auto, and more. Today, we also rely on voice-based biometrics for user identification. All these developments are now creating new demand in the Voice AI industry, i.e., voice-driven coding.
Voice-driven coding will not only render the software development industry more accessible by eliminating entry barriers, but it will also allow individuals with injuries or chronic illnesses to continue working. Many programmers suffer from repetitive strain injury (RSI) caused by repetitive motions causing damage to muscles, tendons, and nerves. Voice-driven coding can be a blessing to them. The concept is simple: artificial intelligence is used to produce code from natural language descriptions of what users want to accomplish by telling them in easy-to-understand spoken language. Voice coding involves two types of software: a speech recognition engine and a voice coding platform. A good microphone is also required for voice coding, especially if you wish to reduce background or non-stationary noise.
An example of a speech recognition engine that allows voice coding is Dragon, a powerful engine developed by a Massachusetts-based speech-recognition software company – Nuance. Dragon has different versions for Windows and Mac versions available. Examples of voice coding platforms include VoiceCode, Talon, and Aenea. While VoiceCode and Talon are suitable for Mac OS; Aenea which is a client-server library for using voice macros from Dragon NaturallySpeaking and Dragonfly on remote hosts runs on Linux.
The difference between popular voice assistants like Siri and voice-driven coding platforms like VoiceCode and Talon, is that the latter do not process natural language, thus spoken commands must precisely match the directives that the system already understands. Further, these voice drive coding platforms employ continuous command recognition, which eliminates the need for users to stop between instructions, which happens in the case of voice assistants. The majority of VoiceCode instructions are made up of terms that aren’t in the English language. Talon and Aenea, however, have dynamic grammar, which continually refreshes which words the program can detect based on which applications are open. This implies that users may provide commands in English without creating misunderstandings. Talon can also imitate navigating with a mouse by moving a pointer across the screen based on eye movements and producing clicks based on lip pops.
There are voice-to-code software too, like Serenade, which includes a speech-to-text engine created exclusively for coding, unlike standard conversational speech-based speech-to-text API (e.g., Google). Serenade’s engine passes the code spoken by the user into its natural-language processing unit, which uses machine-learning models to recognize and transform common programming structures into syntactically correct code.
Serenade is compatible with a number of major IDEs, including Visual Studio Code and IntelliJ IDEA. The Serenade app identifies your IDE and integrates with its capabilities when you install and activate it. By delivering specific directives, you can get started quickly. Most of these software also offer visual options that can help users resolve issues faced while interpreting voice commands.
Salesforce is also exploring a voice-driven programming approach named CodeGen. In an exclusive interview with TechCrunch, Silvio Savarese, Executive Vice President and Chief Scientist at Salesforce, revealed that CodeGen is based on a large autoregressive model with 16 billion parameters that is trained on a tremendous quantity of data. It divides the use cases with model samples based on whether the user is an expert programmer or a non-coder.
While the study is currently in the proof-of-concept stage, Savarese plans to share his findings at an internal developer Salesforce conference later this month.
The future of voice drive coding looks promising so far. However, the possibility of such technology entering mainstream consumption depends on user demand and user transition from keyboard-mouse-based coding to voice coding.