On Monday, September 26, Hugging Face, an AI startup, and ServiceNow Research, ServiceNow’s R&D arm, introduced BigCode. BigCode is an ambitious new project that seeks to develop “state-of-the-art” AI systems for code in an open and responsible manner. The main objective is to eventually make a dataset that can be used for training a code-generating system, which will then be applied to build a 15-billion-parameter model prototype using ServiceNow’s internal graphics card cluster.
Experts note that DeepMind’s AlphaCode, Amazon’s CodeWhisperer, and OpenAI’s Codex, which powers GitHub’s Copilot service, offer a fascinating preview of what AI is capable of today in the world of computer programming. However, only a small number of these AI systems have been made publicly accessible and open-sourced. For instance, OpenAI’s paid API provides access to Codex, whereas GitHub has started charging for access to Copilot. This inspires companies to explore the commercial opportunities of offering publicly accessible code-generating tools.
Anyone with a background in professional AI research and the time to contribute to the project is welcome to use in BigCode, which was inspired by Hugging Face’s BigScience initiative and BLOOM to open source highly sophisticated text-generating systems. The application form is now active.
BigCode is attempting to resolve some of the concerns that have come up regarding the use of AI-powered code generation, especially when it comes to fair use. It will achieve this by jointly developing a code-generating system that will be open-sourced under a license that will permit developers to reuse it subject to certain terms and conditions.
The initial objective of BigCode is to create a dataset of code that was gathered in the most ethically acceptable manner possible. In contrast to other releases that only scan all of GitHub for code, BigCode developers promise to go to great lengths to guarantee that only files from repositories with permissive licenses are included in the aforementioned training dataset. They assert that they would build “responsible” AI methods along the way for teaching and exchanging code-generating systems of all kinds and will seek input from relevant parties before announcing any policy changes.
Hugging Face and ServiceNow could not specify a date for when the project would be finished. However, they anticipate researching a number of code generation technologies over the coming several months, including auto-completion and code synthesis systems that operate across a wide range of domains, tasks, and programming languages. They added that the model prototype that will be built using the dataset would be smaller than AlphaCode, which has approximately 41.4 billion parameters, but larger than Codex, which has 12 billion parameters.