OpenAI API recently announced three new families of embedding models – text similarity, text search, and code search, each geared to excel at various tasks. These three models either take code or text as input and provide an embedding vector as a result. Besides, they make natural language and code tasks such as clustering, semantic search, and classification perform effortlessly.
Embeddings are numerical representations of concepts transformed into number sequences. They are beneficial for working with natural language and code since they can be easily consumed and compared with other machine learning models and algorithms such as clustering and search.
The new endpoint maps text and code to a vector representation – “embedding” them in a high-dimensional space using neural network models. These are the descendants of GPT-3 where each dimension captures some aspects of the input.
Text similarity models provide embeddings that represent the semantic similarity of texts and also help in tasks such as clustering, data visualization, and classification.
Text search models provide embeddings that allow for large-scale search tasks, such as discovering a relevant short search query document among a collection of documents based on a text query.
Code search models provide code and text embeddings aiming to discover the relevant code block for a natural language query from a collection of code blocks.