Embeddings, also known as feature vectors, are the native way artificial intelligence represents any type of data, making them well-suited for use with various AI tools and algorithms. They can represent text, images, as well as audio and video. There are many options for generating feature vector data, including using open source text embedding models or calling APIs from cloud services.

Chroma provides a lightweight wrapper for popular embedding models, making it easy to use them in your applications. When creating a Chroma collection, you can set an embedding function that will automatically compute text vectors when saving and updating text data.

Note: You can also choose not to use Chroma's encapsulated embedding function to compute vectors. You can select any embedding model to pre-compute text vector data.

To obtain the embedding function from Chroma, import the embedding_functions module from chromadb.utils.

from chromadb.utils import embedding_functions

Default Model: all-MiniLM-L6-v2

By default, Chroma uses the all-MiniLM-L6-v2 model from Sentence Transformers to compute vectors. This embedding model can create sentence and document vectors. The functionality of this embedding model runs on the local machine and requires downloading the model files (which is done automatically).

default_ef = embedding_functions.DefaultEmbeddingFunction()

Transformers Model

Chroma can also use any Sentence Transformers model to compute vectors.

sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

You can pass an optional model_name parameter to choose the Sentence Transformers model you want to use. By default, Chroma uses all-MiniLM-L6-v2. You can search for model names on Hugging Face.

OpenAI Model

Chroma provides a convenient wrapper for OpenAI's embedding API. This requires calling the OpenAI interface and an API key. You can obtain an API key by registering for an OpenAI account.

This embedding function depends on the openai Python package, which can be installed using pip install openai.

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
                api_key="YOUR_API_KEY",
                model_name="text-embedding-ada-002"
            )

To use the OpenAI embedding model on platforms like Azure, you can use the api_base and api_type parameters:

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
                api_key="YOUR_API_KEY",
                api_base="YOUR_API_BASE_PATH",
                api_type="azure",
                model_name="text-embedding-ada-002"
            )

Custom Embedding Function

You can create your own embedding function to work with Chroma by simply implementing the methods of the EmbeddingFunction base class.

from chromadb.api.types import Documents, EmbeddingFunction, Embeddings

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, texts: Documents) -> Embeddings:
        return embeddings