Embeddings vectors are the native way for artificial intelligence to represent any type of data, and many algorithms use feature vectors to represent data. They can represent text, images, as well as audio and video. There are many methods for computing feature vectors, including using open-source embedding models or some cloud service APIs, such as the openai API.
Chroma provides a lightweight wrapper for popular embedding models, making it easy for you to use them in your applications.
Transformers.js
Chroma can run embedding models locally on your machine using Transformers.js to generate feature vectors. Transformers uses the "Xenova/all-MiniLM-L6-v2" model. To run this, install the Transformers.js library by running npm install @xenova/transformers
in the command line.
const {ChromaClient} = require('chromadb');
const client = new ChromaClient({path: "http://localhost:8000"});
const {TransformersEmbeddingFunction} = require('chromadb');
const embedder = new TransformersEmbeddingFunction();
(async () => {
// Specify the embedding function to use through the embeddingFunction parameter
const collection = await client.getOrCreateCollection({name: "name", embeddingFunction: embedder})
// Add data without specifying vectors; the add function will call the embeddingFunction function to compute text vectors
await collection.add({
ids: ["id1", "id2", "id3"],
metadatas: [{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
documents: ["lorem ipsum...", "doc2", "doc3"],
})
// Query data, which will also call the embeddingFunction function to calculate vectors for the queryTexts and then search for similar data
const results = await collection.query({
nResults: 2,
queryTexts: ["lorem ipsum"]
})
})();
OPENAI
Chroma provides a convenient wrapper for OpenAI's embedding model API. To make calls to the OpenAI API, you'll need an API key, which you can obtain by registering for an OpenAI account.
const {OpenAIEmbeddingFunction} = require('chromadb');
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey"})
// use directly
const embeddings = embedder.generate(["document1","document2"])
// Specify the embedding function using the embeddingFunction parameter
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
const collection = await client.getCollection({name: "name", embeddingFunction: embedder})
You can optionally pass a model_name parameter to choose the OpenAI embedding model to use. By default, Chroma uses text-embedding-ada-002.
Custom Embedding Functions
If you use other embedding models, you can integrate the EmbeddingFunction base class and call your own chosen model to generate vectors.
Note: Custom embedding functions are not required, as you can pre-calculate vectors using your own embedding model and then use the vectors to read and write Chroma data.
class MyEmbeddingFunction {
private api_key: string;
constructor(api_key: string) {
this.api_key = api_key;
}
public async generate(texts: string[]): Promise<number[][]> {
// Call your own model to compute vectors
return embeddings;
}
}