Chroma is an open-source vector database that utilizes vector similarity search technology to efficiently store and retrieve large-scale high-dimensional vector data.
Chroma is an application-embedded database that is embedded into our code in the form of a package. Its advantage lies in its simplicity. If you are developing an LLM application that requires a vector database to implement LLM memory and support text-similarity language search, without wanting to install an independent vector database, Chroma is a good choice. This tutorial is mainly based on JavaScript.
1. Package installation
npm install --save chromadb # yarn add chromadb
2. Initialize Chroma client
const {ChromaClient} = require('chromadb');
const client = new ChromaClient();
3. Create a collection
A collection in the Chroma database is similar to a table in Mysql, where vector data (including documents and other source data) is stored. Below creates a collection:
const {OpenAIEmbeddingFunction} = require('chromadb');
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "your_api_key"})
const collection = await client.createCollection({name: "my_collection", embeddingFunction: embedder})
Here, an openai text embedding model is used to calculate text vectors, so you need to provide your openai api key. Of course, you can also omit the embeddingFunction parameter and use the built-in model in Chroma to calculate vectors, or replace it with another open-source text embedding model.
4. Add data
After defining a collection, data is added to the collection, and Chroma will store our data and create a dedicated vector index based on the text data's vectors for easy querying later.
await collection.add({
ids: ["id1", "id2"],
metadatas: [{"source": "my_source"}, {"source": "my_source"}],
documents: ["This is a document", "This is another document"],
})
Using pre-calculated text vectors, without using Chroma's built-in embedding function to calculate:
await collection.add({
ids: ["id1", "id2"],
embeddings: [[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
where: [{"source": "my_source"}, {"source": "my_source"}],
documents: ["This is a document", "This is another document"]
})
5. Query collection data
Based on the queryTexts set for querying conditions, Chroma will return the most similar nResults results.
const results = await collection.query({
nResults: 2,
queryTexts: ["This is a query document"]
})