Building Vector Index
This guide describes how to build a vector index in Milvus.
A vector index is a metadata organization unit used to accelerate vector similarity search. Without building an index on the vectors, Milvus will perform a brute-force search.
By default, Milvus does not index segments with fewer than 1,024 rows.
The following example illustrates building an IVF_FLAT index with 1024 clusters using the Euclidean distance (L2). You can choose the index and metric type that suit your scenario.
Prepare Index Parameters
Follow these steps to prepare the index parameters:
index_params = {
"metric_type":"L2",
"index_type":"IVF_FLAT",
"params":{"nlist":1024}
}
Parameter | Description | Options |
---|---|---|
metric_type |
The type of metric used to measure vector similarity. | For floating-point vectors:- L2 (Euclidean distance)- IP (inner product)- COSINE (cosine similarity)For binary vectors:- JACCARD (Jaccard distance)- HAMMING (Hamming distance) |
index_type |
The type of index used to accelerate vector search. | For floating-point vectors:- FLAT (FLAT)- IVF_FLAT (IVF_FLAT)- IVF_SQ8 (IVF_SQ8)- IVF_PQ (IVF_PQ)- GPU_IVF_FLAT* (GPU_IVF_FLAT)- GPU_IVF_PQ*> (GPU_IVF_PQ)- HNSW (HNSW)- DISKANN* (DISKANN)For binary vectors:- BIN_FLAT (BIN_FLAT)- BIN_IVF_FLAT (BIN_IVF_FLAT) |
params |
Specific construction parameters for the index. | For more information, see In-Memory and On-Disk Indexing. |
- DISKANN has certain prerequisites. For more information, see On-Disk Indexing.
- GPU_IVF_FLAT and GPU_IVF_PQ are only available in Milvus installations with GPU support enabled.
Build Index
Build the index by specifying the vector field name and the index parameters.
from pymilvus import Collection, utility
collection = Collection("book")
collection.create_index(
field_name="book_intro",
index_params=index_params
)
utility.index_building_progress("book")
Parameter | Description |
---|---|
field_name |
The name of the vector field on which to build the index. |
index_params |
The parameters of the index to build. |
Building Scalar Index
Unlike vectors, scalars have only magnitude and no direction. Milvus considers individual numbers and strings as scalars. Below is the list of available data types for scalar fields in Milvus.
Starting from Milvus v2.1.0, to accelerate attribute filtering in hybrid searches, you can build an index on scalar fields. You can read more about scalar field indexing here.
Note: The link for "here" should be replaced with the actual link specified in the original document.
Building Index
When building an index on a scalar field, you don't need to set any index parameters. The default value for the index name of the scalar field is default_idx
, followed by the name of the indexed field. You can set it to other appropriate values if you want.
The following code snippet assumes that a collection named book
already exists and an index needs to be created on the string field book_name
.
from pymilvus import Collection
collection = Collection("book")
collection.create_index(
field_name="book_name",
index_name="scalar_index",
)
collection.load()
Once the index is created, you can include a boolean expression for this string field in vector similarity search, as shown below:
search_param = {
"data": [[0.1, 0.2]],
"anns_field": "book_intro",
"param": {"metric_type": "L2", "params": {"nprobe": 10}},
"limit": 2,
"expr": "book_name like \"Hello%\"",
}
res = collection.search(**search_param)
Deleting Index
from pymilvus import Collection
collection = Collection("book") # Get an existing collection.
collection.drop_index()