Inserting Entities

This section describes how to insert data into Milvus via the client-side.

You can also use MilvusDM to migrate data to Milvus. MilvusDM is an open-source tool specifically designed for importing and exporting Milvus data.

Milvus 2.1 supports the VARCHAR data type on scalar fields. When building an index for a scalar field of type VARCHAR, the default index type is a trie.

The following example inserts 2,000 rows of randomly generated data as sample data (the Milvus CLI example uses a pre-built remote CSV file containing similar data). Real-world applications often use higher-dimensional vectors than the example. You can prepare your own data to replace the sample data.

Data Preparation

First, prepare the data to be inserted. The data type to be inserted must match the schema of the collection, otherwise Milvus will raise an exception.

Milvus supports default values for scalar fields, except for primary key fields. This means that during data insertion or update, some fields can be left empty. For more information, see Creating a Collection.

After enabling dynamic schema, you can append dynamic fields to the data. For detailed information, see Dynamic Schema.

import random
data = [
  [i for i in range(2000)],
  [str(i) for i in range(2000)],
  [i for i in range(10000, 12000)],
  [[random.random() for _ in range(2)] for _ in range(2000)],
  [], 
  None,
]

data.append([str("dy"*i) for i in range(2000)])

Inserting Data into Milvus

Insert the data into the collection.

By specifying partition_name, you can choose which partition to insert the data into.

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
mr = collection.insert(data)
Parameter Description
data The data to be inserted into Milvus.
partition_name (optional) The name of the partition where the data will be inserted.

After inserting entities into a previously indexed collection, there is no need to re-index the collection, as Milvus will automatically create indexes for the newly inserted data. For more information, see Can Indexes Be Created After Inserting Vectors?

Refreshing Data in Milvus

When data is inserted into Milvus, it is inserted into segments. Segments must reach a certain size to be sealed and indexed. Unsealed segments will use brute-force search. To avoid this situation, it is best to call flush() for any remaining data. The flush() call will seal any remaining segments and send them to the index. It is important to call this method only at the end of an insertion session. Calling it too frequently can lead to fragmented data that will need to be cleaned up later.

Limitations

Feature Maximum Limit
Vector dimension 32,768

Upsert Entity

Upsert update is a combination of insertion and deletion operations. In the context of the Milvus vector database, updating is a data-level operation. It overwrites the existing entity if the specified field exists in the collection, and inserts a new entity if the specified value does not exist.

The following example updates 3,000 rows of randomly generated data as sample data. When performing the update operation, it is important to note that this operation may affect performance because it involves deleting data.

Prepare Data

First, prepare the data to be updated. The data type to be updated must match the schema of the collection, otherwise Milvus will raise an exception.

Milvus supports default values for scalar fields, except for primary key fields. This means that during data insertion or update, some fields can be left empty. For more information, please refer to creating collections.

import random
nb = 3000
dim = 8
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
data = [
    [i for i in range(nb)],
    [str(i) for i in range(nb)],
    [i for i in range(10000, 10000+nb)],
    vectors,
    [str("dy"*i) for i in range(nb)]
]

Update Data

Update the data to the collection.

from pymilvus import Collection
collection = Collection("book")  # Get the existing collection.
mr = collection.upsert(data)

Delete Entity

Milvus supports entity deletion using boolean expressions through primary keys.

Prepare Boolean Expression

Prepare a boolean expression to filter the entities to be deleted.

Milvus only supports deleting entities with explicitly specified primary keys, which can be achieved using the "in" operator. Other operators can only be used in scalar filtering for querying or vector searching.

The following example filters data using primary key values of 0 and 1.

expr = "book_id in [0,1]"

Similar to SQL WHERE clause statement

Delete Entity

Use the boolean expression you created to delete entities. Milvus will return the list of IDs of the entities that were deleted.

from pymilvus import Collection
collection = Collection("book")      # Get the existing collection.
collection.delete(expr)

Compact Data

Milvus supports automatic data compression by default. You can configure Milvus to enable or disable compression and automatic compression.

If automatic compression is disabled, you can still manually compress data.

Manually Compress Data

Since compression usually takes a long time, compression requests are processed asynchronously.

from pymilvus import Collection
collection = Collection("book")      # Get the existing collection.
collection.compact()

Check Compression State

You can use the compression ID returned when triggering manual compression to check the compression state.

collection.get_compaction_state()