Inserting Entities
This section describes how to insert data into Milvus via the client-side.
You can also use MilvusDM to migrate data to Milvus. MilvusDM is an open-source tool specifically designed for importing and exporting Milvus data.
Milvus 2.1 supports the VARCHAR
data type on scalar fields. When building an index for a scalar field of type VARCHAR, the default index type is a trie.
The following example inserts 2,000 rows of randomly generated data as sample data (the Milvus CLI example uses a pre-built remote CSV file containing similar data). Real-world applications often use higher-dimensional vectors than the example. You can prepare your own data to replace the sample data.
Data Preparation
First, prepare the data to be inserted. The data type to be inserted must match the schema of the collection, otherwise Milvus will raise an exception.
Milvus supports default values for scalar fields, except for primary key fields. This means that during data insertion or update, some fields can be left empty. For more information, see Creating a Collection.
After enabling dynamic schema, you can append dynamic fields to the data. For detailed information, see Dynamic Schema.
import random
data = [
[i for i in range(2000)],
[str(i) for i in range(2000)],
[i for i in range(10000, 12000)],
[[random.random() for _ in range(2)] for _ in range(2000)],
[],
None,
]
data.append([str("dy"*i) for i in range(2000)])
Inserting Data into Milvus
Insert the data into the collection.
By specifying partition_name
, you can choose which partition to insert the data into.
from pymilvus import Collection
collection = Collection("book") # Get an existing collection.
mr = collection.insert(data)
Parameter | Description |
---|---|
data |
The data to be inserted into Milvus. |
partition_name (optional) |
The name of the partition where the data will be inserted. |
After inserting entities into a previously indexed collection, there is no need to re-index the collection, as Milvus will automatically create indexes for the newly inserted data. For more information, see Can Indexes Be Created After Inserting Vectors?
Refreshing Data in Milvus
When data is inserted into Milvus, it is inserted into segments. Segments must reach a certain size to be sealed and indexed. Unsealed segments will use brute-force search. To avoid this situation, it is best to call flush()
for any remaining data. The flush()
call will seal any remaining segments and send them to the index. It is important to call this method only at the end of an insertion session. Calling it too frequently can lead to fragmented data that will need to be cleaned up later.
Limitations
Feature | Maximum Limit |
---|---|
Vector dimension | 32,768 |
Upsert Entity
Upsert update is a combination of insertion and deletion operations. In the context of the Milvus vector database, updating is a data-level operation. It overwrites the existing entity if the specified field exists in the collection, and inserts a new entity if the specified value does not exist.
The following example updates 3,000 rows of randomly generated data as sample data. When performing the update operation, it is important to note that this operation may affect performance because it involves deleting data.
Prepare Data
First, prepare the data to be updated. The data type to be updated must match the schema of the collection, otherwise Milvus will raise an exception.
Milvus supports default values for scalar fields, except for primary key fields. This means that during data insertion or update, some fields can be left empty. For more information, please refer to creating collections.
import random
nb = 3000
dim = 8
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
data = [
[i for i in range(nb)],
[str(i) for i in range(nb)],
[i for i in range(10000, 10000+nb)],
vectors,
[str("dy"*i) for i in range(nb)]
]
Update Data
Update the data to the collection.
from pymilvus import Collection
collection = Collection("book") # Get the existing collection.
mr = collection.upsert(data)
Delete Entity
Milvus supports entity deletion using boolean expressions through primary keys.
Prepare Boolean Expression
Prepare a boolean expression to filter the entities to be deleted.
Milvus only supports deleting entities with explicitly specified primary keys, which can be achieved using the "in" operator. Other operators can only be used in scalar filtering for querying or vector searching.
The following example filters data using primary key values of 0 and 1.
expr = "book_id in [0,1]"
Similar to SQL WHERE clause statement
Delete Entity
Use the boolean expression you created to delete entities. Milvus will return the list of IDs of the entities that were deleted.
from pymilvus import Collection
collection = Collection("book") # Get the existing collection.
collection.delete(expr)
Compact Data
Milvus supports automatic data compression by default. You can configure Milvus to enable or disable compression and automatic compression.
If automatic compression is disabled, you can still manually compress data.
Manually Compress Data
Since compression usually takes a long time, compression requests are processed asynchronously.
from pymilvus import Collection
collection = Collection("book") # Get the existing collection.
collection.compact()
Check Compression State
You can use the compression ID returned when triggering manual compression to check the compression state.
collection.get_compaction_state()