The Collection class in Milvus is similar to a table in MySQL, used to organize data and composed of one or more partitions.
Creating a Collection
A collection is composed of one or more partitions. When creating a new collection, Milvus will create a default partition named _default
. For more information, please refer to the terminology explanation - related to collections.
The following example will create a collection named book
with two partitions, including a primary key field named book_id
, an INT64
scalar field named word_count
, and a two-dimensional floating-point vector field named book_intro
. Actual applications often use higher-dimensional vectors than the example.
Preparation Mode
The mode is similar to defining the structure of a MYSQL table.
The collection to be created must contain a primary key field and a vector field. The primary key field supports INT64 and VarChar data types.
First, prepare the necessary parameters, including the field schema, collection schema, and collection name.
Before defining the collection schema, create a schema for each field in the collection. To reduce the complexity of data insertion, Milvus allows you to specify a default value for each scalar field (except the primary key field). This means that if you leave a field empty when inserting data, the default value configured during field schema creation will be used.
from pymilvus import CollectionSchema, FieldSchema, DataType
book_id = FieldSchema(
name="book_id",
dtype=DataType.INT64,
is_primary=True,
)
book_name = FieldSchema(
name="book_name",
dtype=DataType.VARCHAR,
max_length=200,
default_value="Unknown" # Default value is "Unknown"
)
word_count = FieldSchema(
name="word_count",
dtype=DataType.INT64,
default_value=9999 # Default value is 9999
)
book_intro = FieldSchema(
name="book_intro",
dtype=DataType.FLOAT_VECTOR,
dim=2
)
schema = CollectionSchema(
fields=[book_id, book_name, word_count, book_intro],
description="Test book search", # Description is "Test book search"
enable_dynamic_field=True # Enable dynamic schema
)
collection_name = "book"
Schema Type | Parameter | Description | Options |
---|---|---|---|
FieldSchema |
name |
The name of the field to be created. | N/A |
dtype |
The data type of the field to be created. | Primary key field: - DataType.INT64 (numpy.int64) - DataType.VARCHAR (VARCHAR) Scalar fields: - DataType.BOOL (Boolean) - DataType.INT8 (numpy.int8) - DataType.INT16 (numpy.int16) - DataType.INT32 (numpy.int32) - DataType.INT64 (numpy.int64) - DataType.FLOAT (numpy.float32) - DataType.DOUBLE (numpy.double) - DataType.VARCHAR (VARCHAR) - DataType.JSON (JSON) Vector fields: - BINARY_VECTOR (Binary vector) - FLOAT_VECTOR (Float vector) |
|
is_primary |
A switch that controls whether the field is a primary key field. This parameter needs to be specified for primary key fields. | True or False |
|
auto_id |
A switch to enable or disable automatic ID (primary key) assignment. This parameter needs to be specified for primary key fields and defaults to False . |
True or False |
|
max_length (Required for VARCHAR fields) |
The maximum length of the string that can be inserted. | [1, 65,535] | |
default_value |
The default value of the field. This parameter only applies to non-array and non-JSON scalar fields. It is not possible to specify a default value for the primary key field. For more information, please refer to the default_value parameter. |
N/A | |
dim (Required for vector fields) |
The dimension of the vector. | [1, 32,768] | |
description (Optional) |
Description of the field. | N/A | |
CollectionSchema |
fields |
The fields to be created for the collection. | N/A |
description (Optional) |
Description of the collection to be created. | N/A | |
enable_dynamic_field |
Whether to enable dynamic schema. Data type: Boolean (true or false ). Optional, default is False . For more information on dynamic schema, please refer to the user guide for dynamic schema and collection management. |
||
collection_name |
The name of the collection to be created. | N/A |
Creating a Collection with Schema
Next, create a collection with the specified schema.
from pymilvus import Collection
collection = Collection(
name=collection_name,
schema=schema,
using='default',
shards_num=2
)
Parameter | Description | Options |
---|---|---|
using (optional) |
Specify the alias of the server here, and choose where to create the collection in the Milvus server. | N/A |
shards_num (optional) |
The number of shards for the collection to be created. | [1,16] |
num_partitions (optional) |
The logical number of partitions for the collection to be created. | [1,4096] |
*kwargs: collection.ttl.seconds (optional) |
The TTL of the collection is the expiration time of the collection. Data in the expired collection will be cleaned and will not participate in searches or queries. Specify TTL in seconds. | The value should be 0 or greater. 0 means disabling TTL. |
Limitations
Resource Configuration
Feature | Maximum Limit |
---|---|
Collection Name Length | 255 characters |
Number of Partitions in the Collection | 4,096 |
Number of Fields in the Collection | 64 |
Number of Shards in the Collection | 16 |
default_value
Parameter
-
default_value
applies only to non-array and non-JSON scalar fields. -
default_value
does not apply to the primary key. - The data type of
default_value
must be the same as the data type specified indtype
. Otherwise, errors may occur. - In the case of using
auto_id
, it is not allowed to set all remaining fields to use the default value. In other words, when performing insert or update operations, you need to specify the value of at least one field. Otherwise, errors may occur.
Renaming a Collection
If you want to rename a collection, you can interact with Milvus using the collection renaming API. This guide will help you understand how to use your chosen SDK to rename an existing collection.
In the following code snippet, we create a collection and name it old_collection
, and then rename it to new_collection
.
from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
connections.connect(alias="default")
schema = CollectionSchema(fields=[
... FieldSchema("int64", DataType.INT64, description="int64", is_primary=True),
... FieldSchema("float_vector", DataType.FLOAT_VECTOR, is_primary=False, dim=128),
... ])
collection = Collection(name="old_collection", schema=schema)
utility.rename_collection("old_collection", "new_collection") # Output: True
utility.has_collection("new_collection") # Output: False
Modifying a Collection
Currently, the TTL feature is only available in Python.
collection.set_properties(properties={"collection.ttl.seconds": 1800})
The above example changes the collection's TTL to 1800 seconds.
Checking If a Collection Exists
Verify whether the collection exists in Milvus.
from pymilvus import utility
utility.has_collection("book")
Checking Collection Details
from pymilvus import Collection
collection = Collection("book") # Get an existing collection.
collection.schema # Returns the CollectionSchema of the collection.
collection.description # Returns the description of the collection.
collection.name # Returns the name of the collection.
collection.is_empty # Returns a boolean indicating whether the collection is empty.
collection.num_entities # Returns the number of entities in the collection.
collection.primary_field # Returns the schema.FieldSchema of the primary key field.
collection.partitions # Returns a list of [Partition] objects.
collection.indexes # Returns a list of [Index] objects.
collection.properties # Returns the expiration time of the data in the collection.
List all collections
from pymilvus import utility
utility.list_collections()
Drop a collection
from pymilvus import utility
utility.drop_collection("book")
Create a collection alias
from pymilvus import utility
utility.create_alias(
collection_name = "book",
alias = "publication"
)
Drop a collection alias
from pymilvus import utility
utility.drop_alias(alias = "publication")
Modify a collection alias
Modify an existing alias to point to a different collection. The following example is based on the scenario where the alias publication
was originally created for another collection.
from pymilvus import utility
utility.alter_alias(
collection_name = "book",
alias = "publication"
)
Load a collection
How to load a collection into memory before performing search or query operations. In Milvus, all search and query operations are executed in memory.
Milvus allows users to load collections as multiple replicas to utilize additional CPU and memory resources of query nodes. This feature improves overall QPS and throughput without additional hardware. Before loading a collection, ensure that you have created an index for it.
from pymilvus import Collection, utility
collection = Collection("book")
collection.load(replica_number=2)
utility.load_state("book")
utility.loading_progress("book")
Release a collection
How to release a collection after search or query operations to reduce memory usage.
from pymilvus import Collection
collection = Collection("book")
collection.release()