The Collection class in Milvus is similar to a table in MySQL, used to organize data and composed of one or more partitions.

Creating a Collection

A collection is composed of one or more partitions. When creating a new collection, Milvus will create a default partition named _default. For more information, please refer to the terminology explanation - related to collections.

The following example will create a collection named book with two partitions, including a primary key field named book_id, an INT64 scalar field named word_count, and a two-dimensional floating-point vector field named book_intro. Actual applications often use higher-dimensional vectors than the example.

Preparation Mode

The mode is similar to defining the structure of a MYSQL table.

The collection to be created must contain a primary key field and a vector field. The primary key field supports INT64 and VarChar data types.

First, prepare the necessary parameters, including the field schema, collection schema, and collection name.

Before defining the collection schema, create a schema for each field in the collection. To reduce the complexity of data insertion, Milvus allows you to specify a default value for each scalar field (except the primary key field). This means that if you leave a field empty when inserting data, the default value configured during field schema creation will be used.

from pymilvus import CollectionSchema, FieldSchema, DataType
book_id = FieldSchema(
  name="book_id",
  dtype=DataType.INT64,
  is_primary=True,
)
book_name = FieldSchema(
  name="book_name",
  dtype=DataType.VARCHAR,
  max_length=200,
  default_value="Unknown"  # Default value is "Unknown"
)
word_count = FieldSchema(
  name="word_count",
  dtype=DataType.INT64,
  default_value=9999  # Default value is 9999
)
book_intro = FieldSchema(
  name="book_intro",
  dtype=DataType.FLOAT_VECTOR,
  dim=2
)
schema = CollectionSchema(
  fields=[book_id, book_name, word_count, book_intro],
  description="Test book search",  # Description is "Test book search"
  enable_dynamic_field=True  # Enable dynamic schema
)
collection_name = "book"

Schema Type Parameter Description Options
FieldSchema name The name of the field to be created. N/A
dtype The data type of the field to be created. Primary key field: - DataType.INT64 (numpy.int64) - DataType.VARCHAR (VARCHAR) Scalar fields: - DataType.BOOL (Boolean) - DataType.INT8 (numpy.int8) - DataType.INT16 (numpy.int16) - DataType.INT32 (numpy.int32) - DataType.INT64 (numpy.int64) - DataType.FLOAT (numpy.float32) - DataType.DOUBLE (numpy.double) - DataType.VARCHAR (VARCHAR) - DataType.JSON (JSON) Vector fields: - BINARY_VECTOR (Binary vector) - FLOAT_VECTOR (Float vector)
is_primary A switch that controls whether the field is a primary key field. This parameter needs to be specified for primary key fields. True or False
auto_id A switch to enable or disable automatic ID (primary key) assignment. This parameter needs to be specified for primary key fields and defaults to False. True or False
max_length (Required for VARCHAR fields) The maximum length of the string that can be inserted. [1, 65,535]
default_value The default value of the field. This parameter only applies to non-array and non-JSON scalar fields. It is not possible to specify a default value for the primary key field. For more information, please refer to the default_value parameter. N/A
dim (Required for vector fields) The dimension of the vector. [1, 32,768]
description (Optional) Description of the field. N/A
CollectionSchema fields The fields to be created for the collection. N/A
description (Optional) Description of the collection to be created. N/A
enable_dynamic_field Whether to enable dynamic schema. Data type: Boolean (true or false). Optional, default is False. For more information on dynamic schema, please refer to the user guide for dynamic schema and collection management.
collection_name The name of the collection to be created. N/A

Creating a Collection with Schema

Next, create a collection with the specified schema.

from pymilvus import Collection
collection = Collection(
    name=collection_name,
    schema=schema,
    using='default',
    shards_num=2
    )
Parameter Description Options
using (optional) Specify the alias of the server here, and choose where to create the collection in the Milvus server. N/A
shards_num (optional) The number of shards for the collection to be created. [1,16]
num_partitions (optional) The logical number of partitions for the collection to be created. [1,4096]
*kwargs: collection.ttl.seconds (optional) The TTL of the collection is the expiration time of the collection. Data in the expired collection will be cleaned and will not participate in searches or queries. Specify TTL in seconds. The value should be 0 or greater. 0 means disabling TTL.

Limitations

Resource Configuration

Feature Maximum Limit
Collection Name Length 255 characters
Number of Partitions in the Collection 4,096
Number of Fields in the Collection 64
Number of Shards in the Collection 16

default_value Parameter

  • default_value applies only to non-array and non-JSON scalar fields.
  • default_value does not apply to the primary key.
  • The data type of default_value must be the same as the data type specified in dtype. Otherwise, errors may occur.
  • In the case of using auto_id, it is not allowed to set all remaining fields to use the default value. In other words, when performing insert or update operations, you need to specify the value of at least one field. Otherwise, errors may occur.

Renaming a Collection

If you want to rename a collection, you can interact with Milvus using the collection renaming API. This guide will help you understand how to use your chosen SDK to rename an existing collection.

In the following code snippet, we create a collection and name it old_collection, and then rename it to new_collection.

from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
connections.connect(alias="default")
schema = CollectionSchema(fields=[
...     FieldSchema("int64", DataType.INT64, description="int64", is_primary=True),
...     FieldSchema("float_vector", DataType.FLOAT_VECTOR, is_primary=False, dim=128),
... ])
collection = Collection(name="old_collection", schema=schema)
utility.rename_collection("old_collection", "new_collection") # Output: True
utility.has_collection("new_collection") # Output: False

Modifying a Collection

Currently, the TTL feature is only available in Python.

collection.set_properties(properties={"collection.ttl.seconds": 1800})

The above example changes the collection's TTL to 1800 seconds.

Checking If a Collection Exists

Verify whether the collection exists in Milvus.

from pymilvus import utility
utility.has_collection("book")

Checking Collection Details

from pymilvus import Collection
collection = Collection("book")  # Get an existing collection.

collection.schema                # Returns the CollectionSchema of the collection.
collection.description           # Returns the description of the collection.
collection.name                  # Returns the name of the collection.
collection.is_empty              # Returns a boolean indicating whether the collection is empty.
collection.num_entities          # Returns the number of entities in the collection.
collection.primary_field         # Returns the schema.FieldSchema of the primary key field.
collection.partitions            # Returns a list of [Partition] objects.
collection.indexes               # Returns a list of [Index] objects.
collection.properties            # Returns the expiration time of the data in the collection.

List all collections

from pymilvus import utility
utility.list_collections()

Drop a collection

from pymilvus import utility
utility.drop_collection("book")

Create a collection alias

from pymilvus import utility
utility.create_alias(
  collection_name = "book",
  alias = "publication"
)

Drop a collection alias

from pymilvus import utility
utility.drop_alias(alias = "publication")

Modify a collection alias

Modify an existing alias to point to a different collection. The following example is based on the scenario where the alias publication was originally created for another collection.

from pymilvus import utility
utility.alter_alias(
  collection_name = "book",
  alias = "publication"
)

Load a collection

How to load a collection into memory before performing search or query operations. In Milvus, all search and query operations are executed in memory.

Milvus allows users to load collections as multiple replicas to utilize additional CPU and memory resources of query nodes. This feature improves overall QPS and throughput without additional hardware. Before loading a collection, ensure that you have created an index for it.

from pymilvus import Collection, utility

collection = Collection("book")
collection.load(replica_number=2)

utility.load_state("book")

utility.loading_progress("book")

Release a collection

How to release a collection after search or query operations to reduce memory usage.

from pymilvus import Collection
collection = Collection("book") 
collection.release()