Milvus is an open source vector similarity search engine, which supports the addition, deletion and modification of TB level vectors and near real-time query. It has the characteristics of high flexibility, stability, reliability and high-speed query. Milvus integrates widely used vector index libraries such as Faiss, NMSLIB and Annoy, and provides a set of simple and intuitive API s, so that you can choose different index types for different scenarios. In addition, Milvus can also filter scalar data, further improving the recall rate and enhancing the flexibility of search.
characteristic
- Heterogeneous computing
- The performance of GPU based search vector and indexing is optimized
- It can complete the millisecond search of TB data on a single general server
- Dynamic data management
- Support mainstream index libraries, distance calculation methods and monitoring tools
- It integrates vector index libraries such as Faiss, NMSLIB and Annoy
- Support quantization based index, graph based index and tree based index
- Similarity calculation methods include Euclidean distance (L2), inner product (IP), Hamming distance, jackard distance, etc
- Prometheus is used as a storage scheme for monitoring and performance indicators, and Grafana is used as a visual component for data display
- Near real time search
- The data inserted into Milvus can be searched in 1 second by default
Vector distance
Euclidean distance L2
inner product
Jackard distance
Tanamoto distance
Hamming distance
python SDK
pip3 install pymilvus
from milvus import Milvus, IndexType, MetricType, Status milvus = Milvus(host='localhost', port='19530') milvus = Milvus(uri='tcp://localhost:19530')
- Create collection
- Create a set named test01, with a dimension of 256, a data file size of 1024 MB for automatic index creation, and a distance measurement method of Euclidean distance (L2)
param = {'collection_name':'test01', 'dimension':256, 'index_file_size':1024, 'metric_type':MetricType.L2} milvus.create_collection(param)
- Delete collection
milvus.drop_collection(collection_name='test01')
- Create partition
milvus.create_partition('test01', 'tag01')
- delete a partition
milvus.drop_partition(collection_name='test01', partition_tag='tag01')
- Inserts a vector into the set
import random vectors = [[random.random() for _ in range(256)] for _ in range(20)] milvus.insert(collection_name='test01', records=vectors) custom id vector_ids = [id for id in range(20)] milvus.insert(collection_name='test01', records=vectors, ids=vector_ids)
- Insert vector in partition
milvus.insert('test01', vectors, partition_tag="tag01")
- Delete by id
ids = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] milvus.delete_entity_by_id(collection_name='test01', id_array=ids)
- Create index
ivf_param = {'nlist': 16384} milvus.create_index('test01', IndexType.IVF_FLAT, ivf_param)
- Delete index
milvus.drop_index('test01')
- Query vector
search_param = {'nprobe': 16} q_records = [[random.random() for _ in range(256)] for _ in range(5)] milvus.search(collection_name='test01', query_records=q_records, top_k=2, params=search_param) top_k It refers to the nearest target vector in vector space k Vector top_k The scope of is:[1, 16384].
- Query vector in partition
q_records = [[random.random() for _ in range(256)] for _ in range(5)] milvus.search(collection_name='test01', query_records=q_records, top_k=1, partition_tags=['tag01'], params=search_param)
- Data drop time after data modification is 1s
milvus.flush(collection_name_array=['test01'])
- Data segment sorting
A collection can contain multiple data segments. If the vector data in a data segment is deleted, the space occupied by it will not be automatically released.
milvus.compact(collection_name='test01', timeout=1)