Open source vector database -- milvus

Milvus is an open source vector similarity search engine, which supports the addition, deletion and modification of TB level vectors and near real-time query. It has the characteristics of high flexibility, stability, reliability and high-speed query. Milvus integrates widely used vector index libraries such as Faiss, NMSLIB and Annoy, and provides a set of simple and intuitive API s, so that you can choose different index types for different scenarios. In addition, Milvus can also filter scalar data, further improving the recall rate and enhancing the flexibility of search.

characteristic

Heterogeneous computing

The performance of GPU based search vector and indexing is optimized
It can complete the millisecond search of TB data on a single general server
Dynamic data management

Support mainstream index libraries, distance calculation methods and monitoring tools

It integrates vector index libraries such as Faiss, NMSLIB and Annoy
Support quantization based index, graph based index and tree based index
Similarity calculation methods include Euclidean distance (L2), inner product (IP), Hamming distance, jackard distance, etc
Prometheus is used as a storage scheme for monitoring and performance indicators, and Grafana is used as a visual component for data display

Near real time search

The data inserted into Milvus can be searched in 1 second by default

Vector distance

Euclidean distance L2

inner product

Jackard distance

Tanamoto distance

Hamming distance

python SDK

pip3 install pymilvus

from milvus import Milvus, IndexType, MetricType, Status


milvus = Milvus(host='localhost', port='19530')
milvus = Milvus(uri='tcp://localhost:19530')

Create collection
- Create a set named test01, with a dimension of 256, a data file size of 1024 MB for automatic index creation, and a distance measurement method of Euclidean distance (L2)


param = {'collection_name':'test01', 'dimension':256, 'index_file_size':1024, 'metric_type':MetricType.L2}


milvus.create_collection(param)

Delete collection

milvus.drop_collection(collection_name='test01')

Create partition

milvus.create_partition('test01', 'tag01')

delete a partition

milvus.drop_partition(collection_name='test01', partition_tag='tag01')

Inserts a vector into the set

import random
vectors = [[random.random() for _ in range(256)] for _ in range(20)]

milvus.insert(collection_name='test01', records=vectors)


custom id
vector_ids = [id for id in range(20)]
milvus.insert(collection_name='test01', records=vectors, ids=vector_ids)

Insert vector in partition

milvus.insert('test01', vectors, partition_tag="tag01")

Delete by id

ids = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

milvus.delete_entity_by_id(collection_name='test01', id_array=ids)

Create index

ivf_param = {'nlist': 16384}
milvus.create_index('test01', IndexType.IVF_FLAT, ivf_param)

Delete index

milvus.drop_index('test01')

Query vector

search_param = {'nprobe': 16}
q_records = [[random.random() for _ in range(256)] for _ in range(5)]
milvus.search(collection_name='test01', query_records=q_records, top_k=2, params=search_param)


top_k It refers to the nearest target vector in vector space k Vector
top_k The scope of is:[1, 16384].

Query vector in partition

q_records = [[random.random() for _ in range(256)] for _ in range(5)]
milvus.search(collection_name='test01', query_records=q_records, top_k=1, partition_tags=['tag01'], params=search_param)

Data drop time after data modification is 1s

milvus.flush(collection_name_array=['test01'])

Data segment sorting

A collection can contain multiple data segments. If the vector data in a data segment is deleted, the space occupied by it will not be automatically released.

milvus.compact(collection_name='test01', timeout=1)

Keywords: Big Data NLP milvus

Added by dibyendrah on Tue, 08 Mar 2022 16:49:49 +0200

Programming VIP