Basic concepts of Elasticsearch
Index: Elasticsearch is a logical area used to store data. It is similar to the database concept in relational databases. An index can be on one or more shards, and a shard may have multiple replicas.
Document: the entity data stored in elastic search is similar to a row of data in a table in relational data.
A document consists of multiple fields. Fields with the same name in different documents must have the same type. Fields in the document can appear repeatedly, that is, a field will have multiple values, that is, multivalued.
Document type: in order to query, an index may have multiple documents, that is, document type It is similar to the concept of table in relational database. However, it should be noted that field s with the same name in different documents must be of the same type.
Mapping: it is similar to the concept of schema definition in relational database. Store the mapping information related to the field. Different document type s have different mapping.
The following figure is a comparison of some terms between ElasticSearch and relational database:
Relationnal database | Elasticsearch |
---|---|
Database | Index |
Table | Type |
Row | Document |
Column | Field |
Schema | Mapping |
Schema | Mapping |
Index | Everything is indexed |
SQL | Query DSL |
SELECT * FROM table... | GET http://... |
UPDATE table SET | PUT http://... |
Introduction to Python Elasticsearch DSL
Connect Es:
import elasticsearch es = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}]) Copy code
Let's take a look at the search. q refers to the search content. Spaces have no impact on the q query results. size specifies the number, from_ Specify the starting position, filter_path can specify the data to be displayed, as shown in the final result in this example_ id and_ type.
res_3 = es.search(index="bank", q="Holmes", size=1, from_=1) res_4 = es.search(index="bank", q=" 39225 5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type']) Copy code
Query all data at the specified index:
Where, index specifies the index, and the string represents an index; The list represents multiple indexes, such as index=["bank", "banner", "country"]; The regular form represents multiple indexes that meet the conditions, such as index=["apple *"], representing all indexes starting with apple.
You can also specify a specific doc type in search.
from elasticsearch_dsl import Search s = Search(using=es, index="index-test").execute() print s.to_dict() Copy code
Multiple query criteria can be superimposed according to a field query:
s = Search(using=es, index="index-test").query("match", sip="192.168.1.1") s = s.query("match", dip="192.168.1.2") s = s.excute() Copy code
Multi field query:
from elasticsearch_dsl.query import MultiMatch, Match multi_match = MultiMatch(query='hello', fields=['title', 'content']) s = Search(using=es, index="index-test").query(multi_match) s = s.execute() print s.to_dict() Copy code
You can also use the Q() object to query multiple fields. Fields is a list, and query is the value to be queried.
from elasticsearch_dsl import Q q = Q("multi_match", query="hello", fields=['title', 'content']) s = s.query(q).execute() print s.to_dict() Copy code
The first parameter of Q() is the query method or bool.
q = Q('bool', must=[Q('match', title='hello'), Q('match', content='world')]) s = s.query(q).execute() print s.to_dict() Copy code
The combined query through Q() is equivalent to another writing method of the above query.
q = Q("match", title='python') | Q("match", title='django') s = s.query(q).execute() print(s.to_dict()) # {"bool": {"should": [...]}} q = Q("match", title='python') & Q("match", title='django') s = s.query(q).execute() print(s.to_dict()) # {"bool": {"must": [...]}} q = ~Q("match", title="python") s = s.query(q).execute() print(s.to_dict()) # {"bool": {"must_not": [...]}} Copy code
Filtering: range filtering here, range is the method, timestamp is the name of the field to be queried, gte is greater than or equal to, lt is less than, and can be set as needed.
About the difference between term and match, term is an exact match, match will be blurred, word segmentation will be performed, and the matching score will be returned. (if term queries a string of lowercase letters, if there is uppercase, it will return null, i.e. no hit. Match is case insensitive, and the returned result is the same.)
# Range query s = s.filter("range", timestamp={"gte": 0, "lt": time.time()}).query("match", country="in") # General filtration res_3 = s.filter("terms", balance_num=["39225", "5686"]).execute() Copy code
Other writing methods:
s = Search() s = s.filter('terms', tags=['search', 'python']) print(s.to_dict()) # {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}} s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])]) print(s.to_dict()) # {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}} s = s.exclude('terms', tags=['search', 'python']) # perhaps s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])]) print(s.to_dict()) # {'query': {'bool': {'filter': [{'bool': {'must_not': [{'terms': {'tags': ['search', 'python']}}]}}]}}} Copy code
Aggregation can be superimposed after query, filtering and other operations, and aggs needs to be added.
bucket is a group. The first parameter is the name of the group and you can specify it yourself. The second parameter is a method and the third is a specified field.
The same is true for metric. The metric methods include sum, avg, max, min, etc., but it should be noted that there are two methods that can return these values at one time, stats and extended_stats, which can also return variance equivalent.
# Example 1 s.aggs.bucket("per_country", "terms", field="timestamp").metric("sum_click", "stats", field="click").metric("sum_request", "stats", field="request") # Example 2 s.aggs.bucket("per_age", "terms", field="click.keyword").metric("sum_click", "stats", field="click") # Example 3 s.aggs.metric("sum_age", "extended_stats", field="impression") # Example 4 s.aggs.bucket("per_age", "terms", field="country.keyword") # In example 5, this aggregation is based on the interval a = A("range", field="account_number", ranges=[{"to": 10}, {"from": 11, "to": 21}]) res = s.execute() Copy code
Finally, execute() should still be executed. It should be noted here that the s.aggs operation cannot be received with variables (for example, res=s.aggs, this operation is wrong), and the aggregated results will be saved and displayed in res.
sort
s = Search().sort( 'category', '-title', {"lines" : {"order" : "asc", "mode" : "avg"}} ) Copy code
paging
s = s[10:20] # {"from": 10, "size": 10} Copy code
Some extension methods can be seen by interested students:
s = Search() # Set extended properties to use ` extra() method s = s.extra(explain=True) # Set parameters using ` params()` s = s.params(search_type="count") # To limit the returned fields, you can use the 'source()' method # only return the selected fields s = s.source(['title', 'body']) # don't return any fields, just the metadata s = s.source(False) # explicitly include/exclude fields s = s.source(include=["title"], exclude=["user.*"]) # reset the field selection s = s.source(None) # Serialize a query using dict s = Search.from_dict({"query": {"match": {"title": "python"}}}) # Modify an existing query s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42}) Copy code