Learn python-day02-22--From Python Distributed Crawler Creating Search Engine Scrapy

Section 366, Python Distributed Crawler Build Search Engine Scrapy Speech - bool Combinatorial Query of Elicsearch

bool query description

filter:[], field filtering, does not participate in scoring
Must:[], if there are multiple queries, they must satisfy [and]
should:[], if there are multiple queries, satisfy one or more matches [or]
must_not:[], on the contrary, if none of the query terms is satisfied, they will match.

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must satisfy [and]
#     "should": [], if there are multiple queries, satisfy one or more matches [or]
#     "must_not": [], on the contrary, if none of the query terms is satisfied, they will match [negative, not]
#}
 

Set up test data

#Set up test data
POST jobbole/job/_bulk
{"index":{"_id":1}}
{"salary":10,"title":"python"}
{"index":{"_id":2}}
{"salary":20,"title":"Scrapy"}
{"index":{"_id":3}}
{"salary":30,"title":"Django"}
{"index":{"_id":4}}
{"salary":40,"title":"Elasticsearch"}

bool combined query - the simplest filter filtered query term query, equal to

Filter queries to data with salary field equal to 20

You can see that there are two steps to perform, first to find all the data, then to filter all the data found and query the data whose salary field equals 20

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must all be satisfied
#     "should": [], if there are multiple queries, one or more matches are satisfied
#     "must_not": [], on the contrary, if none of the query terms are satisfied, they match
#}



#Simple filter query
#The simplest filter filter filter query
#If we're looking for data where the salary field equals 20
GET jobbole/job/_search
{
  "query": {
    "bool": {                   #bool combined query
      "must":{                  #If there are multiple query terms, they must all be satisfied
        "match_all":{}          #Query all fields
      },
      "filter": {               #filter filtering
        "term": {               #term queries, queries that don't split our search terms, queries that match them exactly
          "salary": 20          #Query salary field value is 20
        }
      }
    }
  }
}



#Simple filter query
#The simplest filter filter filter query
#If we're looking for data where the salary field equals 20
GET jobbole/job/_search
{
  "query": {
    "bool": {
      "must":{
        "match_all":{}
      },
      "filter": {
        "term": {
          "salary": 20
        }
      }
    }
  }
}

bool Combinatorial Query - The simplest terms query for filter queries, equivalent to or

Filter queries to data with salary field equal to 10 or 20

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must all be satisfied
#     "should": [], if there are multiple queries, one or more matches are satisfied
#     "must_not": [], on the contrary, if none of the query terms are satisfied, they match
#}




#Simple filter query
#The simplest filter filter filter query
#If we're looking for data where the salary field equals 20
#Filter data with a salary field value of 10 or 20
GET jobbole/job/_search
{
  "query": {
    "bool": {
      "must":{
        "match_all":{}
      },
      "filter": {
        "terms": {
          "salary":[10,20]
        }
      }
    }
  }
}

Note: Other basic queries can also be used in filter filtering

_analyze test to see the results of word breaker parsing
analyzer sets the segmenter type ik_max_word to refine the word, ik_smart to non-refine the word
text settings

#_analyze test to see the results of word breaker parsing
#analyzer sets the segmenter type ik_max_word to refine the word, ik_smart to non-refine the word
#text settings
GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "Python Network Development Engineer"
}

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "Python Network Development Engineer"
}

bool Composite Query - Composite Complex Query 1
Query data where the salary field is equal to 20 or the title field is equal to python, the salary field is not equal to 30, and the salary field is not equal to 10

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must satisfy [and]
#     "should": [], if there are multiple queries, satisfy one or more matches [or]
#     "must_not": [], on the contrary, if none of the query terms is satisfied, they will match [negative, not]
#}

# Query data where the salary field is equal to 20 or the title field is equal to python, the salary field is not equal to 30, and the salary field is not equal to 10
GET jobbole/job/_search
{
  "query": {
    "bool": {
      "should": [
        {"term":{"salary":20}},
        {"term":{"title":"python"}}
      ],
      "must_not": [
        {"term": {"salary":30}},
        {"term": {"salary":10}}]
    }
  }
}

bool Composite Query - Composite Complex Query 2
Query data where the salary field is equal to 20 or the title field is equal to python, the salary field is not equal to 30, and the salary field is not equal to 10

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must satisfy [and]
#     "should": [], if there are multiple queries, satisfy one or more matches [or]
#     "must_not": [], on the contrary, if none of the query terms is satisfied, they will match [negative, not]
#}

# Query for data where title field equals python, or (title field equals elasticsearch and salary equals 30)
GET jobbole/job/_search
{
  "query": {
    "bool": {
      "should":[
        {"term":{"title":"python"}},
        {"bool": {
          "must": [
            {"term": {"title":"elasticsearch"}},
            {"term":{"salary":30}}
          ]
        }}
      ]
    }
  }
}

bool Combinatorial Query - Filter Empty and Non-Empty

#Establish data
POST bbole/jo/_bulk
{"index":{"_id":"1"}}
{"tags":["search"]}
{"index":{"_id":"2"}}
{"tags":["search","python"]}
{"index":{"_id":"3"}}
{"other_field":["some data"]}
{"index":{"_id":"4"}}
{"tags":null}
{"index":{"_id":"1"}}
{"tags":["search",null]}

Methods for handling null null values

Gets the data for the tags field, where the value is not null and the value is not null

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must satisfy [and]
#     "should": [], if there are multiple queries, satisfy one or more matches [or]
#     "must_not": [], on the contrary, if none of the query terms is satisfied, they will match [negative, not]
#}


#Methods for handling null null values
#Gets the data for the tags field, where the value is not null and the value is not null
GET bbole/jo/_search
{
  "query": {
    "bool": {
      "filter": {
        "exists": {
          "field": "tags"
        }
      }
    }
  }
}

Gets data with an empty or null tags field value, or if the data does not have a tags field

# bool query
# Old version of filtered has been replaced by bool
#Complete with bool including must should must_not filter
#The format is as follows:

#bool:{
#     "filter": [], field filtering, does not participate in scoring
#     "Must": [], if there are multiple queries, they must satisfy [and]
#     "should": [], if there are multiple queries, satisfy one or more matches [or]
#     "must_not": [], on the contrary, if none of the query terms is satisfied, they will match [negative, not]
#}


#Gets data with an empty or null tags field value, or if the data does not have a tags field
GET bbole/jo/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "tags"
        }
      }
    }
  }
}
46 original articles published, 0 praised, 407 visits
Private letter follow

Keywords: Python ElasticSearch network Django

Added by jonshutt on Tue, 18 Feb 2020 05:14:49 +0200