Search elastic4s search filter mode

Now we can start to explore the core part of es: search. Search has two modes: filter and query. Filter mode is the filter mode: find the records that meet the filter conditions as the results. Query mode is divided into two steps: first filter, and then calculate the similarity of each eligible record. It's just a lot of scoring. If we want to realize the query function of the traditional database first, then it is enough to use the filter mode. Filter mode can also use the word segmentation function of search engine to produce high-quality query results, and filter can be cached, which is more efficient. These functions of database management system can not be achieved. The filter mode of ES is implemented under the bool query framework, as follows:

GET /_search
{
  "query": {
    "bool": {
      "filter": [
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}

Here is a simple demonstration:

  val filterTerm = search("bank")
    .query(
      boolQuery().filter(termQuery("city.keyword","Brogan")))

The generated request json is as follows:

POST /bank/_search
{
  "query":{
    "bool":{
      "filter":[
       {
        "term":{"city.keyword":{"value":"Brogan"}}
       }
      ]
    }
  }
}

Let's explain the query request first: This is an entry query termQuery, which requires that the conditions match completely, including case. It can't use the fields that have been analyzed by the word breaker, so use city.keyword.

Return the query result json:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        }
      }
    ]
  }
}

Let's take a look at how the above json results are expressed by elasitic4s: first, the return type is Reponse[SearchResponse]. The Response class is defined as follows:

sealed trait Response[+U] {
  def status: Int                  // the http status code of the response
  def body: Option[String]         // the http response body if the response included one
  def headers: Map[String, String] // any http headers included in the response
  def result: U                    // returns the marshalled response U or throws an exception
  def error: ElasticError          // returns the error or throw an exception
  def isError: Boolean             // returns true if this is an error response
  final def isSuccess: Boolean = !isError // returns true if this is a success

  def map[V](f: U => V): Response[V]
  def flatMap[V](f: U => Response[V]): Response[V]

  final def fold[V](ifError: => V)(f: U => V): V = if (isError) ifError else f(result)
  final def fold[V](onError: RequestFailure => V, onSuccess: U => V): V = this match {
    case failure: RequestFailure => onError(failure)
    case RequestSuccess(_, _, _, result) => onSuccess(result)
  }
  final def foreach[V](f: U => V): Unit          = if (!isError) f(result)

  final def toOption: Option[U] = if (isError) None else Some(result)
}

Response[+U] is a high-level class. If u is replaced by SearchResponse, the returned result value can be obtained by def result: SearchResponse. Status represents the standard HTTP return status, isError,isSuccess represents the execution status, and error is the exact exception message. The header information of the returned result is in headers. Let's take a look at the definition of this SearchResponse class:

case class SearchResponse(took: Long,
                          @JsonProperty("timed_out") isTimedOut: Boolean,
                          @JsonProperty("terminated_early") isTerminatedEarly: Boolean,
                          private val suggest: Map[String, Seq[SuggestionResult]],
                          @JsonProperty("_shards") private val _shards: Shards,
                          @JsonProperty("_scroll_id") scrollId: Option[String],
                          @JsonProperty("aggregations") private val _aggregationsAsMap: Map[String, Any],
                          hits: SearchHits) {...}


case class SearchHits(total: Total,
                      @JsonProperty("max_score") maxScore: Double,
                      hits: Array[SearchHit]) {
  def size: Long = hits.length
  def isEmpty: Boolean = hits.isEmpty
  def nonEmpty: Boolean = hits.nonEmpty
}

case class SearchHit(@JsonProperty("_id") id: String,
                     @JsonProperty("_index") index: String,
                     @JsonProperty("_type") `type`: String,
                     @JsonProperty("_version") version: Long,
                     @JsonProperty("_seq_no") seqNo: Long,
                     @JsonProperty("_primary_term") primaryTerm: Long,
                     @JsonProperty("_score") score: Float,
                     @JsonProperty("_parent") parent: Option[String],
                     @JsonProperty("_shard") shard: Option[String],
                     @JsonProperty("_node") node: Option[String],
                     @JsonProperty("_routing") routing: Option[String],
                     @JsonProperty("_explanation") explanation: Option[Explanation],
                     @JsonProperty("sort") sort: Option[Seq[AnyRef]],
                     private val _source: Map[String, AnyRef],
                     fields: Map[String, AnyRef],
                     @JsonProperty("highlight") private val _highlight: Option[Map[String, Seq[String]]],
                     private val inner_hits: Map[String, Map[String, Any]],
                     @JsonProperty("matched_queries") matchedQueries: Option[Set[String]])
  extends Hit {...}

The important parts of the returned results, such as "score", "source" and "fields", are all in SearchHit. The complete return result processing example is as follows:

 val filterTerm  = client.execute(search("bank")
    .query(
      boolQuery().filter(termQuery("city.keyword","Brogan")))).await

  if (filterTerm.isSuccess) {
    if (filterTerm.result.nonEmpty)
      filterTerm.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
  } else println(s"Error: ${filterTerm.error.reason}")

There are many prefix queries in traditional query methods:

POST /bank/_search
{
  "query":{
    "bool":{
      "filter":[
       {
        "prefix":{"city.keyword":{"value":"Bro"}}
       }
      ]
    }
  }
}

  val filterPrifix  = client.execute(search("bank")
    .query(
      boolQuery().filter(prefixQuery("city.keyword","Bro")))
      .sourceInclude("address","city","state")
  ).await
  if (filterPrifix.isSuccess) {
    if (filterPrifix.result.nonEmpty)
      filterPrifix.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
  } else println(s"Error: ${filterPrifix.error.reason}")

....

Map(address -> 880 Holmes Lane, city -> Brogan, state -> IL)
Map(address -> 810 Nostrand Avenue, city -> Brooktrails, state -> GA)
Map(address -> 295 Whitty Lane, city -> Broadlands, state -> VT)
Map(address -> 511 Heath Place, city -> Brookfield, state -> OK)
Map(address -> 918 Bridge Street, city -> Brownlee, state -> HI)
Map(address -> 806 Pierrepont Place, city -> Brownsville, state -> MI)

Regular expression queries also include:

POST /bank/_search
{
  "query":{
    "bool":{
      "filter":[
       {
        "regexp":{"address.keyword":{"value":".*bridge.*"}}
       }
      ]
    }
  }
}


  val filterRegex  = client.execute(search("bank")
    .query(
      boolQuery().filter(regexQuery("address.keyword",".*bridge.*")))
    .sourceInclude("address","city","state")
  ).await
  if (filterRegex.isSuccess) {
    if (filterRegex.result.nonEmpty)
      filterRegex.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
  } else println(s"Error: ${filterRegex.error.reason}")


....
Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS)
Map(address -> 721 Cambridge Place, city -> Efland, state -> ID)

Of course, ES uses bool query to implement composite query. We can put a bool query into the filter framework, as follows:

POST /bank/_search
{
  "query":{
    "bool":{
      "filter":[
       {
        "regexp":{"address.keyword":{"value":".*bridge.*"}}
       },
       {
         "bool": {
         "must": [
           { "match" : {"lastname" : "lane"}}
           ]
         }
       }
      ]
    }
  }
}

The elastic4s QueryDSL statement and the returned result are as follows:

  val filterBool  = client.execute(search("bank")
    .query(
      boolQuery().filter(regexQuery("address.keyword",".*bridge.*"),
        boolQuery().must(matchQuery("lastname","lane"))))
    .sourceInclude("lastname","address","city","state")
  ).await
  if (filterBool.isSuccess) {
    if (filterBool.result.nonEmpty)
      filterBool.result.hits.hits.foreach {hit => println(s"score: ${hit.score}, ${hit.sourceAsMap}")}
  } else println(s"Error: ${filterBool.error.reason}")


...

score: 0.0, Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS, lastname -> Lane)

score: 0.0, indicating that the filter will not score. Maybe the execution efficiency will be improved.

Keywords: Scala JSON Database

Added by phppaper on Sun, 26 Apr 2020 16:51:34 +0300