Leyou mall personal notes - Mall system framework knowledge

Set up front desk system

Static resources

new project -> Static Web- Static Web
Save project to hm49 \ code \ Leyou portal
Unzip the Leyou portal file of the course
Copy directly to new project

live-server

Without webpack, we can't run this project using webpack dev server to achieve hot deployment.
Not a single page
Therefore, here we use another hot deployment method: live server,

brief introduction

Address; https://www.npmjs.com/package/live-server

This is a small development server with hot loading function. Use it to display your HTML / JavaScript / CSS, but it can't be used to deploy the final website.

Installation and operating parameters

To install, use the npm command. It is recommended to install globally. It can be used anywhere in the future

npm install -g live-server

When running, directly enter the command:

live-server

In addition, you can follow some parameters to configure after running the command:

--port=NUMBER - select the port to use. The default value is PORT env var or 8080
--host=ADDRESS - select the host address to bind. The default value is IP env var or 0.0.0.0 ("any address")
--No browser - disable automatic Web browser startup
--browser=BROWSER - specifies to use the browser instead of the system default
--quiet | -q - prohibit recording
--verbose | -V - more logging (record all requests, display all listening IPv4 interfaces, etc.)
--open=PATH - start the browser to PATH instead of server root
--watch=PATH - comma separated paths to specifically monitor changes (default: view everything)
--ignore=PATH - comma separated path string to ignore( anymatch -compatible definition)
--ignorePattern=RGXP - the regular expression of the file is ignored (i.e. * \. jade) (in favor of -- ignore is not recommended)
--middleware=PATH - the path to export the. js file of the middleware function to be added; it can be the name without path or the extension referring to the middleware bundled in the middleware folder
--Entry file = path - provide this file (server root) in place of the missing file (useful for single page applications)
--mount=ROUTE:PATH - provide path content under the defined route (there may be multiple definitions)
--spa - convert request from / abc to / abc (convenient for single page application)
--wait=MILLISECONDS - (default 100ms) wait for all changes and reload
--htpasswd=PATH - enables HTTP auth that expects the htpasswd file located in PATH
--CORS - enable CORS for any source (reflecting the request source, supporting requests for credentials)
--https=PATH - path to HTTPS configuration module
--proxy=ROUTE:URL - proxy all requests to the URL
--help | -h - display concise tips and exit
--version | -v - displays the version and exits

test

We enter the Leyou portal directory and enter the command:

live-server --port=9002

If you can't install or run, you must turn off the front-end project of the management page and try again, because the port may be occupied

Domain name access

Now we can only access through: http://127.0.0.1:9002

We want to access: http://www.leyou.com

Step 1: modify the hosts file and add a line of configuration:

127.0.0.1 www.leyou.com

The second step is to modify the nginx configuration and reverse proxy www.leyou.com to 127.0.0.1:9002
The configuration should be written to the first

server {
    listen       80;
    server_name  www.leyou.com;

    proxy_set_header X-Forwarded-Host $host;
    proxy_set_header X-Forwarded-Server $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    location / {
        proxy_pass http://127.0.0.1:9002;
        proxy_connect_timeout 600;
        proxy_read_timeout 600;
    }
}

Reload nginx configuration: nginx.exe -s reload

common.js

In order to facilitate subsequent development, we defined some tools in the foreground system and put them in common.js:

Firstly, some global configurations are made for axios, such as request timeout, basic path of request, whether cross domain operation of cookie s is allowed, etc

The defined object ly, also known as leyou, contains the following properties:

getUrlParam(key): get the parameters in the url path
http: alias of the axios object. You can use ly.http.get() to initiate ajax requests later
store: local storage is easy to operate, which will be described in detail later
formatPrice: format the price. If a string is passed in, it will be expanded by 100 and converted to a number. If a number is passed in, it will be reduced by 100 times and converted to a string
Format date (val, pattern): formats the date object val according to the specified pattern template
stringify: converts an object to a parameter string
parse: changes the parameter string to a js object

Elasticsearch introduction and installation

When users visit our home page, they usually search directly to find the goods they want to buy.

The number of goods is very large, and the classification is complex. How to correctly display the goods users want, filter reasonably and promote the transaction as soon as possible is the core of the search system.

In the face of such complex search business and data volume, it is difficult to use traditional database search. Generally, we will use full-text retrieval technology, such as Solr, which we have learned before.

Today, however, we are going to talk about another full-text retrieval technology: elastic search.

brief introduction

Elastic

Elastic official website: https://www.elastic.co/cn/

Elastic has a complete product line and solutions: Elasticsearch, Kibana, Logstash, etc. the above three are the ELK technology stack.

Elasticsearch

Elasticsearch website: https://www.elastic.co/cn/products/elasticsearch

As mentioned above, Elasticsearch has the following features:

Distributed, no need to build clusters manually (solr needs to be configured manually, and Zookeeper is used as the registration center)
Restful style, all API s follow the Rest principle and are easy to use
For near real-time search, data updates are almost completely synchronized in Elasticsearch.
At present, the latest version of Elasticsearch is 6.3.1, so we use 6.3.0
Virtual machine JDK1.8 and above is required

install and configure

In order to simulate the real scene, we will install Elasticsearch under linux.

Create a new user leyou

For security reasons, elasticsearch is not allowed to run under the root account by default.

Create user:

useradd leyou

Set password:

passwd leyou

Switch users:

su - leyou

Upload the installation package and unzip it

We will upload the installation package to the: / home/leyou directory

Decompression:

tar -zxvf elasticsearch-6.2.4.tar.gz

We rename the directory:

mv elasticsearch-6.3.0/ elasticsearch

Enter to view the directory structure:

Modify configuration

We go to the config Directory: cd config

There are two configuration files that need to be modified:

jvm.options

Elasticsearch is based on Lucene, and the underlying layer of Lucene is implemented in java, so we need to configure jvm parameters.

Edit jvm.options:

vim jvm.options

The default configuration is as follows:

-Xms1g
-Xmx1g

Too much memory. Let's turn it down:

-Xms512m
-Xmx512m

elasticsearch.yml

vim elasticsearch.yml

Modify the data and log directory:

path.data: /home/leyou/elasticsearch/data # Data directory location
path.logs: /home/leyou/elasticsearch/logs # Log directory location

We modified the data and logs directories to point to the installation directory of elasticsearch. However, these two directories do not exist, so we need to create them.

Enter the root directory of elasticsearch and create:

mkdir data
mkdir logs

Modify the bound ip:

network.host: 0.0.0.0 # Bind to 0.0.0.0 and allow any ip to access

Only local access is allowed by default. After it is modified to 0.0.0.0, it can be accessed remotely

At present, we are doing stand-alone installation. If we want to do cluster, we only need to add other node information to this configuration file.

Other configurable information of elasticsearch.yml:

Attribute name	explain
cluster.name	Configure the cluster name of elasticsearch. The default is elasticsearch. It is suggested to change it to a meaningful name.
node.name	For the node name, es will specify a name randomly by default. It is recommended to specify a meaningful name to facilitate management
path.conf	Set the storage path of the configuration file. The tar or zip package is installed in the config folder under the es root directory by default, and the rpm is installed in / etc/ elasticsearch by default
path.data	Set the storage path of index data. The default is the data folder under the es root directory. You can set multiple storage paths separated by commas
path.logs	Set the storage path of the log file. The default is the logs folder under the es root directory
path.plugins	Set the storage path of plug-ins. The default is the plugins folder under the es root directory
bootstrap.memory_lock	Set to true to lock the memory used by ES and avoid memory swap
network.host	Set bind_host and publish_host, set to 0.0.0.0 to allow Internet access
http.port	Set the http port of external service. The default value is 9200.
transport.tcp.port	Communication port between cluster nodes
discovery.zen.ping.timeout	Set the timeout time of ES automatic discovery node connection, which is 3 seconds by default. If the network delay is high, it can be set larger
discovery.zen.minimum_master_nodes	The minimum number of primary nodes. The formula of this value is: (master_eligible_nodes / 2) + 1. For example, if there are three qualified primary nodes, it should be set to 2

function

Enter the elasticsearch/bin directory and you can see the following executable file:

Then enter the command:

./elasticsearch

An error was found and the startup failed.

Error 1: kernel too low

We are using CentOS 6, whose linux kernel version is 2.6. The Elasticsearch plug-in requires at least version 3.5. But it doesn't matter. We can disable this plug-in.

Modify the elasticsearch.yml file and add the following configuration at the bottom:

bootstrap.system_call_filter: false

Then restart

Error 2: insufficient file permissions

[1]: max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536]

We use the leyou user instead of root, so the file permissions are insufficient.

First log in as root.

Then modify the configuration file:

vim /etc/security/limits.conf

Add the following:

* soft nofile 65536

* hard nofile 131072

* soft nproc 4096

* hard nproc 4096

Error 3: insufficient threads

In the error report just now, there is another line:

[1]: max number of threads [1024] for user [leyou] is too low, increase to at least [4096]

This is because there are not enough threads.

Continue to modify the configuration:

vim /etc/security/limits.d/90-nproc.conf

Modify the following:

* soft nproc 1024

Replace with:

* soft nproc 4096

Error 4: process virtual memory

[3]: max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]

vm.max_map_count: limit the number of VMA (virtual memory area) that a process can own. Continue to modify the configuration file:

vim /etc/sysctl.conf

Add the following:

vm.max_map_count=655360

Then execute the command:

sysctl -p

Restart terminal window

After all errors are corrected, be sure to restart your Xshell terminal, otherwise the configuration will be invalid.

Exception in thread "main" java.nio.file.AccessDeniedException: /root/home/searchengine/elasticsearc
Startup failed because of insufficient permissions
However, you can't start it with root, so you need to modify the permissions
Modify with command

chown -R ITCAST:ITCAST /home/leyou/eal

https://blog.csdn.net/czczcz_/article/details/83308702

ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max number of threads [3767] for user [ITCAST] is too low, increase to at least [4096]
You'd better restart after modification

After startup

We can see

You can see that two ports are bound:

9300: communication interface between cluster nodes
9200: client access interface

We access in the browser: http://192.168.56.101:9200

Install Kibana

What is Kibana?

Kibana is a data statistics tool of Elasticsearch index library based on Node.js. It can use the aggregation function of Elasticsearch to generate various charts, such as column chart, line chart, pie chart, etc.

It also provides a console for operating Elasticsearch index data, and provides some API tips, which is very helpful for us to learn Elasticsearch syntax.

install

Because kibana depends on node, node is not installed in our virtual machine, but in window. So we choose to use kibana under window.

The latest version is consistent with elasticsearch, which is also 6.3.0

Unzip into hm49\tools

Configuration run

to configure

Enter the config directory under the installation directory and modify the kibana.yml file:

Modify the address of elasticsearch server:

elasticsearch.url: "http://192.168.56.101:9200"

function

Enter the bin directory under the installation directory:

Double click Run

It is found that kibana's listening port is 5601

We visit: http://127.0.0.1:5601

Console

Select the DevTools menu on the left to enter the console page:

On the right side of the page, we can enter the request to access elastic search.

Install ik word splitter

Lucene's IK word breaker was not maintained as early as 2012. Now we need to maintain and upgrade the version based on it, and develop it as an integrated plug-in of elasticsearch. It is maintained and upgraded together with elasticsearch, and the version is the same. The latest version is 6.3.0

install

Upload the zip package in the pre class materials and unzip it into the plugins directory of Elasticsearch Directory:
Unzip the file using the unzip command:

unzip elasticsearch-analysis-ik-6.3.0.zip -d ik-analyzer

Then restart elasticsearch:

API

Elasticsearch provides a Rest style API, namely http request interface, and also provides client APIs in various languages

Rest style API

Document address: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Client API

Elasticsearch supports many clients: https://www.elastic.co/guide/en/elasticsearch/client/index.html

After clicking Java Rest Client, you will find two more:

The Low Level Rest Client is a low-level package that provides some basic functions but is more flexible

The High Level Rest Client is a high-level package based on the Low Level Rest Client. It has richer and more complete functions, and the API will become simpler

How to learn

It is recommended to learn the Rest style API first to understand the underlying implementation of the request and the request body format.

Api and usage examples mixed notes

This part will combine the introduction of Api with the later Demo project. Therefore, the corresponding parts are put together to facilitate understanding and query of engineering examples

Spring Data Elasticsearch instance preparation

brief introduction

Spring Data Elasticsearch is a sub module under the Spring Data project.

Check the official website of Spring Data: http://projects.spring.io/spring-data/

The mission of Spring Data is to provide a familiar and consistent Spring based programming model for data access, while still retaining the special characteristics of the underlying data store.

It makes it easy to use data access technology, relational and non relational databases, map reduce framework and cloud based data services. This is an umbrella project that contains many subprojects specific to a given database. Behind these exciting technology projects are developed by many companies and developers.

The mission of Spring Data is to provide a unified programming interface for various data access, whether it is a relational database (such as MySQL), a non relational database (such as Redis), or an index database such as Elasticsearch. So as to simplify the developer's code and improve the development efficiency.

Modules containing many different data operations:

Page of Spring Data Elasticsearch: https://projects.spring.io/spring-data-elasticsearch/

features:

Support Spring's @ Configuration based java Configuration mode or XML Configuration mode
Provides a convenient tool class * * ElasticsearchTemplate * * for operating ES. This includes implementing automatic intelligent mapping between documents and POJO s.
Use Spring's data transformation service to realize the function rich object mapping
Annotation based metadata mapping, and can be extended to support more different data formats
The corresponding implementation method is automatically generated according to the persistence layer interface without manual writing of basic operation code (similar to mybatis, it is automatically implemented according to the interface). Of course, manual customization query is also supported

Create Demo project

Direct maven creation

pom dependency:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.leyou.demo</groupId>
	<artifactId>elasticsearch</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>elasticsearch</name>
	<description>Demo project for Spring Boot</description>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.0.6.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
</project>

application.yml file configuration:

spring:
  data:
    elasticsearch:
      cluster-name: elasticsearch
      cluster-nodes: 192.168.56.101:9300

Entity class
First, we prepare the entity class:

public class Item {
    Long id;
    String title; //title
    String category;// classification
    String brand; // brand
    Double price; // Price
    String images; // Picture address
}

mapping

Spring Data declares the mapping attributes of fields through annotations. There are three annotations:

@Document acts on the class, marking the entity class as a document object, which generally has four attributes
- indexName: the name of the corresponding index library
- Type: corresponds to the type in the index library
- shards: the number of shards. The default value is 5
- Replicas: number of replicas. The default is 1
@id acts on the member variable and marks a field as the id primary key
@Field acts on the member variable, marks it as the field of the document, and specifies the field mapping attribute:
- Type: field type. The value is enumeration: FieldType
- Index: whether to index. Boolean type. The default is true
- Store: whether to store. Boolean type. The default is false
- analyzer: word breaker Name: ik_max_word

Example:

@Document(indexName = "item",type = "docs", shards = 1, replicas = 0)
public class Item {
    @Id
    private Long id;
    
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String title; //title
    
    @Field(type = FieldType.Keyword)
    private String category;// classification
    
    @Field(type = FieldType.Keyword)
    private String brand; // brand
    
    @Field(type = FieldType.Double)
    private Double price; // Price
    
    @Field(index = false, type = FieldType.Keyword)
    private String images; // Picture address
}

Operation index

Basic concepts

Elasticsearch is also a full-text search library based on Lucene. Its essence is to store data. Many concepts are similar to MySQL.

Comparison relationship:

Index( indices)--------------------------------Databases database

  Type( type)-----------------------------Table data sheet

     Documentation( Document)----------------Row that 's ok

	   Field( Field)-------------------Columns column

detailed description:

concept	explain
Index Libraries	Indexes is the plural of index, which represents many indexes,
type	Type is to simulate the concept of table in mysql. There can be different types of indexes under an index library, such as commodity index and order index, with different data formats. However, this will lead to confusion in the index library, so this concept will be removed in future versions
document	The original data stored in the index library. For example, each item of commodity information is a document
field	Properties in document
Mapping configuration	The data type, attribute, index, storage and other characteristics of the field

Is it similar to the concepts in Lucene and solr.

In addition, there are some cluster related concepts in SolrCloud, and similar concepts in Elasticsearch:

Indexes (plural of index): the logical complete index collection1
shard: each part after data splitting
replica: replication of each shard

It should be noted that Elasticsearch itself is distributed, so even if you have only one node, Elasticsearch will slice and copy your data by default. When you add new data to the cluster, the data will be balanced among the newly added nodes.

Create index

grammar

Elasticsearch adopts Rest style API, so its API is an http request. You can initiate http requests with any tool

Request format for index creation:

Request method: PUT
Request path: / index library name
Request parameter: json format:
```
{
    "settings": {
        "number_of_shards": 3,
        "number_of_replicas": 2
      }
}
```
- Settings: index library settings
  - number_of_shards: number of Shards
  - number_of_replicas: number of replicas

Create with kibana

kibana's console can simplify http requests, for example:

Therefore, the server address of elasticsearch is omitted

And there are grammar tips, very comfortable.

View index settings

grammar
Get request can help us view index information. Format:

GET /Index library name

Alternatively, we can use * to query all index library configurations:

Delete index

Use DELETE request

grammar

DELETE /Index library name

Example

Of course, we can also use the HEAD request to check whether the index exists:

Mapping configuration

Create mapping field

grammar

The request mode is still PUT

PUT /Index library name/_mapping/Type name
{
  "properties": {
    "Field name": {
      "type": "type",
      "index": true，
      "store": true，
      "analyzer": "Tokenizer "
    }
  }
}

Type name: This is the concept of type, which is similar to different tables in the database
Field name: it can be filled in arbitrarily. Many attributes can be specified, such as:
Type: type, which can be text, long, short, date, integer, object, etc
Index: whether to index. The default value is true
Store: store or not. The default value is false
analyzer: word splitter. ik_max_word here uses ik word splitter

Example

Initiate request:

PUT heima/_mapping/goods
{
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "ik_max_word"
    },
    "images": {
      "type": "keyword",
      "index": "false"
    },
    "price": {
      "type": "float"
    }
  }
}

Response result:

{
  "acknowledged": true
}

View mapping relationships

Syntax:

GET /Index library name/_mapping

Example:

GET /heima/_mapping

Response:

{
  "heima": {
    "mappings": {
      "goods": {
        "properties": {
          "images": {
            "type": "keyword",
            "index": false
          },
          "price": {
            "type": "float"
          },
          "title": {
            "type": "text",
            "analyzer": "ik_max_word"
          }
        }
      }
    }
  }
}

Detailed explanation of field properties

type

The data types supported in Elasticsearch are very rich:

Let's say a few key points:

There are two types of String:
- text: participle, not aggregation
- keyword: non separable words. The data will be matched as complete fields and can participate in aggregation
Numerical: numerical type, divided into two categories
- Basic data types: long, integer, short, byte, double, float, half_float
- High precision type of floating point number: scaled_float
  - You need to specify an accuracy factor, such as 10 or 100. elasticsearch will multiply the real value by this factor, store it, and restore it when it is taken out.
Date: date type

elasticsearch can format the date as a string for storage, but it is recommended that we store it as a millisecond value and a long value to save space.

index

Index affects the index of the field.

True: if the field is indexed, it can be used for search. The default value is true
false: the field will not be indexed and cannot be used for search

The default value of index is true, which means that all fields will be indexed without any configuration.

However, some fields we don't want to be indexed, such as the picture information of goods, we need to manually set the index to false.

store

Whether to store the data additionally.

When learning lucene and solr, we know that if the store of a field is set to false, the value of this field will not appear in the document list and will not be displayed in the user's search results.

However, in elastic search, you can find results even if store is set to false.

The reason is that when Elasticsearch creates a document index, it will backup the original data in the document and save it to an attribute called _source. In addition, we can filter _sourceto select which to display and which not to display.

If the store is set to true, an extra copy of data will be stored outside the _source, which is redundant. Therefore, generally, we will set the store to false. In fact, the default value of the store is false.

boost

The incentive factor is the same as that in lucene

Others will not be explained one by one. We don't use much. Please refer to the official documents:

New data

Randomly generated id

Through POST request, data can be added to an existing index library.

Syntax:

POST /Index library name/Type name
{
    "key":"value"
}

Example:

POST /heima/goods/
{
    "title":"Mi phones",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":2699.00
}

Response:

{
  "_index": "heima",
  "_type": "goods",
  "_id": "r9c1KGMBIhaxtY5rlRKv",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 3,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 2
}

View data through kibana:

get _search
{
    "query":{
        "match_all":{}
    }
}

{
  "_index": "heima",
  "_type": "goods",
  "_id": "r9c1KGMBIhaxtY5rlRKv",
  "_version": 1,
  "_score": 1,
  "_source": {
    "title": "Mi phones",
    "images": "http://image.leyou.com/12479122.jpg",
    "price": 2699
  }
}

_ Source: source document information. All data is in it.
_ id: the unique identifier of this document, which is not associated with the document's own id field

Custom id

If we want to specify the id when adding, we can do this:

POST /Index library name/type/id value
{
    ...
}

Example:

POST /heima/goods/2
{
    "title":"Rice mobile phone",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":2899.00
}

Data obtained:

{
  "_index": "heima",
  "_type": "goods",
  "_id": "2",
  "_score": 1,
  "_source": {
    "title": "Rice mobile phone",
    "images": "http://image.leyou.com/12479122.jpg",
    "price": 2899
  }
}

Intelligent judgment

When learning Solr, we found that when adding data, we can only use the fields with mapping attributes configured in advance, otherwise an error will be reported.

However, there is no such provision in elastic search.

In fact, Elasticsearch is very intelligent. You don't need to set any mapping mapping mapping for the index library. It can also judge the type according to the data you enter and add data mapping dynamically.

Test:

POST /heima/goods/3
{
    "title":"Super meter mobile phone",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":2899.00,
    "stock": 200,
    "saleable":true
}

We added two additional fields: stock inventory and whether saleable is on the shelf.

Look at the results:

{
  "_index": "heima",
  "_type": "goods",
  "_id": "3",
  "_version": 1,
  "_score": 1,
  "_source": {
    "title": "Super meter mobile phone",
    "images": "http://image.leyou.com/12479122.jpg",
    "price": 2899,
    "stock": 200,
    "saleable": true
  }
}

Look at the mapping relationship of the index library:

{
  "heima": {
    "mappings": {
      "goods": {
        "properties": {
          "images": {
            "type": "keyword",
            "index": false
          },
          "price": {
            "type": "float"
          },
          "saleable": {
            "type": "boolean"
          },
          "stock": {
            "type": "long"
          },
          "title": {
            "type": "text",
            "analyzer": "ik_max_word"
          }
        }
      }
    }
  }
}

Both stock and saleable were successfully mapped.

Modify data

Change the request method just added to PUT to modify it. However, the id must be specified for modification,

If the document corresponding to id exists, modify it
If the document corresponding to id does not exist, it will be added

For example, we modify the data with id 3:

PUT /heima/goods/3
{
    "title":"Super rice mobile phone",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":3899.00,
    "stock": 100,
    "saleable":true
}

result:

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 9,
    "successful": 9,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "Super rice mobile phone",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3899,
          "stock": 100,
          "saleable": true
        }
      }
    ]
  }
}

Delete data

To DELETE a DELETE request, you need to DELETE it according to the id:

grammar

DELETE /Index library name/Type name/id value

Instance Template index operation

Create indexes and maps

Create index

An API for creating indexes is provided in ElasticsearchTemplate:

It can be generated automatically according to the class information, or you can manually specify indexName and Settings

mapping

Mapping related API s:

The mapping can be generated according to the bytecode information (annotation configuration) of the class, or written manually

Here we use the bytecode information of the class to create an index and map:

@RunWith(SpringRunner.class)
@SpringBootTest(classes = ItcastElasticsearchApplication.class)
public class IndexTest {

    @Autowired
    private ElasticsearchTemplate elasticsearchTemplate;

    @Test
    public void testCreate(){
        // When creating an index, it will be created according to the @ Document annotation information of the Item class
        elasticsearchTemplate.createIndex(Item.class);
        // Configuring the mapping will automatically complete the mapping according to the id, Field and other fields in the Item class
        elasticsearchTemplate.putMapping(Item.class);
    }
}

result:

GET /item
{
  "item": {
    "aliases": {},
    "mappings": {
      "docs": {
        "properties": {
          "brand": {
            "type": "keyword"
          },
          "category": {
            "type": "keyword"
          },
          "images": {
            "type": "keyword",
            "index": false
          },
          "price": {
            "type": "double"
          },
          "title": {
            "type": "text",
            "analyzer": "ik_max_word"
          }
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "1s",
        "number_of_shards": "1",
        "provided_name": "item",
        "creation_date": "1525405022589",
        "store": {
          "type": "fs"
        },
        "number_of_replicas": "0",
        "uuid": "4sE9SAw3Sqq1aAPz5F6OEg",
        "version": {
          "created": "6020499"
        }
      }
    }
  }
}

Delete index

API for deleting indexes:

It can be deleted according to the class name or index name.

Example:

@Test
public void deleteIndex() {
    elasticsearchTemplate.deleteIndex("heima");
}

Instance Repository document operation

The strength of Spring Data is that you don't need to write any DAO processing, and automatically perform CRUD operations according to the method name or class information. As long as you define an interface and inherit some sub interfaces provided by the Repository, you can have various basic crud functions.

We just need to define the interface and inherit it.

public interface ItemRepository extends ElasticsearchRepository<Item,Long> {
}

Let's take a look at the inheritance relationship of the Repository:

We see an ElasticsearchRepository interface:

New document

@Autowired
private ItemRepository itemRepository;

@Test
public void index() {
    Item item = new Item(1L, "Xiaomi mobile phone 7", " mobile phone",
                         "millet", 3499.00, "http://image.leyou.com/13123.jpg");
    itemRepository.save(item);
}

Check the page:

GET /item/_search

result:

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "item",
        "_type": "docs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": 1,
          "title": "Xiaomi mobile phone 7",
          "category": " mobile phone",
          "brand": "millet",
          "price": 3499,
          "images": "http://image.leyou.com/13123.jpg"
        }
      }
    ]
  }
}

Batch add

code:

@Test
public void indexList() {
    List<Item> list = new ArrayList<>();
    list.add(new Item(2L, "Nut phone R1", " mobile phone", "hammer", 3699.00, "http://image.leyou.com/123.jpg"));
    list.add(new Item(3L, "Huawei META10", " mobile phone", "Huawei", 4499.00, "http://image.leyou.com/3.jpg"));
    // Receive object collection to realize batch addition
    itemRepository.saveAll(list);
}

Go to the page again to query:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "item",
        "_type": "docs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "id": 2,
          "title": "Nut phone R1",
          "category": " mobile phone",
          "brand": "hammer",
          "price": 3699,
          "images": "http://image.leyou.com/13123.jpg"
        }
      },
      {
        "_index": "item",
        "_type": "docs",
        "_id": "3",
        "_score": 1,
        "_source": {
          "id": 3,
          "title": "Huawei META10",
          "category": " mobile phone",
          "brand": "Huawei",
          "price": 4499,
          "images": "http://image.leyou.com/13123.jpg"
        }
      },
      {
        "_index": "item",
        "_type": "docs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": 1,
          "title": "Xiaomi mobile phone 7",
          "category": " mobile phone",
          "brand": "millet",
          "price": 3499,
          "images": "http://image.leyou.com/13123.jpg"
        }
      }
    ]
  }
}

Modify document

Modification and addition are the same interface, and the distinction is based on id, which is similar to launching PUT requests on the page.

Basic query

ElasticsearchRepository provides some basic query methods:

Let's try to query all:

@Test
public void testQuery(){
    Optional<Item> optional = this.itemRepository.findById(1l);
    System.out.println(optional.get());
}

@Test
public void testFind(){
    // Query all and sort by price descending order
    Iterable<Item> items = this.itemRepository.findAll(Sort.by(Sort.Direction.DESC, "price"));
    items.forEach(item-> System.out.println(item));
}

result:

Custom method

Another powerful feature of Spring Data is the automatic implementation of functions based on method names.

For example, your method name is: findByTitle, then it will know that you query according to the title, and then automatically help you complete it without writing an implementation class.

Of course, the method name must comply with certain conventions:

Keyword	Sample	Elasticsearch Query String
And	findByNameAndPrice	{"bool" : {"must" : [ {"field" : {"name" : "?"}}, {"field" : {"price" : "?"}} ]}}
Or	findByNameOrPrice	{"bool" : {"should" : [ {"field" : {"name" : "?"}}, {"field" : {"price" : "?"}} ]}}
Is	findByName	{"bool" : {"must" : {"field" : {"name" : "?"}}}}
Not	findByNameNot	{"bool" : {"must_not" : {"field" : {"name" : "?"}}}}
Between	findByPriceBetween	{"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : ?,"include_lower" : true,"include_upper" : true}}}}}
LessThanEqual	findByPriceLessThan	{"bool" : {"must" : {"range" : {"price" : {"from" : null,"to" : ?,"include_lower" : true,"include_upper" : true}}}}}
GreaterThanEqual	findByPriceGreaterThan	{"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : null,"include_lower" : true,"include_upper" : true}}}}}
Before	findByPriceBefore	{"bool" : {"must" : {"range" : {"price" : {"from" : null,"to" : ?,"include_lower" : true,"include_upper" : true}}}}}
After	findByPriceAfter	{"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : null,"include_lower" : true,"include_upper" : true}}}}}
Like	findByNameLike	{"bool" : {"must" : {"field" : {"name" : {"query" : "?*","analyze_wildcard" : true}}}}}
StartingWith	findByNameStartingWith	{"bool" : {"must" : {"field" : {"name" : {"query" : "?*","analyze_wildcard" : true}}}}}
EndingWith	findByNameEndingWith	{"bool" : {"must" : {"field" : {"name" : {"query" : "*?","analyze_wildcard" : true}}}}}
Contains/Containing	findByNameContaining	{"bool" : {"must" : {"field" : {"name" : {"query" : "?","analyze_wildcard" : true}}}}}
In	findByNameIn(Collection<String>names)	{"bool" : {"must" : {"bool" : {"should" : [ {"field" : {"name" : "?"}}, {"field" : {"name" : "?"}} ]}}}}
NotIn	findByNameNotIn(Collection<String>names)	{"bool" : {"must_not" : {"bool" : {"should" : {"field" : {"name" : "?"}}}}}}
Near	findByStoreNear	Not Supported Yet !
True	findByAvailableTrue	{"bool" : {"must" : {"field" : {"available" : true}}}}
False	findByAvailableFalse	{"bool" : {"must" : {"field" : {"available" : false}}}}
OrderBy	findByAvailableTrueOrderByNameDesc	{"sort" : [{ "name" : {"order" : "desc"} }],"bool" : {"must" : {"field" : {"available" : true}}}}

For example, let's define a method to query by price range:

public interface ItemRepository extends ElasticsearchRepository<Item,Long> {

    /**
     * Query by price range
     * @param price1
     * @param price2
     * @return
     */
    List<Item> findByPriceBetween(double price1, double price2);
}

Then add some test data:

@Test
public void indexList() {
    List<Item> list = new ArrayList<>();
    list.add(new Item(1L, "Xiaomi mobile phone 7", "mobile phone", "millet", 3299.00, "http://image.leyou.com/13123.jpg"));
    list.add(new Item(2L, "Nut phone R1", "mobile phone", "hammer", 3699.00, "http://image.leyou.com/13123.jpg"));
    list.add(new Item(3L, "Huawei META10", "mobile phone", "Huawei", 4499.00, "http://image.leyou.com/13123.jpg"));
    list.add(new Item(4L, "millet Mix2S", "mobile phone", "millet", 4299.00, "http://image.leyou.com/13123.jpg"));
    list.add(new Item(5L, "glory V10", "mobile phone", "Huawei", 2799.00, "http://image.leyou.com/13123.jpg"));
    // Receive object collection to realize batch addition
    itemRepository.saveAll(list);
}

There is no need to write the implementation class, and then we run it directly:

@Test
public void queryByPriceBetween(){
    List<Item> list = this.itemRepository.findByPriceBetween(2000.00, 3500.00);
    for (Item item : list) {
        System.out.println("item = " + item);
    }
}

result:

Although the basic query and user-defined methods are very powerful, it is not enough if it is a complex query (fuzzy, wildcard, entry query, etc.). At this point, we can only use native queries.

query

Let's query from 4 blocks:

Basic query
_ source filtering
Result filtering
Advanced query
sort

Basic query

Basic grammar

GET /Index library name/_search
{
    "query":{
        "Query type":{
            "query criteria":"Query condition value"
        }
    }
}

The query here represents a query object, which can have different query properties

Query type:
- For example: match_all, match, term, range, etc
Query criteria: query criteria are written differently according to different types, which will be explained in detail later

Query all (match_all)

Example:

GET /heima/_search
{
    "query":{
        "match_all": {}
    }
}

Query: represents the query object
match_all: query all

result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "Rice mobile phone",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      },
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "title": "Mi phones",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      }
    ]
  }
}

took: query time, in milliseconds
time_out: whether to time out
_shards: slice information
hits: search results overview object
- Total: the total number of items searched
- max_score: the highest score of the document in all results
- hits: the document object array of search results. Each element is a searched document information
  - _Index: index library
  - _Type: document type
  - _id: document id
  - _Score: document score
  - _Source: the source data of the document

match query

Let's add a piece of data to facilitate the test:

PUT /heima/goods/3
{
    "title":"Xiaomi TV 4 A",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":3899.00
}

Now, there are 2 mobile phones and 1 TV in the index library:

or relation

match type query will segment the query criteria, and then query. The relationship between multiple terms is or

GET /heima/_search
{
    "query":{
        "match":{
            "title":"Mi TV"
        }
    }
}

result:

"hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
        {
            "_index": "heima",
            "_type": "goods",
            "_id": "tmUBomQB_mwm6wH_EC1-",
            "_score": 0.6931472,
            "_source": {
                "title": "Mi phones",
                "images": "http://image.leyou.com/12479122.jpg",
                "price": 2699
            }
        },
        {
            "_index": "heima",
            "_type": "goods",
            "_id": "3",
            "_score": 0.5753642,
            "_source": {
                "title": "Xiaomi TV 4 A",
                "images": "http://image.leyou.com/12479122.jpg",
                "price": 3899
            }
        }
    ]
}

In the above case, not only TV but also those related to Xiaomi will be queried. The relationship between multiple words is or.

and relation

In some cases, we need to find more accurately. We want this relationship to become and, which can be done as follows:

GET /heima/_search
{
    "query":{
        "match": {
          "title": {
            "query": "Mi TV",
            "operator": "and"
          }
        }
    }
}

result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "Xiaomi TV 4 A",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3899
        }
      }
    ]
  }
}

In this example, only entries containing both Xiaomi and TV will be searched.

Between or and?

Choosing between or and is a bit too black or white. If there are 5 query word items after the conditional word segmentation given by the user, what should I do to find a document containing only 4 words? Setting the operator parameter to and will only exclude this document.

Sometimes this is exactly what we expect, but in most application scenarios of full-text search, we want to include those documents that may be relevant while excluding those that are less relevant. In other words, we want to be in the middle of something.

match query supports minimum_should_match minimum matching parameter, which allows us to specify the number of word items that must be matched to indicate whether a document is relevant. We can set it to a specific number. The more common way is to set it to a percentage, because we can't control the number of words users enter when searching:

GET /heima/_search
{
    "query":{
        "match":{
            "title":{
            	"query":"Millet curved TV",
            	"minimum_should_match": "75%"
            }
        }
    }
}

In this example, the search statement can be divided into three words. If you use the and relationship, you need to meet three words at the same time to be searched. Here we use the minimum number of brands: 75%, that is to say, as long as it matches 75% of the total number of entries, here 3 * 75% is about 2. Therefore, as long as it contains 2 entries, the conditions are met.

result:

multi_match

multi_match is similar to match, except that it can be queried in multiple fields

GET /heima/_search
{
    "query":{
        "multi_match": {
            "query":    "millet",
            "fields":   [ "title", "subTitle" ]
        }
	}
}

In this example, we will query the word Xiaomi in the title field and subtitle field

Term matching

term queries are used for exact value matching, which may be numbers, times, Booleans, or those non segmented strings

GET /heima/_search
{
    "query":{
        "term":{
            "price":2699.00
        }
    }
}

result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "title": "Mi phones",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      }
    ]
  }
}

Multi entry exact matching (terms)

terms query is the same as term query, but it allows you to specify multiple values for matching. If this field contains any of the specified values, the document meets the following conditions:

GET /heima/_search
{
    "query":{
        "terms":{
            "price":[2699.00,2899.00,3899.00]
        }
    }
}

result:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "Rice mobile phone",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      },
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "title": "Mi phones",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      },
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "Xiaomi TV 4 A",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3899
        }
      }
    ]
  }
}

Result filtering

By default, elastic search will save the document in the search results_ All fields of source are returned.

If we only want to get some of these fields, we can add them_ source filtering

Specify fields directly

Example:

GET /heima/_search
{
  "_source": ["title","price"],
  "query": {
    "term": {
      "price": 2699
    }
  }
}

Returned results:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "price": 2699,
          "title": "Mi phones"
        }
      }
    ]
  }
}

Specify includes and excludes

We can also:

includes: to specify the fields you want to display
Exclude: to specify the fields you do not want to display

Both are optional.

Example:

GET /heima/_search
{
  "_source": {
    "includes":["title","price"]
  },
  "query": {
    "term": {
      "price": 2699
    }
  }
}

The result will be the same as the following:

GET /heima/_search
{
  "_source": {
     "excludes": ["images"]
  },
  "query": {
    "term": {
      "price": 2699
    }
  }
}

Advanced query

Boolean combination (bool)

bool combines various other queries through must (and), must_not (not), and should (or)

GET /heima/_search
{
    "query":{
        "bool":{
        	"must":     { "match": { "title": "rice" }},
        	"must_not": { "match": { "title":  "television" }},
        	"should":   { "match": { "title": "mobile phone" }}
        }
    }
}

result:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "Rice mobile phone",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      }
    ]
  }
}

Range query

The range query finds numbers or times that fall within a specified range

GET /heima/_search
{
    "query":{
        "range": {
            "price": {
                "gte":  1000.0,
                "lt":   2800.00
            }
    	}
    }
}

The range query allows the following characters:

Operator	explain
gt	greater than
gte	Greater than or equal to
lt	less than
lte	Less than or equal to

Fuzzy query

We added a new item:

POST /heima/goods/4
{
    "title":"apple mobile phone",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":6899.00
}

Fuzzy query is the fuzzy equivalent of term query. It allows users to search for the spelling deviation between the entry and the actual entry, but the editing distance of the deviation shall not exceed 2: that is, the number of letters of the deviation

GET /heima/_search
{
  "query": {
    "fuzzy": {
      "title": "appla"
    }
  }
}

The above query can also query apple mobile phones

We can specify the allowable editing distance through fuzziness:

GET /heima/_search
{
  "query": {
    "fuzzy": {
        "title": {
            "value":"appla",
            "fuzziness":1
        }
    }
  }
}

Filter

Filter in condition query

All queries will affect the score and ranking of documents. If we need to filter the query results and do not want the filter criteria to affect the score, we should not use the filter criteria as the query criteria. Instead, use the filter method:

GET /heima/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "Mi phones" }},
        	"filter":{
                "range":{"price":{"gt":2000.00,"lt":3800.00}}
        	}
        }
    }
}

Note: bool combination condition filtering can also be performed again in the filter.

No query criteria, direct filtering

If a query has only filter, no query conditions, and you don't want to score, we can use constant_score to replace the bool query with only filter statement. The performance is exactly the same, but it is very helpful to improve the simplicity and clarity of the query.

GET /heima/_search
{
    "query":{
        "constant_score":   {
            "filter": {
            	 "range":{"price":{"gt":2000.00,"lt":3000.00}}
            }
        }
}

sort

Single field sorting

Sort allows us to sort according to different fields, and specify the sorting method through order

GET /heima/_search
{
  "query": {
    "match": {
      "title": "Mi phones"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

Multi field sorting

Suppose we want to use price and _score (score) together for query, and the matching results are sorted by price first, and then by correlation score:

GET /goods/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "Mi phones" }},
        	"filter":{
                "range":{"price":{"gt":200000,"lt":300000}}
        	}
        }
    },
    "sort": [
      { "price": { "order": "desc" }},
      { "_score": { "order": "desc" }}
    ]
}

Instance - advanced query

Basic query

Let's look at the basic playing methods first

@Test
public void testQuery(){
    // Term query
    MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("title", "millet");
    // Execute query
    Iterable<Item> items = this.itemRepository.search(queryBuilder);
    items.forEach(System.out::println);
}

The search method of Repository requires the QueryBuilder parameter. elasticSearch provides us with an object QueryBuilders:

QueryBuilders provides a large number of static methods for generating various types of query objects, such as QueryBuilder objects such as entry, fuzzy, wildcard, etc.

result:

Elastic search provides many available query methods, but it is not flexible enough. It is difficult to play with filtering or aggregate query.

Custom query

Let's first look at the most basic match query:

@Test
public void testNativeQuery(){
    // Build custom query builder
    NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
    // Add basic word segmentation query
    queryBuilder.withQuery(QueryBuilders.matchQuery("title", "millet"));
    // Perform a search to get results
    Page<Item> items = this.itemRepository.search(queryBuilder.build());
    // Total number of prints
    System.out.println(items.getTotalElements());
    // Total pages printed
    System.out.println(items.getTotalPages());
    items.forEach(System.out::println);
}

Native searchquerybuilder: a query condition builder provided by Spring to help build the request body in json format

Page < item >: the default is a paged query, so a paged result object is returned, including the following attributes:

totalElements: total number of items
totalPages: total pages
Iterator: iterator, which implements the iterator interface, so it can directly iterate to get the data of the current page
Other properties:

result

Paging query

Using NativeSearchQueryBuilder, paging can be easily realized:

@Test
public void testNativeQuery(){
    // Build query criteria
    NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
    // Add basic word segmentation query
    queryBuilder.withQuery(QueryBuilders.termQuery("category", "mobile phone"));

    // Initialize paging parameters
    int page = 0;
    int size = 3;
    // Set paging parameters
    queryBuilder.withPageable(PageRequest.of(page, size));

    // Perform a search to get results
    Page<Item> items = this.itemRepository.search(queryBuilder.build());
    // Total number of prints
    System.out.println(items.getTotalElements());
    // Total pages printed
    System.out.println(items.getTotalPages());
    // Size per page
    System.out.println(items.getSize());
    // Current page
    System.out.println(items.getNumber());
    items.forEach(System.out::println);
}

result:

It is found that the paging in Elasticsearch starts from page 0.

sort

Sorting is also done through NativeSearchQueryBuilder:

@Test
public void testSort(){
    // Build query criteria
    NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
    // Add basic word segmentation query
    queryBuilder.withQuery(QueryBuilders.termQuery("category", "mobile phone"));

    // sort
    queryBuilder.withSort(SortBuilders.fieldSort("price").order(SortOrder.DESC));

    // Perform a search to get results
    Page<Item> items = this.itemRepository.search(queryBuilder.build());
    // Total number of prints
    System.out.println(items.getTotalElements());
    items.forEach(System.out::println);
}

result:

aggregations

Aggregation makes it extremely convenient for us to realize the statistics and analysis of data. For example:

What brand of mobile phone is the most popular?
The average price, the highest price and the lowest price of these mobile phones?
How about the monthly sales of these mobile phones?

The implementation of these statistical functions is much more convenient than the sql of the database, and the query speed is very fast, which can realize the real-time search effect.

Basic concepts

There are many types of aggregations in Elasticsearch. The two most commonly used are bucket and metric:

bucket

Bucket is used to group data in a certain way. Each group of data is called a bucket in ES. For example, we can get Chinese bucket, British bucket and Japanese bucket by dividing people according to nationality... Or we can divide people according to age: 01010202030040, etc.

There are many ways to divide buckets in Elasticsearch:

Date Histogram Aggregation: grouping by date ladder. For example, if the ladder is given as week, it will be automatically divided into a group every week
Histogram Aggregation: grouped by numerical ladder, similar to date
Terms Aggregation: grouped according to the content of the entries. The exactly matched entries are a group
Range Aggregation: range grouping of values and dates, specifying the start and end, and then grouping by segment
......

bucket aggregations are only responsible for grouping data without calculation. Therefore, another kind of aggregation is often nested in the bucket: metrics aggregations, that is, metrics

metrics

After grouping, we generally perform aggregation operations on the data in the group, such as average, maximum, minimum, sum, etc., which are called metrics in ES

Some common measurement aggregation methods:

Avg Aggregation: Average
Max Aggregation: find the maximum value
Min Aggregation: find the minimum value
Percentiles Aggregation: calculate the percentage
Stats Aggregation: returns avg, max, min, sum, count, etc
Sum Aggregation: Sum
Top hits Aggregation: find the first few
Value Count Aggregation: find the total number
......

To test aggregation, we first import some data in batch

Create index:

PUT /cars
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "transactions": {
      "properties": {
        "color": {
          "type": "keyword"
        },
        "make": {
          "type": "keyword"
        }
      }
    }
  }
}

Note: in ES, the fields that need to be aggregated, sorted and filtered are handled in a special way, so they cannot be segmented. Here, we set the fields of color and make as keyword type. This type will not be segmented and can participate in aggregation in the future

Import data

POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

Aggregate into barrels

First, we divide the barrels according to the color of the car

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

size: the number of queries is set to 0 here, because we don't care about the searched data, but only about the aggregation results to improve efficiency
aggs: declare that this is an aggregate query, which is the abbreviation of aggregations
- popular_colors: give this aggregation a name, any.
  - Terms: the way to divide buckets. Here, it is divided according to terms
    - Field: the field that divides the bucket

result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4
        },
        {
          "key": "blue",
          "doc_count": 2
        },
        {
          "key": "green",
          "doc_count": 2
        }
      ]
    }
  }
}

hits: the query result is empty because we set the size to 0
Aggregations: results of aggregations
popular_colors: the aggregate name we define
buckets: for the found bucket, each different color field value will form a bucket
- key: the value of the color field corresponding to this bucket
- doc_count: the number of documents in this bucket

Through the aggregation results, we found that the current red car is more popular!

In barrel measurement

The previous example tells us the number of documents in each bucket, which is very useful. But usually, our application needs to provide more complex document metrics. For example, what is the average price of a car of each color?

Therefore, we need to tell Elasticsearch which field to use and which measurement method to use for operation. These information should be nested in the bucket, and the measurement operation will be based on the documents in the bucket

Now, let's add the measure of price average to the aggregation result just now:

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                }
            }
        }
    }
}

Aggs: we added a new aggs to the previous aggs(popular_colors). Visible metrics are also an aggregation
avg_price: the name of the aggregate
avg: the type of measurement. Here is the average value
Field: the field of the measurement operation

result:

...
  "aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4,
          "avg_price": {
            "value": 32500
          }
        },
        {
          "key": "blue",
          "doc_count": 2,
          "avg_price": {
            "value": 20000
          }
        },
        {
          "key": "green",
          "doc_count": 2,
          "avg_price": {
            "value": 21000
          }
        }
      ]
    }
  }
...

You can see that each bucket has its own AVG_ The price field, which is the result of the measurement aggregation

Embedded barrel

In the case just now, we embed metric operations in the bucket. In fact, buckets can not only nest operations, but also nest other buckets. That is, in each group, there are more groups.

For example, we want to count the manufacturers of cars of each color, and then divide the barrels according to the make field

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                },
                "maker":{
                    "terms":{
                        "field":"make"
                    }
                }
            }
        }
    }
}

The original color bucket and avg calculations are unchanged
Maker: add a new bucket called maker under the nested aggs
terms: the division type of bucket is still an entry
filed: it is divided according to the make field

Partial results:

...
{"aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4,
          "maker": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "honda",
                "doc_count": 3
              },
              {
                "key": "bmw",
                "doc_count": 1
              }
            ]
          },
          "avg_price": {
            "value": 32500
          }
        },
        {
          "key": "blue",
          "doc_count": 2,
          "maker": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "ford",
                "doc_count": 1
              },
              {
                "key": "toyota",
                "doc_count": 1
              }
            ]
          },
          "avg_price": {
            "value": 20000
          }
        },
        {
          "key": "green",
          "doc_count": 2,
          "maker": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "ford",
                "doc_count": 1
              },
              {
                "key": "toyota",
                "doc_count": 1
              }
            ]
          },
          "avg_price": {
            "value": 21000
          }
        }
      ]
    }
  }
}
...

We can see that the new aggregate maker is nested in the bucket of each original color.
Each color is grouped according to the make field
Information we can read:
- There are 4 red cars in total
- The average price of a red car is $32500.
- Three of them are made by Honda and one by BMW.

Other ways of dividing barrels

As mentioned earlier, there are many ways to divide buckets, such as:

Date Histogram Aggregation: grouping by date ladder. For example, if the ladder is given as week, it will be automatically divided into a group every week
Histogram Aggregation: grouped by numerical ladder, similar to date
Terms Aggregation: grouped according to the content of the entries. The exactly matched entries are a group
Range Aggregation: range grouping of values and dates, specifying the start and end, and then grouping by segment

In the case just now, we used Terms Aggregation, which is to divide buckets according to entries.

Next, let's learn some more practical:

Ladder bucket Histogram

Principle:

histogram is to group numeric fields according to a certain ladder size. You need to specify a ladder value (interval) to divide the ladder size.

give an example:

For example, you have a price field. If you set the value of interval to 200, the ladder will be like this:

0，200，400，600，...

Listed above is the key of each ladder and the starting point of the interval.

If the price of a commodity is 450, which ladder will it fall into? The calculation formula is as follows:

bucket_key = Math.floor((value - offset) / interval) * interval + offset

Value: the value of the current data, in this case 450

Offset: the starting offset, which is 0 by default

Interval: step interval, such as 200

So you get key = Math.floor((450 - 0) / 200) * 200 + 0 = 400

Operation:

For example, we group the prices of cars and specify an interval of 5000:

GET /cars/_search
{
  "size":0,
  "aggs":{
    "price":{
      "histogram": {
        "field": "price",
        "interval": 5000
      }
    }
  }
}

result:

{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 2
        },
        {
          "key": 15000,
          "doc_count": 1
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 25000,
          "doc_count": 1
        },
        {
          "key": 30000,
          "doc_count": 1
        },
        {
          "key": 35000,
          "doc_count": 0
        },
        {
          "key": 40000,
          "doc_count": 0
        },
        {
          "key": 45000,
          "doc_count": 0
        },
        {
          "key": 50000,
          "doc_count": 0
        },
        {
          "key": 55000,
          "doc_count": 0
        },
        {
          "key": 60000,
          "doc_count": 0
        },
        {
          "key": 65000,
          "doc_count": 0
        },
        {
          "key": 70000,
          "doc_count": 0
        },
        {
          "key": 75000,
          "doc_count": 0
        },
        {
          "key": 80000,
          "doc_count": 1
        }
      ]
    }
  }
}

You will find that there are a large number of buckets with 0 documents in the middle, which looks ugly.

We can add a parameter min_doc_count is 1 to restrict the minimum number of documents to 1, so that the bucket with the number of documents of 0 will be filtered

Example:

GET /cars/_search
{
  "size":0,
  "aggs":{
    "price":{
      "histogram": {
        "field": "price",
        "interval": 5000,
        "min_doc_count": 1
      }
    }
  }
}

result:

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 2
        },
        {
          "key": 15000,
          "doc_count": 1
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 25000,
          "doc_count": 1
        },
        {
          "key": 30000,
          "doc_count": 1
        },
        {
          "key": 80000,
          "doc_count": 1
        }
      ]
    }
  }
}

Perfect,!

If you use kibana to turn the result into a column chart, it will look better:

Bucket range

Range bucket division is similar to step bucket division. It also groups numbers according to stages, but the range method requires you to specify the start and end sizes of each group.

Instance aggregation

Aggregate into barrels

Buckets are grouped. For example, here we group by brand: = group by

@Test
public void testAgg(){
    NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
    // No results are queried. Add a result set filter that does not include any fields
    queryBuilder.withSourceFilter(new FetchSourceFilter(new String[]{""}, null));
    // 1. Add a new aggregation with the aggregation type of terms, the aggregation name of brands and the aggregation field of brand
    queryBuilder.addAggregation(
        AggregationBuilders.terms("brands").field("brand"));
    // 2. For aggregate query, the result needs to be forcibly converted to AggregatedPage type
    AggregatedPage<Item> aggPage = (AggregatedPage<Item>) this.itemRepository.search(queryBuilder.build());
    // 3. Parse the aggregation result set, and perform forced conversion according to the aggregation type and field. brand - is a string type, aggregation type - term aggregation, and brands - obtains the aggregation object through the aggregation name
    // 3.1. Take the aggregation named brands from the results,
    // Because it is term aggregation using String type fields, the result should be strongly converted to StringTerm type
    StringTerms agg = (StringTerms) aggPage.getAggregation("brands");
    // 3.2. Obtaining barrels
    List<StringTerms.Bucket> buckets = agg.getBuckets();
    // 3.3 traversal
    for (StringTerms.Bucket bucket : buckets) {
        // 3.4. Get the key in the bucket, that is, the brand name
        System.out.println(bucket.getKeyAsString());
        // 3.5. Get the number of documents in the bucket
        System.out.println(bucket.getDocCount());
    }

}

Results displayed:

Key API s:

AggregationBuilders: aggregated build factory class. All aggregations are built by this class. Look at its static methods:

AggregatedPage: the result class of the aggregate query. It is a sub interface of page < T >

AggregatedPage expands the functions related to aggregation based on the function of Page. In fact, it is a kind of encapsulation of aggregation results. You can refer to the JSON structure of aggregation results.

The returned results are all Aggregation type objects, but they are represented by different subclasses according to different field types

Let's look at the comparison between the JSON result of the query on the page and the Java class:

Nested aggregation, averaging

code:

@Test
public void testSubAgg(){
    NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
    // Do not query any results
    queryBuilder.withSourceFilter(new FetchSourceFilter(new String[]{""}, null));
    // 1. Add a new aggregation with the aggregation type of terms, the aggregation name of brands and the aggregation field of brand
    queryBuilder.addAggregation(
        AggregationBuilders.terms("brands").field("brand")
        .subAggregation(AggregationBuilders.avg("priceAvg").field("price")) // Perform nested aggregation in the brand aggregation bucket and calculate the average value
    );
    // 2. Query, you need to force the result to the AggregatedPage type
    AggregatedPage<Item> aggPage = (AggregatedPage<Item>) this.itemRepository.search(queryBuilder.build());
    // 3. Analysis
    // 3.1. Take the aggregation named brands from the results,
    // Because it is term aggregation using String type fields, the result should be strongly converted to StringTerm type
    StringTerms agg = (StringTerms) aggPage.getAggregation("brands");
    // 3.2. Obtaining barrels
    List<StringTerms.Bucket> buckets = agg.getBuckets();
    // 3.3 traversal
    for (StringTerms.Bucket bucket : buckets) {
        // 3.4. Get the key in the bucket, i.e. brand name 3.5. Get the number of documents in the bucket
        System.out.println(bucket.getKeyAsString() + "，common" + bucket.getDocCount() + "platform");

        // 3.6. Obtain sub aggregation results:
        InternalAvg avg = (InternalAvg) bucket.getAggregations().asMap().get("priceAvg");
        System.out.println("Average selling price:" + avg.getValue());
    }

}

result:

Keywords: Algorithm Machine Learning linear algebra

Added by bucfan99 on Wed, 01 Dec 2021 04:39:53 +0200

Programming VIP

Leyou mall personal notes - Mall system framework knowledge

Set up front desk system

Static resources

live-server

brief introduction

Installation and operating parameters

test

Domain name access

common.js

Elasticsearch introduction and installation

brief introduction

Elastic

Elasticsearch

install and configure

Create a new user leyou

Upload the installation package and unzip it

Modify configuration

function

Error 1: kernel too low

Error 2: insufficient file permissions

Error 3: insufficient threads

Error 4: process virtual memory

Restart terminal window

After startup

Install Kibana

What is Kibana?

install

Configuration run

Console

Install ik word splitter

install

API

Rest style API

Client API

How to learn

Api and usage examples mixed notes

Spring Data Elasticsearch instance preparation

brief introduction

Create Demo project

Operation index

Basic concepts

Create index

grammar

Create with kibana

View index settings

Delete index

Mapping configuration

Create mapping field

View mapping relationships

Detailed explanation of field properties

type

index

store

boost

New data

Randomly generated id

Custom id

Intelligent judgment

Modify data

Delete data

Instance Template index operation

Create indexes and maps

Delete index

Instance Repository document operation

New document

Batch add

Modify document

Basic query

Custom method

query

Basic query

Query all (match_all)

match query

multi_match

Term matching

Multi entry exact matching (terms)

Result filtering

Specify fields directly

Specify includes and excludes