Distributed e-commerce project grain mall learning notes < 3 >

10, ES

7. Advanced - aggregation

Aggregation provides the ability to group and extract data from data.

The simplest aggregation method is roughly equal to SQL Group by and SQL aggregation function (average, maximum and minimum)


It is used to process the queried data

# Including mil, average age
GET bank/_search
  "query": { # Query the information containing mill
    "match": {
      "address": "Mill"
  "aggs": { #Query based aggregation
    "ageAgg": {  # The name of aggregation, casually
      "terms": { # The probability distribution of values is grouping statistics, which is similar to group by
        "field": "age",
        "size": 10
    "ageAvg": { 
      "avg": { # Look at the average of age values
        "field": "age"
    "balanceAvg": {
      "avg": { # Look at the average balance
        "field": "balance"
  "size": 0  # Don't look at the details

Sub aggregation

That is, write another aggs in aggs

GET bank/_search
  "query": {
    "match_all": {}
  "aggs": {
    "ageAgg": {
      "terms": { # Look at the distribution of age groups
        "field": "age",
        "size": 100
      "aggs": { # Juxtaposed with terms
        "ageAvg": { #Average the data grouped by age, such as the average salary of all people aged 20
          "avg": {
            "field": "balance"
  "size": 0

8.Mapping field mapping

It is directly defined under the index, because there is no difference in the processing of documents with the same name under different types. Therefore, using mapping mapping is equivalent to masking types and placing documents directly at the next level of the index.

ElasticSearch7 - remove the type concept

Create index and specify mapping

PUT /my_index  #It is equivalent to mysql creating a table to specify the type of each field
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      "email": {
        "type": "keyword" # Specify as keyword
      "name": {
        "type": "text" # Full text search. Word segmentation during saving and word segmentation matching during retrieval

PUT /my_index/_mapping  #It is equivalent to filling in data in the form
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false # Field cannot be retrieved. Retrieval indicates that the newly added field cannot be retrieved, but is a redundant field. Cannot update mapping
 We cannot update an existing field mapping. Update must create a new index for data migration.

GET /my_index #view map

Cannot update mapping

We cannot update an existing field mapping. Update must create a new index for data migration.

It is equivalent to that the attributes of the table cannot be changed (similar to mysql)

You must create a new index and migrate the old data.

create new index

PUT /newbank
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      "address": {
        "type": "text"
      "age": {
        "type": "integer"
      "balance": {
        "type": "long"
      "city": {
        "type": "keyword"
      "email": {
        "type": "keyword"
      "employer": {
        "type": "keyword"
      "firstname": {
        "type": "text"
      "gender": {
        "type": "keyword"
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
      "state": {
        "type": "keyword"

Migrate data from bank to newbank

POST _reindex
  "source": {
    "index": "bank",
    "type": "account" #The name and type of the original index
  "dest": {
    "index": "newbank"  #The name of the new index

The type of the new index changes to_ doc (default index type, old index is account)

9. Participle

A tokenizer receives a character stream, divides it into independent tokens (words, usually independent words), and then outputs the tokens stream.

POST _analyze
  "analyzer": "standard",
  "text": "The 2 Brown-Foxes bone."

It will be separated by words, but it is not suitable for Chinese, because it will divide each word as a word

So we need to use other word splitters

Install ik word splitter

During the previous installation of elasticsearch, we have mapped the "/ usr/share/elasticsearch/plugins" directory of the elasticsearch container to the "/ mydata/elasticsearch/plugins" directory of the host machine. Therefore, a more convenient way is to download the "/ elasticsearch-analysis-ik-7.4.2.zip" file and unzip it to this folder. After installation, restart the elasticsearch container.

Download it, unzip it, put it in the / mydata/elasticsearch/plugins directory, and restart the container.

GET _analyze
   "analyzer": "ik_smart", 
   "text":"I am Chinese,"

GET _analyze
   "analyzer": "ik_max_word", 
   "text":"I am Chinese,"

Supplement: linux command line editing

vi file name

i enter insert mode

esc exits insert mode

: wq exit and save

Custom Dictionary

Modify IKAnalyzer.cfg.xml in / usr/share/elasticsearch/plugins/ik/config

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
	<comment>IK Analyzer Extended configuration</comment>
	<!--Users can configure their own extended dictionary here -->
	<entry key="ext_dict"></entry>
	 <!--Users can configure their own extended stop word dictionary here-->
	<entry key="ext_stopwords"></entry>
	<!--Users can configure the remote extension dictionary here -->
	<entry key="remote_ext_dict"> < / entry > # configure the path of the custom word segmentation file here
	<!--Users can configure the remote extended stop word dictionary here-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->

You can look at the notes for details. I won't go into it here



Import dependency

<dependency>    <groupId>org.elasticsearch.client</groupId>    <artifactId>elasticsearch-rest-high-level-client</artifactId>    <version>7.4.2</version></dependency>

Since spring boot has integrated es version 6.8.5, the ES version that spring boot dependencies depends on should be changed

    <elasticsearch.version>7.4.2</elasticsearch.version> #It used to be 6.8.5

A microservice that does not need a data source depends on a parent project that has data source related configuration processing

Annotate startup class

@SpringBootApplication(exclude = DataSourceAutoConfiguration.class)

Configuration class

public class GulimallElasticSearchConfig {

    public static final RequestOptions COMMON_OPTIONS;

    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();

        COMMON_OPTIONS = builder.build();

    public RestHighLevelClient esRestClient() {

        RestClientBuilder builder = null;
        // You can specify more than one es
        builder = RestClient.builder(new HttpHost(host, 9200, "http"));

        RestHighLevelClient client = new RestHighLevelClient(builder);
        return client;

Test class

Save / modify

@Testpublic void indexData() throws IOException {        // Set index indexrequest indexrequest = new indexrequest ("users"); indexRequest.id("1");     User user = new User();     User.setusername ("Zhang San"); user.setAge(20);    user.setGender("male"); String jsonString = JSON.toJSONString(user);        // Set the content to be saved, and specify the data and type indexRequest.source(jsonString, XContentType.JSON)// Execute index creation and data saving indexresponse index = client. Index (indexrequest, guilmallelasticsearchconfig. Common_options); System.out.println(index);}

If the save statement is sent again, it will become a modification operation.

Retrieval and aggregation

	@Test    public void find() throws IOException {        // 1 create a search request searchrequest searchrequest = new searchrequest(); searchRequest.indices("bank");  // Populate index SearchSourceBuilder sourceBuilder = new SearchSourceBuilder()// Construct search criteria / / sourcebuilder. Query()// sourceBuilder.from();//         sourceBuilder.size();//         sourceBuilder.aggregation();         sourceBuilder.query(QueryBuilders.matchQuery("address","mill")); // Query the system. Out. Println (sourcebuilder. Tostring()) of the address containing mill// Sourcebuilder is the JSON string of a query statement. / / any query condition match match match AGG is constructed with their respective constructors and placed in SearchSourceBuilder, and then inserted into searchrequest. / / the first aggregation condition is constructed: termaggregationbuilder agg1 = aggregationbuilders. Terms ("agg1"). Field ("age"). Size (10) ;//  Aggregate name 		//  The parameter is aggregationbuilder sourcebuilder.aggregation (agg1)// Build the second aggregation condition: average salary avggregationbuilder agg2 = aggregationbuilders.avg ("agg2"). Field ("balance"); sourceBuilder.aggregation(agg2);                         searchRequest.source(sourceBuilder);        //  2. Execute search response, response = client.search (searchrequest, gulialelasticsearchconfig. Common_options)// 3. Analyze the response result system. Out. Println (response. Tostring())// Response is also a result JSON string. / / 3.1 get the java bean searchhits hits = response. Gethits(); SearchHit[] hits1 = hits.getHits();         For (searchhit: hits1) {hit. Getid(); hit. Getindex(); string sourceasstring = hit. Getsourceasstring(); / / the real query result is in source. Get the source and use the JSON conversion tool to convert it into a bean object. This bean needs to create its own VO class (or existing PO) account account = json.parseobject (sourceasstring, account. Class); system. Out. Println (account);} / / 3.2 get the aggregation result aggregations aggregations = response. Getaggregations(); terms agg21 = aggregations. Get ("agg2"); / / get for (terms. Bucket: agg21. Getbuckets()) according to the aggregation name {            String keyAsString = bucket.getKeyAsString();            System.out.println(keyAsString);        }    }

Generally, a class name plus s represents the constructor of this class

11, Installing nginx

nginx can be understood as tomcat is a web server

Start an nginx instance just to copy the configuration

docker run -p80:80 --name nginx -d nginx:1.10  

Copy the configuration file in the container to / mydata/nginx/conf /

mkdir -p /mydata/nginx/htmlmkdir -p /mydata/nginx/logsmkdir -p /mydata/nginx/confdocker container cp nginx:/etc/nginx/*  /mydata/nginx/conf/ #Since there will be an nginx folder in config after copying, you need to move its contents to conf mv /mydata/nginx/conf/nginx/* /mydata/nginx/conf/rm -rf /mydata/nginx/conf/nginx

Terminate original container:

docker stop nginx

Execute the command to delete the original container:

docker rm nginx

To create a new Nginx, execute the following command

docker run -p 80:80 --name nginx \ -v /mydata/nginx/html:/usr/share/nginx/html \ -v /mydata/nginx/logs:/var/log/nginx \ -v /mydata/nginx/conf/:/etc/nginx \ -d nginx:1.10

Set startup nginx

docker update nginx --restart=always

Create a "/ mydata/nginx/html/index.html" file to test whether it can be accessed normally

echo '<h2>hello nginx!</h2>' >index.html

visit: http://nginx IP of host: 80 / index.html

12, Product es preparation

ES is in memory, so it is better than mysql in retrieval. Es also supports clustering and data fragment storage.

1. Determine the index model

Two schemes

First, search according to spu, that is, the search condition is spu, and sku stores only one sku id, which saves space. However, there is a fatal problem that when a qualified spu is retrieved, a large number of skuid s will be returned at one time, which may cause blocking in a high concurrency environment.

Second, save according to sku, that is, the search condition is sku, and all sku information is saved. This method has many redundant fields because many SKUs have the same spu attribute. However, although this method occupies more space than the first method, it returns less data each time.

Considering comprehensively, the second scheme is selected according to the concept of "space for time".

The index model is as follows:

PUT product
        "properties": {
            "skuId":{ "type": "long" },
            "spuId":{ "type": "keyword" },  # Indivisible word
            "skuTitle": {
                "type": "text",
                "analyzer": "ik_smart"  # Chinese word splitter
            "skuPrice": { "type": "keyword" },  # Guaranteed accuracy
            "skuImg"  : { 
        			"type": "keyword" ,
        			"index": false,    # It cannot be retrieved and no index is generated
    		},  # false in video
            "saleCount":{ "type":"long" },
            "hasStock": { "type": "boolean" },
            "hotScore": { "type": "long"  },
            "brandId":  { "type": "long" },
            "catalogId": { "type": "long"  },
            "brandName": {"type": "keyword"}, # false in video
                "type": "keyword",
                "index": false,  # It can not be retrieved, no index is generated, and it is only used as a page
                "doc_values": false # Cannot be aggregated. The default value is true
            "catalogName": {
                "type": "keyword" 
                "index": false,  # It cannot be retrieved and no index is generated
            }, # There is false in the video
            "attrs": {
                "type": "nested",
                "properties": {
                    "attrId": {"type": "long"  },
                    "attrName": {
                        "type": "keyword",
                        "index": false,
                        "doc_values": false
                    "attrValue": {"type": "keyword" }

2.nested embedded objects

The attribute is "type": "nested", because it is an internal attribute for retrieval

Objects of array type will be flattened (each attribute of the object will be stored together separately)


In this storage mode, the following errors may occur:
Error retrieving{aaa,ddd},This combination does not exist

The flattening of the array will enable the retrieval to retrieve the non-existent ones. In order to solve this problem, the embedded attribute is adopted. When the array is an object, the embedded attribute is used (not an object, no embedded attribute is required)

13, Goods on the shelf

1. Basic ideas

The background management system transfers the spuid to the back end. The back end finds out a series of SKUs corresponding to the spuid from the mysql database according to the spuid, and then uploads these SKUs to es. Of course, this involves the transformation between PO class and ESModel.

2. Batch query sku whether there is inventory

// The specification parameters of sku are the same, so we need to query the specification parameters in advance and only query once / * * * query whether sku has inventory * return skuId and stock * / @ postmapping ("/ hasstock") public R getskuhasstock (@ requestbody list < long > skuids) {list < skuhasstockvo > Vos = wareskuservice. Getskuhasstock (skuids); return r.ok(). SetData (VOS);}

3. Batch upload ES to a skuEsModels

	 * Goods on the shelves
@PostMapping("/product") // ElasticSaveController
public R productStatusUp(@RequestBody List<SkuEsModel> skuEsModels){

    boolean status;
    try {
        status = productSaveService.productStatusUp(skuEsModels);
    } catch (IOException e) {
        log.error("ElasticSaveController Goods on the shelf error: {}", e);
        return R.error(BizCodeEnum.PRODUCT_UP_EXCEPTION.getCode(), BizCodeEnum.PRODUCT_UP_EXCEPTION.getMsg());
        return R.ok();
    return R.error(BizCodeEnum.PRODUCT_UP_EXCEPTION.getCode(), BizCodeEnum.PRODUCT_UP_EXCEPTION.getMsg());

public boolean productStatusUp(List<SkuEsModel> skuEsModels) throws IOException {
    // 1. Create an index product for ES
    BulkRequest bulkRequest = new BulkRequest();
    // 2. Construct save request
    for (SkuEsModel esModel : skuEsModels) {
        // catalog index
        IndexRequest indexRequest = new IndexRequest(EsConstant.PRODUCT_INDEX);
        // Set index id
        String jsonString = JSON.toJSONString(esModel);
        indexRequest.source(jsonString, XContentType.JSON);
        // add
    // bulk batch save
    BulkResponse bulk = client.bulk(bulkRequest, GuliESConfig.COMMON_OPTIONS);
    // Does TODO have errors
    boolean hasFailures = bulk.hasFailures();
        List<String> collect = Arrays.stream(bulk.getItems()).map(item -> item.getId()).collect(Collectors.toList());
        log.error("Product listing error:{}",collect);
    return hasFailures;

4. Package the shelf data according to spuId

That is, encapsulate ESModels according to spuid

// SpuInfoServiceImpl 
public void upSpuForSearch(Long spuId) {
        //1. Find out all sku information and brand name corresponding to the current spuId
        List<SkuInfoEntity> skuInfoEntities=skuInfoService.getSkusBySpuId(spuId);
        //TODO 4. Find out all specification attributes of the current sku that can be retrieved according to spu
        List<ProductAttrValueEntity> productAttrValueEntities = productAttrValueService.list(new QueryWrapper<ProductAttrValueEntity>().eq("spu_id", spuId));
        List<Long> attrIds = productAttrValueEntities.stream().map(attr -> {
            return attr.getAttrId();
        List<Long> searchIds=attrService.selectSearchAttrIds(attrIds); #Here, the database is queried according to the id. there are sql statements below
        Set<Long> ids = new HashSet<>(searchIds);
        List<SkuEsModel.Attr> searchAttrs = productAttrValueEntities.stream().filter(entity -> {
            return ids.contains(entity.getAttrId());
        }).map(entity -> {
            SkuEsModel.Attr attr = new SkuEsModel.Attr();
            BeanUtils.copyProperties(entity, attr);
            return attr;

        //TODO 1. Send a remote call to the inventory system to query whether there is inventory
        Map<Long, Boolean> stockMap = null;
        try {
            List<Long> longList = skuInfoEntities.stream().map(SkuInfoEntity::getSkuId).collect(Collectors.toList());
            List<SkuHasStockVo> skuHasStocks = wareFeignService.getSkuHasStocks(longList);
            stockMap = skuHasStocks.stream().collect(Collectors.toMap(SkuHasStockVo::getSkuId, SkuHasStockVo::getHasStock));
        }catch (Exception e){
            log.error("Remote call to inventory service failed,reason{}",e);

        //2. Encapsulate the information of each sku
        Map<Long, Boolean> finalStockMap = stockMap;
        List<SkuEsModel> skuEsModels = skuInfoEntities.stream().map(sku -> {
            SkuEsModel skuEsModel = new SkuEsModel();
            BeanUtils.copyProperties(sku, skuEsModel);
            //TODO 2. Heat score. 0
            //TODO 3. Query brand and category name information
            BrandEntity brandEntity = brandService.getById(sku.getBrandId());
            CategoryEntity categoryEntity = categoryService.getById(sku.getCatalogId());
            //Set searchable properties
            //Set whether there is inventory
            return skuEsModel;

        //TODO 5. Send data to es for saving: gulimall search
        R r = searchFeignService.saveProductAsIndices(skuEsModels);
        if (r.getCode()==0){
            this.baseMapper.upSpuStatus(spuId, ProductConstant.ProductStatusEnum.SPU_UP.getCode());
        }else {
            log.error("Commodity remote es Save failed");

Persistence layer sql corresponding to selectSearchAttrIds

<resultMap type="com.atguigu.gulimal1.product.entity.AttrEntity" id="attrMap">
		<result property="attrId" column="attr_id" />
		<result property="attrName" column="attr_name" />
		<result property="searchType" column="search_type" />
		<result property="valueType" column="value_type" />
		<result property="icon" column="icon" / >
		<result property="valueSelect" column="value_select" />
		<result property="attrType" column="attr_type" />
		<result property="enab1e" column= "enable"/>
		<result property="catelogId" column="catelog_id" />
		<result property="showDesc" column="show_dese" />
< / resu1tMap>
	resultMap Corresponding return result PO Mapping to database column names
	Long And long Can handle null Situation
<select id="selectSearchAttrIds" resu1tType="java.lang.Long">
		SELECT attr_id FROM 'pms_attr' WHERE attr_id IN
		<foreach collection="attrIds" item="id" separator=" , " open="(" close=")">
		< / foreach>
		AND search_type = 1
</ select>
    here attrIds Is a collection that can be xml of use foreach To traverse.  open close Is the symbol to be added at the beginning and end, because it is in front of IN,So add	There are two parentheses  separator Is the separator between each element

Find out the information of all SKUs according to spu, and then judge whether there is inventory. According to whether there is inventory, write bool type hasStock (the difference between ESModel and PO)

Upload the encapsulated ESModels and call the service of 3

5. A small high-end writing method

When you want to do the same for elements in a collection

You can

 List<CategoryEntity> newCategoryEntities = categoryEntities.stream().filter(
                categoryEntity ->
                    categoryEntity.getEntityNum() == 1
        ).map(categoryEntity -> {
            categoryEntity.setSize(new Long(100));
            return categoryEntity;

The total elements of the set can be operated by lmbda expression, and then collected into a set.

14, Nginx

1. Brief introduction

Forward proxy is that I want to access Google, but I can't access it. Then I ask another server to help me transfer to Google. Then it is said that this server is forward proxy for my request.

Reverse proxy is that I want to visit Google. Google transferred me to Baidu, so it is called Google reverse proxy for my request. (I'm not going to Baidu)

2. Logic of nginx + gateway

In fact, Nginx is to shield the ip of the intranet and expose only one Nginx ip. After having the domain name, the address mapped by the domain name is the address of Nginx.

Logic to implement: the native browser requests gulimall.com. After configuring the hosts file, when you enter gulimall.com in the browser, it is equivalent to the domain name resolution DNS service resolution to obtain ip, that is, instead of accessing the java service, you first find nginx. What do you mean? It means that if the project goes online one day, gulimall.com should be the ip of nginx, and users visit nginx

After the request reaches nginx,

If it is a static resource / static /, find the static resource directly in the nginx server and return it directly.
If it is not a static resource / (it is configured after / static / *, so the priority is low), nginx transfers its upstream to another ip This ip port is the gateway.
(pay attention to configuring proxy_set_header Host $host; in the process of upstream.)

After arriving at the gateway, determine which micro service in nacos should be forwarded through url information assertion (you can also rewrite the url before giving it to nacos), and you get a response

3. Analysis of nginx configuration file


Global block: configure instructions that affect nginx global. For example, user group, pid storage path of nginx process, log storage path, introduction of configuration file, allowing generation of worker process fault, etc
events block: the configuration affects the network connection between the Nginx server and the user. Common settings include whether to enable the serialization of network connections under multiple work process es, whether multiple network connections are allowed to be received at the same time, which event driven model is selected to process connection requests, and the maximum number of connections each word process can support at the same time.
http block:
http global block: the configured instructions include file import, MIME-TYPE definition, log customization, connection timeout, maximum number of single link requests, etc. Error page, etc
Server block: this block is closely related to the virtual host. From the user's point of view, the virtual host is exactly the same as an independent hardware host. Each http block can include multiple server blocks, and each server block is equivalent to a virtual host.
location1: configure the routing of requests and the processing of various pages

4.Nginx + gateway configuration

Modify the host hosts and map gulimall.com to Turn off firewall

At this point, you can request index.html on the homepage of nginx by visiting gulimall.com

To make nginx reverse proxy to the 10000 port of the local machine, you mainly need to modify the server configuration

server {listen     80; server_name  gulimall.com ; #charset koi8-r;#access_log/var/log/nginx/log/host.access.logmain;location / {proxy_pass http:/ / 10000}

listen is the listening port number, server_name is the domain name of the listener. Because of the mapping, gulimall.com is actually the ip address of the host

location is the ip + port to be forwarded

Modify nginx/conf/nginx.conf to map upstream to our gateway service

upstream gulimall{        # 88 is gateway server;}

Modify nginx/conf/conf.d/gulimall.conf. After receiving the visit from gulimall.com, if it is /, it will be transferred to the specified upstream. Because the host header will be lost in nginx forwarding, the gateway does not know the original host, so we add the header information

location / {        proxy_pass http://gulimall;        proxy_set_header Host $host;    }

Gateway Routing and forwarding configuration

Configure the gateway as the server and forward the domain name * *. gulimall.com to the commodity service. When configuring, pay attention to the principle of gateway priority matching, so this configuration should be put later

- id: gulimall_host_route          uri: lb://gulimall-product          predicates:            - Host=**.gulimall.com

In short, nginx is similar to the role of a gateway, mainly to hide the host's domain name. The external request requests the domain name, and the domain name maps to the address of nginx, and then nginx forwards the user's request to the real server.

Keywords: Big Data ElasticSearch Distribution

Added by banjax on Sun, 07 Nov 2021 04:16:25 +0200