Spring Boot 2.0 integrates ES 5 article content search practice

Spring Boot 2.0 integration ES 5 article content search practice

Contents of this chapter

  1. Article content search ideas
  2. Search content segmentation
  3. Search query statement
  4. Screening conditions
  5. Paging and sorting conditions
  6. Summary

Reading time: 8 minutes Excerpt: in your life, how much you get is more, and how to live well or not is actually undefined. The key is what you think in your heart. If you want too much, you won't be happy

1, Article content search ideas

The last one talked about how to integrate ES 5 on Spring Boot 2.0. This one talks about specific actual combat. Let's briefly talk about how to realize the specific implementation of the content search of articles and Q & A. The implementation idea is simple:

  • Set the minimum matching weight based on phrase matching
  • Where's the phrase? Use IK word segmentation
  • Filter based on Fiter
  • Paging sorting based on Pageable

If you call the search directly here, it is easy to find unsatisfactory things. Because content search focuses on content connectivity. Therefore, the processing method here is low. I hope to communicate more and realize a better search method together. It is to get many phrases through word segmentation, and then use phrases to accurately match phrases.

ES it's easy to install the IK word breaker plug-in. The first step is to download the corresponding version https://github.com/medcl/elasticsearch-analysis-ik/releases . The second step is in elastic search-5.5 3 / plugins directory, create a new folder IK and add elastic search-analysis-ik-5.5 3. Zip the extracted file and copy it to elasticsearch-5.1 1 / plugins / IK directory. Finally, restart ES.

2, Search content segmentation

After IK is installed, how to call it?

In the first step, my search content will be concatenated with commas. So we'll split the comma first

The second step is to add yourself to the search terms, because some words disappear after ik word segmentation This is a bug

Step 3: use the AnalyzeRequestBuilder object to obtain the return value object list after IK word segmentation

Step 4: optimize the word segmentation results. For example, if all words are words, keep all; If there are words and characters, keep the words; If there are only words, keep the words

The core implementation code is as follows:

/**
     * Search content segmentation
     */
    protected List<String> handlingSearchContent(String searchContent) {

        List<String> searchTermResultList = new ArrayList<>();
        // Get a list of search terms separated by commas
        List<String> searchTermList = Arrays.asList(searchContent.split(SearchConstant.STRING_TOKEN_SPLIT));

        // If the search term is greater than 1 word, the word segmentation result list is obtained through the IK word splitter
        searchTermList.forEach(searchTerm -> {
            // The search term TAG itself is added to the search term list and solves the problem of will
            searchTermResultList.add(searchTerm);
            // Get search term IK participle list
            searchTermResultList.addAll(getIkAnalyzeSearchTerms(searchTerm));
        });

        return searchTermResultList;
    }

    /**
     * Call ES to get the result after IK word segmentation
     */
    protected List<String> getIkAnalyzeSearchTerms(String searchContent) {
        AnalyzeRequestBuilder ikRequest = new AnalyzeRequestBuilder(elasticsearchTemplate.getClient(),
                AnalyzeAction.INSTANCE, SearchConstant.INDEX_NAME, searchContent);
        ikRequest.setTokenizer(SearchConstant.TOKENIZER_IK_MAX);
        List<AnalyzeResponse.AnalyzeToken> ikTokenList = ikRequest.execute().actionGet().getTokens();

        // Cyclic assignment
        List<String> searchTermList = new ArrayList<>();
        ikTokenList.forEach(ikToken -> {
            searchTermList.add(ikToken.getTerm());
        });

        return handlingIkResultTerms(searchTermList);
    }

    /**
     * If word segmentation result: Shampoo (shampoo, shampoo, wash, hair, water)
     * - All words, reserved
     * - Word + word, keep only words
     * - All words are reserved
     */
    private List<String> handlingIkResultTerms(List<String> searchTermList) {
        Boolean isPhrase = false;
        Boolean isWord = false;
        for (String term : searchTermList) {
            if (term.length() > SearchConstant.SEARCH_TERM_LENGTH) {
                isPhrase = true;
            } else {
                isWord = true;
            }
        }

        if (isWord & isPhrase) {
            List<String> phraseList = new ArrayList<>();
            searchTermList.forEach(term -> {
                if (term.length() > SearchConstant.SEARCH_TERM_LENGTH) {
                    phraseList.add(term);
                }
            });
            return phraseList;
        }

        return searchTermList;
    }

3, Search query statement

Construct the content enumeration object and list the fields to be searched. The code of ContentSearchTermEnum is as follows:

import lombok.AllArgsConstructor;

@AllArgsConstructor
public enum ContentSearchTermEnum {

    // title
    TITLE("title"),
    // content
    CONTENT("content");

    /**
     * Search field
     */
    private String name;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

}

Cycle through the phrase search matching search field and set the minimum weight value to 1. The core code is as follows:

/**
     * Construct query criteria
     */
    private void buildMatchQuery(BoolQueryBuilder queryBuilder, List<String> searchTermList) {
        for (String searchTerm : searchTermList) {
            for (ContentSearchTermEnum searchTermEnum : ContentSearchTermEnum.values()) {
                queryBuilder.should(QueryBuilders.matchPhraseQuery(searchTermEnum.getName(), searchTerm));
            }
        }
        queryBuilder.minimumShouldMatch(SearchConstant.MINIMUM_SHOULD_MATCH);
    }

4, Screening conditions

There are more than things found. Sometimes the demand is like this. You need to search under a category. For example, e-commerce needs to search for goods under a brand. Then you need to construct some fitler s to filter. The OR AND statements under Where of the corresponding SQL statement. Add a filter in ES using the filter method. The code is as follows:

/**
     * Build filter criteria
     */
    private void buildFilterQuery(BoolQueryBuilder boolQueryBuilder, Integer type, String category) {
        // Content type filtering
        if (type != null) {
            BoolQueryBuilder typeFilterBuilder = QueryBuilders.boolQuery();
            typeFilterBuilder.should(QueryBuilders.matchQuery(SearchConstant.TYPE_NAME, type).lenient(true));
            boolQueryBuilder.filter(typeFilterBuilder);
        }

        // Content category filtering
        if (!StringUtils.isEmpty(category)) {
            BoolQueryBuilder categoryFilterBuilder = QueryBuilders.boolQuery();
            categoryFilterBuilder.should(QueryBuilders.matchQuery(SearchConstant.CATEGORY_NAME, category).lenient(true));
            boolQueryBuilder.filter(categoryFilterBuilder);
        }
    }

Type is a large class and category is a small class, so you can support size class filtering. But what if you need to search in type = 1 or type = 2? The specific implementation code is very simple:

typeFilterBuilder
    .should(QueryBuilders.matchQuery(SearchConstant.TYPE_NAME, 1)
    .should(QueryBuilders.matchQuery(SearchConstant.TYPE_NAME, 2)
    .lenient(true));

Through the chain expression, two should implement OR, that is, the OR statement corresponding to SQL. The AND statement corresponding to SQL is implemented through two boolquerybuilders.

5, Paging and sorting conditions

The paging sort code is simple:

@Override
    public PageBean searchContent(ContentSearchBean contentSearchBean) {

        Integer pageNumber = contentSearchBean.getPageNumber();
        Integer pageSize = contentSearchBean.getPageSize();

        PageBean<ContentEntity> resultPageBean = new PageBean<>();
        resultPageBean.setPageNumber(pageNumber);
        resultPageBean.setPageSize(pageSize);

        // Build search phrase
        String searchContent = contentSearchBean.getSearchContent();
        List<String> searchTermList = handlingSearchContent(searchContent);

        // Build query criteria
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        buildMatchQuery(boolQueryBuilder, searchTermList);

        // Build filter criteria
        buildFilterQuery(boolQueryBuilder, contentSearchBean.getType(), contentSearchBean.getCategory());

        // Build paging and sorting conditions
        Pageable pageable = PageRequest.of(pageNumber, pageSize);
        if (!StringUtils.isEmpty(contentSearchBean.getOrderName())) {
            pageable = PageRequest.of(pageNumber, pageSize, Sort.Direction.DESC, contentSearchBean.getOrderName());
        }
        SearchQuery searchQuery = new NativeSearchQueryBuilder().withPageable(pageable)
                .withQuery(boolQueryBuilder).build();

        // search
        LOGGER.info("\n ContentServiceImpl.searchContent() [" + searchContent
                + "] \n DSL  = \n " + searchQuery.getQuery().toString());
        Page<ContentEntity> contentPage = contentRepository.search(searchQuery);

        resultPageBean.setResult(contentPage.getContent());
        resultPageBean.setTotalCount((int) contentPage.getTotalElements());
        resultPageBean.setTotalPage((int) contentPage.getTotalElements() / resultPageBean.getPageSize() + 1);
        return resultPageBean;
    }

Use the Pageable object to construct paging parameters and specify the corresponding sorting field and sorting order (DESC ASC).

6, Summary

This idea is relatively simple. If you have a better implementation method, welcome to exchange and discuss.

Added by lvitup on Sat, 18 Dec 2021 04:28:18 +0200