ES series tutorial 02: Elasticsearch one day tour

This article was first published in the official account of the geek barracks. Original address

The best way to learn elastic search (hereinafter referred to as ES) is to practice more. In this series of tutorials, I will use the small project "online bookstore" throughout each chapter. The background of this project is very simple. Each book has seven attributes: ISBN, title, description, price, author, publisher and stock. The book information is stored in ES. Users can search the book they want to buy according to some attribute information of the book.

In order to make you have an intuitive feeling of ES as a whole, I will lead you to quickly browse various abilities of ES in this section. You may be confused about some concepts just now, but it doesn't matter. All the contents involved in this section will be introduced in detail in the subsequent chapters. You only need to know the concepts in this section first.

Interact with ElasticSearch

To use the services provided by es, you must first know how to give instructions to es. The ES server provides a series of restful APIs, which can send data or instructions to the API interface provided by es in the form of JSON serialization through HTTP protocol.

As we all know, network applications designed based on RESTful style use four HTTP verbs: GET, POST, PUT and DELETE to operate server-side resources. GET is used to obtain resources, POST is used to create new resources, PUT is used to update resources and DELETE is used to DELETE resources.

For example, send an HTTP POST request to ES through the following curl command to save the information of a Book:

curl -X POST -H "Content-Type: application/json" -d \ 
'{"title": "Zero Basics Java", "description": "self-taught JAVA Introduction to", "price": 66.88}' \
http://localhost:9200/book_store/_doc?pretty

The execution result of this curl command is shown in the figure below. Any programming language or tool can call the RESTful API interface provided by ElasticSearch through HTTP protocol.

If we call ElasticSearch through a long string of curl commands every time, it will be a little cumbersome. Kibana provides us with a more convenient operation mode.

In the left navigation bar of Kibana, there is a menu called dev tools. The menu position is shown in the figure below:

Using Dev Tools allows us to operate es in a very convenient way. For example, the operation in Dev Tools in the figure below is exactly the same as the curl command above - save the information of a book in ES.

Compared with the curl command, the Kibana Dev Tools can omit the declaration of the content type, the IP address of the ES server and the listening port. Because Kibana knows this information and will automatically add it for us, we don't need to specify it explicitly. In addition, the command format entered in Kibana is beautiful. Click the arrow (Click to send request) to directly send a command to es, and the Response result is displayed on the right side, which is very convenient to use.

All the following example codes are used in Kibana, that is, there is no need to explicitly specify HTTP Header information and server IP address.

Basic concepts

Before introducing other contents, let's take a closer look at the Response returned by ES:

{
    "_index" : "book_store",
    "_type" : "_doc",
    "_id" : "E2lww30BAtxnt_qoQMJz",
    "_version" : 1,
    "result" : "created",
    "_shards" : {
        "total" : 2,
        "successful" : 2,
        "failed" : 0
    },
    "_seq_no" : 15,
    "_primary_term" : 2
}

The Response of ES is also a JSON structure. From its result field, we can know that we have successfully saved the information of a book in ES. In addition, there are three fields that need attention:_ index,_ Type and_ id, which represent index, type and document id respectively.

  • Document: the most basic unit for reading and writing data. In this case, a book corresponds to a document. For another example, in the e-commerce system, a commodity corresponds to a document. In the personnel system, an employee also corresponds to a document. Each document contains a_ id field, as the identification of the document, its value is unique in documents of the same type.
  • Index: an index is a container for documents, a collection of similar documents, and a logical unit for document storage.
  • Type: in earlier versions of ES, an index can contain multiple types, and one type contains multiple documents. However, in version 7.0, an index can only have one type, and the name of the type must be_ doc. So at present, type has no substantive meaning.

Let's make a rough analogy between ElasticSearch and various concepts in relational database to help you understand:

ElasticSearchRelational database
Index + typeTable
DocumentRow
Document fieldColumn
Document ID(Document ID)Primary key

We use index book_store stores the information of all books, and the following Request represents the book in the index_ Store_ Save the information of a book under doc type (the type can only be _doc):

POST /book_store/_doc
{
    "ISBN": "9787187651807",
    "title": "Zero Basics Java", 
    "description": "Zero basic self-study JAVA Introduction to programming", 
    "price": 66.88, 
    "author": "Poype",
    "publisher": "Geek barracks", 
    "stock": 55
}

Compared with the need to create the corresponding table in advance before the database performs the insert operation, using ES does not need to create the corresponding index before writing the document. If the corresponding index does not exist when writing the document, es will automatically create the index for us.

Because no unique id is provided when saving the document, ES will automatically generate a unique id for the document as_ The value of the id field.

Get document data

You can obtain the information of a document according to the above document ID, send a query Request to the ES server using the HTTP GET method, and specify the index and document ID in the Request. Note that the document ID here is randomly generated by es, so the ID is different every time.

GET /book_store/_doc/FWmJw30BAtxnt_qoj8IE

After receiving the request, ES will return the following Response:

{
    "_index" : "book_store",
    "_type" : "_doc",
    "_id" : "FWmJw30BAtxnt_qoj8IE",
    "_version" : 1,
    "_seq_no" : 16,
    "_primary_term" : 2,
    "found" : true,
    "_source" : {
        "ISBN" : "9787187651807",
        "title" : "Zero Basics Java",
        "description" : "Zero basic self-study JAVA Introduction to programming",
        "price" : 66.88,
        "author" : "Poype",
        "publisher" : "Geek barracks",
        "stock" : 55
    }
}

The Response contains metadata describing the document, as we have already described_ index,_ type and_ id three fields. In addition, if the found field is true, the corresponding document is found successfully_ Below the source field is the real document data.

If you initiate a query with a nonexistent document ID, you will receive the following Response, where the found field is false, indicating that there is no document with the specified ID.

{
    "_index" : "book_store",
    "_type" : "_doc",
    "_id" : "FWmJw30BAtxnt_qoj81E",
    "found" : false
}

Modify document data

You can modify an existing document based on the document ID. Send a Request to modify a document using the HTTP PUT method:

PUT /book_store/_doc/FWmJw30BAtxnt_qoj8IE
{
    "ISBN": "9787187651807",
    "title": "Zero Basics Java", 
    "description": "Zero basic self-study JAVA Introduction to programming", 
    "price": 55.66, 
    "author": "Poype",
    "publisher": "Geek barracks", 
    "stock": 55
}

We revised the price of the book from 66.88 to 55.66, and other information remained unchanged. After receiving the request, ES returns the following results:

{
    "_index" : "book_store",
    "_type" : "_doc",
    "_id" : "FWmJw30BAtxnt_qoj8IE",
    "_version" : 2,
    "result" : "updated",
    "_shards" : {
        ...
    },
    "_seq_no" : 17,
    "_primary_term" : 2
}

The result field is updated, indicating that the document has been updated successfully. In addition, it should be noted that_ version and_ seq_no two fields, their values are increased. These two fields are used for concurrency control, and their specific meanings will be described in detail in subsequent chapters.

remove document

You can delete a document by using the HTTP DELETE method, such as deleting the previous document;

DELETE /book_store/_doc/FWmJw30BAtxnt_qoj8IE

After receiving the request, ES returns the following Response:

{
    "_index" : "book_store",
    "_type" : "_doc",
    "_id" : "FWmJw30BAtxnt_qoj8IE",
    "_version" : 8,
    "result" : "deleted",
    "_shards" : {
        ...
    },
    "_seq_no" : 23,
    "_primary_term" : 2
}

The result field is deleted, indicating that the document has been successfully deleted.

Full text search

You may be disappointed to see that the CRUD operations we have introduced can be well supported by any database. What are the advantages of ES? Don't worry. Let's take a look at what ElasticSeach is good at compared with traditional databases - full text search.

In order to better explain it, we will add more information of several books to ES:

POST /book_store/_doc/
{
    "ISBN": "9787111213826",
    "title": "JAVA Programming thought",
    "description": "JAVA Learning classics,Palace level works",
    "price": 89.88,
    "author": "Bruce Eckel",
    "publisher": "Machinery Industry Press",
    "stock": 231
}

POST /book_store/_doc/
{
    "ISBN": "9085115891807",
    "title": "Python Programming, from introduction to practice",
    "description": "Zero Basics Python Programming tutorial books",
    "price": 82.30,
    "author": "Eric Matthes",
    "publisher": "People's Posts and Telecommunications Publishing House",
    "stock": 121
}

POST /book_store/_doc/
{
    "ISBN": "9787115449153",
    "title": "Elasticsearch actual combat",
    "description": "Elasticsearch Introductory tutorial books",
    "price": 79,
    "author": "Radu Gheorghe",
    "publisher": "People's Posts and Telecommunications Publishing House",
    "stock": 87
}

POST /book_store/_doc/
{
    "ISBN": "9787115472588",
    "title": "Brother bird's Linux Private dishes",
    "description": "apply Linux System application, development and operation and maintenance personnel",
    "price": 98.80,
    "author": "Brother bird",
    "publisher": "People's Posts and Telecommunications Publishing House",
    "stock": 77
}

For example, a customer wants to search for a book about Java. If you use a relational database, you may execute SQL like the following to help customers find the books they want;

select * from book_store where title like %Java%;

This SQL has poor performance and needs to scan the entire table. And the compatibility is not good. For example, the titles containing Java and Java can not be retrieved.

Instead of ElasticSearch, you can use the following query command to search Java related books:

GET /book_store/_doc/_search
{
    "query": {
        "match": {
            "title": "java"
        }
    }
}

ElasticSearch returns the following results after receiving the request:

{
    "took" : 619,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
                "value" : 2,
                "relation" : "eq"
        },
        "max_score" : 0.9278223,
        "hits" : [
            {
                "_index" : "book_store",
                "_type" : "_doc",
                "_id" : "GGmpw30BAtxnt_qoXMJq",
                "_score" : 0.9278223,
                "_source" : {
                    "ISBN" : "9787111213826",
                    "title" : "JAVA Programming thought",
                    "description" : "JAVA Learning classics,Palace level works",
                    "price" : 89.88,
                    "author" : "Bruce Eckel",
                    "publisher" : "Machinery Industry Press",
                    "stock" : 231
                }
            },
            {
                "_index" : "book_store",
                "_type" : "_doc",
                "_id" : "FWmJw30BAtxnt_qoj8IE",
                "_score" : 0.9278223,
                "_source" : {
                    "ISBN" : "9787187651807",
                    "title" : "Zero Basics Java",
                    "description" : "Zero basic self-study JAVA Introduction to programming",
                    "price" : 55.66,
                    "author" : "Poype",
                    "publisher" : "Geek barracks",
                    "stock" : 55
                }
            }
        ]
    }
}

You can see that the Response already contains the results we want. Its compatibility is very good. Documents containing Java or Java in the title of the book can be successfully searched. Its performance is also very high. Even with a large amount of data, ElasticSearch can quickly search the data users want.

More complex search

If a customer wants to buy a book about learning Java, but his budget is limited, he can only pay 70 yuan at most. So he wants to retrieve Java related books, and the price of the book must be less than 70 yuan. Similar search scenarios are common in our daily life. This search requirement can be realized through the following commands, where "lt" means less than:

GET /book_store/_doc/_search
{
    "query": {
        "bool": {
            "filter": {
                "range": {
                    "price": {
                        "lt": 70
                    }
                }
            },
            "must": {
                "match": {
                    "title": "java"
                }
            }
        }
    }
}

After receiving the request, ES returns the following results:

{
    "took" : 320,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 0.9278223,
        "hits" : [
            {
                "_index" : "book_store",
                "_type" : "_doc",
                "_id" : "FWmJw30BAtxnt_qoj8IE",
                "_score" : 0.9278223,
                "_source" : {
                    "ISBN" : "9787187651807",
                    "title" : "Zero Basics Java",
                    "description" : "Zero basic self-study JAVA Introduction to programming",
                    "price" : 55.66,
                    "author" : "Poype",
                    "publisher" : "Geek barracks",
                    "stock" : 55
                }
            }
        ]
    }
}

Highlight matching clip

If there is another customer, he wants to find a classic. We can achieve this requirement by searching for documents with the word "classic" in the description of the book.

But the customer not only wants to get the search results, but also wants to see at a glance why the results returned to him meet his requirements. Generally, you can highlight keywords in search results so that users can recognize them at a glance.

GET /book_store/_doc/_search
{
    "query": {
        "match": {
            "description": "classic"
        }
    },
    "highlight": {
        "fields": {
            "description": {}
        }
    }
}

We specify the name of the highlighted field with highlight in the search command. ElasticSearch returns the following results after receiving the request:

{
    ...
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 2.8796844,
        "hits" : [
            {
                "_index" : "book_store",
                "_type" : "_doc",
                "_id" : "GGmpw30BAtxnt_qoXMJq",
                "_score" : 2.8796844,
                "_source" : {
                    "ISBN" : "9787111213826",
                    "title" : "JAVA Programming thought",
                    "description" : "JAVA Learning classics,Palace level works",
                    "price" : 89.88,
                    "author" : "Bruce Eckel",
                    "publisher" : "Machinery Industry Press",
                    "stock" : 231
                },
                "highlight" : {
                    "description" : [
                        "JAVA study<em>through</em><em>Canon</em>,Palace level works"
                    ]
                }
            }
        ]
    }
}

Compared with the previous search results, the Response also contains the highlight field this time. html tags are used to highlight the matching keywords. But this highlight is not very appropriate. Compared with "classic", the highlight method of "classic" is more reasonable. Let's have an impression here. When we learn the word splitter later, we will know how to optimize this problem.

Summary

Through a small example, this chapter takes you through some basic concepts in ElasticSearch. We learned how to interact with ElasticSearch server, how to operate ElasticSearch more conveniently through Kibana, how to perform basic CRUD operations, and how to realize simple search requirements. The following chapters will formally introduce the technical details of ElasticSearch.

Friends who love this article welcome to official account of the public number geek barracks, and watch more exciting content.

Keywords: Java Big Data ElasticSearch search engine

Added by delphi123 on Wed, 05 Jan 2022 03:04:48 +0200