In order to deeply study Elasticsearch and Lucene, this paper starts with the creation process of Elasticsearch index and makes a small experiment on two important API s: Refresh and Flush. So as to understand how Elasticsearch indexes a document after receiving it. After understanding the whole process, we will further study the memory data structure and disk file format of Lucene index. Note: Elasticsearch and Kibana 7.10.2 are used in this paper.
1. Experimental environment
1.1 index creation
Create an index Lucene learning, set the number of shard s to 1, turn off replica, and turn off refresh and flush at the same time (the mandatory interval is one hour).
PUT lucene-learning { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0, "refresh_interval": -1, "translog": { "sync_interval": "3600s" } } }, "mappings": { "properties": { "name": { "type": "text" }, "age": { "type": "integer" } } } }
1.2 document directory
From the directory structure of elasticsearch, we can see that the index name and folder name correspond. Because the Lucene learning index has only one partition 0, there is only one folder 0, which contains two directories: index and translog. The former corresponds to a Lucene index (the Shard of elasticsearch is equal to the index of Lucene). This folder is completely managed by Lucene. Elasticsearch will not directly write the files here. All interactions should be completed through Lucene API. The latter is the folder where the translog of elasticsearch is stored. At this time, there is no actual content in these two files. We will check them later.
GET _cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open lucene-learning yGvjXskJQlql1-b2Sx2MHA 1 0 0 0 208b 208b $ pwd elasticsearch-7.10.2/data/nodes/0 $ 0 tree -I '_state|5Uwl-yHUTjiCMn9XrZgM-g' -L 5 . ├── indices │ └── yGvjXskJQlql1-b2Sx2MHA │ └── 0 │ ├── index │ │ ├── segments_2 │ │ └── write.lock │ └── translog │ ├── translog-2.tlog │ └── translog.ckp └── node.lock $ cat index/segments_2 ?�segments translog_uuidh8Y1UqnyT9ePnvkI4mMQ3Qlocal_checkpoint-1 history_uuidvic9U3IUSvWWP0rDi-fEpA max_seq_no-1max_unsafe_auto_id_timestamp-1�(��38kZ $ cat translog/translog-2.tlog ?�translogh8Y1UqnyT9ePnvkI4mMQ3Qy�roW
Note: Elasticsearch may have some internal processing, so the segment version number of Lucene is 2. However, this version number does not seem to be directly related to the serial number of the actual segment.
2. Experimental design
2.1 steps
The steps of the whole experiment are as follows:
- Index a document first
- The contents of the document are written to the cache and translog at the same time
- Manually execute refresh
- Elasticsearch create segment 0
- Because of Lucene's real-time search function, doc can be searched at this time
- Index a document again and refresh
- segment 1 is obtained, and the new request is appended to the translog file
- Perform flush manually
- Both segments 0 and 1 are fsync to the disk, the commit point in the segments file is updated, and the translog is cleared
2.2 interpretation
Please refer to the third part of the experimental process and this article Guide to Refresh and Flush Operations in Elasticsearch.
3. Experimental process
3.1 index document 1
Now let's index a file:
PUT lucene-learning/_doc/1 { "name": "Allen Hank", "age": 30 }
Even if it has not been refresh ed, according to the design of Elasticsearch, the request has entered the memory cache and translog at the same time. Otherwise, in case of downtime, all requests in the cache will be lost. Write to the translog as soon as it comes up, so what is lost is only those requests for dropping the disk to the translog file. At this time, the segment file does not change, but you can see some empty Lucene files.
$ pwd elasticsearch-7.10.2/data/nodes/0/indices/yGvjXskJQlql1-b2Sx2MHA/0 $ tree -I '_state' . ├── index │ ├── _0.fdm │ ├── _0.fdt │ ├── _0_Lucene85FieldsIndex-doc_ids_0.tmp │ ├── _0_Lucene85FieldsIndexfile_pointers_1.tmp │ ├── segments_2 │ └── write.lock └── translog ├── translog-2.tlog └── translog.ckp $ ll index -rw-r--r-- 1 daichen staff 0B May 27 15:02 _0.fdm -rw-r--r-- 1 daichen staff 0B May 27 15:02 _0.fdt -rw-r--r-- 1 daichen staff 0B May 27 15:02 _0_Lucene85FieldsIndex-doc_ids_0.tmp -rw-r--r-- 1 daichen staff 0B May 27 15:02 _0_Lucene85FieldsIndexfile_pointers_1.tmp -rw-r--r-- 1 daichen staff 208B May 27 14:53 segments_2 -rw-r--r-- 1 daichen staff 0B May 27 14:53 write.lock $ cat translog/translog-2.tlog ?�translogPpqSibBGQcK_Cowcf0190Q�$W 1_doc({ "name": "Allen Hank", "age": 30 } ��������PQ�
[optional] if you search now, you can't find the document just now (note that this step may affect the observation result of refresh in the next step).
POST lucene-learning/_search { "took" : 8, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }
After manually executing refresh, you can see Lucene merge various files into cfe and cfs two compound files (please refer to the appendix for specific meaning), which means that segment 0 has been created. But it hasn't commit ted yet, which is why segment_ The X file has not changed. (if search has been performed previously, Elasticsearch seems to flush automatically)
POST lucene-learning/_refresh { "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 } } tree -I '_state' . ├── index │ ├── _0.cfe │ ├── _0.cfs │ ├── _0.si │ ├── segments_2 │ └── write.lock └── translog ├── translog-2.tlog └── translog.ckp
3.2 index document 2
Index a new document.
PUT lucene-learning/_doc/2 { "name": "Tom Hank", "age": 25 } $ tree -I '_state' . ├── index │ ├── _0.cfe │ ├── _0.cfs │ ├── _0.si │ ├── _1.fdm │ ├── _1.fdt │ ├── _1_Lucene85FieldsIndex-doc_ids_2.tmp │ ├── _1_Lucene85FieldsIndexfile_pointers_3.tmp │ ├── segments_2 │ └── write.lock └── translog ├── translog-2.tlog └── translog.ckp
Manually execute refresh to get segment 1:
POST lucene-learning/_refresh { "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 } } $ tree -I '_state' . ├── index │ ├── _0.cfe │ ├── _0.cfs │ ├── _0.si │ ├── _1.cfe │ ├── _1.cfs │ ├── _1.si │ ├── segments_2 │ └── write.lock └── translog ├── translog-2.tlog └── translog.ckp $ cat translog/translog-2.tlog ?�translogh8Y1UqnyT9ePnvkI4mMQ3Qy�roW 1_doc({ "name": "Allen Hank", "age": 30 } ��������PQ�U 2_doc&{ "name": "Tom Hank", "age": 25 } ����������X�
3.3 submission
After the flush is executed, the segment data file can be considered as the real disk. At this time, you can see that the version number of segments changes to 3, and the submission point contains segments 0 and 1. Note that because of the cache of the operating system itself, the files seen in the previous steps may not be saved to the disk. Specifically, you can search the relationship between the system call write and fsync.
POST lucene-learning/_flush { "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 } } $ tree -I '_state' . ├── index │ ├── _0.cfe │ ├── _0.cfs │ ├── _0.si │ ├── _1.cfe │ ├── _1.cfs │ ├── _1.si │ ├── segments_3 │ └── write.lock └── translog ├── translog-3.tlog └── translog.ckp $ cat index/segments_3 ?�segments ��F�ߝᰆ�4�[3 translog_uuidh8Y1UqnyT9ePnvkI4mMQ3Qmin_retained_seq_no0�ߝᰆ�4�Z�_1��F�ߝᰆ�4Lucene87��������������������������F�ߝᰆ�4�[local_checkpoint1max_unsafe_auto_id_timestamp-1 history_uuidvic9U3IUSvWWP0rDi-fEpA max_seq_no1�(��!��9 $ cat translog/translog-3.tlog ?�translogh8Y1UqnyT9ePnvkI4mMQ3Qy�ro