Use of Field field and maintenance of index library
preface
Previously, we conducted a simple demo of the use of Lucene. In this paper, we will explain in detail the use of Field field Field in Lucene and the related operations of index library maintenance.
Use of Fileld domain
brief introduction
Lucene storage objects take Document as the storage unit, and the relevant attribute values in the objects are stored in Field.
Field is the field in the Document, including field name and field value. A Document can include multiple fields, and Document is only field
The Field value is not only the content to be indexed, but also the content to be searched.
Field has three properties:
- tokenized
- Yes: word segmentation, that is, word segmentation with Field value. The purpose of word segmentation is to index.
For example: product name, product introduction, etc. users need to enter keywords to search these contents, because the search content format is not fixed and there are many contents to be searched
Index vocabulary units after word segmentation.
- No: no word segmentation
For example, order number, ID number, etc.
- Indexed
-
Yes: index. Index the word after Field word segmentation or the whole Field value. The purpose of the index is to search.
For example, the index of the commodity and the product is indexed after the word segmentation, and the order number and ID number need not be segmenting but index. These will be used as enquiries in the future.
Conditions. -
No: do not index. The contents of this field cannot be searched.
For example, file path, picture path, etc. are not used as indexes for query conditions.
- stored
-
Yes: store the Field value in the Document, and the Field stored in the Document can be obtained from the Document.
For example: product name, order number, and all fields to be obtained from Document in the future should be stored.
-
No: the Field value is not stored, and the Field that is not stored cannot be obtained through Document
For example: product introduction, the content is large and does not need to be stored. If you want to show the product profile to users, you can get the product profile from the relational database of the system
Introduction.If you need a product description, you can query the database according to the searched product ID, and then display the product description information.
Common types
The class corresponding to field is org apache. lucene. document. Field, which implements org apache. lucene. index. Indexablefield interface, which represents a field used for indexing. Field class comparison
There are some at the bottom, so Lucene implements many Field subclasses for different scenarios.
The common types and usage of Field are shown in the following table:
Field type | data type | Word segmentation | Index | Store | explain |
---|---|---|---|---|---|
StringField(FieldName, FieldValue, Store.YES) | character string | N | Y | Y/N | String type Field, not a word segmentation, as an index (such as: ID number, order number), whether it needs to be stored by Store.YES or store No decision |
TextField(FieldName, FieldValue, Store.NO) | Text type | Y | Y | Y/N | The text type is Field, word segmentation and index. Whether it needs to be stored is determined by store Yes or store No decision |
LongField(FieldName, FieldValue, Store.YES) or LongPoint(String name, int... point), etc | Numerical representation | Y | Y | Y/N | In Lucene 6.0, LongField is replaced with LongPoint, IntField with IntPoint, FloatField with FloatPoint, and DoubleField with DoublePoint. For numeric field indexes, the indexes are not stored. To store, combine StoredField. |
StoredField(FieldName, FieldValue) | Support multiple types | N | N | Y | Build different types of fields without word segmentation, index and storage (e.g. product picture path) |
Field application code
@Test public void createIndex() throws Exception { // 1. Collect data Book booka = new Book(); List<Book> bookList = new ArrayList<Book>(); booka.setId(1); booka.setDesc("Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. The PyLucene sub project provides Python bindings for Lucene Core. "); booka.setName("Lucene"); booka.setPrice(100.45f); bookList.add(booka); Book bookb = new Book(); bookb.setId(11); bookb.setDesc("Solr is highly scalable, providing fully fault tolerant distributed indexing, search and analytics. It exposes Lucene's features through easy to use JSON/HTTP interfaces or native clients for Java and other languages. "); bookb.setName("Solr"); bookb.setPrice(320.45f); bookList.add(bookb); Book bookc = new Book(); bookc.setId(21); bookc.setDesc("The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models."); bookc.setName("Hadoop"); bookc.setPrice(620.45f); bookList.add(bookc); // 2. Encapsulate the collected data into the Document object List<Document> docList = new ArrayList<>(); Document document; for (Book book : bookList) { document = new Document(); // IntPoint word segmentation index does not store combined StoredField Field id = new IntPoint("id", book.getId()); System.out.println(id.fieldType().tokenized() + ":" + id.fieldType().stored()); Field id_v = new StoredField("id", book.getId()); // Word segmentation, index, storage TextField Field name = new TextField("name", book.getName(), Field.Store.YES); // Word segmentation, index, not stored but numeric, so use FloatPoint Field price = new FloatPoint("price", book.getPrice()); // Word segmentation, index, do not store TextField Field desc = new TextField("desc", book.getDesc(), Field.Store.YES); // Set the field field to the Document object document.add(id); document.add(id_v); document.add(name); document.add(price); document.add(desc); docList.add(document); } //3. Create Analyzer word splitter to segment documents Analyzer analyzer = new StandardAnalyzer(); // Create Directory and IndexWriterConfig objects Directory directory = FSDirectory.open(Paths.get("D:/lucene/index")); IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer); // 4. Create IndexWriter write object IndexWriter indexWriter = new IndexWriter(directory,indexWriterConfig); // Add document object for (Document doc : docList) { indexWriter.addDocument(doc); } // Release resources indexWriter.close(); }
Maintenance of index library
Index addition
Index addition is an index record that does not exist in the new index library. The code is as follows:
@Test public void indexCreate()throws Exception{ // Create word breaker Analyzer analyzer = new StandardAnalyzer(); // Create Directory stream object Directory directory = FSDirectory.open(Paths.get("D:/lucene/index3")); IndexWriterConfig config = new IndexWriterConfig(analyzer); // Create index write object IndexWriter indexWriter = new IndexWriter(directory,config); // Create Document Document document = new Document(); document.add(new TextField("id","1001", Field.Store.YES)); document.add(new TextField("name","game", Field.Store.YES)); document.add(new TextField("desc","one world one dream", Field.Store.YES)); // Add document create index indexWriter.addDocument(document); indexWriter.close(); }
index upgrade
When updating an index, delete it first and then add it. It is recommended to use this method for updating requirements. To ensure that the existing index is updated, you can check it first
Find out and confirm that the update record exists and perform the update operation.
If the target document object for updating the index does not exist, the addition is performed.
The code is as follows:
@Test public void indexUpdate()throws Exception{ // Create word breaker Analyzer analyzer = new StandardAnalyzer(); // Create Directory stream object Directory directory = FSDirectory.open(Paths.get("D:/lucene/index3")); IndexWriterConfig config = new IndexWriterConfig(analyzer); // Create index write object IndexWriter indexWriter = new IndexWriter(directory,config); // Create Document Document document = new Document(); document.add(new TextField("id","1001", Field.Store.YES)); document.add(new TextField("name","study hard", Field.Store.YES)); document.add(new TextField("desc","What should I do when the game is over", Field.Store.YES)); // to update indexWriter.updateDocument(new Term("name","game"),document); indexWriter.close(); }
Index deletion
Delete the index according to the Term item, and all that meet the conditions will be deleted.
Example: indexwriter deleteDocuments(new Term("name", "game"));
Delete all
Clear all indexes in the index library:
indexWriter.deleteAll();
Source code
summary
In this paper, we mainly studied the use of Field field Field in Lucene, several common Field types, and the addition, deletion and modification of index library.
more
More personal interviews, learning materials, and small buddy who want to push the big factory can contact me. Please pay attention to WeChat official account: programmer information station, reply to key words "interview" or "information" to get more learning materials, and reply to "push", I will help you push the big factory.