Lucene application practice -- use of Field field and maintenance of index library

Use of Field field and maintenance of index library

preface

Previously, we conducted a simple demo of the use of Lucene. In this paper, we will explain in detail the use of Field field Field in Lucene and the related operations of index library maintenance.

Use of Fileld domain

brief introduction

Lucene storage objects take Document as the storage unit, and the relevant attribute values in the objects are stored in Field.

Field is the field in the Document, including field name and field value. A Document can include multiple fields, and Document is only field
The Field value is not only the content to be indexed, but also the content to be searched.

Field has three properties:

  1. tokenized
  • Yes: word segmentation, that is, word segmentation with Field value. The purpose of word segmentation is to index.

For example: product name, product introduction, etc. users need to enter keywords to search these contents, because the search content format is not fixed and there are many contents to be searched
Index vocabulary units after word segmentation.

  • No: no word segmentation

For example, order number, ID number, etc.

  1. Indexed
  • Yes: index. Index the word after Field word segmentation or the whole Field value. The purpose of the index is to search.

    For example, the index of the commodity and the product is indexed after the word segmentation, and the order number and ID number need not be segmenting but index. These will be used as enquiries in the future.
    Conditions.

  • No: do not index. The contents of this field cannot be searched.

    For example, file path, picture path, etc. are not used as indexes for query conditions.

  1. stored
  • Yes: store the Field value in the Document, and the Field stored in the Document can be obtained from the Document.

    For example: product name, order number, and all fields to be obtained from Document in the future should be stored.

  • No: the Field value is not stored, and the Field that is not stored cannot be obtained through Document

    For example: product introduction, the content is large and does not need to be stored. If you want to show the product profile to users, you can get the product profile from the relational database of the system
    Introduction.

    If you need a product description, you can query the database according to the searched product ID, and then display the product description information.

Common types

The class corresponding to field is org apache. lucene. document. Field, which implements org apache. lucene. index. Indexablefield interface, which represents a field used for indexing. Field class comparison
There are some at the bottom, so Lucene implements many Field subclasses for different scenarios.

The common types and usage of Field are shown in the following table:

Field typedata typeWord segmentationIndexStoreexplain
StringField(FieldName, FieldValue, Store.YES)character stringNYY/NString type Field, not a word segmentation, as an index (such as: ID number, order number), whether it needs to be stored by Store.YES or store No decision
TextField(FieldName, FieldValue, Store.NO)Text typeYYY/NThe text type is Field, word segmentation and index. Whether it needs to be stored is determined by store Yes or store No decision
LongField(FieldName, FieldValue, Store.YES) or LongPoint(String name, int... point), etcNumerical representationYYY/NIn Lucene 6.0, LongField is replaced with LongPoint, IntField with IntPoint, FloatField with FloatPoint, and DoubleField with DoublePoint. For numeric field indexes, the indexes are not stored. To store, combine StoredField.
StoredField(FieldName, FieldValue)Support multiple typesNNYBuild different types of fields without word segmentation, index and storage (e.g. product picture path)

Field application code

 @Test
    public void createIndex() throws Exception {
        // 1. Collect data
        Book  booka  = new Book();
        List<Book> bookList = new ArrayList<Book>();
        booka.setId(1);
        booka.setDesc("Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. The PyLucene sub project provides Python bindings for Lucene Core. ");
        booka.setName("Lucene");
        booka.setPrice(100.45f);
        bookList.add(booka);

        Book  bookb  = new Book();
        bookb.setId(11);
        bookb.setDesc("Solr is highly scalable, providing fully fault tolerant distributed indexing, search and analytics. It exposes Lucene's features through easy to use JSON/HTTP interfaces or native clients for Java and other languages. ");
        bookb.setName("Solr");
        bookb.setPrice(320.45f);
        bookList.add(bookb);
        Book  bookc  = new Book();
        bookc.setId(21);
        bookc.setDesc("The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.");
        bookc.setName("Hadoop");
        bookc.setPrice(620.45f);
        bookList.add(bookc);

        // 2. Encapsulate the collected data into the Document object
        List<Document> docList = new ArrayList<>();
        Document document;
        for (Book book : bookList) {
            document = new Document();
            // IntPoint word segmentation index does not store combined StoredField
            Field id = new IntPoint("id", book.getId());
            System.out.println(id.fieldType().tokenized() + ":" + id.fieldType().stored());
            Field id_v  = new StoredField("id", book.getId());
            // Word segmentation, index, storage TextField
            Field name = new TextField("name", book.getName(), Field.Store.YES);
            // Word segmentation, index, not stored but numeric, so use FloatPoint
            Field price = new FloatPoint("price", book.getPrice());
            // Word segmentation, index, do not store TextField
            Field desc = new TextField("desc",
                    book.getDesc(), Field.Store.YES);

            // Set the field field to the Document object

            document.add(id);
            document.add(id_v);
            document.add(name);
            document.add(price);
            document.add(desc);

            docList.add(document);
        }
        //3. Create Analyzer word splitter to segment documents
        Analyzer  analyzer  = new StandardAnalyzer();
        // Create Directory and IndexWriterConfig objects
        Directory  directory = FSDirectory.open(Paths.get("D:/lucene/index"));

        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);

        // 4. Create IndexWriter write object
        IndexWriter  indexWriter = new IndexWriter(directory,indexWriterConfig);

        // Add document object
        for (Document doc : docList) {
            indexWriter.addDocument(doc);
        }
        // Release resources
        indexWriter.close();
    }

Maintenance of index library

Index addition

Index addition is an index record that does not exist in the new index library. The code is as follows:

@Test
    public  void  indexCreate()throws  Exception{
        // Create word breaker
        Analyzer analyzer = new StandardAnalyzer();
        // Create Directory stream object
        Directory directory = FSDirectory.open(Paths.get("D:/lucene/index3"));
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        // Create index write object
        IndexWriter  indexWriter = new IndexWriter(directory,config);
        // Create Document
        Document  document = new Document();
        document.add(new TextField("id","1001", Field.Store.YES));
        document.add(new TextField("name","game", Field.Store.YES));
        document.add(new TextField("desc","one world one dream", Field.Store.YES));
        // Add document create index
        indexWriter.addDocument(document);
        indexWriter.close();
    }

index upgrade

When updating an index, delete it first and then add it. It is recommended to use this method for updating requirements. To ensure that the existing index is updated, you can check it first
Find out and confirm that the update record exists and perform the update operation.
If the target document object for updating the index does not exist, the addition is performed.

The code is as follows:

 @Test
    public  void  indexUpdate()throws  Exception{
        // Create word breaker
        Analyzer analyzer = new StandardAnalyzer();
        // Create Directory stream object
        Directory directory = FSDirectory.open(Paths.get("D:/lucene/index3"));
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        // Create index write object
        IndexWriter  indexWriter = new IndexWriter(directory,config);
        // Create Document
        Document  document = new Document();

        document.add(new TextField("id","1001", Field.Store.YES));
        document.add(new TextField("name","study hard", Field.Store.YES));
        document.add(new TextField("desc","What should I do when the game is over", Field.Store.YES));
        // to update
        indexWriter.updateDocument(new Term("name","game"),document);
        indexWriter.close();

    }

Index deletion

Delete the index according to the Term item, and all that meet the conditions will be deleted.

Example: indexwriter deleteDocuments(new Term("name", "game"));

Delete all

Clear all indexes in the index library:

indexWriter.deleteAll();

Source code

lucene index demo

summary

In this paper, we mainly studied the use of Field field Field in Lucene, several common Field types, and the addition, deletion and modification of index library.

more

More personal interviews, learning materials, and small buddy who want to push the big factory can contact me. Please pay attention to WeChat official account: programmer information station, reply to key words "interview" or "information" to get more learning materials, and reply to "push", I will help you push the big factory.

Keywords: Java Apache lucene

Added by lukevrn on Wed, 26 Jan 2022 01:56:09 +0200