django full-text retrieval using haystack

Using haystack to implement django full-text search engine

django is a powerful web framework for python.With some plug-ins, it's easy to add search capabilities to a web site.
Search engine uses whoosh, which is a full-text search engine implemented by pure python. It is compact and simple.
Chinese search requires Chinese word segmentation, using jieba.
Using whoosh directly in a django project requires attention to some basic details. With haystack, a search framework, it is easy to add search functions directly in django without paying attention to details such as indexing, search resolution, etc.
haystack supports a variety of search engines, not only whoosh, but also solr, elastic search, and other search engines. It can switch the engine directly without even modifying the search code.

Configure Search

1. Install packages

pip install django-haystack
pip install whoosh
pip install jieba

2. Configure settings for django

#Modify the settings.py file to add a haystack application:
INSTALLED_APPS = (
    ...
    'haystack', #Put haystack at the end
)

#Append the haystack configuration to settings:
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.whoosh_cn_backend.WhooshEngine',
        'PATH': os.path.join(BASE_DIR, 'whoosh_index'),
    }
}

Add this item, when the database changes, it will automatically update the index, very convenient

HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

3. Add url

#In urls.py for the entire project, configure the url path for search functionality:
urlpatterns = [
    ...
    url(r'^search/', include('haystack.urls')),
]

4. Add an index to the application directory

#In the subapplication directory, create a file named search_indexes.py.
from haystack import indexes
# Modify this for your own model
from models import GoodsInfo

#Modify this to name the class model class + Index, if the model class is GoodsInfo, then this class is GoodsInfoIndex
class GoodsInfoIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        # Modify this for your own model
        return GoodsInfo

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

//Explain:
1)Modify the three notes above
2)This file specifies how to index existing data. get_model , directly django In model If you put it in, you can index directly without paying attention to details such as database reading and indexing.
3)text=indexes.CharField One sentence specifies which fields in the model class will be indexed, and use_template=True Explain that we'll specify a template file to tell you which fields to use later

5. Specify index template file

Create a Model Class Name_text.txt file under the project's templates/search/indexes/application name/.
For example, if the model class name above is GoodsInfo, goodsinfo_text.txt (all lowercase) is created, which specifies which fields in the model are indexed and written as follows: (Modify the Chinese language only, do not change the object)
{{object. field 1}}
{{object. field 2}}
{{object. field 3}}

6. Specify search results page

#Under templates/search/create a search.html page.
<!DOCTYPE html>
<html>
<head>
    <title></title>
</head>
<body>
{% if query %}
    <h3>The search results are as follows:</h3>
    {% for result in page.object_list %}
        <a href="/{{ result.object.id }}/">{{ result.object.gName }}</a><br/>
    {% empty %}
        <p>Nothing found</p>
    {% endfor %}

    {% if page.has_previous or page.has_next %}
        <div>
            {% if page.has_previous %}<a href="?q={{ query }}&amp;page={{ page.previous_page_number }}">{% endif %}&laquo; Previous page{% if page.has_previous %}</a>{% endif %}
        |
            {% if page.has_next %}<a href="?q={{ query }}&amp;page={{ page.next_page_number }}">{% endif %}next page &raquo;{% if page.has_next %}</a>{% endif %}
        </div>
    {% endif %}
{% endif %}
</body>
</html>

7. Use a Chinese word splitter in jieba

#stay haystack Under the installation folder of, such as "/home/python/.virtualenvs/django_py2/lib/python2.7/site-#packages/haystack/backends), create a file named ChineseAnalyzer.py and write the following:
import jieba
from whoosh.analysis import Tokenizer, Token


class ChineseTokenizer(Tokenizer):
    def __call__(self, value, positions=False, chars=False,
                 keeporiginal=False, removestops=True,
                 start_pos=0, start_char=0, mode='', **kwargs):
        t = Token(positions, chars, removestops=removestops, mode=mode,
                  **kwargs)
        seglist = jieba.cut(value, cut_all=True)
        for w in seglist:
            t.original = t.text = w
            t.boost = 1.0
            if positions:
                t.pos = start_pos + value.find(w)
            if chars:
                t.startchar = start_char + value.find(w)
                t.endchar = start_char + value.find(w) + len(w)
            yield t


def ChineseAnalyzer():
    return ChineseTokenizer()

8. Switch whoosh back end to Chinese word breaker

#Copy the whoosh_backend.py file in the backends directory above, named whoosh_cn_backend.py, and open the file to replace it:
 #To introduce the Chinese participle just added at the top
from .ChineseAnalyzer import ChineseAnalyzer 

#Throughout the py file, find
analyzer=StemmingAnalyzer()
#Change all to
analyzer=ChineseAnalyzer()
#There are about two or three places altogether.

9. Generate Index

#Manually generate an index:
python manage.py rebuild_index

10. Implement Search Entry

#Add a search box to the page:
<form method='get' action="/search/" target="_blank">
    <input type="text" name="q">
    <input type="submit" value="query">
</form>
#Rich customizations
 #Above is just a quick basic search engine, haystack has more customizable features to meet your personalized needs.
#Refer to the official documentation: http://django-haystack.readth...
#Custom search view
 #In the configuration above, search-related requests are imported into haystack.urls, and you can modify them if you want to customize the view of the search to achieve more functionality.
#haystack.urls is really simple,
from django.conf.urls import url  
from haystack.views import SearchView  
  
urlpatterns = [  
    url(r'^$', SearchView(), name='haystack_search'),  
]  

#Well, let's write a view, inherit from SearchView, and import the url of the search into the custom view.
class MySearchView(SearchView):
#Override related variables or methods
template = 'search_result.html'

Look at the source code or documentation of the SearchView and see what each method does, so you can modify it accordingly.
For example, it overrides the template variable and modifies the location of the search results page template.

Highlight

#In the template of the search results page, you can use the highlight tag (you need to load it first)
{% highlight <text_block> with <query> [css_class "class_name"] [html_tag "span"] [max_length 200] %}

text_block is the entire text, query is the highlighted keyword, followed by optional parameters that define the html tag of the highlighted keyword, the css class name, and the maximum length of the entire highlighted part.
The source code for highlighting is located in the files haystack/templatetags/lighlight.py and haystack/utils/lighlighting.py, which can be copied and modified to achieve custom highlighting.

Keywords: Django Python pip Database

Added by trufla on Thu, 01 Aug 2019 04:27:55 +0300