This article discusses how RedisSearch operates through java.
RedisSearch is a search tool. When searching, it will first segment the content to be searched, and it will also segment words when creating an index. For English, word segmentation is relatively simple. Basically, spaces and punctuation marks are OK, but Chinese word segmentation is relatively complex, because Chinese can't carry out simple word segmentation through spaces.
Now there are various Chinese word splitters, such as jieba, which has been used by RedisSearch: friso.
friso can be found on gitee: https://gitee.com/lionsoul/friso
For the specific use of friso, please refer to the introduction in gitee.
Before I use it, I compare friso with jieba, and find that the effect of word segmentation is a little worse than jieba. There is no disrespect to the author here. Moreover, friso is only in maintenance status at present. The version has not been updated for five years. The main author of friso is maintaining a new word splitter. If you are interested, you can take a step-by-step look:
https://gitee.com/lionsoul/jcseg#jcseg%E6%98%AF%E4%BB%80%E4%B9%88
1. Customize Friso Thesaurus
Because I want to do Chinese word segmentation, the default dictionary of friso is not suitable, so I need to customize the dictionary.
RedisSearch packages friso directly. If you want to customize friso, you can only change the initialization configuration of friso.
You can see the default configuration first:
127.0.0.1:6379> FT.CONFIG get *
1) 1) EXTLOAD
2) (nil)
2) 1) SAFEMODE
2) true
3) 1) CONCURRENT_WRITE_MODE
2) false
4) 1) NOGC
2) false
5) 1) MINPREFIX
2) 2
6) 1) FORKGC_SLEEP_BEFORE_EXIT
2) 0
7) 1) MAXDOCTABLESIZE
2) 1000000
8) 1) MAXSEARCHRESULTS
2) 1000000
9) 1) MAXAGGREGATERESULTS
2) unlimited
10) 1) MAXEXPANSIONS
2) 200
11) 1) MAXPREFIXEXPANSIONS
2) 200
12) 1) TIMEOUT
2) 500
13) 1) INDEX_THREADS
2) 8
14) 1) SEARCH_THREADS
2) 20
15) 1) FRISOINI
2) nil
16) 1) ON_TIMEOUT
2) return
17) 1) GCSCANSIZE
2) 100
18) 1) MIN_PHONETIC_TERM_LEN
2) 3
19) 1) GC_POLICY
2) fork
20) 1) FORK_GC_RUN_INTERVAL
2) 30
21) 1) FORK_GC_CLEAN_THRESHOLD
2) 100
22) 1) FORK_GC_RETRY_INTERVAL
2) 5
23) 1) _MAX_RESULTS_TO_UNSORTED_MODE
2) 1000
24) 1) UNION_ITERATOR_HEAP
2) 20
25) 1) CURSOR_MAX_IDLE
2) 300000
26) 1) NO_MEM_POOLS
2) false
27) 1) PARTIAL_INDEXED_DOCS
2) false
28) 1) UPGRADE_INDEX
2) Upgrade config for upgrading
29) 1) _NUMERIC_COMPRESS
2) false
30) 1) _PRINT_PROFILE_CLOCK
2) true
31) 1) RAW_DOCID_ENCODING
2) false
32) 1) _NUMERIC_RANGES_PARENTS
2) 0
The 15th configuration item is the configuration of friso, which is empty by default.
If you want to change, I can find two methods at present, but I tried the first:
1.1 type I:
When redis starts, add parameter configuration as follows:
redis-server --loadmodule /usr/lib/redis/modules/redisearch.so FRISOINI /home/friso.ini
The command can be put into Dockerfile and the initialization file and dictionary of cp friso in the file:
FROM redislabs/redisearch:latest
MAINTAINER qzh "qiaozh2006@126.com"
WORKDIR /opt/
ADD friso.ini /home/
ADD friso_dict /home/
EXPOSE 6379
ENTRYPOINT ["redis-server", "--loadmodule", "/usr/lib/redis/modules/redisearch.so","FRISOINI", "/home/friso.ini"]
friso.ini file can be obtained from gitee. You only need to change the dictionary path
friso.lex_dir = /home/vendors/dict/UTF-8/
friso_ The content structure of dict folder is:
friso_dict
-vendors
--Makefile.am
--dict
---Makefile.am
---GBK
---UTF-8
----friso.lex.ini
----lex-placename.lex
lex-placename.lex is a user-defined dictionary, which mainly includes place names in the file FRISO lex. Add custom dictionary to ini:
__LEX_CJK_WORDS__ :[
lex-main.lex;
lex-admin.lex;
lex-chars.lex;
lex-cn-mz.lex;
lex-cn-place.lex;
lex-company.lex;
lex-festival.lex;
lex-flname.lex;
lex-food.lex;
lex-lang.lex;
lex-nation.lex;
lex-net.lex;
lex-org.lex;
lex-touris.lex;
lex-placename.lex;
# add more here
]
1.2 the second:
You can load modules and related configurations through redis configuration files. However, this method is only available online. I haven't tried it. I'll try again when I'm free.
After the custom dictionary is configured, package and run it, and then look at the configuration of friso in RedisSearch
15) 1) FRISOINI
2) /home/friso.ini
2. Java operation instance
2.1 environment configuration
jedis is still used, and the related configuration is the same. You can find it in the first article: Redisjason and RedisSearch (I)_ Six dogs back blog - CSDN blog
2.2 POJO definition
Define province:
package com.redisStream.pojo.address; import lombok.Getter; import lombok.Setter; import java.util.List; @Getter @Setter public class Province { private String provinceName; private String provincePinyin; private List<City> cityList ; }
Define city:
package com.redisStream.pojo.address; import com.fasterxml.jackson.annotation.JsonFormat; import lombok.Getter; import lombok.Setter; import java.util.List; @Getter @Setter public class City { private String cityName; private List<County> countyList; private String cityPinyin; //"geoinfo":-122.064228,37.377658 private String geoinfo; }
Define count:
package com.redisStream.pojo.address; import com.fasterxml.jackson.annotation.JsonFormat; import lombok.Getter; import lombok.Setter; import java.util.List; @Getter @Setter public class County { private String countyName; private String countyPinyin; private List<String> attributes; }
2.3 index creation
package com.redisStream.utils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import redis.clients.jedis.UnifiedJedis; import redis.clients.jedis.search.FieldName; import redis.clients.jedis.search.IndexDefinition; import redis.clients.jedis.search.IndexOptions; import redis.clients.jedis.search.Schema; import javax.annotation.PostConstruct; import java.lang.reflect.Field; import java.util.Map; @Component public class RedisSearchUtils { private static final Logger log = LoggerFactory.getLogger(RedisSearchUtils.class); @Autowired private UnifiedJedis jedis; private String prefix = "$."; @PostConstruct private void init(){ createIndex("place-index","place:", new String[]{"provinceName","cityList[*].cityName","cityList[*].geoinfo","cityList[*].countyList[*].countyName"}); } public boolean createIndex(String indexName, String key, String... fields){ try { try{ Map<String, Object> map = jedis.ftInfo(indexName); log.info("index configuration:{}",map); jedis.ftDropIndex(indexName); } catch (Exception e){ log.error("the index does not exist", e); } Schema schema = new Schema(); float weight = 1.0f; for(String field : fields) { String attribute; if (StringUtils.isNoneBlank(field)) { if (field.indexOf(".") == -1) { attribute = field; } else { String[] fieldSplit = field.split("\\."); attribute = fieldSplit[fieldSplit.length - 1]; } if (attribute.toLowerCase().startsWith("geo")) { Schema.Field field1 = new Schema.Field(FieldName.of(prefix + field).as(attribute), Schema.FieldType.GEO); //schema.addGeoField(prefix + field); schema.addField(field1); continue; } else { Schema.TextField textField = new Schema.TextField(FieldName.of(prefix + field).as(attribute), weight, false, false, false, null); schema.addField(textField); weight *= 3; continue; } } } IndexDefinition rule = new IndexDefinition(IndexDefinition.Type.JSON).setLanguage("chinese") .setPrefixes(new String[]{key}); jedis.ftCreate(indexName, IndexOptions.defaultOptions().setDefinition(rule), schema); return true; } catch (Exception e){ log.error("create redis search index failed", e); return false; } } }
2.4 defining Controller
2.4.1 adding data
package com.redisStream.controller; import com.alibaba.fastjson.JSON; import com.redisStream.pojo.address.Province; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RestController; import redis.clients.jedis.UnifiedJedis; import redis.clients.jedis.json.Path2; import redis.clients.jedis.search.Document; import redis.clients.jedis.search.Query; import redis.clients.jedis.search.SearchResult; import java.util.HashMap; import java.util.List; import java.util.Map; @RestController public class PlaceController { private static final Logger log = LoggerFactory.getLogger(PlaceController.class); @Autowired private UnifiedJedis jedis; private String key_prefix = "place:"; @PostMapping("/addProvince") public String addProvince(@RequestBody Province newKeyInfo) throws Exception{ jedis.jsonSet(key_prefix + newKeyInfo.getProvinceName(), JSON.toJSONString(newKeyInfo)); return JSON.toJSONString(jedis.jsonGet(key_prefix + newKeyInfo.getProvinceName())); }
Test it and send a request to add data:
POST http://localhost:8081/addforProvince
{
"provinceName": "Hebei Province",
"provincePinyin": "hebeisheng",
"cityList": [{
"cityName": "Zhangjiakou City",
"cityPinyin": "zhangjiakoushi",
"geoinfo": "115.408848,40.970239",
"countyList": [{
"countyName": "Chongli county",
"countyPinyin": "chonglixian",
"attributes": ["ski resort", "mountain"
}]
}]
}
Return value:
{"cityList":[{"cityName": "Zhangjiakou City", "cityPinyin":"zhangjiakoushi","countyList":[{"attributes": ["ski resort", "Gaoshan"], "countyName": "Chongli county", "countyPinyin":"chonglixian"}],"geoinfo":"115.408848, 40.970239"}],"provinceName": "Hebei Province", "provincePinyin":"hebeisheng"}
2.4.2 search a province
@PostMapping("/queryforProvince") public Map<String, String> queryProvince(@RequestBody String keyword) throws Exception { Query q = new Query("@provinceName:" + keyword); SearchResult result = jedis.ftSearch(indexName,q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
Send request: POST http://localhost:8081/queryforProvince
Hebei Province
The response after sending is null.
Why is it empty? I tried several ways, and finally found that when creating the index, just remove the geo.
Can't you create geo type index es at the same time? I checked the official website and didn't find the relevant instructions. I think maybe jsonpath made a mistake in this place, but I looked through it JSONPath - XPath for JSON , I tried several times, but there was no other way.
There was no choice but to split the index into two parts.
createIndex("place-index","place:", new String[]{"provinceGeoInfo", "provinceName","cityList.cityName","cityList.countyList.countyName"});
createIndex("place-geo-index","place:", new String[]{"cityList[*].geoinfo"});
Who knows how to do it? Please comment and reply.
After the index is changed, just try again.
createIndex("place-index","place:", new String[]{"provinceName","cityList[*].cityName","cityList[*].countyList[*].countyName"}); createIndex("place-geo-index","place:", new String[]{"provinceName","cityList[*].geoinfo"});
2.4.3 search a city
@PostMapping("/queryforCity") public Map<String, String> queryCity(@RequestBody String keyword) throws Exception { Query q = new Query("@cityName:" + keyword); SearchResult result = jedis.ftSearch(indexName,q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
It is the same as obtaining provinces, except that when generating Query, the specified field is different. Experiment:
POST http://localhost:8081/queryforCity
{Zhangjiakou City}
2.4.4 full text search
@PostMapping("/queryforAddrall") public Map<String, String> queryAddrALl(@RequestBody String keyword) throws Exception { Query q = new Query(keyword); SearchResult result = jedis.ftSearch("place-index",q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
It's time to search Hebei province or Zhangjiakou City. You can find this interface:
2.4.5 geographic location filtering
@PostMapping("/queryforgeo") public Map<String, String> queryGeo(@RequestBody GEOQueryBody body) throws Exception { Query q = new Query(body.getName()); if(StringUtils.isNoneBlank(body.getGeoinfo())) { String[] geo = body.getGeoinfo().split(","); q.addFilter(new Query.GeoFilter("geoinfo", Double.parseDouble(geo[0].trim()), Double.parseDouble(geo[1].trim()), Double.parseDouble(geo[2].trim()), Query.GeoFilter.KILOMETERS)); } SearchResult result = jedis.ftSearch("place-geo-index",q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
GEOQueryBody is also very simple:
public class GEOQueryBody { private String name; private String geoinfo; }
It should be noted that the first parameter property of GEOFilter can be written as follows:
q.addFilter(new Query.GeoFilter("geoinfo", Double.parseDouble(geo[0].trim()), Double.parseDouble(geo[1].trim()), Double.parseDouble(geo[2].trim()), Query.GeoFilter.KILOMETERS));
When defining index, if as is not used to get the alias, it should be written as jsonpath path, for example:
q.addFilter(new Query.GeoFilter("$.geoinfo", Double.parseDouble(geo[0]), Double.parseDouble(geo[1]), Double.parseDouble(geo[2]), Query.GeoFilter.KILOMETERS));
Send a request to try:
If the longitude and latitude are changed, for example:
{
"name": "Hebei Province",
"geoinfo":"125.408848,40.970239,20"
}
The search result is empty.
2.4.6 adding city information
@PostMapping("/addforCity") public String addCity(@RequestBody Province newKeyInfo) throws Exception{ Path2 path = new Path2("$." + ".cityList"); jedis.jsonArrAppend(key_prefix + newKeyInfo.getProvinceName(), path, JSON.toJSONString(newKeyInfo.getCityList().get(0))); return JSON.toJSONString(jedis.jsonGet(key_prefix + newKeyInfo.getProvinceName())); }
Add the following:
{
"provinceName": "Hebei Province",
"geoinfoprovince": "125.1111,33.2222",
"provincePinyin": "hebeisheng",
"cityList": [{
"cityName": "Shijiazhuang city",
"cityPinyin": "shijiazhuangshi",
"geoinfo": "125.408848,41.970239",
"countyList": [{
"countyName": "Zhengding County",
"countyPinyin": "chonglixian",
"attributes": ["Zhengding", "famous historical city"]
}]
}]
}
The returned result is:
{"cityList":[{"cityName": "Zhangjiakou City", "cityPinyin":"zhangjiakoushi","countyList":[{"attributes": ["ski resort", "Gaoshan"], "countyName": "Chongli county", "countyPinyin":"chonglixian"}],"geoinfo":"115.408848,40.970239"},{"cityName": "Shijiazhuang city", "cityPinyin":"shijiazhuangshi","countyList":[{"attributes": ["Zhengding", "famous historical city"], "countyName": "Zhengding County", "countypinyin": "chonglixian"}], "GeoInfo": "125.408848,41.970239"}], "geoinfoservice": "125.1111,33.2222", "provincename": "Hebei Province", "provincePinyin":"hebeisheng"}
2. Application restrictions
Although some have not been found out, I can basically find a solution, but when I try to index jsonarray and add multiple list contents, I can't search. That's the above example. If I add another Shijiazhuang City, and then do the same search, I will return empty results. After tossing for a long time, I found this sentence on the official website:
JSON arrays can only be indexed in a TAG field.
In other words, if it is text type, it is not supported to create jsonarray type.
Xinsai, I hope I'm wrong.