PostgreSQL Billion-Level Fuzzy Query Best Practice

Label

PostgreSQL, Fuzzy Query, Regular Query, pg_trgm, bytea, gin, Functional Index

background

Pre-fuzziness (with prefix fuzziness), post-fuzziness (with suffix fuzziness), pre-and post-fuzziness (without prefix fuzziness), and regular matching are common requirements in the field of text search.

PostgreSQL has a strong text search capability. Besides supporting full-text search, it also supports fuzzy query and regular query. The built-in pg_trgm plug-in is not available in general databases and may not have been heard of by many people. At the same time, the functions of expression index and GIN index are built-in.

Different fuzzy query requirements have different optimization methods.

PostgreSQL, like other databases, can be accelerated with btree for forward and backward ambiguities. Post-ambiguity can be accelerated by using functional index of inversion function.

For forward and backward fuzziness and regular matching, one method is to use pg_trgm plug-in and use GIN index to speed up the fuzziness and regular query (the effect of inputting three or more characters is very good). Another method is to customize GIN expression index, which is suitable for customized fuzzy query.

I. Optimization of Front-Fuzzy and Back-Fuzzy

1. Pre-Fuzzy (Pre-Fuzzy) Optimization Method

Pre-ambiguous queries can be supported with b-tree. Only suitable for collate="C" queries, when the default database lc_collate<>C, index and query need to specify collate "C" explicitly.

The collate s of indexes and query conditions must be consistent in order to use indexes.

Example

test=# create table test(id int, info text);    
CREATE TABLE    
test=# insert into test select generate_series(1,1000000),md5(random()::text);    
INSERT 0 1000000    
test=# create index idx on test(info collate "C");    
CREATE INDEX    
    
test=# explain (analyze,verbose,timing,costs,buffers) select * from test where info like 'abcd%' collate "C";    
                                                      QUERY PLAN                                                          
----------------------------------------------------------------------------------------------------------------------    
 Index Scan using idx on public.test  (cost=0.42..16.76 rows=100 width=37) (actual time=0.057..0.093 rows=18 loops=1)    
   Output: id, info    
   Index Cond: ((test.info >= 'abcd'::text) AND (test.info < 'abce'::text))    
   Filter: (test.info ~~ 'abcd%'::text COLLATE "C")    
   Buffers: shared hit=18 read=3    
 Planning time: 0.424 ms    
 Execution time: 0.124 ms    
(7 rows)

2. Optimizing Method of Postfuzzy (Fuzzy with Suffix)

Using reverse function index, post-ambiguous queries can be supported. Only suitable for collate="C" queries, when the default database lc_collate<>C, index and query need to specify collate "C" explicitly.

The collate s of indexes and query conditions must be consistent in order to use indexes.

Example

test=# create index idx1 on test(reverse(info) collate "C");    
CREATE INDEX    
test=# select * from test limit 1;    
 id |               info                   
----+----------------------------------    
  1 | b3275976cdd437a033d4329775a52514    
(1 row)    
    
test=# explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) like '4152%' collate "C";    
                                                        QUERY PLAN                                                            
--------------------------------------------------------------------------------------------------------------------------    
 Index Scan using idx1 on public.test  (cost=0.42..4009.43 rows=5000 width=37) (actual time=0.061..0.097 rows=18 loops=1)    
   Output: id, info    
   Index Cond: ((reverse(test.info) >= '4152'::text) AND (reverse(test.info) < '4153'::text))    
   Filter: (reverse(test.info) ~~ '4152%'::text COLLATE "C")    
   Buffers: shared hit=18 read=3    
 Planning time: 0.128 ms    
 Execution time: 0.122 ms    
(7 rows)    
    
test=# select * from test where reverse(info) like '4152%' collate "C";    
   id   |               info                   
--------+----------------------------------    
 847904 | abe2ecd90393b5275df8e34a39702514    
 414702 | 97f66d26545329321164042657d02514    
 191232 | 7820972c6220c2b01d46c11ebb532514    
 752742 | 93232ac39c6632e2540df44627c42514    
 217302 | 39e518893a1a7b1e691619bd1fc42514    
      1 | b3275976cdd437a033d4329775a52514    
 615718 | 4948f94c484c13dc6c4fae8a3db52514    
 308815 | fc2918ceff7c7a4dafd2e04031062514    
 149521 | 546d963842ea5ca593e622c810262514    
 811093 | 4b6eca2eb6b665af67b2813e91a62514    
 209000 | 1dfd0d4e326715c1739f031cca992514    
 937616 | 8827fd81f5b673fb5afecbe3e11b2514    
 419553 | bd6e01ce360af16137e8b6abc8ab2514    
 998324 | 7dff51c19dc5e5d9979163e7d14c2514    
 771518 | 8a54e30003a48539fff0aedc73ac2514    
 691566 | f90368348e3b6bf983fcbe10db2d2514    
 652274 | 8bf4a97b5f122a5540a21fa85ead2514    
 233437 | 739ed715fc203d47e37e79b5bcbe2514    
(18 rows)

3. Integrative optimization method of forward and backward ambiguity

Using the pg_trgm index, you can support forward and backward ambiguous queries. Note:

(Fuzzy with prefix) Enter at least one character (Fuzzy with suffix) and at least two characters. Only good index filtering effect. To support Chinese, the database lc_collate and lc_ctype cannot be "C".

The collate s of indexes and query conditions must be consistent in order to use indexes.

test=# \l+ test    
                                              List of databases    
 Name |  Owner   | Encoding |  Collate   |   Ctype    | Access privileges |  Size  | Tablespace | Description     
------+----------+----------+------------+------------+-------------------+--------+------------+-------------    
 test | postgres | UTF8     | zh_CN.utf8 | zh_CN.utf8 |                   | 245 MB | pg_default |     
(1 row)    
    
test=# create extension pg_trgm;    
    
test=# create table test001(c1 text);    
CREATE TABLE

Functions for Generating Random Chinese Strings

test=# create or replace function gen_hanzi(int) returns text as $$                
declare      
  res text;      
begin      
  if $1 >=1 then      
    select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1);      
    return res;      
  end if;      
  return null;      
end;      
$$ language plpgsql strict;      
CREATE FUNCTION

Generating Random Data

test=# insert into test001 select gen_hanzi(20) from generate_series(1,100000);    
INSERT 0 100000    
    
test=# create index idx_test001_1 on test001 using gin (c1 gin_trgm_ops);    
CREATE INDEX    
    
test=# select * from test001 limit 5;    
                    c1                        
------------------------------------------    
 Noise-provoking Office of Xing He's flourishing and eloquent.    
 It's a good idea to be very skillful and well-organized at the same time. It's a good idea to be a good friend of younger generations and to be a good friend of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations of younger generations.    
 The two sorrows of Jiazhao Peng and Zheng Zheng violate the law.    
 Swimming, crashing and penetrating, and staring at each other.    
 Publishing and punishing the dragon in Qiaopu is full of vigor and vigour.    
(5 rows)

Fuzzy Query

test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'you%';    
                                                      QUERY PLAN                                                           
-----------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=5.08..15.20 rows=10 width=61) (actual time=0.030..0.034 rows=3 loops=1)    
   Output: c1    
   Recheck Cond: (test001.c1 ~~ 'you%'::text)    
   Heap Blocks: exact=3    
   Buffers: shared hit=7    
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..5.08 rows=10 width=0) (actual time=0.020..0.020 rows=3 loops=1)    
         Index Cond: (test001.c1 ~~ 'you%'::text)    
         Buffers: shared hit=4    
 Planning time: 0.119 ms    
 Execution time: 0.063 ms    
(10 rows)    
    
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'% skeleton';    
                                                      QUERY PLAN                                                           
-----------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=5.08..15.20 rows=10 width=61) (actual time=0.031..0.034 rows=1 loops=1)    
   Output: c1    
   Recheck Cond: (test001.c1 ~~ '%Chinese style'::text)    
   Rows Removed by Index Recheck: 1    
   Heap Blocks: exact=2    
   Buffers: shared hit=6    
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..5.08 rows=10 width=0) (actual time=0.020..0.020 rows=2 loops=1)    
         Index Cond: (test001.c1 ~~ '%Chinese style'::text)    
         Buffers: shared hit=4    
 Planning time: 0.136 ms    
 Execution time: 0.062 ms    
(11 rows)

2. Fuzzy optimization

pg_trgm plug-in is used to support ambiguous queries. Be careful:

If pg_trgm is needed to support Chinese fuzzy queries, the database lc_collate and lc_ctype cannot be "C".

For example, input three or more characters, otherwise the effect is not good.

Example

test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'% Xinxing He%';    
                                                      QUERY PLAN                                                           
-----------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=5.08..15.20 rows=10 width=61) (actual time=0.038..0.038 rows=1 loops=1)    
   Output: c1    
   Recheck Cond: (test001.c1 ~~ '%Xin Xinghe%'::text)    
   Heap Blocks: exact=1    
   Buffers: shared hit=5    
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..5.08 rows=10 width=0) (actual time=0.025..0.025 rows=1 loops=1)    
         Index Cond: (test001.c1 ~~ '%Xin Xinghe%'::text)    
         Buffers: shared hit=4    
 Planning time: 0.170 ms    
 Execution time: 0.076 ms    
(10 rows)    
    
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'% Xing%';    
                                                              QUERY PLAN                                                                  
--------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=7615669.08..7615679.20 rows=10 width=61) (actual time=147.524..178.232 rows=1 loops=1)    
   Output: c1    
   Recheck Cond: (test001.c1 ~~ '%Xing Xing%'::text)    
   Rows Removed by Index Recheck: 99999    
   Heap Blocks: exact=1137    
   Buffers: shared hit=14429    
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..7615669.08 rows=10 width=0) (actual time=147.377..147.377 rows=100000 loops=1)    
         Index Cond: (test001.c1 ~~ '%Xing Xing%'::text)    
         Buffers: shared hit=13292    
 Planning time: 0.133 ms    
 Execution time: 178.265 ms    
(11 rows)

3. Optimization of Regular Matching

At present, the effect of pg_trgm on the regular matching of Chinese (or multi-byte characters) is not good, and that of ascii characters is good.

Rows Removed by Index Recheck is very small, indicating that the index is very filterable.

Example

test=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 ~ '12[0-9]{3,9}';    
                                                       QUERY PLAN                                                           
------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=65.08..75.20 rows=10 width=61) (actual time=0.196..0.196 rows=0 loops=1)    
   Output: c1    
   Recheck Cond: (test001.c1 ~ '12[0-9]{3,9}'::text)    
   Rows Removed by Index Recheck: 1    
   Heap Blocks: exact=1    
   Buffers: shared hit=50    
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..65.08 rows=10 width=0) (actual time=0.183..0.183 rows=1 loops=1)    
         Index Cond: (test001.c1 ~ '12[0-9]{3,9}'::text)    
         Buffers: shared hit=49    
 Planning time: 0.452 ms    
 Execution time: 0.221 ms    
(11 rows)    
    
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 ~'Xinxing He';    
                                                              QUERY PLAN                                                                  
--------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=7615669.08..7615679.20 rows=10 width=61) (actual time=143.846..232.278 rows=1 loops=1)    
   Output: c1    
   Recheck Cond: (test001.c1 ~ 'Xin Xinghe'::text)    
   Rows Removed by Index Recheck: 99999    
   Heap Blocks: exact=1137    
   Buffers: shared hit=14429    
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..7615669.08 rows=10 width=0) (actual time=143.688..143.688 rows=100000 loops=1)    
         Index Cond: (test001.c1 ~ 'Xin Xinghe'::text)    
         Buffers: shared hit=13292    
 Planning time: 0.254 ms    
 Execution time: 232.312 ms    
(11 rows)

Principle of pg_trgm Fuzzy Query

First, pg_trgm adds two spaces to the front end of the string and one space to the end.

Then, each consecutive three characters is a TOKEN, which is disassembled.

Finally, GIN inverted index is established for TOKEN.

To view the TOKEN of a string, you can use the following method.

test=# select show_trgm('123');    
        show_trgm            
-------------------------    
 {"  1"," 12",123,"23 "}    
(1 row)

Reasons for Requirement of Number of Fuzzy Characters around pg_trgm

When using pg_trgm, if you want to get the best results, you'd better satisfy these conditions.

1. Fuzzy queries with prefixes, such as a%, require at least one character. (The search is for token ='a')

2. Fuzzy queries with suffixes, such as% ab, need to provide at least two characters. (The search is for token='ab')

3. Fuzzy queries, such as% abcd, need to provide at least three characters. (This uses an array search for token(s) containing {"a", "ab", "abc", "bcd"}

Why?

Because the TOKEN generated by pg_trgm is three characters, only under the above three conditions can the corresponding TOKEN be matched.

test=# select show_trgm('123');    
        show_trgm            
-------------------------    
 {"  1"," 12",123,"23 "}    
(1 row)

4. Optimization of Fuzzy Query with Less than 3 Input Characters

pg_trgm can't satisfy the requirement when we need to fuzzily search one or two characters before and after, but we can use the expression GIN index.

Using expressions, the string is split into a single word, two consecutive character arrays, and the GIN index can be established for logarithmic groups.

Example

test=# create or replace function split001(text) returns text[] as $$    
declare    
  res text[];    
begin    
  select regexp_split_to_array($1,'') into res;    
  for i in 1..length($1)-1 loop    
    res := array_append(res, substring($1,i,2));    
  end loop;    
  return res;    
end;    
$$ language plpgsql strict immutable;    
CREATE FUNCTION    
    
test=# create index idx_test001_2 on test001 using gin (split001(c1));    
    
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where split001 (c1) @ > array ['hello'];    
                                                       QUERY PLAN                                                           
------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=8.87..550.12 rows=500 width=61) (actual time=0.041..0.041 rows=0 loops=1)    
   Output: c1    
   Recheck Cond: (split001(test001.c1) @> '{Hello}'::text[])    
   Buffers: shared hit=4    
   ->  Bitmap Index Scan on idx_test001_2  (cost=0.00..8.75 rows=500 width=0) (actual time=0.039..0.039 rows=0 loops=1)    
         Index Cond: (split001(test001.c1) @> '{Hello}'::text[])    
         Buffers: shared hit=4    
 Planning time: 0.104 ms    
 Execution time: 0.068 ms    
(9 rows)    
    
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where split001 (c1) @ > array ['you'];    
                                                       QUERY PLAN                                                            
-------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test001  (cost=8.87..550.12 rows=500 width=61) (actual time=0.063..0.183 rows=86 loops=1)    
   Output: c1    
   Recheck Cond: (split001(test001.c1) @> '{you}'::text[])    
   Heap Blocks: exact=80    
   Buffers: shared hit=84    
   ->  Bitmap Index Scan on idx_test001_2  (cost=0.00..8.75 rows=500 width=0) (actual time=0.048..0.048 rows=86 loops=1)    
         Index Cond: (split001(test001.c1) @> '{you}'::text[])    
         Buffers: shared hit=4    
 Planning time: 0.101 ms    
 Execution time: 0.217 ms    
(10 rows)    
    
test=# Select * from test001 where split001 (c1) @ > array ['you'];    
                    c1                        
------------------------------------------    
 //The great and dirty breasts of Hong Xian and Xian Xian collapsed and your captain collapsed in the vicinity of Hong Xian and Xian Xian.and your captain's captain's captain's captain's captain's captain's captain's captain's captain's captain's captain's captain's captain's captain's captain's Captain    
 //With a warm face, the eel always accompanies you.    
......

V. Similar Query Optimization

Fuzzy query and regular matching are both to find records that are fully qualified. Another requirement is similar query.

For example, the postgresql string, input p0stgresgl, can also be matched according to similarity.

Example

test=# create index idx_test001_3 on test001 using gist (c1 gist_trgm_ops);    
CREATE INDEX    
      
test=# Explain (analysis, verbose, timing, costs, buffers) SELECT, C1 <->'cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach cockroach as'AS dist      
  FROM test001 t      
  ORDER BY dist LIMIT 5;      
                                                                   QUERY PLAN                                                                        
-------------------------------------------------------------------------------------------------------------------------------------------------    
 Limit  (cost=0.28..0.52 rows=5 width=89) (actual time=37.462..37.639 rows=5 loops=1)    
   Output: t.*, ((c1 <-> 'Bromoconaze in the milk of cockroach cockroach'::text))    
   Buffers: shared hit=1631    
   ->  Index Scan using idx_test001_3 on public.test001 t  (cost=0.28..4763.28 rows=100000 width=89) (actual time=37.461..37.636 rows=5 loops=1)    
         Output: t.*, (c1 <-> 'Bromoconaze in the milk of cockroach cockroach'::text)    
         Order By: (t.c1 <-> 'Bromoconaze in the milk of cockroach cockroach'::text)    
         Buffers: shared hit=1631    
 Planning time: 0.089 ms    
 Execution time: 37.668 ms    
(9 rows)    
    
test=# SELECT, C1 <->'Cockroach gills'AS dist      
  FROM test001 t      
  ORDER BY dist LIMIT 5;      
                     t                      |   dist       
--------------------------------------------+----------    
 (The gills of the cockroaches tell you that the milk of the cockroaches is bromoconaze.) | 0.307692    
 (Coral peas, cups, cups and lead batteries are in full swing.) | 0.976744    
 (He intimidated and praised the toughness of the cuttlefish.) | 0.976744    
 (There is a great deal of anxiety in the world.) | 0.976744    
 (The shield of the Yuanxia Gorge bears the burden of the valley, the pulse, and the parochial silence.) | 0.976744    
(5 rows)

Summary

1. If there are only pre-Ambiguous query requirements, use collate "C" b-tree index.

2. If there are only post-ambiguous query requirements, use the b-tree index of the reverse() expression of collate "C".

3. If you have ambiguous query requirements and include Chinese, please use lc_collate, lc_ctype <> "C" database, and use gin index of pg_trgm plug-in.

4. If you have ambiguous query requirements and do not include Chinese, please use gin index of pg_trgm plug-in.

5. If there is a query requirement for regular expressions containing ascii characters, use the gin index of the pg_trgm plug-in.

6. If there is a need for fuzzy query with input conditions less than 3 characters, we can use GIN expression index to search by array inclusion, and the performance is as good as before.

7. Performance

There are 100 million records, 15 random Chinese characters per record, a total of 8 GB. Fuzzy query performance before and after testing.

1. Generating test data

vi test.sql    
insert into test001 select gen_hanzi(15) from generate_series(1,2500000);    
    
pgbench -n -r -P 1 -f ./test.sql -c 40 -j 40 -t 1 test    
  
  
  
test=# select count(*) from test001;    
   count       
-----------    
 100000000    
(1 row)    
test=# select * from test001 limit 10;  
               c1                 
--------------------------------  
 The flatfish are very choosy.  
 The winding line of the town's dike is rolling and winding.  
 There is a great deal of interest in the world.  
 It's very beautiful and graceful.  
 To be open-minded, to be strong, to be jealous  
 Coconut coconut bun  
 Ba Weng is one of the most important people in the world.  
 Lofty and elegant dressing  
 It's a gorgeous cuckoo.  
 A sudden wound on the cake and a round rope.  
(10 rows)

2. Creating Index

test=# set maintenance_work_mem ='32GB';    
test=# create index idx_test001_1 on test001 using gin (c1 gin_trgm_ops);

3. Fuzzy Query Performance Test

3.1 Pre-ambiguity

Response time: 9 ms

Return to line 4701

select * from test001 where c1 like 'you%';  
  
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'you%';  
                                                          QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test001  (cost=89.50..10161.50 rows=10000 width=46) (actual time=1.546..8.868 rows=4701 loops=1)  
   Output: c1  
   Recheck Cond: (test001.c1 ~~ 'you%'::text)  
   Rows Removed by Index Recheck: 85  
   Heap Blocks: exact=4776  
   Buffers: shared hit=4784  
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..87.00 rows=10000 width=0) (actual time=0.879..0.879 rows=4786 loops=1)  
         Index Cond: (test001.c1 ~~ 'you%'::text)  
         Buffers: shared hit=8  
 Planning time: 0.099 ms  
 Execution time: 9.166 ms  
(11 rows)

3.2 Post-ambiguity

Response time: 0.25 milliseconds

Return 2 rows

select * from test001 where c1 like '%Chinese style';  
  
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'% bow';  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test001  (cost=89.50..10161.50 rows=10000 width=46) (actual time=0.049..0.223 rows=2 loops=1)  
   Output: c1  
   Recheck Cond: (test001.c1 ~~ '%Chinese style'::text)  
   Rows Removed by Index Recheck: 87  
   Heap Blocks: exact=89  
   Buffers: shared hit=94  
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..87.00 rows=10000 width=0) (actual time=0.031..0.031 rows=89 loops=1)  
         Index Cond: (test001.c1 ~~ '%Chinese style'::text)  
         Buffers: shared hit=5  
 Planning time: 0.113 ms  
 Execution time: 0.249 ms  
(11 rows)

3.3 Ambiguity

Response time: 0.2 milliseconds

Return one line

select * from test001 where c1 like '%Cockroach%';  
  
test=# Explain (analysis, verbose, timing, costs, buffers) select * from test001 where C1 like'% cockroach';  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test001  (cost=89.50..10161.50 rows=10000 width=46) (actual time=0.044..0.175 rows=1 loops=1)  
   Output: c1  
   Recheck Cond: (test001.c1 ~~ '%Cockroach%'::text)  
   Rows Removed by Index Recheck: 81  
   Heap Blocks: exact=82  
   Buffers: shared hit=87  
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..87.00 rows=10000 width=0) (actual time=0.027..0.027 rows=82 loops=1)  
         Index Cond: (test001.c1 ~~ '%Cockroach%'::text)  
         Buffers: shared hit=5  
 Planning time: 0.112 ms  
 Execution time: 0.201 ms  
(11 rows)

Keywords: Database PostgreSQL ascii less

Added by sm on Sat, 06 Jul 2019 01:07:09 +0300

Programming VIP

PostgreSQL Billion-Level Fuzzy Query Best Practice

Label

background

I. Optimization of Front-Fuzzy and Back-Fuzzy

2. Fuzzy optimization

3. Optimization of Regular Matching

Principle of pg_trgm Fuzzy Query

Reasons for Requirement of Number of Fuzzy Characters around pg_trgm

4. Optimization of Fuzzy Query with Less than 3 Input Characters

V. Similar Query Optimization

Summary

7. Performance

Popular Keywords