[PostgreSQL] prefix fuzzy query level optimization

Pre match blur

Using B-Tree to speed up matching fuzzy query before optimization

Tectonic data

Create a new product table and insert 10 million pieces of data.

create table goods(id int, name varchar);

insert into goods select generate_series(1,10000000),md5(random()::varchar); 

Query the product name of "123%" when the index is not established, and the execution plan will take about 575ms to display:

explain (analyze,verbose,timing,costs,buffers) select * from goods where name like '123%' ;
========================================
Gather  (cost=1000.00..136516.59 rows=1000 width=37) (actual time=1.390..572.857 rows=2364 loops=1)
  Output: id, name
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=83334
  ->  Parallel Seq Scan on public.goods  (cost=0.00..135416.59 rows=417 width=37) (actual time=0.750..528.116 rows=788 loops=3)
        Output: id, name
        Filter: ((goods.name)::text ~~ '123%'::text)
        Rows Removed by Filter: 3332545
        Buffers: shared hit=83334
        Worker 0: actual time=1.032..511.776 rows=676 loops=1
          Buffers: shared hit=24201
        Worker 1: actual time=0.145..511.737 rows=755 loops=1
          Buffers: shared hit=26101
Planning time: 0.065 ms
Execution time: 573.157 ms

Optimize 1, build index, LC  collate (B-Tree)

lc_collate (string)
Area used for sorting report text data. See section 23.1 for details. The value is determined when the database is created.

lc_collate (string)
Area used for sorting report text data. See section 23.1 for details. The value is determined when the database is created.

create index idx_c on goods(name collate "C");

Execution plan display time is less than 10ms:

explain (analyze,verbose,timing,costs,buffers) select * from goods where name like '123%' collate "C";

Optimize 2, build index, operator class varchar? Pattern? Ops Mode

create index idx_varchar on goods(name varchar_pattern_ops);

The execution plan shows that it takes less than 5ms:

explain (analyze,verbose,timing,costs,buffers) select * from goods where name like '123%' collate "C";

======================================
Bitmap Heap Scan on public.goods  (cost=86.60..7681.10 rows=1000 width=37) (actual time=0.740..4.628 rows=2364 loops=1)
  Output: id, name
  Filter: ((goods.name)::text ~~ '123%'::text)
  Heap Blocks: exact=2330
  Buffers: shared hit=2351
  ->  Bitmap Index Scan on idx_varchar  (cost=0.00..86.35 rows=2179 width=0) (actual time=0.487..0.487 rows=2364 loops=1)
        Index Cond: (((goods.name)::text ~>=~ '123'::text) AND ((goods.name)::text ~<~ '124'::text))
        Buffers: shared hit=21
Planning time: 0.139 ms
Execution time: 4.891 ms

PostgreSQL extension

The keyword ILIKE can replace LIKE, indicating that case is irrelevant. This is a PostgreSQL extension.

Reference resources

https://www.postgresql.org/docs/10/indexes-opclass.html
https://github.com/digoal/blog/blob/master/201704/20170426_01.md

Keywords: PHP PostgreSQL Database less github

Added by DaPrince on Thu, 31 Oct 2019 22:46:18 +0200