Recursive query of PostgreSQL tree structure

 

background

For hierarchical structures dealing with uncertain depth, such as organizations, a common design is to save ID and parent in a table_ ID, and construct a tree by self association. This method is very friendly to the process of writing data, but the query process becomes relatively complex. On the premise of not introducing MPTT model, a node and its child nodes must be queried by recursive algorithm.
The connect by extension syntax provided by Oracle is simple and easy to use. But other RDBMS are not so user-friendly (or I don't know). Recently, PostgreSQL was used in the project to query tree data and record it.

Construct sample data

drop table if exists demo.tree_data;
create table demo.tree_data (
    id integer,
    code text,
    pid integer,
    sort integer
);

insert into demo.tree_data values(1, 'China', null, 1);
insert into demo.tree_data values(2, 'Sichuan', 1, 1);
insert into demo.tree_data values(3, 'Yunnan', 1, 2);
insert into demo.tree_data values(4, 'Chengdu', 2, 1);
insert into demo.tree_data values(5, 'Mianyang', 2, 2);	
insert into demo.tree_data values(6, 'Wuhou District', 4, 1);
insert into demo.tree_data values(7, 'Kunming', 3, 1);	

Table structure

connectby function

If the tablefunc extension is installed, you can use the PG version of the connectby function. This is not as powerful as Oracle, but it can meet the basic requirements.

-- API as follows
connectby(text relname,             -- Table name
  text keyid_fld,           -- id field
  text parent_keyid_fld     -- father id field    
  [, text orderby_fld ],    -- sort field
  text start_with,          -- Start line id value
  int max_depth             -- Tree depth, 0 means infinite
  [, text branch_delim ])   -- path separator 
-- The basic usage is as follows, which must be passed AS Clause defines the name and type of the returned field
select * 
    from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
    as (id int, pid int, lvl int, branch text, sort int);
     
-- Query results
id | pid | lvl | branch | sort
----+-----+-----+---------+------
 1 | | 0 | 1 | 1
 2 | 1 | 1 | 1~2 | 2
 4 | 2 | 2 | 1~2~4 | 3
 6 | 4 | 3 | 1~2~4~6 | 4
 5 | 2 | 2 | 1~2~5 | 5
 3 | 1 | 1 | 1~3 | 6
 7 | 3 | 2 | 1~3~7 | 7
(7 rows)
-- Using only the basic usage, you can only query id If you want to query the relevant information of code And other fields, you need to pass additional join Operation.
select
    t.id, n.code, t.pid, p.code as pcode, lvl, branch
from (
    select * from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
        as (id int, pid int, lvl int, branch text, sort int)
) as t
    left join demo.tree_data as n on (t.id = n.id)
    left join demo.tree_data as p on (t.pid = p.id)
order by t.sort ;   
 
 id | code | pid | pcode | lvl | branch
----+--------+-----+-------+-----+---------
 1 | China | | | 0 | 1
 2 | Sichuan | 1 | China | 1 | 1~2
 4 | Chengdu | 2 | Sichuan | 2 | 1~2~4
 6 | Wuhou District | 4 | Chengdu | 3 | 1~2~4~6
 5 | Mianyang | 2 | Sichuan | 2 | 1~2~5
 3 | Yunnan | 1 | China | 1 | 1~3
 7 | Kunming | 3 | Yunnan | 2 | 1~3~7
(7 rows)

Although the code of the node can be queried through the join, the branch part cannot be directly converted into the corresponding code, which is still inconvenient to use.

CTE syntax

The recursive query of tree data is realized by using CTE syntax and with recursive. Although this method is not as direct as connectby, it has better flexibility and display effect.

-- 
with recursive cte as
(
 -- Query first root node 
 select
 id, code, pid, '' as pcode,
 code as branch
 from demo.tree_data where id = 1
 union all
 -- adopt cte recursive query  root Direct child node of node 
 select
 origin.id, origin.code, cte.id as pid, cte.code as pcode,
 cte.branch || '~' || origin.code
 from cte
 join demo.tree_data as origin on origin.pid = cte.id
)
select
 id,code, pid, pcode, branch, 
 -- By calculating the number of separators, the depth of the tree is simulated
 (length(branch)-length(replace(branch, '~', ''))) as lvl
from cte;
 
-- 
 id | code | pid | pcode | branch  | lvl
----+--------+-----+-------+-----------------------+-----
 1 | China | | | China   | 0
 2 | Sichuan | 1 | China | China~Sichuan  | 1
 3 | Yunnan | 1 | China | China~Yunnan  | 1
 4 | Chengdu | 2 | Sichuan | China~Sichuan~Chengdu | 2
 5 | Mianyang | 2 | Sichuan | China~Sichuan~Mianyang | 2
 7 | Kunming | 3 | Yunnan | China~Yunnan~Kunming | 2
 6 | Wuhou District | 4 | Chengdu | China~Sichuan~Chengdu~Wuhou District | 3
(7 rows)

Query a node and its child nodes

with RECURSIVE r as 
( select t1.* from tree_data t1 where t1.id = 1 
union all 
select t2.* from tree_data t2 inner join r  on r.id = t2.pid) 
select * from r order by id 

Query results

Query a node and parent node

with RECURSIVE r as
( select t1.* from tree_data t1 where t1.id= 7 
union all 
select t2.* FROM tree_data t2, r WHERE t2.id = r.pid ) 
select * from r order by id 

Query results

Description of execution process

As can be seen from the above example, the WITH RECURSIVE statement contains two parts

  • Non recursive term (non recursive part), that is, the front part of union all in the above example
  • recursive term, that is, the part after union all in the above example

Perform the following steps

  • Execute non recursive term. (if you use union instead of union all, you need to de duplicate the result) the result is used as a reference to the result in the recursive term, and this part of the result is placed in the temporary working table
  • Repeat the following steps until the working table is empty: replace the recursive self reference with the content of the working table, execute the recursive term (remove the duplicate data if using union instead of union all), and replace the working table with the result (the de duplication result if using union instead of union all)

Take the query above as an example to see the specific process

  1. Execute non recursive query
-- step 1 implement
 select
 id, code, pid, '' as pcode,
 code as branch
 from demo.tree_data where id = 1
  
-- Result set and working table by
 id | code | pid | pcode | branch
----+------+-----+-------+--------
 1 | China | | | China
  1. Execute recursive query
-- step 2 Recursion is performed, and self reference is performed at this time cte The data in is step 1 Results
 select
 origin.id, origin.code, cte.id as pid, cte.code as pcode,
 cte.branch || '~' || origin.code
 from cte
 join demo.tree_data as origin on origin.pid = cte.id
  
 -- Result set and working table by
 id | code | pid | pcode | branch 
----+--------+-----+-------+---------------------
 2 | Sichuan | 1 | China | China~Sichuan  
 3 | Yunnan | 1 | China | China~Yunnan
  1. Continue to execute recursive query until the result set and working table are empty

  2. End recursion and merge the result sets of the first three steps to obtain the final result set of WITH RECURSIVE.

    Strictly speaking, the implementation of this process is an iterative process rather than RECURSIVE, but the keyword RECURSIVE is set by the SQL Standards Committee, so PostgreSQL also uses the keyword RECURSIVE.

Original link: https://juejin.im/post/5cdac101e51d453d022cb666

Keywords: PostgreSQL

Added by ozzy on Fri, 21 Jan 2022 18:32:53 +0200