background
In our daily development, one kind of data we will come into contact with is hierarchical data. What is tiered data? Business organization chart, content management category, RBAC permission management, product category, etc. These are hierarchical data. The following is the product category hierarchy of an e-store:
In this paper, we will study two models for processing hierarchical data in MySQL, starting with the traditional adjacency table model.
Adjacency table model
Usually, the sample categories shown above will be stored in the following table (I write down the CREATE and INSERT statements, and you can run them):
CREATE TABLE category( category_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20) NOT NULL, parent INT DEFAULT NULL ); INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2), (4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',6),(8,'FLASH',7), (9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6); SELECT * FROM category ORDER BY category_id; +-------------+----------------------+--------+ | category_id | name | parent | +-------------+----------------------+--------+ | 1 | ELECTRONICS | NULL | | 2 | TELEVISIONS | 1 | | 3 | TUBE | 2 | | 4 | LCD | 2 | | 5 | PLASMA | 2 | | 6 | PORTABLE ELECTRONICS | 1 | | 7 | MP3 PLAYERS | 6 | | 8 | FLASH | 7 | | 9 | CD PLAYERS | 6 | | 10 | 2 WAY RADIOS | 6 | +-------------+----------------------+--------+ 10 rows in set (0.00 sec)
In the adjacency table model, each item in the table contains a pointer to its parent. The top element, in this case, is ELECTRONICS, and its parent element is NULL. The advantage of adjacency table model is that it is relatively simple. It is easy to see that FLASH is a child of MP3 PLAYERS, a child of PORTABLE ELECTRONICS and a child of ELECTRONICS. The disadvantages are also obvious. We need to check all superiors or all subordinates of a node. We need to query recursively. In a business scenario, if the number of layers increases to a large number and leaf nodes also increase, our query will become very slow. We can also optimize this, that is, store the path between each leaf node. However, this method will increase the amount of data storage. So what is the way to store our entire number in a relational database\
Nested set model
In the nested set model, we can look at our hierarchy in a new way, not as nodes and lines, but as nested containers to try to describe our electronic product category in this way:
Note that our hierarchy remains because the parent classes surround their child classes. We use the left and right values to represent the nesting of nodes and represent this form of hierarchy in the table:
CREATE TABLE nested_category ( category_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20) NOT NULL, lft INT NOT NULL, rgt INT NOT NULL ); INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4), (4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13), (9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18); SELECT * FROM nested_category ORDER BY category_id; +-------------+----------------------+-----+-----+ | category_id | name | lft | rgt | +-------------+----------------------+-----+-----+ | 1 | ELECTRONICS | 1 | 20 | | 2 | TELEVISIONS | 2 | 9 | | 3 | TUBE | 3 | 4 | | 4 | LCD | 5 | 6 | | 5 | PLASMA | 7 | 8 | | 6 | PORTABLE ELECTRONICS | 10 | 19 | | 7 | MP3 PLAYERS | 11 | 14 | | 8 | FLASH | 12 | 13 | | 9 | CD PLAYERS | 15 | 16 | | 10 | 2 WAY RADIOS | 17 | 18 | +-------------+----------------------+-----+-----+
So how do we determine the left and right values? We start numbering from the far left of the outer node and continue to the right:
This design can also be applied to typical trees:
When using the tree, we go from left to right, one layer at a time, and drop to the child nodes of each node before assigning the right-hand number and moving to the right. This method is called preorder tree traversal algorithm.
Retrieve the complete tree
We can retrieve the complete tree by using self connection, which connects the parent node with the node, and the lft value based on the node will always appear between the lft and rgt values of its parent node:
SELECT node.name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND parent.name = 'ELECTRONICS' ORDER BY node.lft; +----------------------+ | name | +----------------------+ | ELECTRONICS | | TELEVISIONS | | TUBE | | LCD | | PLASMA | | PORTABLE ELECTRONICS | | MP3 PLAYERS | | FLASH | | CD PLAYERS | | 2 WAY RADIOS | +----------------------+
Unlike our previous example using the adjacency list model, this query will work regardless of the depth of the tree. We don't care about the rgt value of the node in the BETWEEN clause, because the rgt value will always be in the same parent node as the lft value.