Postgresql gossip 05 - JSON and JSONB types in Postgresql

1, JSON and JSONB

Compared with other relational databases, Postgresql has a great advantage in that it has built-in many special data types, such as inet and MAC. The json and json types introduced in this paper are also two unique data types of Postgresql. In fact, their essence is to store data in jon format, However, the difference is that the json format is stored according to the original json string, including repeated keys, spaces and sequences in the string; The jsonb format parses the original json string, removes duplicate keys and spaces, and does not guarantee the order of keys. It will be converted to binary when stored. Therefore, json data is stored faster, but read slower; Jsonb data is stored slowly, but read faster.

In addition, json and jsonb have some other differences:

(1) Most of the operations and operators used by the two are the same, but jsonb will have some specific operators, such as "=" to judge whether the two jsonb objects are equal, "@ >" to judge whether the operands on the left contain the operands on the right.

(2) We can directly create an index on a field of json type, but we can only create a functional index on a field of json type, not a direct index.

Next, the author verifies these contents. First, let's look at the two data tables used by the writer:

stock_analysis_data=# \d+ custmer
                                                       Table "public.custmer"
 Column |         Type          | Collation | Nullable |               Default               | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
 id     | integer               |           | not null | nextval('custmer_id_seq'::regclass) | plain    |              | 
 name   | character varying(40) |           |          |                                     | extended |              | 
 tag    | json  
  • custmer table -- tag with a json type field
stock_analysis_data=# \d+ account
                                                       Table "public.account"
 Column |         Type          | Collation | Nullable |               Default               | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
 id     | integer               |           | not null | nextval('account_id_seq'::regclass) | plain    |              | 
 name   | character varying(40) |           |          |                                     | extended |              | 
 tag    | jsonb                 |           |          |                                     | extended |              | 
  • account table - has a field tag of type jsonb

2, Insert JSON data

  • Insert json standard format data

We first insert a piece of data in standard json format into the custmer table to verify the insertion effect of json type data.

stock_analysis_data=# Insert into customer (name, tag) values ('Tom ',' {"age": 17, "job": "development"} '); 
INSERT 0 1

Next, let's view the data in the entire custmer table:

stock_analysis_data=# select * from custmer;
 id | name |           tag           
----+------+-------------------------
  1 | tom  | {"age":17,"job":"development"}
(1 row)

Then, insert a piece of data into the account.

stock_analysis_data=# insert into account (name,tag) values ('tom','{"age":17,"job": "development"} '); INSERT 0 1

To view the inserted data:

stock_analysis_data=# insert into account (name,tag) values ('tom','{"age":17,"job": "development"} ');
INSERT 0 1

We found that the tag field of the data queried from the account table is also a standard json format. In fact, the main difference between json and jsonb lies in the stored data structure, and jsonb is stored in binary mode. However, for our client-side query, the queried data in json B format will eventually be presented in json standard format. The similarities and differences of their storage structures are insensitive to us.

  • Repeat key, space, and sequence processing during insertion

Next, we verify the processing of repeated keys, spaces and sequences in json data in json format and json B format. We insert data containing spaces and repeating keys into the custmer table.

stock_analysis_data=# Insert into customer (name, tag) values ('Tom ',' {"age": 17, "J o b": "development", "age":19} ');
INSERT 0 1
stock_analysis_data=# select * from custmer;
 id | name |               tag                
----+------+----------------------------------
  3 | tom  | {"age":17,"j o b":"development","age":19}
(1 row)

From the data queried later, we can see that the json format data inserted by the author has duplicate keys age, and there are spaces in the keys and values of the job field, but the json format fields do not process the duplicate keys and spaces, and they are stored as they are.

Next, the author inserts the same data into the account table and queries:

stock_analysis_data=# insert into account (name,tag) values ('tom','{"age":17,"j o b": "development", "age":19} ');
INSERT 0 1
stock_analysis_data=# select * from account;
 id | name |              tag               
----+------+--------------------------------
  2 | tom  | {"age": 19, "j o b": "development"}
(1 row)

Careful friends have found why the spaces in json key value pairs are not removed? Didn't jsob remove the spaces? In fact, the de whitespace in isonb does not refer to the de whitespace inside key value pairs, but to remove the whitespace between key value pairs. For example, in the following example, the author added several spaces between the first age and job. After checking it out, it was found that it had disappeared.

stock_analysis_data=# Insert into account (name, tag) values ('Tom ',' {"age": 17, "J o b": "development", "age":19} ');
INSERT 0 1
stock_analysis_data=# select * from account;
 id | name |              tag               
----+------+--------------------------------
  3 | tom  | {"age": 19, "j o b": "development"}

Moreover, we also found that when jsob is de duplicated, the last duplicate key value pair is retained by default, and the order of key value pairs in json is sorted.

  • Insert malformed json data

json and json B store json strings. Compared with directly using text and other string types to store json strings, the biggest feature is that the former will verify the json format, while the latter will not. For example, in the following example, the author directly inserts a json string in wrong format (there is no comma between the latter two key value pairs) into the custmer table, Final insert failed:

stock_analysis_data=# insert into account (name,tag) values ('tom','{"age":17,"job": "development" age: 19} ');
ERROR:  invalid input syntax for type json
LINE 1: insert into account (name,tag) values ('tom','{"age":17,"job...
                                                     ^
DETAIL:  Expected "," or "}", but found ""age"".
CONTEXT:  JSON data, line 1: {"age":17,"job":"development""age"..

3, Common JSON/JSONB operators or functions

Postgresql provides a series of json/jsonb related operators or functions that enable us to easily operate json/jsonb. Next, the author introduces some common operators or functions.

  • ->Operator

Function: fetch the value in json/jsonb through the key.

stock_analysis_data=# select tag->'age' from account where name='tom';
 ?column? 
----------
 19
  • ->>Operator

Function: retrieve the value in json/jsonb through the key, but the returned result will be automatically converted to text type.

stock_analysis_data=# select tag->>'age' from account where name='tom';
 ?column? 
----------
 19
  • #>Operator

Function: for nested json data, it can be obtained according to the path.

Before verifying the use of this operator, we need to insert a nested json data into the database. The inserted data is as follows:

stock_analysis_data=# select * from account where name='jerry';
 id | name  |                  tag                   
----+-------+----------------------------------------
  4 | jerry | {"info": {"age": 18, "job": "architect "}}
(1 row)

What we need to do now is to use the #> operator to obtain the value corresponding to age:

stock_analysis_data=# select tag#>'{info,age}' from account where name='jerry';
 ?column? 
----------
 18
(1 row)
  • =Operator

Function: it can only be used for jsonb to judge whether the data of two jsonb are equal.

stock_analysis_data=# select * from account where tag='{"age": 19, "j o b": "development"}':: jsonb;
 id | name |              tag               
----+------+--------------------------------
  3 | tom  | {"age": 19, "j o b": "development"}

It should be noted that the jsonb data is order sensitive. The order of key value pairs is different, and the data of two jsonb are different.

  • @>Operator and < @ operator

Function: it can only be used for jsonb. The function of the @ > operator is to judge whether the jsonb object of the left operand contains the jsonb object of the right operand, while the function of < @ is the opposite.

stock_analysis_data=# select * from account where tag@>'{"age": 19}'::jsonb;
 id | name |              tag               
----+------+--------------------------------
  3 | tom  | {"age": 19, "j o b": "development"}
  • to_json function

Function: convert other types of objects into json objects

Before verifying its use, we first create a custom TYPE:

stock_analysis_data=# create type info as(age integer,job text);
CREATE TYPE

Use to_json adds info type data to the tag field in the account table and automatically converts it to jsonb format:

stock_analysis_data=# insert into account (name,tag) values ('tom',to_json('(19, "development") ':: info));
INSERT 0 1
stock_analysis_data=# insert into account (name,tag) values ('jerry',to_json('(35, "architect") ':: info));
INSERT 0 1
stock_analysis_data=# select * from account;
 id | name  |             tag              
----+-------+------------------------------
  5 | tom   | {"age": 19, "job": "development"}
  6 | jerry | {"age": 35, "job": "architect "}
(2 rows)

4, Modification of JSON/JSONB data

To modify the json/jsonb field, you can use the | operator:

stock_analysis_data=# update account set tag=tag||'{"age":20}' where name='tom';
UPDATE 1

In addition, if the data is in jsonb format, you can use jsonb_set method:

stock_analysis_data=# update account set tag=jsonb_set(tag,'{age}','28')  where name='tom';
UPDATE 1
stock_analysis_data=# select * from account where name='tom';
 id | name |            tag             
----+------+----------------------------
  5 | tom  | {"age": 28, "job": "development"}
(1 row)

The function prototype of jsonb is as follows:

jsonb_set(target jsonb, path text[], new_value jsonb[, create_missing boolean])
  • Target -- field in target jsonb format
  • Path -- the path of the target value in json format
  • new_value -- the new value. Note that if the value type of the target is a number, a single quotation mark should also be added here.

5, Deletion of JSON/JSONB data

To delete the key value pair in json/jsonb, you can use the '-' operator:

stock_analysis_data=# update account set tag=tag-'age'  where name='tom';
UPDATE 1
stock_analysis_data=# select * from account where name='tom';
 id | name |       tag       
----+------+-----------------
  5 | tom  | {"job": "development"}
(1 row)

6, Adding indexes to JSON/JSONB

As mentioned above, JSON types do not provide relevant comparison functions, so indexes cannot be created directly. Only functional indexes can be used. However, the JSONB type can be indexed directly on it, either BTree index or GIN index. However, the BTree index does not care about the internal data of JSON, but simply arranges the values of JSONB in order, which is basically not used in actual scenarios. The index created in the JSONB type we usually say refers to the GIN index.

  • Adding a functional index to a JSON field

Next, the author first demonstrates the method of creating a functional index in the custmer table. Before that, we need to make some preparations and insert 200W data into the table.

stock_analysis_data=# select count(*) from custmer;
  count  
---------
 2097152
(1 row)

Among the 200W pieces of data, the tag field of 10 pieces of data is age=21. Before inserting the index, execute the query to view the time consumption:

stock_analysis_data=# explain (analyze,verbose) select * from custmer where tag->>'age'='21';
                                                             QUERY PLAN                                                    
          
---------------------------------------------------------------------------------------------------------------------------
----------
 Gather  (cost=1000.00..34755.80 rows=10486 width=47) (actual time=0.338..1683.812 rows=10 loops=1)
   Output: id, name, tag
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.custmer  (cost=0.00..32707.20 rows=4369 width=47) (actual time=1098.140..1657.965 rows=3
 loops=3)
         Output: id, name, tag
         Filter: ((custmer.tag ->> 'age'::text) = '21'::text)
         Rows Removed by Filter: 699047
         Worker 0: actual time=1650.803..1650.816 rows=7 loops=1
         Worker 1: actual time=1643.537..1643.538 rows=0 loops=1
 Planning Time: 0.049 ms
 Execution Time: 1683.847 ms
(12 rows)

Time: 1684.334 ms (00:01.684) 

We find that the whole query takes 1683 MS without index. Now, let's use JSON_ extract_ path_ Textbuild a function index.

stock_analysis_data=# create index myfuncinx on custmer using btree(json_extract_path_text(tag,'age'));
CREATE INDEX
Time: 3879.206 ms (00:03.879)

Reuse JSON_ extract_ path_ Textfunction execution query:

stock_analysis_data=# explain (analyze,verbose) select * from custmer where json_extract_path_text(tag,'age')='21';
                                                         QUERY PLAN                                                        
 
---------------------------------------------------------------------------------------------------------------------------
-
 Bitmap Heap Scan on public.custmer  (cost=197.69..17322.43 rows=10486 width=47) (actual time=0.082..0.094 rows=10 loops=1)
   Output: id, name, tag
   Recheck Cond: (json_extract_path_text(custmer.tag, VARIADIC '{age}'::text[]) = '21'::text)
   Heap Blocks: exact=2
   ->  Bitmap Index Scan on myfuncinx  (cost=0.00..195.07 rows=10486 width=0) (actual time=0.064..0.065 rows=10 loops=1)
         Index Cond: (json_extract_path_text(custmer.tag, VARIADIC '{age}'::text[]) = '21'::text)
 Planning Time: 2.450 ms
 Execution Time: 0.147 ms
(8 rows)

Time: 3.063 ms

It can be seen that this query takes the index, and the total time is 3ms, which is 1 / 500 of the original.

  • Create GIN index for JSONB

Next, the author demonstrates the method of creating GIN index on JSONB field and compares the query efficiency before and after index creation. Before that, we need to insert 200W test data into the account table.

stock_analysis_data=# select  count(*) from account;
  count  
---------
 2097152
(1 row)

Among the 200W pieces of data, there are 6 pieces of data in the tag field, and the job is "project manager". We find these 6 pieces of data.

First, find without adding an index:

stock_analysis_data=# Explain (analyze, verbose) select * from account where tag @ > '{job ":" project manager "}';
                                                            QUERY PLAN                                                     
       
---------------------------------------------------------------------------------------------------------------------------
-------
 Gather  (cost=1000.00..30692.37 rows=2097 width=43) (actual time=0.336..565.635 rows=6 loops=1)
   Output: id, name, tag
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.account  (cost=0.00..29482.67 rows=874 width=43) (actual time=354.463..542.827 rows=2 lo
ops=3)
         Output: id, name, tag
         Filter: (account.tag @> '{"job": "project manager"}'::jsonb)
         Rows Removed by Filter: 699049
         Worker 0: actual time=525.154..525.161 rows=4 loops=1
         Worker 1: actual time=538.161..538.162 rows=0 loops=1
 Planning Time: 0.062 ms
 Execution Time: 565.678 ms
(12 rows)

Time: 566.094 ms

It was found that the whole search process took 566ms. Next, we create a GIN index on the tag. There are two types of GIN indexes for jsonb:

(1) Use the default jsonb_ops operator creation.

(2) Using jsonb_path_ops operator creation.

The difference between the two is that each key and value in the jsonb data of the former is regarded as a separate index item, while the latter makes only one index item for each key value pair. For example, for the json of {"job": "project manager"}, jsonb_ops creates two indexes for "job" and "project manager", and jsonb_path_ops will only combine "job" and "project manager" into a Hash value as an index item. Because there are many fewer index entries, jsonb_path_ops index will be better than jsonb_ The OPS index is small and the performance is improved.

Next, we create two GIN indexes and demonstrate the query. First, create the default jsonb_ops index:

stock_analysis_data=# create index myjsongininx on account using gin(tag);
CREATE INDEX
Time: 2018.663 ms (00:02.019

stock_analysis_data=# Explain (analyze, verbose) select * from account where tag @ > '{job ":" project manager "}';
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.account  (cost=1140.25..7158.99 rows=2097 width=43) (actual time=0.118..0.131 rows=6 loops=1)
   Output: id, name, tag
   Recheck Cond: (account.tag @> '{"job": "project manager"}'::jsonb)
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on myjsongininx  (cost=0.00..1139.73 rows=2097 width=0) (actual time=0.101..0.102 rows=6 loops=1)
         Index Cond: (account.tag @> '{"job": "project manager"}'::jsonb)
 Planning Time: 0.089 ms
 Execution Time: 0.171 ms
(8 rows)

Time: 0.723 ms

You can see that the query performance has been greatly improved. Then use jsonb_path_ops creates a GIN index (first delete the previous index) and queries it.

stock_analysis_data=# create index myjsongininx2 on account using gin(tag jsonb_path_ops);
CREATE INDEX
Time: 1097.963 ms (00:01.098)
stock_analysis_data=# Explain (analyze, verbose) select * from account where tag @ > '{job ":" project manager "}';
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.account  (cost=440.25..6458.99 rows=2097 width=43) (actual time=0.027..0.039 rows=6 loops=1)
   Output: id, name, tag
   Recheck Cond: (account.tag @> '{"job": "project manager"}'::jsonb)
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on myjsongininx2  (cost=0.00..439.73 rows=2097 width=0) (actual time=0.013..0.013 rows=6 loops=1)
         Index Cond: (account.tag @> '{"job": "project manager"}'::jsonb)
 Planning Time: 0.081 ms
 Execution Time: 0.072 ms
(8 rows)
Time: 0.639 ms

        jsonb_path_ops index also greatly improves the query efficiency, and its query time is even more time-consuming than jsonb_ The OPS index should be smaller.

Keywords: Database JSON PostgreSQL

Added by The Chancer on Tue, 18 Jan 2022 07:42:43 +0200