PostgreSQL best practices for automatically creating partitions

PostgreSQL best practices for automatically creating partitions

introduction

Partition table is a feature of PostgreSQL in version 10. In actual use, users often need to create partitions in advance or create partitions in real time according to the written data. This paper discusses several common automatic partition creation schemes.

scene

In practical use, the partition table generally takes the time field as the partition key. To simplify the problem, we assume that the partition field type is timestamp and the partition method is List of values

The structure of the table is as follows:

CREATE TABLE tab
(
    id   bigint GENERATED ALWAYS AS IDENTITY,
    ts   timestamp NOT NULL,
    data text
) PARTITION BY LIST ((ts::date));

CREATE TABLE tab_def PARTITION OF tab DEFAULT;

The creation of zones is generally divided into the following two scenarios:

  1. Create partitions regularly and in advance
  2. Create partitions on demand in real time

programme

Create partitions regularly and in advance

Creating partitions regularly and in advance generally requires only one scheduled task scheduling tool. The common ones are as follows:

  1. Use system scheduler, such as cronab (Linux, UNIX, etc.) and Task Scheduler (Windows);
  2. Use the database built-in scheduler, such as pg_cron, pg_timetable ;
  3. Use special partition management plug-ins, such as pg_partman ;

Using system scheduler

Take the Linux operating system as an example, create the partition table of the next day at 14 pm every day;

cat > /tmp/create_part.sh <<EOF
dateStr=\$(date -d '+1 days' +%Y%m%d);
psql -c "CREATE TABLE tab_\$dateStr (LIKE tab INCLUDING INDEXES); ALTER TABLE tab ATTACH PARTITION tab_\$dateStr FOR VALUES IN ('\$dateStr')";
EOF

(crontab -l 2>/dev/null; echo "0 14 * * * bash /tmp/create_part.sh ") | crontab -

Using the database built-in scheduler

In pg_cron, for example, creates the partition table of the next day at 14 pm every day;

CREATE OR REPLACE FUNCTION create_tab_part() RETURNS integer
    LANGUAGE plpgsql AS
$$
DECLARE
    dateStr varchar;
BEGIN
    SELECT to_char(DATE 'tomorrow', 'YYYYMMDD') INTO dateStr;
    EXECUTE
        format('CREATE TABLE tab_%s (LIKE tab INCLUDING INDEXES)', dateStr);
    EXECUTE
        format('ALTER TABLE tab ATTACH PARTITION tab_%s FOR VALUES IN (%L)', dateStr, dateStr);
    RETURN 1;
END;
$$;

CREATE EXTENSION pg_cron;

SELECT cron.schedule('0 14 * * *', $$SELECT create_tab_part();$$);

Using a dedicated partition management plug-in

In PG_ Take partman as an example, create the partition table of the next day in advance every day;

CREATE EXTENSION pg_partman;

SELECT partman.create_parent(p_parent_table => 'public.tab',
                             p_control => 'ts',
                             p_type => 'native',
                             p_interval=> 'daily',
                             p_premake => 1);

Create partitions on demand in real time

Sometimes, we need to create partitions according to the needs of data insertion, rather than pre create them in advance. According to the existence of the partition, it is convenient to judge whether there is data in the time interval. At this time, trigger is generally used to realize.

However, there are still two problems to be solved:

  1. BEFORE/FOR EACH ROW trigger for partition table is only provided in version 13 and above.
ERROR:  "tab" is a partitioned table
DETAIL:  Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.
  1. When inserting data, the partition table definition cannot be modified due to locking the table, that is, the ATTACH sub table cannot be modified. Therefore, there must be another connection to do the ATTACH operation. Here we think of using the LISTEN/NOTIFY mechanism to notify another connection to modify the partition definition.
ERROR:  cannot CREATE TABLE .. PARTITION OF "tab"
        because it is being used by active queries in this session
 or
ERROR:  cannot ALTER TABLE "tab"
        because it is being used by active queries in this session

Trigger (implement sub table creation and NOTIFY)

CREATE FUNCTION part_trig() RETURNS trigger
    LANGUAGE plpgsql AS
$$
BEGIN
    BEGIN
        /* try to create a table for the new partition */
        EXECUTE
            format('CREATE TABLE %I (LIKE tab INCLUDING INDEXES)', 'tab_' || to_char(NEW.ts, 'YYYYMMDD'));

        /*
         * tell listener to attach the partition
         * (only if a new table was created)
         */
        EXECUTE
            format('NOTIFY tab, %L', to_char(NEW.ts, 'YYYYMMDD'));
    EXCEPTION
        WHEN duplicate_table THEN
            NULL; -- ignore
    END;

    /* insert into the new partition */
    EXECUTE
        format('INSERT INTO %I VALUES ($1.*)', 'tab_' || to_char(NEW.ts, 'YYYYMMDD'))
        USING NEW;

    /* skip insert into the partitioned table */
    RETURN NULL;
END;
$$;

CREATE TRIGGER part_trig
    BEFORE INSERT
    ON TAB
    FOR EACH ROW
    WHEN (pg_trigger_depth() < 1)
EXECUTE FUNCTION part_trig();

Code (implement LISTEN and sub table ATTACH)

#! /usr/bin/env python3.9
# encoding:utf8
import asyncio

import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT

conn = psycopg2.connect('application_name=listener')
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
cursor = conn.cursor()
cursor.execute(f'LISTEN tab;')


def attach_partition(table, date):
    with conn.cursor() as cs:
        cs.execute('ALTER TABLE "%s" ATTACH PARTITION "%s_%s" FOR VALUES IN (\'%s\')' % (table, table, date, date))


def handle_notify():
    conn.poll()
    for notify in conn.notifies:
        print(notify.payload)
        attach_partition(notify.channel, notify.payload)
    conn.notifies.clear()


loop = asyncio.get_event_loop()
loop.add_reader(conn, handle_notify)
loop.run_forever()

summary

This paper introduces several solutions for automatically creating partitions in two scenarios, each of which has its own advantages and disadvantages.

It is easy to understand when the plug-in is created or migrated in advance, but it is easy to understand the cost of several scenarios.

In the scenario of creating partitions on demand in real time, the number of unnecessary partitions can be reduced according to the law of actual data, but it also needs a higher version (> = 13) and additional connections, which is relatively complex.

We can choose the appropriate way to automatically create partitions according to our own business conditions.

Keywords: Database PostgreSQL

Added by dustinnoe on Sun, 13 Feb 2022 11:40:24 +0200