Reflections on an index failure case in PostgreSQL

A strange index failure occurred some time ago. The actual situation is similar to the following:

bill=# begin;
BEGIN
bill=*# create index idx_t1 on t1(id);
CREATE INDEX
bill=*# explain select * from t1 where id = 1;
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)
   Filter: (id = 1)
(2 rows)

bill=*# end;
COMMIT
bill=# explain select * from t1 where id = 1;
                             QUERY PLAN
---------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=1.50..7.01 rows=6 width=36)
   Recheck Cond: (id = 1)
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..1.50 rows=6 width=0)
         Index Cond: (id = 1)
(4 rows)

The obvious problem is that I created an index in the transaction, but I can't use it. But when a transaction is committed, it can be used normally. What is the situation?

This is actually PG_ The indcheckxmin attribute in index is related. The explanation of this field is as follows:
If true, queries must not use the index until the xmin of this pg_index row is below their TransactionXmin event horizon, because the table may contain broken HOT chains with incompatible rows that they can see

This is true after inspection:

bill=*# select indcheckxmin from pg_index where indexrelid = 'idx_t1'::regclass;
 indcheckxmin
--------------
 t
(1 row)

So the question is, under what circumstances will the attribute of the index be set to true when creating an index?

From the explanation of this field in the previous official document, if the table contains broken HOT chains, it will be true. What is broken HOT chains? It seems to be related to the HOT mechanism. Is it true only when broken HOT chains exist?

We won't sell the key here. We'll give a conclusion directly, and then we'll verify it one by one.

After testing, it is found that the following two conditions will cause the indcheckxmin of the index to be set to true:

  1. There are broken HOT chains on the table in the current transaction, which is stated in the official document;
  2. When old_ snapshot_ When threshold is set.

Scenario 1: broken HOT chains

This situation exists as long as there are HOT updated rows in the table in the current transaction. So when will the HOT update take place? Two premises:
The new tuple and the old tuple must be in the same page;
Index fields cannot be updated.

In this case, there are two common situations in practice:

  • Update the last page on the table;
  • fillfactor is set in the table, that is, there is reserved free space on each page.

example:
Insert 10 pieces of data into the table, and naturally there is only one page:

bill=# insert into t1 select generate_series(1,10),md5(random()::text);
INSERT 0 10

Update:

bill=# update t1 set info = 'bill' where id = 10;
UPDATE 1

Check and find that it is indeed a HOT update:
About t_ The explanation of infomask2 field will not be repeated here.

Next, we create an index:
It can be found that indcheckxmin is set to true, and the index is not available in the current transaction.

After verification, in index_ In the build phase, if the BrokenHotChain is judged, the indcheckxmin is modified to true.

The specific modification code is as follows:

/*Indexinfo - > II_ Brokenhotchain has been modified to true */	
if ((indexInfo->ii_BrokenHotChain || EarlyPruningEnabled(heapRelation)) &&
		!isreindex &&
		!indexInfo->ii_Concurrent)
	{
		Oid			indexId = RelationGetRelid(indexRelation);
		Relation	pg_index;
		HeapTuple	indexTuple;
		Form_pg_index indexForm;

		pg_index = table_open(IndexRelationId, RowExclusiveLock);

		indexTuple = SearchSysCacheCopy1(INDEXRELID,
										 ObjectIdGetDatum(indexId));
		if (!HeapTupleIsValid(indexTuple))
			elog(ERROR, "cache lookup failed for index %u", indexId);
		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);

		/* If it's a new index, indcheckxmin shouldn't be set ... */
		Assert(!indexForm->indcheckxmin);

/*Modify indcheckxmin to true */
		indexForm->indcheckxmin = true;
		CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);

		heap_freetuple(indexTuple);
		table_close(pg_index, RowExclusiveLock);
	}

Similarly, we can verify that indcheckxmin is set to true because of broken hotwins.

Scenario 2: old_snapshot_threshold

Let's start with an example:
In the simplest scenario, for a completely empty table, create an index in the transaction, and indcheckxmin will be set to true. Sure enough, the index is also unavailable.

bill=# drop table t1;
DROP TABLE
bill=# create table t1(id int,info text);
CREATE TABLE
bill=# begin;
BEGIN
bill=*# create index idx_t1 on t1(id);
CREATE INDEX
bill=*# select indcheckxmin from pg_index where indexrelid = 'idx_t1'::regclass;
 indcheckxmin
--------------
 t
(1 row)

bill=*# explain select * from t1 where id = 1;
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)
   Filter: (id = 1)
(2 rows)

So why old_ snapshot_ Will threshold have such an impact?

After tracking, it is found that when this parameter is enabled, the snapshot data structure of creating an index in the transaction is as follows:

(SnapshotData) $6 = {
  snapshot_type = SNAPSHOT_MVCC
  xmin = 856
  xmax = 856
  xip = 0x00007fd55c804fc0
  xcnt = 0
  subxip = 0x00007fd55ad5d000
  subxcnt = 0
  suboverflowed = false
  takenDuringRecovery = false
  copied = false
  curcid = 1
  speculativeToken = 0
  vistest = NULL
  active_count = 0
  regd_count = 0
  ph_node = {
    first_child = NULL
    next_sibling = NULL
    prev_or_parent = NULL
  }
  whenTaken = 691752041261069
  lsn = 208079736
}

And disable this parameter?

(SnapshotData) $7 = {
  snapshot_type = SNAPSHOT_MVCC
  xmin = 828
  xmax = 828
  xip = 0x00007fad31704780
  xcnt = 0
  subxip = 0x00007fad3155d000
  subxcnt = 0
  suboverflowed = false
  takenDuringRecovery = false
  copied = false
  curcid = 1
  speculativeToken = 0
  active_count = 0
  regd_count = 0
  ph_node = {
    first_child = NULL
    next_sibling = NULL
    prev_or_parent = NULL
  }
  whenTaken = 0
  lsn = 0
}

You can see that the difference is that when you do not use this parameter, when you create a snapshot data, you will not set whenTaken and lsn. What are these two parameters?

Let's take a look at the structure of snapshot data:

typedef struct SnapshotData
{
    SnapshotType snapshot_type; /* type of snapshot */

    /*
     * The remaining fields are used only for MVCC snapshots, and are normally
     * just zeroes in special snapshots.  (But xmin and xmax are used
     * specially by HeapTupleSatisfiesDirty, and xmin is used specially by
     * HeapTupleSatisfiesNonVacuumable.)
     *
     * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
     * the effects of all older XIDs except those listed in the snapshot. xmin
     * is stored as an optimization to avoid needing to search the XID arrays
     * for most tuples.
     */
    TransactionId xmin;         /* all XID < xmin are visible to me */
    TransactionId xmax;         /* all XID >= xmax are invisible to me */

    /*
     * For normal MVCC snapshot this contains the all xact IDs that are in
     * progress, unless the snapshot was taken during recovery in which case
     * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
     * it contains *committed* transactions between xmin and xmax.
     *
     * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
     */
    TransactionId *xip;
    uint32      xcnt;           /* # of xact ids in xip[] */

    /*
     * For non-historic MVCC snapshots, this contains subxact IDs that are in
     * progress (and other transactions that are in progress if taken during
     * recovery). For historic snapshot it contains *all* xids assigned to the
     * replayed transaction, including the toplevel xid.
     *
     * note: all ids in subxip[] are >= xmin, but we don't bother filtering
     * out any that are >= xmax
     */
    TransactionId *subxip;
    int32       subxcnt;        /* # of xact ids in subxip[] */
    bool        suboverflowed;  /* has the subxip array overflowed? */

    bool        takenDuringRecovery;    /* recovery-shaped snapshot? */
    bool        copied;         /* false if it's a static snapshot */

    CommandId   curcid;         /* in my xact, CID < curcid are visible */

    /*
     * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
     * snapshots.
     */
    uint32      speculativeToken;

    /*
     * For SNAPSHOT_NON_VACUUMABLE (and hopefully more in the future) this is
     * used to determine whether row could be vacuumed.
     */
    struct GlobalVisState *vistest;

    /*
     * Book-keeping information, used by the snapshot manager
     */
    uint32      active_count;   /* refcount on ActiveSnapshot stack */
    uint32      regd_count;     /* refcount on RegisteredSnapshots */
    pairingheap_node ph_node;   /* link in the RegisteredSnapshots heap */

    TimestampTz whenTaken;      /* timestamp when snapshot was taken */
    XLogRecPtr  lsn;            /* position in the WAL stream when taken */

    /*
     * The transaction completion count at the time GetSnapshotData() built
     * this snapshot. Allows to avoid re-computing static snapshots when no
     * transactions completed since the last GetSnapshotData().
     */
    uint64      snapXactCompletionCount;
} SnapshotData;

As shown above, TimestampTz indicates when the snapshot is generated and why old is enabled_ snapshot_ Will this value be set at threshold?

Because this value is used to judge whether the snapshot is too old:

/*
 * Implement slower/larger portions of TestForOldSnapshot
 *
 * Smaller/faster portions are put inline, but the entire set of logic is too
 * big for that.
 */
void
TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
{
        if (RelationAllowsEarlyPruning(relation)
                && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
                ereport(ERROR,
                                (errcode(ERRCODE_SNAPSHOT_TOO_OLD),
                                 errmsg("snapshot too old")));
}

In this way, we can better understand why the index created when this parameter is set is not available in the current transaction:
When we do not set this parameter, creating an index in a transaction can ensure the consistency of MVCC, so the index is safe and available.
When using the parameter, because TimestampTz is set, the database will judge whether the row data has expired. If it has expired, it will be cleaned up. This is unsafe for the index and cannot ensure the data consistency. For indexes that are not hot safe, it is natural to set indcheckxmin to true, Prevent the data from actually expired and deleted after the index is created in the transaction.

 /*
     * At this moment we are sure that there are no transactions with the
     * table open for write that don't have this new index in their list of
     * indexes.  We have waited out all the existing transactions and any new
     * transaction will have the new index in its list, but the index is still
     * marked as "not-ready-for-inserts".  The index is consulted while
     * deciding HOT-safety though.  This arrangement ensures that no new HOT
     * chains can be created where the new tuple and the old tuple in the
     * chain have different index keys.
     *
     * We now take a new snapshot, and build the index using all tuples that
     * are visible in this snapshot.  We can be sure that any HOT updates to
     * these tuples will be compatible with the index, since any updates made
     * by transactions that didn't know about the index are now committed or
     * rolled back.  Thus, each visible tuple is either the end of its
     * HOT-chain or the extension of the chain is HOT-safe for this index.
     */

Summary:

When PG_ The indcheckxmin field of index is set to true until this PG_ Queries cannot use this index until the xmin of the index row is below the TransactionXmin horizon of the query.

There are two main situations for this phenomenon:

1. broken HOT chains exist in the current transaction on the table;
2. old_ snapshot_ When threshold is set.

Keywords: Database PostgreSQL

Added by dclamp on Fri, 03 Dec 2021 11:03:37 +0200