A strange index failure occurred some time ago. The actual situation is similar to the following:
bill=# begin; BEGIN bill=*# create index idx_t1 on t1(id); CREATE INDEX bill=*# explain select * from t1 where id = 1; QUERY PLAN ---------------------------------------------------- Seq Scan on t1 (cost=0.00..25.88 rows=6 width=36) Filter: (id = 1) (2 rows) bill=*# end; COMMIT bill=# explain select * from t1 where id = 1; QUERY PLAN --------------------------------------------------------------------- Bitmap Heap Scan on t1 (cost=1.50..7.01 rows=6 width=36) Recheck Cond: (id = 1) -> Bitmap Index Scan on idx_t1 (cost=0.00..1.50 rows=6 width=0) Index Cond: (id = 1) (4 rows)
The obvious problem is that I created an index in the transaction, but I can't use it. But when a transaction is committed, it can be used normally. What is the situation?
This is actually PG_ The indcheckxmin attribute in index is related. The explanation of this field is as follows:
If true, queries must not use the index until the xmin of this pg_index row is below their TransactionXmin event horizon, because the table may contain broken HOT chains with incompatible rows that they can see
This is true after inspection:
bill=*# select indcheckxmin from pg_index where indexrelid = 'idx_t1'::regclass; indcheckxmin -------------- t (1 row)
So the question is, under what circumstances will the attribute of the index be set to true when creating an index?
From the explanation of this field in the previous official document, if the table contains broken HOT chains, it will be true. What is broken HOT chains? It seems to be related to the HOT mechanism. Is it true only when broken HOT chains exist?
We won't sell the key here. We'll give a conclusion directly, and then we'll verify it one by one.
After testing, it is found that the following two conditions will cause the indcheckxmin of the index to be set to true:
- There are broken HOT chains on the table in the current transaction, which is stated in the official document;
- When old_ snapshot_ When threshold is set.
Scenario 1: broken HOT chains
This situation exists as long as there are HOT updated rows in the table in the current transaction. So when will the HOT update take place? Two premises:
The new tuple and the old tuple must be in the same page;
Index fields cannot be updated.
In this case, there are two common situations in practice:
- Update the last page on the table;
- fillfactor is set in the table, that is, there is reserved free space on each page.
example:
Insert 10 pieces of data into the table, and naturally there is only one page:
bill=# insert into t1 select generate_series(1,10),md5(random()::text); INSERT 0 10
Update:
bill=# update t1 set info = 'bill' where id = 10; UPDATE 1
Check and find that it is indeed a HOT update:
About t_ The explanation of infomask2 field will not be repeated here.
Next, we create an index:
It can be found that indcheckxmin is set to true, and the index is not available in the current transaction.
After verification, in index_ In the build phase, if the BrokenHotChain is judged, the indcheckxmin is modified to true.
The specific modification code is as follows:
/*Indexinfo - > II_ Brokenhotchain has been modified to true */ if ((indexInfo->ii_BrokenHotChain || EarlyPruningEnabled(heapRelation)) && !isreindex && !indexInfo->ii_Concurrent) { Oid indexId = RelationGetRelid(indexRelation); Relation pg_index; HeapTuple indexTuple; Form_pg_index indexForm; pg_index = table_open(IndexRelationId, RowExclusiveLock); indexTuple = SearchSysCacheCopy1(INDEXRELID, ObjectIdGetDatum(indexId)); if (!HeapTupleIsValid(indexTuple)) elog(ERROR, "cache lookup failed for index %u", indexId); indexForm = (Form_pg_index) GETSTRUCT(indexTuple); /* If it's a new index, indcheckxmin shouldn't be set ... */ Assert(!indexForm->indcheckxmin); /*Modify indcheckxmin to true */ indexForm->indcheckxmin = true; CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple); heap_freetuple(indexTuple); table_close(pg_index, RowExclusiveLock); }
Similarly, we can verify that indcheckxmin is set to true because of broken hotwins.
Scenario 2: old_snapshot_threshold
Let's start with an example:
In the simplest scenario, for a completely empty table, create an index in the transaction, and indcheckxmin will be set to true. Sure enough, the index is also unavailable.
bill=# drop table t1; DROP TABLE bill=# create table t1(id int,info text); CREATE TABLE bill=# begin; BEGIN bill=*# create index idx_t1 on t1(id); CREATE INDEX bill=*# select indcheckxmin from pg_index where indexrelid = 'idx_t1'::regclass; indcheckxmin -------------- t (1 row) bill=*# explain select * from t1 where id = 1; QUERY PLAN ---------------------------------------------------- Seq Scan on t1 (cost=0.00..25.88 rows=6 width=36) Filter: (id = 1) (2 rows)
So why old_ snapshot_ Will threshold have such an impact?
After tracking, it is found that when this parameter is enabled, the snapshot data structure of creating an index in the transaction is as follows:
(SnapshotData) $6 = { snapshot_type = SNAPSHOT_MVCC xmin = 856 xmax = 856 xip = 0x00007fd55c804fc0 xcnt = 0 subxip = 0x00007fd55ad5d000 subxcnt = 0 suboverflowed = false takenDuringRecovery = false copied = false curcid = 1 speculativeToken = 0 vistest = NULL active_count = 0 regd_count = 0 ph_node = { first_child = NULL next_sibling = NULL prev_or_parent = NULL } whenTaken = 691752041261069 lsn = 208079736 }
And disable this parameter?
(SnapshotData) $7 = { snapshot_type = SNAPSHOT_MVCC xmin = 828 xmax = 828 xip = 0x00007fad31704780 xcnt = 0 subxip = 0x00007fad3155d000 subxcnt = 0 suboverflowed = false takenDuringRecovery = false copied = false curcid = 1 speculativeToken = 0 active_count = 0 regd_count = 0 ph_node = { first_child = NULL next_sibling = NULL prev_or_parent = NULL } whenTaken = 0 lsn = 0 }
You can see that the difference is that when you do not use this parameter, when you create a snapshot data, you will not set whenTaken and lsn. What are these two parameters?
Let's take a look at the structure of snapshot data:
typedef struct SnapshotData { SnapshotType snapshot_type; /* type of snapshot */ /* * The remaining fields are used only for MVCC snapshots, and are normally * just zeroes in special snapshots. (But xmin and xmax are used * specially by HeapTupleSatisfiesDirty, and xmin is used specially by * HeapTupleSatisfiesNonVacuumable.) * * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see * the effects of all older XIDs except those listed in the snapshot. xmin * is stored as an optimization to avoid needing to search the XID arrays * for most tuples. */ TransactionId xmin; /* all XID < xmin are visible to me */ TransactionId xmax; /* all XID >= xmax are invisible to me */ /* * For normal MVCC snapshot this contains the all xact IDs that are in * progress, unless the snapshot was taken during recovery in which case * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e. * it contains *committed* transactions between xmin and xmax. * * note: all ids in xip[] satisfy xmin <= xip[i] < xmax */ TransactionId *xip; uint32 xcnt; /* # of xact ids in xip[] */ /* * For non-historic MVCC snapshots, this contains subxact IDs that are in * progress (and other transactions that are in progress if taken during * recovery). For historic snapshot it contains *all* xids assigned to the * replayed transaction, including the toplevel xid. * * note: all ids in subxip[] are >= xmin, but we don't bother filtering * out any that are >= xmax */ TransactionId *subxip; int32 subxcnt; /* # of xact ids in subxip[] */ bool suboverflowed; /* has the subxip array overflowed? */ bool takenDuringRecovery; /* recovery-shaped snapshot? */ bool copied; /* false if it's a static snapshot */ CommandId curcid; /* in my xact, CID < curcid are visible */ /* * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC * snapshots. */ uint32 speculativeToken; /* * For SNAPSHOT_NON_VACUUMABLE (and hopefully more in the future) this is * used to determine whether row could be vacuumed. */ struct GlobalVisState *vistest; /* * Book-keeping information, used by the snapshot manager */ uint32 active_count; /* refcount on ActiveSnapshot stack */ uint32 regd_count; /* refcount on RegisteredSnapshots */ pairingheap_node ph_node; /* link in the RegisteredSnapshots heap */ TimestampTz whenTaken; /* timestamp when snapshot was taken */ XLogRecPtr lsn; /* position in the WAL stream when taken */ /* * The transaction completion count at the time GetSnapshotData() built * this snapshot. Allows to avoid re-computing static snapshots when no * transactions completed since the last GetSnapshotData(). */ uint64 snapXactCompletionCount; } SnapshotData;
As shown above, TimestampTz indicates when the snapshot is generated and why old is enabled_ snapshot_ Will this value be set at threshold?
Because this value is used to judge whether the snapshot is too old:
/* * Implement slower/larger portions of TestForOldSnapshot * * Smaller/faster portions are put inline, but the entire set of logic is too * big for that. */ void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) { if (RelationAllowsEarlyPruning(relation) && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) ereport(ERROR, (errcode(ERRCODE_SNAPSHOT_TOO_OLD), errmsg("snapshot too old"))); }
In this way, we can better understand why the index created when this parameter is set is not available in the current transaction:
When we do not set this parameter, creating an index in a transaction can ensure the consistency of MVCC, so the index is safe and available.
When using the parameter, because TimestampTz is set, the database will judge whether the row data has expired. If it has expired, it will be cleaned up. This is unsafe for the index and cannot ensure the data consistency. For indexes that are not hot safe, it is natural to set indcheckxmin to true, Prevent the data from actually expired and deleted after the index is created in the transaction.
/* * At this moment we are sure that there are no transactions with the * table open for write that don't have this new index in their list of * indexes. We have waited out all the existing transactions and any new * transaction will have the new index in its list, but the index is still * marked as "not-ready-for-inserts". The index is consulted while * deciding HOT-safety though. This arrangement ensures that no new HOT * chains can be created where the new tuple and the old tuple in the * chain have different index keys. * * We now take a new snapshot, and build the index using all tuples that * are visible in this snapshot. We can be sure that any HOT updates to * these tuples will be compatible with the index, since any updates made * by transactions that didn't know about the index are now committed or * rolled back. Thus, each visible tuple is either the end of its * HOT-chain or the extension of the chain is HOT-safe for this index. */
Summary:
When PG_ The indcheckxmin field of index is set to true until this PG_ Queries cannot use this index until the xmin of the index row is below the TransactionXmin horizon of the query.
There are two main situations for this phenomenon:
1. broken HOT chains exist in the current transaction on the table;
2. old_ snapshot_ When threshold is set.