List of data structures of konjaku
If there is a list of the embodiment of human great wisdom in the world, the data structure must have a place in it.
This article is just a summary of the author's study of data structure. Because most of them have learned it, they don't explain it in detail. Only a few points that can slightly increase understanding will be left.
array
The most important, basic and complex data structure will not be repeated
Stack:
First in, then out. There are ready-made packages in the c + + STL library. There's nothing to say
std::stack<type> st; //Type indicates the data type stored in the stack st.pop() st.push() st.size() st.empty()
queue
First in, first out, ready-made packaging, nothing to say
std::queue<type> que; //Type indicates the data type stored in the queue que.size() que.front() que.push() que.pop()
Priority queue
A maximum (small or custom) heap. The top of the heap is the maximum, and the heap is disordered
std::priority_queue<type> que; que.push() que.pop() que.top() que.size() que.empty()
Irreducible set
Query whether a data is in the set, the internal of the set is orderly and the data is unique.
std::set<type> st; st.insert() st.clear() st.find()//If true, the iterator of this position will be returned; if false, st.end() will be returned st.size()
Linked list
c + + has sub packaging, but the actual situation is that the linked list is not used much. Most of the topics are on the array simulation linked list, which leads to the embarrassing status of the linked list
std::list<type> lst; //Type indicates the data type stored in the linked list lst.size() lst.begin() lst.end() lst.insert(iter,key) lst.clear()
Binary heap (red black tree implementation)
A balanced binary search tree can quickly and efficiently maintain complex functions such as the maximum value, minimum value, ranking of each number and number of each ranking in a sequence.
It is not easy to implement here, but because there is a ready-made library pbds (commonly known as flat panel TV) that can be called
#include "bits/extc++.h" using namespace __gnu_pbds; //Note the header function and namespace __gnu_pbds::tree<type,null_type,less<type>,rb_tree_tag,tree_order_statistics_node_update> tr; // Storage type regardless of sorting method red black tree (regardless) update method (regardless) //insert erase delete // order_of_key for ranking, starting from 0 // find_by_order find the number k, starting from 0 // tr1.join(tr2) merges two red and black trees with the same type and no duplication // tr1.split(v,tr2) splits a red black tree. Elements less than or equal to v remain in Tr1, while others belong to tr2 // lower_bound(x) queries the number of that is greater than or equal to X // upper_bound(x) query number greater than x
Hash table
A hash function is designed to map a large range or complex data to a simpler location. For example, a long string is mapped to a number through string hash to facilitate key value pair query
When mapping, the data will be distorted and mapping conflicts will occur, so it is basically unnecessary or not recommended
String hashes are used when matching strings
//Prehistoric Shishan was not maintained because it was not used //Double hash of the string that sets the weight for each bit ll data1[MAXLEN] = {0},data2[MAXLEN]={0}; ll PowP1[MAXLEN] = {1},PowP2[MAXLEN]={1}; struct Hash { const ll p1 = 131,p2 = 131; const ll mod1 = 998244353,mod2 = 1e4 +7; void init() { for (int i = 1; i < MAXLEN; i++) PowP1[i] = (PowP1[i - 1] * p1) % mod1, PowP2[i] = (PowP2[i - 1] * p2) % mod2; } void CreatHash(string s) { int len = s.size(); data1[0] = data2[0] = s[0]; for (int i = 1; i < len; i++) data1[i] = (data1[i - 1] * p1 % mod1 + s[i]) % mod1, data2[i] = (data2[i - 1] * p2 % mod2 + s[i]) % mod2; } int substr1(int x,int y) { if (x == 0) return data1[y]; return ((data1[y] - (data1[x - 1] * PowP1[y - x + 1]) % mod1) + mod1) % mod1; } int substr2(int x,int y) { if (x == 0) return data2[y]; return ((data2[y] - (data2[x - 1] * PowP2[y - x + 1]) % mod2) + mod2) % mod2; } }Hash;
Some notes about trees
A tree can be regarded as a connectivity graph without rings. A tree with n nodes has n-1 edges
Generally speaking, in order to facilitate data query, the content of data structure basically does not appear rootless tree
For a tree with roots,
For interval information maintenance, recursive query is often used;
If it is a single point query, pointer variables are often established, and the root is queried downward in a recursive manner;
The segment tree best reflects this point. Recursive query is adopted in the ordinary segment tree of query interval and, while recursive query is adopted in the weighted segment tree of single point counting
Dictionary tree
Dictionary tree, a multi fork tree, is used to store a series of strings, and can complete the query of this string set including and more than the following contents
1. Whether a string appears in the string set several times
2. Whether a string appears in the string set as a prefix several times
const int CHAR_NUM = 200; const int MAXN = 50; //Number of strings in the dictionary const int MAXM = 1010; //Maximum length of string const int NUM = MAXN * MAXM; //Space = number * length struct Trie{ int data[NUM][CHAR_NUM]; int cnt=0; bool val[NUM];//The maintenance of various information is related to this variable void add(string s){ int p=0,len=s.size(); for(int i=0;i<len;i++){ if(!data[p][s[i]]) data[p][s[i]]=++cnt; p=data[p][s[i]]; } val[p]=true; } bool query(string s){ int p,len=s.size(); for(int i=0;i<len;i++){ p=data[p][s[i]]; if(!p)return false; } return val[p]; } }trie;
Dictionary tree - > 01 dictionary tree
The dictionary tree maintains a string, and the 01 dictionary tree maintains some numbers.
The number can be converted into a string of warranty letter 0 and 1 in binary mode (pay attention to length complement), and then these data can be queried similarly.
Because of the particularity of 01 string, 01 dictionary tree is often used for bitwise and / or operations between number sets
The XOR Largest Pair.
The content remains unchanged, and the string CHAR_NUM Only 2 Just convert the number into an equal length string
Joint search set
It is used for data sets and supports merging between sets.
The data of the collection is recorded on the root of the collection. By adding variables, you can maintain queries including but not limited to the following contents:
1. Whether two data belong to the same set
food chain
2. Size of a set
3. The position of a data in its set
Ginga Eiyudensetsu
wait
There are many patterns of parallel search set, but the path compression is simple and the parallel search set code is surprisingly short
int fa[maxn]; void init(int n){ for(int i = 1 ;i <= n; i++) fa[i] = i; } int find(int x){ return fa[x] = (fa[x] == x? x:find(fa[x])); } void merge(int x,int y){ fa[find(x)] = find(y); }
Joint query - > revocable joint query
Revocable join query sets are simple derivatives of join query sets.
The objectives of revocable and query set are:
Complete the basic functions of and query set
It can return the status of the consolidated query set to the k-th version, and delete all the consolidation results after the k-th version
The basic idea of revocable query set is:
Record a fallback stack. When you want to fallback to version k, reverse the changes after version k in the stack.
Because the version information of the parent node will be lost after path compression, path compression is not performed, and rank merging is adopted
For example, fa[1] = 2,fa[2] = 2,fa[3] = 3;
Now merge 2,3 sets, fa[2] = 3; Then query the root node of point 1, and fa[1] will point to 3 after path compression
Then I ask to undo merge 23. In limited operations, fa[2] can only be restored, but fa[1] still points to 3
The basic idea of merging by rank: merge less information into more information, and change as little information as possible when backing back to the previous version
Note: in some offline topics, you can sort the input first, and then use revocable search set instead of persistent search set
int fa[maxn],sz[maxn];//Generally, set size is regarded as rank stack<pair<int,int>> st; // Version number, modification location void init(int n){ for(int i = 1 ;i <= n; i++) fa[i] = i,sz[i] = 1; } int find(int x){ return fa[x] == x? x:find(fa[x]); } void merge(int x,int y,int id){ x = find(x) ; y = find(y) ; if(x == y)return ; if(sz[x]>sz[y])swap(x,y); fa[x] = y; sz[y] += sz[x];//Merge x into y st.push({id,x}); } void cancel(int id){ while(st.top().first > id){ int p = st.top().second(); st.pop(); sz[fa[p]] -=sz[p]; fa[p] = p; } }
Joint query set - > delete point joint query set
It is simple to expand the query set. When a point is deleted, the merged set due to the point is broken.
In order to preserve the parent node information, the rank merging method is still used.
If a point is deleted, connect the parent node of the point to a non-existent point (if the subscript starts with 1, then 0 is a good choice)
Later, when querying the parent node at each point, if fa[fa[x]] == 0, it means that the parent node has been deleted. At this time, update fa[x]=[x], and the two merged sets can be disconnected
Tree array
Tree array is a data structure used to maintain interval information and support single point modification. It uses the idea of doubling to maintain an interval. For example, 1-2 maintains 1-4 and 1-4 maintains 1-8. Later, data modification and query only need log(n) time
However, the tree array can do both line segments and trees, but, but, but
It's really short
lowbit wrote that a fairy thought of it
int d[maxn]; int lowbit(int x) {return x & (-x);} void edit(int x,int p){ while(p<maxn){ d[p] += x; p = p + lowbit(p); } } int pre(int p){ int sum = 0; while(p>=0){ sum+=d[p]; p -= lowbit(p); } return sum; }
Segment tree
The way humans play with segment trees is as flashy as the way humans play with dijestra.
The segment tree mainly maintains the query of interval information.
If the two interval information can be combined, we can get the large interval information through the information between the two cells when querying the large interval. Therefore, we only need to preprocess the inter cell, and we can skip the inter cell calculation in the subsequent query to reduce the time complexity
However, in fact, most interval information can be obtained by merging, such as
Interval sum, interval maximum, interval coverage, interval mode, etc
The function of delay marking is that the modification range covers the whole interval. In order to reduce the time complexity, the modifications to the next level are reserved for the time being. If the modification is a single point modification, there is no need to delay the mark. What is the delay after the modification.
When cells are merged into a large area, some information will be lost. At this time, you need to use push up to update the upper node during decentralization.
The simplest example is the scan line (shown below)
//Segment tree maintenance interval and struct SEG{ int l,r,tag; int data;// }nod[MAXN << 4]; int val[MAXN<<4]; void build(int l,int r,int p){// Function entry build(l,r,1) nod[p] = {l,r,0,0}; int mid = (r+l) / 2 ,ls = p * 2,rs = p * 2 + 1; if(l == r) nod[p].data = val[l]; else { build(l,mid,ls) ; build(mid+1,r,rs); nod[p].data = nod[ls].data + nod[rs].data; } } void pushdown(int p){//The lazy flag indicates that the current location has been modified, and the following locations have not been modified int ls = p * 2,rs = p * 2 + 1 ; nod[ls].data += nod[p].tag * (nod[ls].r - nod[ls].l + 1); nod[rs].data += nod[p].tag * (nod[rs].r - nod[rs].l + 1); nod[ls].tag += nod[p].tag, nod[rs].tag += nod[p].tag; nod[p].tag = 0; } void edit(int l ,int r ,int num,int p){ //Function entry edit(l,r,num,1) int ls = p * 2 ,rs = p * 2 + 1; if (nod[p].tag)pushdown(p); //If the update method of the interval information is affected by the update of the self interval information, the delay flag must be placed first when modifying if (l <= nod[p].l && nod[p].r <= r) {//Complete, inclusive nod[p].data += num * (nod[p].r - nod[p].l + 1); nod[p].tag += num; return ; } if (l<= (nod[p].l+nod[p].r) / 2) edit(l,r,num,ls); if (r > (nod[p].l+nod[p].r) / 2) edit(l,r,num,rs); nod[p].data = nod[ls].data + nod[rs].data ; } int query(int l, int r, int p) {// Function entry query_sum(l,r,1) int ls = p * 2 , rs = p * 2 + 1; if (nod[p].tag)pushdown(p); if (l <= nod[p].l && nod[p].r <= r)//Full inclusion return nod[p].data; int ans = 0,mid = (nod[p].l+nod[p].r) / 2; if(l <= mid) ans += query(l,r,ls); if(r > mid) ans += query(l,r,rs); return ans; }
Discretization
Discretization is an effective method to reduce the spatial complexity after the increase and complexity of data.
The new data can be directly put into the array from the original data; The original data to the new data can only be put into the map, but the array can't be put into it
//Store the read data in num, and then directly change the value in num to the result of discretization //lsh: new data to original data mp original data to new data int lsh[MAXN],num[MAXN]; map<int,int> mp; void LSH(int n){ for(int i=1;i<=n;i++) lsh[i]=num[i]; sort(lsh+1,lsh+1+n); int cnt=unique(lsh+1,lsh+n+1)-lsh-1; for(int i=1;i<=n;i++){ num[i]=lower_bound(lsh+1,lsh+cnt+1,num[i])-lsh; mp.insert({lsh[i],i}); } }
Discretization + line segment tree - > scanline
Scanline template
The scan line is processed by the line segment tree. After reading in the data sorting, discretization and some messy processing, the biggest problems that the scan line needs to deal with are:
When the scan line moves to a certain position, there are n points, each point has a weight, each point may be covered, and the coverage is discontinuous. How to find the weight sum of covered points.
Therefore, two values need to be recorded on each node of the segment tree: tag and sum
tag indicates the number of times this interval has been completely covered. If there is a value at present, it does not need to be maintained down, and the length of the interval is directly output
Sum represents the length of the part covered by the current interval. If the interval is not completely covered, sum should be the sum of the coverage lengths of the two sub intervals
For the coverage of an interval, the tag is only meaningful at the corresponding leaf node. Uploading is meaningless and delegating is redundant.
For example, the tag of 2-3 indicates that 2-3 is covered, but 23 coverage does not mean that 1-3 is covered.
23 coverage is dropped to 22,33 coverage, but in fact, if we read the tag in 23, we will end the recursion, not downward recursion.
Generally, the scan lines appear in pairs, and not moving the tag position will greatly reduce the difficulty of maintenance.
//The core code of the above template questions void edit(int l,int r,int num,int p){ if(l > nod[p].r || r < nod[p].l) return ; if (l <= nod[p].l && nod[p].r <= r) {//Complete, inclusive nod[p].tag += num; if (nod[p].tag) nod[p].dat = lsh[nod[p].r + 1] - lsh[nod[p].l]; else nod[p].dat = (l == r?0:nod[p * 2].dat + nod[p * 2 + 1].dat); return; } int mid = (nod[p].l + nod[p].r) / 2; if( l <= mid )edit(l,r,num,p*2); if( r > mid )edit(l,r,num,p*2+1); if(!nod[p].tag)nod[p].dat = nod[p * 2].dat + nod[p * 2 + 1] .dat; }
Discretization + segment tree - > weight segment tree
If there are n numbers, the weight line segment tree can be used to maintain the k-th largest (smallest) number from 1 to m (m < = n).
Generally, it will cooperate with the chairman tree to find a number with the largest k in the interval. Here, let's talk about the weighted segment tree first
Firstly, n numbers are read in and discretized to obtain a discretized array. The original ranking of each number is saved in the discretized array.
Next, insert from left to right according to the order of the original array. The insertion position is the ranking of this number in the original array. At the same time, recursively make the weight of all its parent nodes + 1.
After m numbers are inserted, we get a segment tree with these characteristics:
Each node represents several numbers within this range.
The number represented by the left son of each node is smaller than that represented by the right son.
Next, just
1. Set a pointer variable to point to the root node
2. Query the size of the left son.
3. If the weight of the left son is greater than k, it means that the K largest is in the left son. Move the pointer left, otherwise move it right, and subtract the size of the left son from K
Repeat 2-3 until k is 0
At this time, the node pointed to by k indicates the ranking of the numbers to be queried in the original interval
int query(int k){ int p = 1; while(k){ if(nod[p * 2].dat >= k) p = p*2; else{ p = p * 2 + 1 ; k -= nod[p*2].dat; } } return nod[p].r; }
Block Point Branch off-line divide and conquer algorithm
QAQ
Sustainable thought
The above data structures maintain the current status of data, but sometimes the data that needs to be queried may be the data before modification. What should I do at this time?
Naturally, first open a root node to distinguish different versions, and then copy the unchanged data, or make the pointer still point to the previous data storage location, and save the new space to be changed.
Because the pointer pointing will change, some conventional pointing methods will not work, for example, ls = 2 * p of the line segment tree;
Persistence is an enchanting prop that makes all kinds of good data structures into hell
Persistent + dictionary tree = persistent dictionary tree
As mentioned above, for each inserted string, first open a root node, then copy the information of each possible character, and then open a new node with the newly inserted information.
I haven't seen many problems
//yy version, which has not been verified and deliberated, should be used with caution struct Trie{ int data[NUM][CHAR_NUM]; int cnt= 1; int id = 1; bool val[NUM]; int root[MAXN]; void add(string s){ int q = root[id] , p = root[++id] = ++cnt ,len=s.size(); for(int i=0;i<len;i++){ for(int j = 0;j<CHAR_NUM;j++) data[p][j] = data[q][j]; if(!data[p][s[i]]) data[p][s[i]]=++cnt; p = data[p][s[i]]; q = data[q][s[i]]; } val[p]=true; } bool query(string s,int k){ int p = root[k],len=s.size(); for(int i=0;i<len;i++){ p=data[p][s[i]]; if(!p)return false; } return val[p]; } }trie;
Persistent + segment tree - > persistent segment tree
Also known as the chairman tree
The basic idea is similar. Open the root node, copy the information, and re open the node for different information
Because multiple versions are stored, the next node cannot be determined simply by multiplying 2
As for the query method, it depends on whether the tree function uses interval maintenance or single point query
int root[maxn],val[maxn]; int cnt = 0; struct SEG{ int l,r; int dat; int ls,rs; }nod[maxn * 40]; void build(int l,int r ,int p){ nod[p] = { l,r,0,0,0 }; if(l == r){ //nod[p].dat = val[p]; It depends on whether the subject needs initial value or not nod[p].rs = nod[p].ls = -1; return ; } int mid = (l+r) /2; build(l ,mid ,nod[p].ls = ++cnt); build(mid + 1,r,nod[p].rs = ++cnt); nod[p].dat = nod[nod[p].ls].dat + nod[nod[p].rs].dat; } void edit(int l ,int r ,int num ,int x ,int p){ nod[p] = nod [x]; if (nod[p].l == nod[p].r) { //There are differences between interval modification and single point modification nod[p].dat += num * (nod[p].r - nod[p].l + 1); return ; } if (l<= (nod[p].l+nod[p].r) / 2) edit(l,r,num,nod[x].ls,nod[p].ls = ++ cnt); if (r > (nod[p].l+nod[p].r) / 2) edit(l,r,num,nod[x].rs,nod[p].rs = ++ cnt); nod[p].dat = nod[nod[p].ls].dat + nod[nod[p].rs].dat ; }
Persistent segment tree - > persistent array
Although it is called an array, it is actually implemented by the chairman tree. In fact, as long as the operations of the chairman tree are single point operations, it becomes an array that can record versions
Because it is only a single point query, nodes can delete a lot of redundant information
#define maxn 1000050 int val[maxn] ,root[maxn]; struct SEG{ int l,r; int ls,rs; int dat; }nod[maxn * 40]; int id = 0; void build(int l,int r,int p){//Program entry build (1, N, root [0] = + ID); nod[p] = {l,r,-1,-1,0}; if(l == r) { nod[p].dat = val[r]; return ; } int mid = (l + r) / 2 ; build(l,mid,nod[p].ls = ++id); build(mid + 1, r,nod[p].rs = ++ id); } void edit(int d ,int num,int x,int p){ //Program entry edit(l,r,num,root [version id],root [current id]) nod[p] = nod[x]; if(nod[p].l == nod[p].r){ nod[p].dat = num; return ; } if(d <= (nod[p].l + nod[p].r) /2) edit(l,r,nod[x].ls,num, nod[p].ls = ++ id) else edit(l,r,nod[x].rs,num, nod[p].rs = ++ id); } int query(int d,int p){ if(nod[p].l == nod[p].r) return nod[p].dat; if(d<=(nod[p].l + nod[p].r)/2) return query(d,nod[p].ls); else return query(d,nod[p].rs); }
Persistent array - > persistent and query set
To put it bluntly, what is useful is a fa array. The array uses a persistent array, and the parallel search set becomes a persistent parallel search set
find and merge all have version numbers, but they're gone
Chairman tree + tree array - > Dynamic chairman tree
QAQ
ST table
Multiply for maximum or minimum
//ST table is used to calculate the interval extreme value, and the following code is used to calculate the interval maximum value //The time of establishing ST table is O (nlogn), and the query time is O(1) //The data cannot be changed. If the data is changed, the table must be rebuilt #define MAXN 1000050 int input[MAXN]={0}; int st[30][MAXN]={0}; //Store the square of 2 for easy calling int bin[30]={1}; //st[i][j] refers to the maximum value from data[j] to data[j+2^i-1], and its length is 2^i // j->j+2^i-1 == max{j->j+2^(i-1)-1 , j+2^(i-1) + 2^(i-1) -1) //That is, st[i][j]=max{st[i-1][j],st[i-1][j+2^(i-1)]} void creat_ST(int n){ int N=(int)log2(n); for(int i=1;i<=N;i++) bin[i]=bin[i-1]*2; for(int i=1;i<=n;i++) st[0][i]=input[i]; for(int i=1;i<=N;i++){ for(int j=1;j<=n;j++){ st[i][j]=(j+bin[i-1]<=n?max(st[i-1][j],st[i-1][j+bin[i-1]]):st[i-1][j]); //Maintained } } return ; } //Find the maximum value of the subscript from start to end int using_ST(int start,int end){ int N=(int)log2(end-start+1); return max(st[N][start],st[N][end-bin[N]+1]); }
Incomplete to be updated