List of data structures of konjaku

List of data structures of konjaku

If there is a list of the embodiment of human great wisdom in the world, the data structure must have a place in it.

This article is just a summary of the author's study of data structure. Because most of them have learned it, they don't explain it in detail. Only a few points that can slightly increase understanding will be left.

array

The most important, basic and complex data structure will not be repeated

Stack:

First in, then out. There are ready-made packages in the c + + STL library. There's nothing to say

std::stack<type> st;
//Type indicates the data type stored in the stack
st.pop()
st.push()
st.size()
st.empty()

queue

First in, first out, ready-made packaging, nothing to say

std::queue<type> que;
//Type indicates the data type stored in the queue
que.size()
que.front()
que.push()
que.pop()

Priority queue

A maximum (small or custom) heap. The top of the heap is the maximum, and the heap is disordered

std::priority_queue<type> que;
que.push()
que.pop()
que.top()
que.size()
que.empty()

Irreducible set

Query whether a data is in the set, the internal of the set is orderly and the data is unique.

std::set<type> st;
st.insert()
st.clear()
st.find()//If true, the iterator of this position will be returned; if false, st.end() will be returned
st.size()

Linked list

c + + has sub packaging, but the actual situation is that the linked list is not used much. Most of the topics are on the array simulation linked list, which leads to the embarrassing status of the linked list

std::list<type> lst;
//Type indicates the data type stored in the linked list
lst.size()
lst.begin()
lst.end()
lst.insert(iter,key)
lst.clear()

Binary heap (red black tree implementation)

A balanced binary search tree can quickly and efficiently maintain complex functions such as the maximum value, minimum value, ranking of each number and number of each ranking in a sequence.
It is not easy to implement here, but because there is a ready-made library pbds (commonly known as flat panel TV) that can be called

#include "bits/extc++.h"
using namespace __gnu_pbds;
//Note the header function and namespace

__gnu_pbds::tree<type,null_type,less<type>,rb_tree_tag,tree_order_statistics_node_update> tr;
// Storage type regardless of sorting method red black tree (regardless) update method (regardless)
//insert erase delete
// order_of_key for ranking, starting from 0
// find_by_order find the number k, starting from 0
// tr1.join(tr2) merges two red and black trees with the same type and no duplication
// tr1.split(v,tr2) splits a red black tree. Elements less than or equal to v remain in Tr1, while others belong to tr2
// lower_bound(x) queries the number of that is greater than or equal to X
// upper_bound(x) query number greater than x

Hash table

A hash function is designed to map a large range or complex data to a simpler location. For example, a long string is mapped to a number through string hash to facilitate key value pair query
When mapping, the data will be distorted and mapping conflicts will occur, so it is basically unnecessary or not recommended
String hashes are used when matching strings

//Prehistoric Shishan was not maintained because it was not used
//Double hash of the string that sets the weight for each bit
ll   data1[MAXLEN] = {0},data2[MAXLEN]={0};
ll   PowP1[MAXLEN] = {1},PowP2[MAXLEN]={1};

struct Hash {
    const ll  p1 = 131,p2 = 131;
    const ll  mod1 = 998244353,mod2 = 1e4 +7;
    void init() {
        for (int i = 1; i < MAXLEN; i++)
            PowP1[i] = (PowP1[i - 1] * p1) % mod1,
                    PowP2[i] = (PowP2[i - 1] * p2) % mod2;
    }
    void CreatHash(string s) {
        int len = s.size();
        data1[0] = data2[0] = s[0];
        for (int i = 1; i < len; i++)
            data1[i] = (data1[i - 1] * p1 % mod1 + s[i]) % mod1,
                    data2[i] = (data2[i - 1] * p2 % mod2 + s[i]) % mod2;
    }
    int substr1(int x,int y) {
        if (x == 0)
            return data1[y];
        return ((data1[y] - (data1[x - 1] * PowP1[y - x + 1]) % mod1) + mod1) % mod1;
    }
    int   substr2(int x,int y) {
        if (x == 0)
            return data2[y];
        return ((data2[y] - (data2[x - 1] * PowP2[y - x + 1]) % mod2) + mod2) % mod2;
    }
}Hash;

Some notes about trees

A tree can be regarded as a connectivity graph without rings. A tree with n nodes has n-1 edges
Generally speaking, in order to facilitate data query, the content of data structure basically does not appear rootless tree

For a tree with roots,
For interval information maintenance, recursive query is often used;
If it is a single point query, pointer variables are often established, and the root is queried downward in a recursive manner;
The segment tree best reflects this point. Recursive query is adopted in the ordinary segment tree of query interval and, while recursive query is adopted in the weighted segment tree of single point counting

Dictionary tree

Dictionary tree, a multi fork tree, is used to store a series of strings, and can complete the query of this string set including and more than the following contents
1. Whether a string appears in the string set several times
2. Whether a string appears in the string set as a prefix several times

const int CHAR_NUM = 200;
const int MAXN = 50;	//Number of strings in the dictionary
const int MAXM = 1010;	//Maximum length of string
const int NUM = MAXN * MAXM;	//Space = number * length

struct Trie{
    int data[NUM][CHAR_NUM];
    int cnt=0;
    bool val[NUM];//The maintenance of various information is related to this variable
    void add(string s){
        int p=0,len=s.size();
        for(int i=0;i<len;i++){
            if(!data[p][s[i]])
                data[p][s[i]]=++cnt;
            p=data[p][s[i]];
        }
       val[p]=true;
    }
    bool query(string s){
        int p,len=s.size();
        for(int i=0;i<len;i++){
            p=data[p][s[i]];
            if(!p)return false;
        }
        return val[p];
    }
}trie;

Dictionary tree - > 01 dictionary tree

The dictionary tree maintains a string, and the 01 dictionary tree maintains some numbers.
The number can be converted into a string of warranty letter 0 and 1 in binary mode (pay attention to length complement), and then these data can be queried similarly.

Because of the particularity of 01 string, 01 dictionary tree is often used for bitwise and / or operations between number sets
The XOR Largest Pair.

The content remains unchanged, and the string CHAR_NUM Only 2
 Just convert the number into an equal length string

Joint search set

It is used for data sets and supports merging between sets.
The data of the collection is recorded on the root of the collection. By adding variables, you can maintain queries including but not limited to the following contents:
1. Whether two data belong to the same set
food chain
2. Size of a set
3. The position of a data in its set
Ginga Eiyudensetsu
wait

There are many patterns of parallel search set, but the path compression is simple and the parallel search set code is surprisingly short

int fa[maxn];
void init(int n){
	for(int i = 1 ;i <= n; i++)
		fa[i] = i;
}
int find(int x){
    return fa[x] = (fa[x] == x? x:find(fa[x]));
}
void merge(int x,int y){
    fa[find(x)] = find(y);
}

Joint query - > revocable joint query

Revocable join query sets are simple derivatives of join query sets.

The objectives of revocable and query set are:
Complete the basic functions of and query set
It can return the status of the consolidated query set to the k-th version, and delete all the consolidation results after the k-th version

The basic idea of revocable query set is:
Record a fallback stack. When you want to fallback to version k, reverse the changes after version k in the stack.

Because the version information of the parent node will be lost after path compression, path compression is not performed, and rank merging is adopted
For example, fa[1] = 2,fa[2] = 2,fa[3] = 3;
Now merge 2,3 sets, fa[2] = 3; Then query the root node of point 1, and fa[1] will point to 3 after path compression
Then I ask to undo merge 23. In limited operations, fa[2] can only be restored, but fa[1] still points to 3

The basic idea of merging by rank: merge less information into more information, and change as little information as possible when backing back to the previous version

Note: in some offline topics, you can sort the input first, and then use revocable search set instead of persistent search set

int fa[maxn],sz[maxn];//Generally, set size is regarded as rank
stack<pair<int,int>> st; // Version number, modification location
void init(int n){
	for(int i = 1 ;i <= n; i++)
		fa[i] = i,sz[i] = 1;
}
int find(int x){
    return fa[x] == x? x:find(fa[x]);
}
void merge(int x,int y,int id){
	x = find(x) ; y = find(y) ; 
	if(x == y)return ;
    if(sz[x]>sz[y])swap(x,y);
    fa[x] = y; sz[y] += sz[x];//Merge x into y
    st.push({id,x});
}
void cancel(int id){
	while(st.top().first > id){
		int p = st.top().second();
		st.pop();
		sz[fa[p]] -=sz[p];
		fa[p] = p;
	}
}

Joint query set - > delete point joint query set

It is simple to expand the query set. When a point is deleted, the merged set due to the point is broken.
In order to preserve the parent node information, the rank merging method is still used.
If a point is deleted, connect the parent node of the point to a non-existent point (if the subscript starts with 1, then 0 is a good choice)
Later, when querying the parent node at each point, if fa[fa[x]] == 0, it means that the parent node has been deleted. At this time, update fa[x]=[x], and the two merged sets can be disconnected

Tree array

Tree array is a data structure used to maintain interval information and support single point modification. It uses the idea of doubling to maintain an interval. For example, 1-2 maintains 1-4 and 1-4 maintains 1-8. Later, data modification and query only need log(n) time
However, the tree array can do both line segments and trees, but, but, but
It's really short
lowbit wrote that a fairy thought of it

int d[maxn];
int lowbit(int x) {return x & (-x);}
void edit(int x,int p){
    while(p<maxn){
        d[p] += x;
        p = p + lowbit(p);
    }
}
int pre(int p){
    int sum = 0;
    while(p>=0){
        sum+=d[p];
        p -= lowbit(p);
    }
    return sum;
}

Segment tree

The way humans play with segment trees is as flashy as the way humans play with dijestra.
The segment tree mainly maintains the query of interval information.
If the two interval information can be combined, we can get the large interval information through the information between the two cells when querying the large interval. Therefore, we only need to preprocess the inter cell, and we can skip the inter cell calculation in the subsequent query to reduce the time complexity

However, in fact, most interval information can be obtained by merging, such as
Interval sum, interval maximum, interval coverage, interval mode, etc

The function of delay marking is that the modification range covers the whole interval. In order to reduce the time complexity, the modifications to the next level are reserved for the time being. If the modification is a single point modification, there is no need to delay the mark. What is the delay after the modification.

When cells are merged into a large area, some information will be lost. At this time, you need to use push up to update the upper node during decentralization.
The simplest example is the scan line (shown below)

//Segment tree maintenance interval and

struct SEG{
    int l,r,tag;
    int data;//
}nod[MAXN << 4];
int val[MAXN<<4];

void build(int l,int r,int p){// Function entry build(l,r,1)
    nod[p] = {l,r,0,0};
    int mid = (r+l) / 2 ,ls = p * 2,rs = p * 2 + 1;
    if(l == r)
        nod[p].data = val[l];
    else {
        build(l,mid,ls) ;
        build(mid+1,r,rs);
        nod[p].data = nod[ls].data + nod[rs].data;
    }
}

void pushdown(int p){//The lazy flag indicates that the current location has been modified, and the following locations have not been modified
    int ls = p * 2,rs = p * 2 + 1 ;
    nod[ls].data += nod[p].tag * (nod[ls].r - nod[ls].l + 1);
    nod[rs].data += nod[p].tag * (nod[rs].r - nod[rs].l + 1);
    nod[ls].tag += nod[p].tag, nod[rs].tag += nod[p].tag;
    nod[p].tag = 0;
}

void edit(int l ,int r ,int num,int p){ //Function entry edit(l,r,num,1)
    int ls = p * 2 ,rs = p * 2 + 1;
    if (nod[p].tag)pushdown(p);
    //If the update method of the interval information is affected by the update of the self interval information, the delay flag must be placed first when modifying
    if (l <= nod[p].l && nod[p].r <= r) {//Complete, inclusive
        nod[p].data += num * (nod[p].r - nod[p].l + 1);
        nod[p].tag  += num;
        return ;
    }
    if (l<= (nod[p].l+nod[p].r) / 2) edit(l,r,num,ls);
    if (r > (nod[p].l+nod[p].r) / 2) edit(l,r,num,rs);
    nod[p].data = nod[ls].data + nod[rs].data ;
}

int query(int l, int r, int p) {// Function entry query_sum(l,r,1)
    int ls = p * 2 , rs = p * 2 + 1; 
    if (nod[p].tag)pushdown(p);
    if (l <= nod[p].l && nod[p].r <= r)//Full inclusion
        return nod[p].data;
    int ans = 0,mid = (nod[p].l+nod[p].r) / 2;
    if(l <= mid) ans += query(l,r,ls);
    if(r >  mid) ans += query(l,r,rs);
    return ans;
}

Discretization

Discretization is an effective method to reduce the spatial complexity after the increase and complexity of data.
The new data can be directly put into the array from the original data; The original data to the new data can only be put into the map, but the array can't be put into it

//Store the read data in num, and then directly change the value in num to the result of discretization
//lsh: new data to original data mp original data to new data

int lsh[MAXN],num[MAXN];
map<int,int> mp;
void LSH(int n){
    for(int i=1;i<=n;i++)
        lsh[i]=num[i];
    sort(lsh+1,lsh+1+n);
    int cnt=unique(lsh+1,lsh+n+1)-lsh-1;
    for(int i=1;i<=n;i++){
     num[i]=lower_bound(lsh+1,lsh+cnt+1,num[i])-lsh;
     mp.insert({lsh[i],i});
    }
}

Discretization + line segment tree - > scanline

Scanline template
The scan line is processed by the line segment tree. After reading in the data sorting, discretization and some messy processing, the biggest problems that the scan line needs to deal with are:
When the scan line moves to a certain position, there are n points, each point has a weight, each point may be covered, and the coverage is discontinuous. How to find the weight sum of covered points.

Therefore, two values need to be recorded on each node of the segment tree: tag and sum
tag indicates the number of times this interval has been completely covered. If there is a value at present, it does not need to be maintained down, and the length of the interval is directly output
Sum represents the length of the part covered by the current interval. If the interval is not completely covered, sum should be the sum of the coverage lengths of the two sub intervals

For the coverage of an interval, the tag is only meaningful at the corresponding leaf node. Uploading is meaningless and delegating is redundant.
For example, the tag of 2-3 indicates that 2-3 is covered, but 23 coverage does not mean that 1-3 is covered.
23 coverage is dropped to 22,33 coverage, but in fact, if we read the tag in 23, we will end the recursion, not downward recursion.

Generally, the scan lines appear in pairs, and not moving the tag position will greatly reduce the difficulty of maintenance.

//The core code of the above template questions
void edit(int l,int r,int num,int p){
    if(l > nod[p].r || r < nod[p].l)
        return ;
    if (l <= nod[p].l && nod[p].r <= r) {//Complete, inclusive
        nod[p].tag += num;
        if (nod[p].tag)
            nod[p].dat = lsh[nod[p].r + 1] - lsh[nod[p].l];
        else
            nod[p].dat = (l == r?0:nod[p * 2].dat + nod[p * 2 + 1].dat);
        return;
    }
	int mid = (nod[p].l + nod[p].r) / 2;
    if( l <= mid )edit(l,r,num,p*2);
    if( r >  mid )edit(l,r,num,p*2+1);
    if(!nod[p].tag)nod[p].dat = nod[p * 2].dat + nod[p * 2 + 1] .dat;
}

Discretization + segment tree - > weight segment tree

If there are n numbers, the weight line segment tree can be used to maintain the k-th largest (smallest) number from 1 to m (m < = n).
Generally, it will cooperate with the chairman tree to find a number with the largest k in the interval. Here, let's talk about the weighted segment tree first

Firstly, n numbers are read in and discretized to obtain a discretized array. The original ranking of each number is saved in the discretized array.
Next, insert from left to right according to the order of the original array. The insertion position is the ranking of this number in the original array. At the same time, recursively make the weight of all its parent nodes + 1.
After m numbers are inserted, we get a segment tree with these characteristics:
Each node represents several numbers within this range.
The number represented by the left son of each node is smaller than that represented by the right son.

Next, just
1. Set a pointer variable to point to the root node
2. Query the size of the left son.
3. If the weight of the left son is greater than k, it means that the K largest is in the left son. Move the pointer left, otherwise move it right, and subtract the size of the left son from K
Repeat 2-3 until k is 0
At this time, the node pointed to by k indicates the ranking of the numbers to be queried in the original interval

int query(int k){
    int p = 1;
    while(k){
        if(nod[p * 2].dat >= k)
            p = p*2;
        else{
            p = p * 2 + 1 ;
            k -= nod[p*2].dat;
        }
    }
    return nod[p].r;
}

Block Point Branch off-line divide and conquer algorithm

QAQ

Sustainable thought

The above data structures maintain the current status of data, but sometimes the data that needs to be queried may be the data before modification. What should I do at this time?
Naturally, first open a root node to distinguish different versions, and then copy the unchanged data, or make the pointer still point to the previous data storage location, and save the new space to be changed.

Because the pointer pointing will change, some conventional pointing methods will not work, for example, ls = 2 * p of the line segment tree;

Persistence is an enchanting prop that makes all kinds of good data structures into hell

Persistent + dictionary tree = persistent dictionary tree

As mentioned above, for each inserted string, first open a root node, then copy the information of each possible character, and then open a new node with the newly inserted information.
I haven't seen many problems

//yy version, which has not been verified and deliberated, should be used with caution
struct Trie{
    int data[NUM][CHAR_NUM];
    int cnt= 1;
    int id = 1;
    bool val[NUM];
    int root[MAXN];
    void add(string s){
        int q = root[id] ,  p = root[++id] = ++cnt ,len=s.size();
        for(int i=0;i<len;i++){
            for(int j = 0;j<CHAR_NUM;j++)
                data[p][j] = data[q][j];
            if(!data[p][s[i]])
                data[p][s[i]]=++cnt;
            p = data[p][s[i]];
            q = data[q][s[i]];
        }
        val[p]=true;
    }
    bool query(string s,int k){
        int p = root[k],len=s.size();
        for(int i=0;i<len;i++){
            p=data[p][s[i]];
            if(!p)return false;
        }
        return val[p];
    }
}trie;

Persistent + segment tree - > persistent segment tree

Also known as the chairman tree
The basic idea is similar. Open the root node, copy the information, and re open the node for different information
Because multiple versions are stored, the next node cannot be determined simply by multiplying 2

As for the query method, it depends on whether the tree function uses interval maintenance or single point query

int root[maxn],val[maxn];
int cnt = 0;
struct SEG{
    int l,r;
    int dat;
    int ls,rs;
}nod[maxn * 40];

void build(int l,int r ,int p){
    nod[p] = {
            l,r,0,0,0
    };
    if(l == r){
        //nod[p].dat = val[p]; It depends on whether the subject needs initial value or not
        nod[p].rs = nod[p].ls = -1;
        return ;
    }
    int mid = (l+r) /2;
    build(l ,mid ,nod[p].ls = ++cnt);
    build(mid + 1,r,nod[p].rs = ++cnt);
    nod[p].dat = nod[nod[p].ls].dat + nod[nod[p].rs].dat;
}
void edit(int l ,int r ,int num ,int x ,int p){
    nod[p] = nod [x];
    if (nod[p].l == nod[p].r) {    //There are differences between interval modification and single point modification
        nod[p].dat += num * (nod[p].r - nod[p].l + 1);
        return ;
    }
    if (l<= (nod[p].l+nod[p].r) / 2) edit(l,r,num,nod[x].ls,nod[p].ls = ++ cnt);
    if (r > (nod[p].l+nod[p].r) / 2) edit(l,r,num,nod[x].rs,nod[p].rs = ++ cnt);
    nod[p].dat = nod[nod[p].ls].dat + nod[nod[p].rs].dat ;
}

Persistent segment tree - > persistent array

Although it is called an array, it is actually implemented by the chairman tree. In fact, as long as the operations of the chairman tree are single point operations, it becomes an array that can record versions

Because it is only a single point query, nodes can delete a lot of redundant information

#define maxn 1000050
int val[maxn] ,root[maxn];
struct SEG{
    int l,r;
    int ls,rs;
    int dat;
}nod[maxn * 40];

int id = 0;
void build(int l,int r,int p){//Program entry build (1, N, root [0] = + ID);
    nod[p] = {l,r,-1,-1,0};
    if(l == r) {
        nod[p].dat = val[r];
        return ;
    }
    int mid = (l + r) / 2 ;
    build(l,mid,nod[p].ls = ++id);
    build(mid + 1, r,nod[p].rs = ++ id);
}

void edit(int d ,int num,int x,int p){ //Program entry edit(l,r,num,root [version id],root [current id])
    nod[p] = nod[x];
    if(nod[p].l == nod[p].r){
        nod[p].dat = num;
        return ;
    }
    if(d <= (nod[p].l + nod[p].r) /2) 
        edit(l,r,nod[x].ls,num, nod[p].ls = ++ id)
    else
        edit(l,r,nod[x].rs,num, nod[p].rs = ++ id);
}

int query(int d,int p){
    if(nod[p].l == nod[p].r)
        return nod[p].dat;
    if(d<=(nod[p].l + nod[p].r)/2)
        return query(d,nod[p].ls);
    else
        return query(d,nod[p].rs);
}

Persistent array - > persistent and query set

To put it bluntly, what is useful is a fa array. The array uses a persistent array, and the parallel search set becomes a persistent parallel search set
find and merge all have version numbers, but they're gone

Chairman tree + tree array - > Dynamic chairman tree

QAQ

ST table

Multiply for maximum or minimum

//ST table is used to calculate the interval extreme value, and the following code is used to calculate the interval maximum value
//The time of establishing ST table is O (nlogn), and the query time is O(1)
//The data cannot be changed. If the data is changed, the table must be rebuilt

#define MAXN 1000050

int input[MAXN]={0};
int st[30][MAXN]={0};
//Store the square of 2 for easy calling
int bin[30]={1};

//st[i][j] refers to the maximum value from data[j] to data[j+2^i-1], and its length is 2^i
// j->j+2^i-1   ==    max{j->j+2^(i-1)-1 ,  j+2^(i-1) + 2^(i-1) -1)
//That is, st[i][j]=max{st[i-1][j],st[i-1][j+2^(i-1)]}

void creat_ST(int n){
    int N=(int)log2(n);
    for(int i=1;i<=N;i++) bin[i]=bin[i-1]*2;
    for(int i=1;i<=n;i++) st[0][i]=input[i];
    for(int i=1;i<=N;i++){
        for(int j=1;j<=n;j++){
            st[i][j]=(j+bin[i-1]<=n?max(st[i-1][j],st[i-1][j+bin[i-1]]):st[i-1][j]);
            //Maintained
            }
        }
    return ;
}

//Find the maximum value of the subscript from start to end
int using_ST(int start,int end){
    int N=(int)log2(end-start+1);
    return max(st[N][start],st[N][end-bin[N]+1]);
}

Incomplete to be updated

Keywords: data structure

Added by JonathanV on Sun, 16 Jan 2022 18:51:45 +0200