Heap downward adjustment algorithm, heap upward adjustment algorithm, heap implementation, Topk problem

catalogue

1, Pile

1. Concept of heap

2. Nature of heap

3. Heap classification

2, Downward adjustment algorithm of heap

3, Heap creation  

4, Heap up adjustment algorithm

5, Implementation of heap

1. Heap initialization

2. Heap destruction

3. Heap insertion

4. Deletion of heap

5. Get data from the top of the heap

6. Number of data in the heap

7. Empty judgment of heap

8. Heap printing

9. Test small example

  6, Topk problem

Test topk

 

1, Pile

1. Concept of heap

Heap: if there is a key set K={k0,k1,k2,..., kn-1}, all its elements are stored in a one-dimensional array in the order of a complete binary tree, and meet ki < = k2i + 1 and Ki < = k2i + 2 (or ki > = k2i + 1 and Ki > = k2i + 2), where i=0,1,2,..., the set is called heap.

2. Nature of heap

① The value of a node in the heap is always not greater than or less than the value of its parent node;

② Heap is always a complete binary tree.

3. Heap classification

Heap: the heap with the largest root node is called the maximum heap or large root heap. The value of the parent node is always greater than that of the child.

Small heap: the heap with the smallest root node is called the minimum heap or small root heap. The value of the parent node is always less than that of the child.

2, Downward adjustment algorithm of heap

Now we give an array, which is logically regarded as a complete binary tree. We can adjust it into a small heap through the downward adjustment algorithm starting from the root node. The downward adjustment algorithm has a premise: the left and right subtrees must be a heap before adjustment.

Basic idea of downward adjustment algorithm:   (take the construction of small piles as an example)

① Starting from the root node, select the smaller value of the left and right child nodes

② Compare fathers with younger children

If the father is larger than the child, then exchange

If the father is smaller than this child, no exchange will be made

③ End condition

1. Father < = small children stop

2. Adjust to the leaf node (the leaf node is characterized by no left child, that is, if the array subscript exceeds the range, it does not exist)

The code is as follows:

void Swap(int* x,int* y)
{
int*tmp=*x;
*x=*y;
*y=tmp;
}
void AdjustDown(int* a, int n, int parent) //Note that the right child may not exist
{
	//Child records the subscript of the smaller child in the left and right children
	int child = 2 * parent + 1; //First, the value of the left child is smaller by default
	while(child<n)
	{ 
		if (child + 1 < n && a[child + 1] < a[child])//The right child exists and the right child is smaller than the left child
		{
			child++;//The younger child is changed to the right child
		}
		if (a[child < a[parent]])
		{
			Swap(&a[child], & a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}

}

Detailed downward adjustment algorithm process:  

3, Heap creation  

How to adjust an arbitrary tree to heap?

Here, we can use the heap downward adjustment algorithm just written. We know that to meet the heap, the left and right subtrees must be large or small piles. We can use this idea to walk backwards from bottom to top, starting from the first non leaf node, and adjust it downward as a root through the control of array subscript, Until the current path is adjusted to a large or small pile that meets the conditions

The code is as follows:

	//Build heap n is the number of arrays, n-1 is the subscript (n-1-1) / 2 of the last element, and find the father of the first non leaf node
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(hp->a, hp->size, i);
	}

The time complexity of heap building is O(N)

Because the heap is a complete binary tree, and the full binary tree is also a complete binary tree, the full binary tree is used here for simplification (the time complexity is an approximate value, and more nodes do not affect the final result):

 

4, Heap up adjustment algorithm

When we already have a heap, we need to insert data at the end of the heap and adjust it to keep the heap structure. Here we need to use the heap up adjustment algorithm

Basic idea of heap up adjustment algorithm: (taking building a small heap as an example)

① Compare the data to be inserted with its parent node data

② If the data of the child node is less than the data of the parent node, it is exchanged

If the data of the child node is larger than the data of the parent node, no exchange and adjustment are required, and the heap structure has been met

The code is as follows:

void Swap(int* x, int* y)
{
	int* tmp = *x;
	*x = *y;
	*y = tmp;
}
void AdjustUp(int * a, int child)
{
	int parent = (child - 1) / 2;
	while (child>0)
	{
		if (a[parent] > a[child])
		{
			Swap(&a[child],&a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else 
		{
			break;
		}
	}
}

Detailed upward adjustment algorithm:

  Now that we know the heap up and down adjustment algorithms, we can start to implement the heap structure

5, Implementation of heap

We need a structure to represent the heap

typedef int HPDataType;//Facilitate subsequent modification of data types
typedef struct Heap
{
	HPDataType* a;//Pointer for dynamic development
	int size;
	int capacity;
}Heap;

1. Heap initialization

We still set size and capacity to 0 and array a to NULL.

void HeapInit(Heap* hp)
{
	assert(hp);
	hp->capacity = hp->size = 0;
	hp->a = NULL;
}

2. Heap destruction

The array used for dynamic development should be released and returned to the operating system in order to avoid memory leakage

void HeapDestroy(Heap* hp)
{
	assert(hp);

	free(hp->a);//Release dynamic array
	hp->a = NULL;//Empty in time
	hp->capacity =hp->size = 0;//Set the number of elements to 0
}

3. Heap insertion

After inserting data into the heap, its structure will still be the heap, so we need to adjust it upward. At the beginning, both size and capacity are 0, so we need to expand the capacity first and then put the data in

void HeapPush(Heap* hp,HPDataType x)
{
	assert(hp);
	if (hp->capacity == hp->size)
	{
		size_t newcapacity = hp->capacity == 0 ? 4 : hp->capacity * 2;
		HPDataType* tmp = (HPDataType*)realloc(hp->a, sizeof(HPDataType) * newcapacity);
		if (tmp == NULL)
		{
			printf("realloc fail\n");
				exit(-1);
		}
		hp->a = tmp;
		hp->capacity = newcapacity;
	}
	hp->a[hp->size] = x;
	hp->size++;
	AdjustUp(hp->a, hp->size - 1);
}

4. Deletion of heap

Deleting the heap is to delete the data at the top of the heap, exchange the data at the top of the heap with the last data, and then delete the last data. The remaining data can be adjusted downward. Due to the array structure, we only need size -- which is equivalent to deleting that data

void HeapPop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	Swap(&hp->a[0],&hp->a[hp->size-1]);
	hp->size--;
	AdjustDown(hp->a,hp->size,0);
}

5. Get data from the top of the heap

We can directly use the array subscript to return data

//Take the data from the top of the heap
HPDataType HeapTop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	return hp->a[0];
}

6. Number of data in the heap

Just return size directly

//Total number of data in the heap
int HeapSize(Heap* hp)
{
	assert(hp);

	return hp->size;
}

7. Empty judgment of heap

Judge whether the number of data in the heap is 0

//Empty judgment of heap
bool HeapEmpty(Heap* hp)
{
	assert(hp);

	return hp->size == 0;//0 is false, not 0 is true
}

8. Heap printing

void HeapPrint(Heap* hp)
{
	assert(hp);
	for(int i = 0; i < hp->size; i++)
	{
		printf("%d ",hp->a[i]);
	}
	printf("\n");
}

9. Test small example

#include"Heap.h"

int main()
{
	int arr[] = { 27,15,19,18,28,34,65,49,25,37 };
	Heap hp;
	HeapInit(&hp);
	int i = 0;
	for (i = 0; i < sizeof(arr) / sizeof(arr[0]); i++)
	{
		HeapPush(&hp, arr[i]);
	}
	HeapPrint1(&hp);
	HeapPop(&hp);
	HeapPop(&hp);
	HeapPop(&hp);
	HeapPop(&hp);
    HeapPrint1(&hp);
	printf("top=%d\n", HeapTop(&hp));
	printf("size=%d\n",HeapSize(&hp));
	HeapDestroy(&hp);
	return 0;

}

test result

  6, Topk problem

That is, find the first K largest or smallest elements in the data combination. Generally, the amount of data is relatively large.

For example, the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.

For the Top-K problem, the simplest and direct way you can think of is sorting. However, if the amount of data is very large, sorting is not desirable (maybe all the data can not be loaded into memory at once). The best way is to use heap. The basic idea is as follows:

1. Use the first K elements in the data set to build the heap

For the first k largest elements, build a small heap

For the first k smallest elements, build a lot

2. Compare the remaining N-K elements with the top elements in turn. If not, replace the top elements

After comparing the remaining N-K elements with the top elements in turn

3. The remaining K elements in the heap are the first K minimum or maximum elements

The code is as follows:  

//topk problem
void PrintTopK(int* a, int n, int k) 
{
	// 1. Build a heap -- build a heap with the first k elements in a (insert one by one)
	Heap hp;
	HeapInit(&hp);
	for(int i=0;i<k;i++)
	{
		HeapPush(&hp,a[i]);
	}

	// 2. Exchange the remaining n-k elements with the top elements in turn, and replace them if they are not satisfied
	for (int i = k; i < n; i++)
	{
		if (a[i] > HeapTop(&hp))
		{
			HeapPop(&hp);
			HeapPush(&hp,a[i]);
		}
	}
	HeapPrint(&hp);
	HeapDestroy(&hp);

	
}

Test topk

void TestTopk()
{
	int n = 10000;
	int* a = (int*)malloc(sizeof(int) * n);
	srand((unsigned int)time(0));
	for (size_t i = 0; i < n; ++i)
	{
		a[i] = rand() % 1000000;
	}
	a[5] = 1000000 + 1;
	a[1231] = 1000000 + 2;
	a[531] = 1000000 + 3;
	a[5121] = 1000000 + 4;
	a[115] = 1000000 + 5;
	a[2335] = 1000000 + 6;
	a[9999] = 1000000 + 7;
	a[76] = 1000000 + 8;
	a[423] = 1000000 + 9;
	a[3144] = 1000000 + 10;
	PrintTopK(a, n, 10);
}
int main()
{
	TestTopk();
	return 0;
}

Here, we use the random number generation method to randomly generate 10000 numbers less than 100000, and then randomly construct 10 numbers greater than 100000. We can find these 10 numbers greater than 100000 by using Topk function!

Test results:

  Thank you for watching!

 

 

Keywords: data structure Binary tree

Added by joelg on Mon, 22 Nov 2021 08:15:20 +0200