The worst case is the linear time selection algorithm

The worst case is the linear time selection algorithm

  • Reference: [algorithm] Introduction to algorithm: https://www.bilibili.com/video/BV1Tb411M7FA?p=6

Ask questions: find the K-th largest number from an array, that is, the TOPK problem. This problem is often encountered in interviews and research. Then, how should this problem be solved?

  • Of course, we will think of sorting. We can use sorting algorithm to make the array orderly, and then select it directly. Using quick sort, merge sort, or heap sort can make the time complexity O(nlgn)
  • Build the heap and take out the first K numbers. When K is close to 0 or the length of the array, the time complexity is almost linear, but when K tends to the median, the complexity will become nlgn

Today we will introduce an algorithm that makes the time complexity of selecting TOPK O (n), that is, the worst case is the linear time selection algorithm (Introduction to algorithm, YYDS).

1: Detailed algorithm

  • The array is divided into several arrays, and each subarray contains 5 elements. Since the length of the array is not necessarily an integer multiple of 5, the length of the last array is allowed to be less than 5
  • Find the median of each subarray and put it on the second position of each subarray, that is, all the median are arranged in a straight line
  • Recursively call select to find the median of the obtained median, that is, the median on a straight line
  • The original array is divided into two parts using a method similar to snapshot. Let K be one more than the number of elements in the divided low area, so X is the element with the smallest K, and there are n - k elements in the divided high area.
  • If i = k, we found it
    • If I < K, a recursive call is made in the low area to find the i-th smallest element.
    • If I > k, the i - k smallest element will be found in the recursive call in the high area (we have removed the K smallest element, so we will find the i - k smallest element in the following array)

2: Code implementation

#include <stdlib.h>
#include <stdio.h>
#define swap(a,b) (a)^=(b);(b)^=(a);(a)^=(b)
#define MAX 1000

void sort(int* input, int size){
    printf ( "sort arry size = %d\n", size );
    int i,j;
    for(i = 0; i< size ; i++){
        for(j = 0; j<size-i-1;j++){
            if(input[j]<input[j+1]){
                swap(input[j],input[j+1]);
            }
        }
    }
}
void output(int * input, int size){
    for(;size>0 && *input;size--,input++){
        printf("%d ", *input);
    }
    printf("\n");

}

int partion(int *input, int size, int key){
    printf ( "--------------Step4---------------\n" );
    printf("key = %d \n", input[key]);
    int *head, *tail;
    head = input;
    tail = head + size - 1;
    swap(*head, input[key]);

    int *k = head;
    while(head<tail){
        while(*tail && *k >= *tail){
            tail--;
        }
        if(tail<=head) break;
        swap(*k,*tail);
        k = tail;
        while(*head && *k < *head)
            head++;
        if(head>=tail) break;
        swap(*k,*head);
        k = head;
    }
    output(input, size);
    printf ( "--------------Step4 done--------------\n" );
    return k-input+1;
}

int kselect(int *input, int size, int k){
    printf ( "start element : %d \n", *input );
    if(size<=5){
        sort(input, size);
        return input[k-1];
    }
    int mid[MAX] = {0};
    int midvalue[MAX] = {0};
    int groups = size/5;
    int i;

    printf ( "-----------------step 1, 2--------------\n" );
    for(i = 0; i<groups;i++){
        sort(input+i*5, (i*5+5 > size) ? (size-1):5);
        printf ( "sorted group %d:\n", i );
        output(input+i*5, 5);
        mid[i] = i*5 + 2;
        midvalue[i] = input[i*5 + 2];
    }

    printf ( "-----------------step 1, 2 done--------------\n" );

    printf ( "---------step3-------------\n" );
    sort(midvalue, groups);
    printf ( "---------step3 done-------\n" );
    int m = -1;
    for(i = 0; i<5;i++){
        if(input[mid[i]] == midvalue[groups/2]){
            m = partion(input, size, mid[i]);
        }
    }
    if(m == k){
        return input[m-1];
    }
    if(k<m){
        return kselect(input,m,k);
    }
    else{
        return kselect(input+m, size - m, k-m);
    }
    return 0xffff;
}

int main(){
    int input[] = {1,3,2,10,5,11, 12, 8 ,6, 7};     /*Output the 7th largest element*/
    int r = kselect(input,sizeof(input)/sizeof(int), 7);
    printf("result %d \n", r);
    return 0;
}

3: About the author

  • This algorithm was designed by Blum, Floyd, Pratt, Rivest and Tarjan. I just started seeing this. I only know Floyd. I had no idea how deep the water was
    • Floyd, the only person I know. I have studied Floyd algorithm, which can calculate the distance between any two fixed points in the graph, the weight can be negative, and the efficiency is higher than dijkstra algorithm. 1978 Turing
    • Blum, in integer decomposition, the second Blum in the Blum Blum Shub encryption algorithm is him. 1995 Turing
    • Pratt, the P in KMP algorithm is him! Huh? What does KMP say?
    • Rivest, the discoverer of RSA encryption algorithm. Is RSA symmetric or asymmetric encryption? As a result, he won Turing award in 2002
    • Tarjan, an expert in graph theory, invented LCA (nearest common ancestor) and strong connected component algorithm. It also invented fiboracci heap and play data structure. And he analyzed and searched the collection, and obtained Turing in 1986
  • Almost all staff Turing, everyone has made outstanding contributions to the development of computer science! Tarjan is a student of Floyd and Knuth. Knuth is the author of The Art of Computer Programming and the inventor of tex. Turing was obtained at the age of 36.

4: Little thinking

  • The algorithm divides the array into small arrays with a length of 5. Why 5?
    • 3 is that ok?
    • 7 is that ok?

Keywords: Algorithm data structure

Added by ChessclubFriend on Tue, 18 Jan 2022 01:00:08 +0200