Analysis and code of TopK problem

Problem Description:

Find the maximum number of K from arr[1,n]

Method 1: heap sorting

Sorting algorithm reference: Summary of ten sorting algorithms

Idea:

Only topk is found, and topk is not sorted.

  1. First, use the first k elements to generate a small top heap (that is, a complete binary tree with a parent node smaller than a child node). This small top heap is used to store the current maximum k elements.
  2. Next, start scanning from the k+1 element and compare it with the top of the heap (the smallest element in the heap). If the scanned element is larger than the top of the heap, replace the elements at the top of the heap and adjust the heap to ensure that the k elements in the heap are always the current maximum k elements.
  3. After scanning all n-k elements, the k elements in the final heap are topk

Adopt priority queue:

The purpose of the following code is to get the k-th element using the priority queue

import java.util.PriorityQueue;
public class Solution{
    public int findKthLargest(int[] nums, int k){
        int len = nums.length;
        
        // Use a minimum heap of k elements
        PriorityQUeue<Integer> minHeap = new Priority<>(k,(a,b)->a-b);
        for(int i = 0; i<k;i++){
            minHeap.add(nums[i]);
        }
        
        for(int i = k; i<len;i++){
            Integer topEle = minHeap.peek();
            
            if(nums[i]>topELe){
                minHeap.poll();
                minHeap.offer(nums[i]);
            }
        }
        return minHeap.peek();
    }
}

Finally, the k elements in the heap in the above code are the topk

Complexity:

Scan n elements once. Assuming bad luck, adjust them in the heap every time. Adjust the time complexity to the height of the heap, i.e. lg(k), so the overall time complexity is O(n*logk).

Method 2: random selection

Random selection algorithm is a classical algorithm in the introduction to algorithms. Its time complexity is O(n), which is a method with linear complexity.

Quick sort

We know that the idea of fast scheduling is divide and conquer, and its code is as follows

public void quick_sort(int[]arr, int low, inthigh){ 
     if(low== high) return; 
     int i = partition(arr, low, high); 
     quick_sort(arr, low, i-1); 
     quick_sort(arr, i+1, high); 
}

Then the partition() method in quick sort will return an integer j such that a[l... j-1] is less than or equal to a[j], and a[j+1... h] is greater than or equal to a[j]. At this time, a[j] is the jth largest element of the array. You can use this feature to find the k-th element of the array.

It is easy to know that the time complexity of partition is O(n).

Relationship between partition and topk problem:

Topk is to find the largest number of k in arr[1... n]. If the largest number of k is found, make a partition, and find the largest number of k at one time, that is, the number of k in the right half of the partition.

So the key is how to find the k-th largest number. The reference code is as follows

public int findKthLargest(int[] nums, int k){
    int len = nums.length;
    int left =0;
    int right = len-1;
    
    // The subscript of the K-th element is len-k
    int target = len - k;
    
    while(true){
        int index = partition(nums, left, right);
        if(index == target){
            return nums[index]; //At this point, the index position in the array starts with topk
        }else if(index < target){
            left=index+1;
        }else{
            right = index-1;
        } 
    }      
}


public int partition(int[] nums, int left, int right){
    int pivot = nums[left]; 
    int j = left;
    for(int i=left+1; i<= right; i++){
        // If you always encounter a number smaller than pivot, j will increase with i
        // If a number greater than or equal to pivot is encountered, j will stop, i will scan the next number smaller than pivot, and then the two will be exchanged
        if(nums[i]<pivot){
            // If num [i] < pivot, j will increase and j will stay at a position greater than or equal to the pivot element
            j++;
            // Elements smaller than pivot are swapped to the front
            swap(nums,j,i);
        }
    }
    swap(nums, j, left);
    return j;
}

private void swap(int[] nums, int index1, int index2) {
    int temp = nums[index1];
    nums[index1] = nums[index2];
    nums[index2] = temp;
}

Another way to write the partition function:

private int partition(int[] a, int l, int h){
    int i = l, j = h + 1;
    while (true) {
        while (a[++i] < a[l] && i < h) ;
        while (a[--j] > a[l] && j > l) ;
        if (i >= j) {
            break;
        }
        swap(a, i, j);
    }
    swap(a, l, j);
    return j;
}

#Reference

Detailed explanation of the interviewer's favorite TopK problem algorithm - Zhihu (zhihu.com)

TopK (bestzuo.cn)

partition reduction + priority queue (Java) - the kth largest element in the array - LeetCode (leetcode-cn.com)

Keywords: Algorithm

Added by aopen on Thu, 30 Dec 2021 00:38:22 +0200