Quick sort and its optimization super detailed answer + code (real understanding)

Original text: https://zhuanlan.zhihu.com/p/...
Welcome to my Zhihu: https://www.zhihu.com/people/...

QuickSort adopts the divide and conquer method, which divides the array linked list or other element sets into the to be sorted set and the sorted set, and converts the elements of the to be sorted set into the sorted set in one iteration until all the elements are sorted.

Quick sort uses this strategy to save the cost of processing sorted elements. The algorithm only focuses on the remaining elements to be sorted, and the substrings of unordered elements with continuous positions are divided into S1 (left), pivot (exchange hub element) and S2 (right)

Take quick sort to realize ascending sort as an example:

First, select an array from the array as the pivot element pivot. (the strategy of selecting HUB elements is crucial)
Move all elements of the set to be sorted smaller than the pivot element pivot to S1 (left), and all elements larger than the pivot element pivot to s2 (right): (opposite pointer method in double pointer)
First, put the pivot element pivot (quasi ordered part) to the left or right to distinguish the parts to be sorted.
Point I and j to the leftmost and rightmost part of the index to be sorted respectively.
If the element pointed to by the i index is smaller than the hub element, i + +; Otherwise, i stop. The element pointed to by the j index is larger than the hub element, J --; Otherwise, J stop.
If i < J, exchange two elements and continue to cycle steps 3 and 4; Otherwise, jump out of the loop and exchange the element corresponding to i with the pivot element pivot (at this time, the segmentation is completed)
At this time, the pivot element pivot is defined as the sorted part, and the remaining unordered parts S1 (left) and S2 (right) continue to cycle steps 1, 2 and 3 until all elements have been sorted.
The most primitive quick sort
The most primitive quick sort adopts the simplest and rough hub element selection strategy, that is, select the first or last element as pivot.

The average time complexity of this scheme is O(nlogn). When the array is just ordered, the worst time complexity O(n^2) will appear, because when the input sequence itself is ordered, the S1 or S2 set will be empty, and all elements except the hub element are in S1 or S2. At this time, the time efficiency of time exchange will be greatly reduced, because the role of another interval is wasted.

Python code

nums = [23,2,4,6,2,5,1,6,13,54,8]

def quicksort(nums, left, right):   # left is the leftmost index and right is the rightmost index
    if left >= right:
        return
    pivot = left //Take the first element as pivot
    i, j = left, right
    while i < j:
        while nums[pivot] <= nums[j] and i < j:
            j -= 1
        while nums[pivot] >= nums[i] and i < j:
            i += 1
        if i < j:
            nums[i], nums[j] = nums[j], nums[i]
    nums[pivot], nums[i] = nums[i], nums[pivot]
    quicksort(nums, left, i-1)
    quicksort(nums, i+1, right)

def quicksort2(nums, left, right):          # Stack instead of recursion
    if left >= right:
        return
    stack = []
    while stack or left < right:
        if left < right:
            pivot = left
            i, j = left, right
            while i < j:
                while nums[pivot] <= nums[j] and i < j:
                    j -= 1
                while nums[pivot] >= nums[i] and i < j:
                    i += 1
                if i < j:
                    nums[i], nums[j] = nums[j], nums[i]
            nums[pivot], nums[i] = nums[i], nums[pivot]
            stack.append((left, i, right))
            right = i - 1
        else:
            left, mid, right = stack.pop()
            left = mid + 1

quicksort2(nums, 0, len(nums)-1)
print(nums) 

Due to stack storage (recursion is equivalent to stack substitution), the space complexity of quick sorting is O(logn)

Therefore, S1 and S2 are approximately balanced. The higher the efficiency of quick sorting, that is, the median (median) is the best choice for hub element (because the sequence can be divided into two subsequences. Merging sorting tells us that it is O(NlogN) at this time)

However, it is time-consuming to calculate the median of a group of arrays, which will slow down the efficiency of fast sorting.

Small question: why do you have to translate the right pointer j instead of the left pointer i when the pivot element pivot is placed on the left?

Answer: in order to avoid the pivot itself being exchanged in the translation exchange process.

Optimization of quick sort
Three number median quick sorting (pivot selected by three number median method)
Although calculating the median of a group of arrays is time-consuming, it will slow down the efficiency of fast sorting. However, it can be replaced by calculating the first left, right left / 2 (rounded down or up) and the median of the last right element of the array.

Python code

def quicksort_opt1(nums, left, right): 
    if left >= right:
        return
    stack = []
    while stack or left < right:
        if left < right:
            l, m, r = left, (right-left)//2. Right # first, middle and last
            pivot = findmedian(l, m, r) # Selecting pivot element pivot by taking the median value of three numbers
            i, j = left, right
            while i < j:
                while nums[pivot] <= nums[j] and i < j:
                    j -= 1
                while nums[pivot] >= nums[i] and i < j:
                    i += 1
                if i < j:
                    nums[i], nums[j] = nums[j], nums[i]
            nums[pivot], nums[i] = nums[i], nums[pivot]
            stack.append((left, i, right))
            right = i - 1
        else:
            left, mid, right = stack.pop()
            left = mid + 1

def findmedian(l, m, r):
    if nums[l] <= nums[m]:
        if nums[m] <= nums[r]:
            return m
        else:
            if nums[l] >= nums[r]:
                return l
            else:
                return r
    else:
        if nums[m] >= nums[r]:
            return m
        else:
            if nums[l] >= nums[r]:
                return r
            else:
                return l

The pivot element pivot is selected by the three number median method, which avoids the worst case to a certain extent.

But even so, our quick sort algorithm may still have the worst case: time complexity O(n^2)

Because the more repeated elements will bring more useless work to the algorithm, the following problem will be involved:

To put it simply, do you need to stop the translation of the left and right index pointers when encountering elements equal to the pivot element?

If there is only one stop: this will cause all elements equal to pivot to move to the same side (S1 or S2). In extreme cases, all elements are repeated, resulting in worst-case O(n^2)
If you don't stop: in extreme cases, all elements are repeated, and the pivot element pivot of the whole process is equivalent to traversing the whole array. The time complexity is (n + n-1 +...+2+1)=(1/2)(1+n)*n, that is, the time complexity O(n^2). In fact, if you deduce it, you will find that the execution rule based on quick sort is equivalent to case 1, Not stopping is equivalent to only one pointer translating all the time, and it doesn't stop until the end of the translation.
If all stop: in extreme cases, all elements are repeated. Although it seems that there will be many "meaningless" exchanges, because the place where each double pointer meets is the midpoint of the array, it happens that the sequence is divided into two equally distributed subsequences at this time, which is still the principle of merging and sorting, so as to maximize the efficiency of divide and conquer, By analogy, the pivot element pivot will walk the entire array at the speed of logn. Therefore, the worst-case time complexity of this method is O(nlogn)
Idea of repeated element processing: three-way segmentation and quick sorting

Three-way segmentation quick sorting recommended Jianshu on this blog

Quick sort
​www.jianshu.com/p/779bc4b61254

Two way quick sort
Graphical quick sort and two-way three-way quick sort - SegmentFault
​segmentfault.com/a/1190000021726667

Two way quick sorting divides the area to be sorted into three sub areas with pointers I and j:

(< num [pivot]), unknown, (> num [pivot])

& amp; amp; lt; V & amp; amp; gt; V two parts are placed at both ends of the array, with i pointing to & amp; amp; lt; The next element in the V section, with j pointing to & amp; amp; gt; The previous element of the V section.

Traverse backward from i, if the traversed element & amp; amp; lt; v. Then continue to traverse backward until the traversed element & amp; amp; gt; v. Then stop traversal. Similarly, start traversing forward from j, if the traversed element & amp; amp; gt; v. Then continue to traverse forward until the traversed element & amp; amp; lt; v. Then stop traversal.

Swap the element pointed to by i and the element pointed to by j. Then i + +, j -- continue to compare the next one.
Python code

def quicksort_opt2(nums, left, right): 
    if left >= right:
        return
    stack = []
    while stack or left < right:
        if left < right:
            l, m, r = left, (right-left)//2. Right # first, middle and last
            pivot = findmedian(l, m, r) # Selecting pivot element pivot by taking the median value of three numbers
            nums[pivot], nums[left] = nums[left], nums[pivot]
            pivot = left
            i, j = left+1, right
            while i <= j:   # Why can't it be I < J?
                # Here, first translating j and first translating i have no effect on the result, because i=left+1 avoids pivot
                while nums[i] < nums[pivot] and i <= j:
                    #It cannot be changed to num [i] < = num [pivot], because in this way, these consecutive values are classified as one of them, making the two trees unbalanced
                    i += 1
                while nums[j] > nums[pivot] and i <= j: 
                    j -= 1
                if i <= j:
                    nums[i], nums[j] = nums[j], nums[i]
                    j -= 1
                    i += 1
            nums[pivot], nums[j] = nums[j], nums[pivot] 
            #Because the pivot is placed on the left, it cannot be exchanged with the left pointer i ----- > which will cause an infinite loop
            stack.append((left, j, right))  
            right = j - 1
        else:
            left, mid, right = stack.pop()
            left = mid + 1

nums = [23,2,4,6,2,5,1,6,13,54,8]
quicksort_opt2(nums, 0, len(nums)-1)
print(nums) 

Question 1: why can't I < J replace I < = J as the cycle condition here?

Question 2: why does pivot exchange with j instead of i after translation? Why does exchanging with i cause an infinite loop?

On this basis, can we optimize again and save the 'meaningless' exchange of repeated elements mentioned above? Of course, the following three-way quick sort is optimized around this point.

Three way quick sort
On the basis that the two-way fast platoon divides the unordered area into three sub areas, the three-way fast platoon is divided into one more sub area to store repeated elements equal to pivot to identify and skip them.

On the basis of two-way quick sorting, we take the element equal to V as a separate part. lt points to the last element that is less than V, and gt points to the first element that is greater than v.

Traverse backward from i. if the traversed element e=v, e is directly merged into the = V part, and then i + + continues to traverse. If the traversed element E & amp; amp; lt; v. Then exchange the first element of the E and = V parts (the element pointed to by lt+1), and then LT + +, i + + continue to traverse. If the traversed element E & amp; amp; gt; v. E and & amp; amp; gt; The previous element of Part V (the element pointed to by gt-1) is exchanged, and then GT --, but this

Python code

def quicksort_opt3(nums, left, right): 
    if left >= right:
        return
    stack = []
    while stack or left < right:
        if left < right:
            l, m, r = left, (right-left)//2, right
            pivot = findmedian(l, m, r)
            nums[pivot], nums[left] = nums[left], nums[pivot]
            pivot = left
            lt, gt = left, right + 1
            i = left + 1
            while i < gt:
                if nums[i] < nums[pivot]:
                    nums[i], nums[lt+1] = nums[lt+1], nums[i]
                    i += 1
                    lt += 1
                elif nums[i] > nums[pivot]:
                    nums[i], nums[gt-1] = nums[gt-1], nums[i] 
                    gt -= 1
                else:# nums[i] == nums[pivot]
                    i += 1
            nums[pivot], nums[lt] = nums[lt], nums[pivot] 
            stack.append((left, lt, right))  
            right = lt - 1
        else:
            left, mid, right = stack.pop()
            left = mid + 1

nums = [23,2,4,6,2,5,1,6,13,54,8]
quicksort_opt3(nums, 0, len(nums)-1)
print(nums) 

Quick sort complexity analysis
The average time complexity of quick sort is O(nlogn)

For the mathematical proof of the average complexity of quick sorting, please refer to this answer:
https://www.zhihu.com/questio...

Short boards for quick sorting:
Even three-way quick sort cannot guarantee the worst time complexity of O(nlogn) as heap sort

Quick sort is unstable sort (the so-called stability means that when there are duplicate elements in the array to be sorted, the relative order of the duplicate elements will not change after sorting)

When the number of elements in the sequence to be sorted is very small (n < = 20), the quick sort is not as fast as the insertion sort, and the insertion sort is a stable sort.

Source of github Code: https://github.com/stevezkw19...

Welcome to my github: stevezkw1998 - Overview

Keywords: Python data structure quick sort computer

Added by vaavi8r on Sat, 26 Feb 2022 07:29:10 +0200