Principle and implementation of ten sorting algorithms [python version]

0 Preface

During the algorithm interview, the interviewer will generally ask "what is the time complexity of this algorithm? What is the space complexity? How is its stability?". In order to answer this kind of questions clearly, I combed the basic sorting algorithm. If you master these classical methods, you can do well in the coding of sorting class and deepen your understanding of this kind of questions.

1 algorithm part

Before learning, first have a general understanding of the classification of sorting, which can be divided into comparative sorting and non comparative sorting

The comparison sort includes:

  • Exchange sort: bubble sort, quick sort
  • Insert sort: simple insert sort, Hill sort
  • Selection sort: simple selection sort and heap sort
  • Merge sort

Non comparison sort includes:

  • Count sort
  • Bucket sorting
  • Cardinality sort

Understand the stability of the algorithm

Definition: there are multiple records with the same keyword. If they are sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i]=r[j], and r[i] is before r[j], but in the sorted sequence, r[i] is still before r[j], then this sorting algorithm is said to be stable; Otherwise, it is called unstable.
case: the sequence is 5 3 3 4 3 8 9 10 11. Now the exchange of central elements 5 and 3 (the fifth element, the subscript starts from 1) will disrupt the stability of element 3. Therefore, quick sorting is an unstable sorting algorithm. The instability occurs when the central element and a[j] exchange.

Understand the concept of in place sorting
It refers to data sorting that does not apply for excess storage space in the sorting process, and only uses the storage space originally used to store the data to be sorted for comparison and exchange. The space complexity is O(1); Sorting that requires extra space is non in place sorting, and its space complexity is usually greater than O(1). Merge sorting, count sorting, bucket sorting and cardinal sorting all belong to non in place sorting.

1.1 bubble sorting

The algorithm idea of bubble sorting is to repeatedly traverse the data and compare the sizes of adjacent elements in turn. If their order is wrong, they will be exchanged until all numbers are no longer exchanged. At this time, the sorting is completed.

def sort_buble(lists):
    for i in range(len(lists)-1):   # N numbers, the comparison only needs n-1 times
        flag = 0          # Mark whether it enters the internal circulation
        for j in range(len(lists)-1-i):
            if lists[j]>lists[j+1]:
                lists[j], lists[j+1] = lists[j+1], lists[j]
                flag = 1
        if flag == 0:     # If the inner loop is not entered once, it indicates that the data has been sorted and exits in advance
            break
    return lists

According to the code, the time complexity of bubble sorting is O(n2) and the space complexity is O(1). The algorithm is stable.

1.2 quick sort

Quick sort is an improvement on bubble sort. The algorithm idea is to find a benchmark number (usually the first element) in the data, and divide the data into two parts according to the benchmark number. The data less than the benchmark number is placed on the left, and the data greater than the benchmark number is placed on the right, and then process the data on the left and right in the same way until sorting is completed.

def sort_quick(lists,i,j):
    if i >= j:
        return lists
    pivot = lists[i]
    low = i
    high = j
    while i < j:
        while i < j and lists[j] >= pivot:
            j -= 1
        lists[i]=lists[j]
        while i < j and lists[i] <=pivot:
            i += 1
        lists[j]=lists[i]
    lists[j] = pivot
    quick_sort(lists,low,i-1)
    quick_sort(lists,i+1,high)
    return lists

According to the code, the time complexity of quick sorting is O(nlogn). In the worst case, the time complexity is O(n2) and the space complexity is O(1). The algorithm is unstable.

1.3 simple insert sort

The algorithm idea of simple insertion sorting is to touch the card. By default, the inserted data are arranged in order. Compare the number to be inserted with the inserted data one by one. If the element to be inserted is equal to an element in the ordered sequence, insert the element to be inserted after the equal element.

def sort_insert(lists):
    for i in range(1, len(lists)):
        j = i - 1
        temp = lists[i]
        # Compare the i-th data with the data in the previous order one by one
        while j >= 0 and temp < lists[j]:
            lists[j+ 1] = lists[j]
            j -= 1
        lists[j + 1] = temp
    return lists

According to the code, the time complexity of the sorting is O(n2) and the space complexity is O(1). The algorithm is stable.

1.4 Hill sorting

Hill sorting is an improvement on simple insertion sorting. The algorithm idea is to divide the data area into several small blocks in a specific interval, and gradually reduce the interval distance after arranging the data in the block by insertion sorting method.

def sort_shell(lists):
    gap = len(lists)//2
    while gap != 0:
        # Insert sort
        for i in range(gap, len(lists)):
        	j = i-gap
            temp = lists[i]
            while j >= 0 and temp < lists[j]:
                lists[j+gap] = lists[j]
                j = j-gap
            lists[j+gap] = temp
        # Adjust interval after sorting
        gap = gap//2
    return lists

The time complexity of Hill sort is related to the selection of incremental sequence. The time complexity is O(n^(1.3~2)) and the space complexity is O(1). The algorithm is unstable.

1.5 simple selection sorting

The algorithm idea of simple selection sorting is to find the smallest (large) element in the unordered sequence, store it at the beginning of the sorted sequence, and then continue to find the smallest (large) element from the remaining unordered elements, and then put it at the end of the sorted sequence.

def sort_selection(lists):
    for i in range(0,len(lists)-1):
        for j in range(i+1,len(lists)):
            if lists[i] <= lists[j]:
                continue
            else:
                lists[i],lists[j] = lists[j],lists[i]
    return lists

The time complexity of simple selection sorting is O(n2) and the space complexity is O(1). The algorithm is unstable.

1.6 heap sorting

The idea of heap sorting algorithm is to first build the data into a heap tree, and then put the root of the heap tree to the end of the heap.

# Build a stacked tree up from the last subtree
def buildMaxHeap(data):
    for i in range(len(data) // 2, -1, -1):
        heapify(data, i)


# Building a stacked tree
def heapify(data, i):
    left = 2 * i + 1
    right = 2 * i + 2
    largest = i
    # Determine whether the left child node needs to be exchanged
    if left < length and data[left] > data[largest]:
        largest = left
    # Determine whether the right child node needs to be exchanged
    if right < length and data[right] > data[largest]:
        largest = right
    if largest != i:
        data[i], data[largest] = data[largest], data[i]
        # If it is exchanged, continue to check whether the subtree whose child node is the root needs to be exchanged
        heapify(data, largest)


def sort_heapify(lists):
    # Use stacked tree sorting
    global length
    length = len(lists)
    buildMaxHeap(lists)
    # Exchange the root of the stacked tree with the last child node. Here, length is used to control that the last child node is not considered
    for i in range(len(lists) - 1, 0, -1):
        lists[0], lists[i] = lists[i], lists[0]
        length -= 1
        # After the head and tail exchange, you need to rebuild the stacked tree
        heapify(lists, 0)
    return lists

The time complexity of heap sorting is O(nlogn) and the space complexity is O(1). The algorithm is unstable.

1.7 merge sort

The algorithm idea of merging and sorting is to make each subsequence orderly, and then merge the ordered subsequences to obtain a completely ordered sequence. The algorithm adopts the divide and conquer idea.

def merge(left, right):
    result = []
    while left and right:
        if left[0] <= right[0]:
            result.append(left.pop(0))
        else:
            result.append(right.pop(0))
    if left:
        result.extend(left)
    if right:
        result.extend(right)
    return result


def sort_merge(lists):
    if len(lists) < 2:
        return lists
    else:
        mid = len(lists)//2
        left, right = lists[:mid], lists[mid:]
    return merge(sort_merge(left), sort_merge(right))

The time complexity of merging sorting is O(nlogn). Because it is non in-situ sorting and the space complexity is O(n), the algorithm is stable.

1.8 counting and sorting

Counting sorting is faster than the previous comparison sort. Its algorithm idea is to find the maximum and minimum value of the given data, create a statistical array and calculate the number of statistical corresponding elements, then count the deformation of the array, and finally traverse the original array in reverse order, find the correct position from the statistical array and output it to the result array.

def sort_count(lists):
    # Find maximum and minimum
    minnum, maxnum = lists[0], lists[0]
    for i in range(len(lists)):
        if lists[i]<minnum:
            minnum = lists[i]
        if lists[i]>maxnum:
            maxnum = lists[i]
    d = maxnum-minnum

    # Create a statistics array and calculate the number of corresponding elements
    countArray = [0]*(d+1)
    for num in lists:
        countArray[num-minnum] += 1

    # Statistical array deformation
    sumnum = 0
    for i in range(len(countArray)):
        sumnum += countArray[i]
        countArray[i] = sumnum

    # Traverse the original array in reverse order, find the correct position from the statistical array, and output it to the result array
    sortedArray = [0]*len(lists)
    for i in range(len(lists)-1, -1, -1):
        sortedArray[countArray[lists[i]-minnum]-1] = lists[i]
        countArray[lists[i]-minnum] -= 1
    return sortedArray

The time complexity of counting sorting is O(n+k). Because it is non in-situ sorting and the space complexity is O(n+k), the algorithm is stable.

1.9 bucket sorting

The algorithm idea of bucket sorting is to store the elements in the same value domain in the same bucket, that is, divide the data into multiple buckets according to the element value characteristics. If the elements in each bucket are sorted, the set of elements in all buckets is sorted. It can be seen that the algorithm adopts the divide and conquer idea. (it is possible to use other sorting algorithms or continue to use bucket sorting recursively for bucket sorting)
Bucket sorting is an extended version of counting sorting. Counting sorting can be regarded as that each bucket stores only the same elements, while bucket sorting stores a certain range of elements. Bucket sorting needs to ensure that the elements are evenly dispersed as far as possible, otherwise when all data are concentrated in the same bucket, bucket sorting will be invalid.

def sort_bucket(lists):
    min_num, max_num = min(lists), max(lists)
    bucketsize = 3
    # Bucket size
    bucket_num = (max_num - min_num) // bucketsize + 1
    # Bucket array
    buckets = [[] for _ in range(int(bucket_num))]
    # Fill in the bucket array
    for num in lists:
        buckets[int((num - min_num) // bucketsize)].append(num)
    # sorted is called directly by sorting inside the bucket
    new_lists = []
    for i in buckets:
        for j in sorted(i):
            new_lists.append(j)
    return new_lists

The time complexity of bucket sorting is O(n+k). Because it is non in-situ sorting and the space complexity is O(n+k), the algorithm is stable.

1.10 cardinality sorting

The algorithm idea of cardinality sorting is to sort according to the low order first, and then merge; Then sort according to the high order, and then merge; And so on until the highest order. Sometimes some attributes are prioritized. They are sorted first by low priority and then by high priority. The final order is the high priority, the high priority, the same high priority, and the low priority, the high priority.

def sort_radix(lists):
    # First calculate the digits of the maximum number. The parameter maxnum indicates how many digits the maximum number is. If it is three digits, set it to 100
    digits = len(str(lists[0]));
    for lst in lists:
        if len(str(lst)) > digits:
            digits =len(str(lst))
    maxnum= 10**(digits-1)
    # n is the cardinality, sorting from single digits;
    n = 1  
    while n<=maxnum: # 
        # Put each number into the table by one digit (ten digits / hundred digits)
        size = len(lists)
        temp = [[0]*size for i in range(10)]
        for i in range(size):
            m = (lists[i]//n)%10
            temp[m][i] = lists[i]

        # Take the numbers out of the table in order and put them into a one-dimensional array
        k = 0
        for i in range(10):
            for j in range(size):
                if temp[i][j] != 0:
                    lists[k] = temp[i][j]
                    k += 1
        n = n *10 #Adjust the number of bits to ten / hundred
    return lists

The time complexity of Radix sorting is O(n*k). Because it is non in-situ sorting and the space complexity is O(n+k), the algorithm is stable.

2 Summary

To sum up, we have an overall understanding of the classical sorting logic and its implementation, and form a clear understanding through classification and comparison, which can be remembered through the following charts.

Sorting methodTime complexity (average)Time complexity (worst case)Time complexity (best)Spatial complexitystability
Bubble sortingO(n2)O(n2)O(n)O(1)stable
Quick sortO(n logn)O(n2)O(n logn)O(logn)instable
Simple insert sortO(n2)O(n2)O(n)O(1)stable
Shell Sort O(n log2n)O(n2)O(n)O(1)instable
Simple selection sortO(n2)O(n2)O(n2)O(1)instable
Heap sortO(n logn)O(n logn)O(n logn)O(1)instable
Merge sortO(n logn)O(n logn)O(n logn)O(n)stable
Count sortO(n+k)O(n+k)O(n+k)O(n+k)stable
Bucket sortingO(n+k)O(n2)O(n+k)O(n+k)stable
Cardinality sortO(n*k)O(n*k)O(n*k)O(n+k)stable

PS: instance code

#Performance test code
if __name__=="__main__":
    lists=[30,24,5,58,18,36,12,42,39]
    print("The sequence before sorting is:")
    for i in lists:
        print(i,end =" ")
    print("\n The sorted sequence is:")
    for i in sort_{Sorting function}(lists):
        print(i,end=" ")

Last last

The problems encountered in work are far more complex than these algorithms. The algorithm is only a tool. Only when you master these tools can you think of and apply them at the key time, so as to make the algorithm one of the solutions to the problem. The implementation code in this paper may not be the optimal solution. Correction and supplement are welcome.

reference

https://blog.csdn.net/weixin_44683255/article/details/111880015
https://www.runoob.com/python3/python3-examples.html
https://zhuanlan.zhihu.com/p/63227573
https://blog.csdn.net/orangerfun/article/details/105297739
https://blog.csdn.net/MobiusStrip/article/details/83785159
https://juejin.cn/post/6844903876550737928

Keywords: Python Algorithm leetcode

Added by DimArchon84 on Wed, 12 Jan 2022 09:23:22 +0200