Python data structures and algorithms (day 6)

44. Hill ranking

Shell Sort

Shell sort is a kind of insert sort. Also known as reduced incremental sorting, it is a more efficient improved version of the direct insertion sorting algorithm. Hill sorting is an unstable sorting algorithm. This method is due to dl Shell was named after it was proposed in 1959. Hill sort is to group records by certain increment of subscript, and sort each group by direct insertion sort algorithm; As the increment decreases, each group contains more and more keywords. When the increment decreases to 1, the whole file is just divided into one group, and the algorithm terminates.

Hill sorting process

The basic idea of hill sorting is to list the array in a table and insert and sort the columns respectively. Repeat this process, but use longer columns each time (the step length is longer and the number of columns is less). Finally, there is only one column in the whole table. To better understand the algorithm, the algorithm itself still uses array for sorting.

For example, suppose there is such a set of numbers [13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10], if we start sorting with a step of 5, we can better describe the algorithm by putting this list in a table with 5 columns, so that they should look like this (the vertical elements are composed of steps):

13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10

Then we sort each column:

10 14 73 25 23
13 27 94 33 39
25 59 94 65 82
45

When the above four lines of numbers are connected together in sequence, we get: [10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45]. At this time, 10 has moved to the correct position, and then sort in steps of 3:

10 14 73
25 23 13
27 94 33
39 25 59
94 65 82
45

After sorting, it becomes:

10 14 13
25 23 33
27 25 59
39 65 73
45 94 82
94

Finally, sort in 1 step (this is a simple insertion sort)

def shell_sort(alist):
    n = len(alist)
    # Initial step size
    gap = n // 2
    while gap > 0:
        # Insert sort by step
        for i in range(gap, n):
            j = i
            # Insert sort
            while j>=gap and alist[j-gap] > alist[j]:
                alist[j-gap], alist[j] = alist[j], alist[j-gap]
                j -= gap
        # Get new step size
        gap = gap // 2

alist = [54,26,93,17,77,31,44,55,20]
shell_sort(alist)
print(alist)

45. Hill sorting implementation

Hill sort implementation

def shell_sort(alist):
    n = len(alist)
    # Initial step size
    gap = n // 2
    while gap > 0:
        # Insert sort by step
        for i in range(gap, n):
            j = i
            # Insert sort
            while j>=gap and alist[j-gap] > alist[j]:
                alist[j-gap], alist[j] = alist[j], alist[j-gap]
                j -= gap
        # Get new step size
        gap = gap // 2

alist = [54,26,93,17,77,31,44,55,20]
shell_sort(alist)
print(alist)

Time complexity

  • Optimal time complexity: it varies according to the step sequence
  • Worst time complexity: O(n2)
  • Stable: unstable

46. Quick sort

Quick sort

Quick sort (English: Quicksort), also known as partition exchange sort (partition exchange sort). The data to be sorted is divided into two independent parts through one-time sorting. All the data in one part is smaller than all the data in the other part, and then the two parts of data are sorted quickly according to this method. The whole sorting process can be recursive, so as to turn the whole data into an ordered sequence.

The steps are:

  1. Pick out an element from the sequence, which is called "pivot",
  2. Reorder the sequence. All elements smaller than the benchmark value are placed in front of the benchmark, and all elements larger than the benchmark value are placed behind the benchmark (the same number can be on either side). After the partition is completed, the benchmark is in the middle of the sequence. This is called partition operation.
  3. Recursively sorts subsequences that are smaller than the reference value element and subsequences that are larger than the reference value element.

The bottom case of recursion is that the size of the sequence is zero or one, that is, it has always been sorted. Although it recurses all the time, the algorithm will always end, because in each iteration, it will put at least one element to its last position.

47,48. Quick sort implementation

Quick sort implementation

def quick_sort(alist,first,last):
    if first >= last:
        return
    mid_value = alist[first]
    low = first
    high = last
    while low < high:
        while low < high and alist[high] >= mid_value:
            high -= 1
        alist[low] = alist[high]
        while low < high and alist[low] < mid_value:
            low += 1
        alist[high] = alist[low]
    #When exiting from the loop, low==high
    alist[low] = mid_value

    quick_sort(alist,first,low-1)
    quick_sort(alist,low+1,last)

if __name__=="__main__":
    li = [1,2,44,11,22,45,52,55,12,23]
    print(li)
    quick_sort(li,0,len(li)-1)
    print(li)
[1, 2, 44, 11, 22, 45, 52, 55, 12, 23]
[1, 2, 11, 12, 22, 23, 44, 45, 52, 55]

Time complexity

  • Optimal time complexity: O(nlogn)
  • Worst time complexity: O(n^2)
  • Stability: unstable

From the beginning, the description of the average O(n log n) time required for quick sorting is not obvious. However, it is not difficult to observe the partition operation. The elements of the array will be visited once in each cycle, using O(n) time. In the version using concatenation, this operation is also O(n).

In the best case, each time we run the partition, we divide a sequence into two nearly equal segments. This means that each recursive call handles a half size sequence. Therefore, we only need to make log n nested calls before reaching the sequence with size one. This means that the depth of the call tree is O(log n). However, in two program calls of the same hierarchy, the same part of the original sequence will not be processed; Therefore, each hierarchy of program calls only needs O(n) time in total (each call has some common additional cost, but because there are only O(n) calls in each hierarchy, these are summarized in the O(n) coefficient). The result is that this algorithm only needs O(n log n) time.

Added by ruano84 on Sun, 02 Jan 2022 15:06:36 +0200