Data structure learning path subsequence optimization

**The content of this blog is based on the data structure of Mooc Zhejiang University, lesson 1.3, what is algorithm?
If you have any questions about the algorithm in the blog, please correct it.**

Problem: subsequence problem

There is a sequence N, which contains [A1,.... AN] integers. The maximum continuous subsequence sum is obtained by solving.

Example: 
N: [1, 2, -1]
Several successive subsequences:
[1]
[2]
[-1]
[1, 2]
[2, -1]
[1, 2, -1]
The largest continuous subsequence is 3.

Facing this problem, the most violent way is to exhaustively get all subsequences, and then get the largest sum of subsequences.

C1 cycle method:

def get_max_sequence_sum(nums: list) -> int:
    length = len(nums)
    this_sum = max_sum = 0
    # Starting index of ergodic subsequence
    for first_index in range(length):
        # Termination index of ergodic subsequence
        for end_index in range(first_index, length):
            # Calculate the sum of current subsequences
            for num in nums[first_index: end_index+1]:
                this_sum += num
            max_sum = this_sum if this_sum > max_sum else max_sum
            this_sum = 0
    return max_sum    

For this algorithm, three cycles are carried out in the calculation process, T(n) = n ^ 3

Obviously, when calculating the sum of subsequences, we can get the sum of new subsequences by adding the previous subsequence and the next integer.

C2 cycle method 2.0

def get_max_sequence_sum(nums: list) -> int:
    length = len(nums)
    this_sum = max_sum = 0
    # Starting index of ergodic subsequence
    for first_index in range(length):
        # Termination index of ergodic subsequence
        for end_index in range(first_index, length):
            # The sum of the current subsequence is the sum of the previous subsequence plus the current integer
            this_sum += nums[end_index]
            max_sum = this_sum if this_sum > max_sum else max_sum
        this_sum = 0
    return max_sum

By optimizing the sum of subsequences, we can reduce the cycle of calculating the sum of subsequences, T(n) = n ^ 2

After the optimization of reducing cycle, we can further use the idea of divide and conquer to optimize this algorithm.

For the maximum subsequence and problem, we can decompose the problem into

Divide the sequence into left and right sequences according to the middle
 Then the sum of the largest subsequences of the left subsequence and the right subsequence and the sum of the largest subsequences obtained by scanning from the middle to both sides are calculated.


The first step is to get the left and right sequences [4, - 3, 5, 2], [- 1, 2, 6, 2]
The second step is to calculate the sum of the largest subsequences for the left and right sequences
      The left sequence continues to decompose into [4, - 1], [5, 2]
      Left left continues to decompose into [4], [- 1]; [5], [2]
      The maximum sum of left left left sequence is 4.
      The maximum sum of left and right sequences is 5
      Then the left sequence [4, - 3, ︱ 5, 2] obtains the maximum sum of subsequences from the dividing line to both sides as 6.
      So the sum of the largest subsequences of the left sequence is 6.
      So the sum of the largest subsequences of the right sequence is 8.
Step 3: the maximum sum of the initial sequence scanned from the dividing line to the left and right sides is 11
 Step 4: compare the sum of the three parts to get the maximum value of 11
 (in writing, it's more complicated QAQ. If you can't understand it, you can see the video of the reference course teacher.)

The idea of divide and conquer method is to decompose the problem and then combine the results of subproblems to get the final result.

So we can get the third algorithm.

C3 divide and conquer method

def get_max_sequence_sum(nums: list) -> int:

    length = len(nums)
    middle_index = length // 2
    middle_to_left_this_sequence_sum = middle_to_left_max_sequence_sum = 0
    middle_to_right_this_sequence_sum = middle_to_right_max_sequence_sum = 0

    # Recursive termination condition
    if length == 0:
        return 0
    if length == 1:
        return nums[0] if nums[0] > 0 else 0
    # The sum of the largest subsequences of the left and right subsequences is obtained recursively
    left_max_sequence_sum = get_max_sequence_sum(nums[:middle_index])
    right_max_sequence_sum = get_max_sequence_sum(nums[middle_index:])

    # Calculate the sum of the largest subsequences scanned from the middle to both sides

    # Scan from the middle to the left to get the largest
    for num in nums[:middle_index]:
        middle_to_left_this_sequence_sum += num
        middle_to_left_max_sequence_sum = middle_to_left_this_sequence_sum if \
            middle_to_left_this_sequence_sum > middle_to_left_max_sequence_sum else \
            middle_to_left_max_sequence_sum

    # Scan from the middle to the right to get the largest
    for num in nums[middle_index:]:
        middle_to_right_this_sequence_sum += num
        middle_to_right_max_sequence_sum = middle_to_right_this_sequence_sum if \
            middle_to_right_this_sequence_sum > middle_to_right_max_sequence_sum else \
            middle_to_right_max_sequence_sum

    middle_max_sequence_sum = middle_to_left_max_sequence_sum + middle_to_right_max_sequence_sum
    return max([left_max_sequence_sum, right_max_sequence_sum, middle_max_sequence_sum])

With this method, the problem will be decomposed in half every time, and then the whole sequence will be scanned, so the final T(n) = nlogn.

In general, we can get a more optimal solution by using the divide and conquer method, but for some problems we can also optimize the problem through more sophisticated mathematical thinking.

For example, for the calculation of the sum of subsequences, because it is a continuous subsequence, if we get a subsequence and < 0, then we can completely discard this subsequence, and directly continue to get a new sum of subsequences from the next number. This approach is also called "online problem solving."

C4 online processing

def get_max_sequence_sum_4(nums: list) -> int:
    this_sum = max_sum = 0
    for num in nums:
        this_sum += num
        max_sum = this_sum if this_sum > max_sum else max_sum
        # If the current subsequence and < 0 discard this subsequence directly
        if this_sum < 0:
            this_sum = 0
    return max_sum

We can see that the average time complexity of this algorithm is n.

Reference resources

Mooc data structure

Keywords: Python

Added by TRI0N on Mon, 28 Oct 2019 04:12:27 +0200