Python data structure and algorithm DAY 1

Data structures and algorithms (Python)

1. Introduction of concepts

If a+b+c=1000 and a2+b2=c^2 (a, b and c are natural numbers), how to find all possible combinations of a, b and c?

Enumeration method:

First attempt

import time
start_time = time.time()

# Note that it is a triple cycle
for a in range(0, 1001):
    for b in range(0, 1001):
        for c in range(0, 1001):
            if a**2 + b**2 == c**2 and a+b+c == 1000:
                print("a, b, c: %d, %d, %d" % (a, b, c))

end_time = time.time()
print("elapsed: %f" % (end_time - start_time))
print("complete!")

Operation results:

a, b, c: 0, 500, 500
a, b, c: 200, 375, 425
a, b, c: 375, 200, 425
a, b, c: 500, 0, 500
elapsed: 214.583347
complete!

It took 214.583347 seconds.

Concept of algorithm

Algorithm is the essence of computer processing information, because computer program is essentially an algorithm to tell the computer the exact steps to perform a specified task. Generally, when the algorithm is processing information, it will read the data from the input device or the storage address of the data, and write the result to the output device or a storage address for later call.

Algorithm is an independent method and idea to solve problems.

Five characteristics of the algorithm

1. Input: the algorithm has 0 or more inputs

2. Output: the algorithm has at least one or more outputs

3. Finiteness: the algorithm will automatically end after limited steps without infinite loop, and each step can be completed in an acceptable time

4. Certainty: every step in the algorithm has a definite meaning, and there will be no ambiguity

5. Feasibility: each step of the algorithm is feasible, that is, each step can be completed a limited number of times

Second attempt:

import time

start_time = time.time()

# Note that it is a double cycle
for a in range(0, 1001):
    for b in range(0, 1001-a):
        c = 1000 - a - b
        if a**2 + b**2 == c**2:
            print("a, b, c: %d, %d, %d" % (a, b, c))

end_time = time.time()
print("elapsed: %f" % (end_time - start_time))
print("complete!")

Operation results:

a, b, c: 0, 500, 500
a, b, c: 200, 375, 425
a, b, c: 375, 200, 425
a, b, c: 500, 0, 500
elapsed: 1.204760
complete!

Algorithm efficiency measurement

The execution time of the algorithm program can reflect the efficiency of the algorithm, that is, the advantages and disadvantages of the algorithm.

Simply relying on the running time to compare the advantages and disadvantages of the algorithm is not necessarily objective and accurate!

We assume that the time for the computer to perform each basic operation of the algorithm is a fixed time unit, so how many basic operations represent how many time units it will take. However, for different machine environments, the exact unit time is different, but the number of basic operations (i.e. how many time units it takes) of the algorithm is the same in the order of magnitude. Therefore, the influence of the machine environment can be ignored and the time efficiency of the algorithm can be reflected objectively.

The time efficiency of the algorithm can be expressed by "Big O notation".

"Large o notation": for monotonic integer function f, if there is an integer function g and real constant C > 0, so that for sufficiently large N, there is always f (n) < = C * g (n), that is, function g is an asymptotic function of F (ignoring the constant), which is recorded as f(n)=O(g(n)). In other words, in the sense of the limit towards infinity, the growth rate of function f is constrained by function g, that is, the characteristics of function f and function g are similar.

Time complexity: assuming that there is A function g such that the time taken by algorithm A to process the problem example with scale n is T(n)=O(g(n)), then O(g(n)) is called the asymptotic time complexity of algorithm A, referred to as time complexity for short, and recorded as T(n).

Worst time complexity

There are several possible considerations when analyzing the algorithm:

  • How many basic operations does the algorithm need to complete the work, that is, the optimal time complexity

  • How many basic operations does the algorithm need to complete the work, that is, the worst time complexity

  • The average number of basic operations required for the algorithm to complete the work, that is, the average time complexity

We mainly focus on the worst case of the algorithm, that is, the worst time complexity.

Several basic calculation rules of time complexity

1. Basic operation, that is, there is only constant term, and its time complexity is considered to be O(1)

2. Sequential structure, and the time complexity is calculated by addition

3. Loop structure, and the time complexity is calculated by multiplication

4. Branch structure, with the maximum time complexity

5. When judging the efficiency of an algorithm, we often only need to pay attention to the highest order term of the number of operations, and other secondary terms and constant terms can be ignored

6. Without special instructions, the time complexity of the algorithm we analyze refers to the worst time complexity

Time complexity of first attempt:

T(n) = O(n*n\*n) = O(n^3)

Time complexity of the second attempt:

·T(n) = O(n*n*(1+1)) = O(n*n) = O(n^2)

It can be seen that the second algorithm we try has better time complexity than the first algorithm.

Common time complexity

Note that log2n (base 2 logarithm) is often abbreviated to logn

Relationship between common time complexity

Time consumed from small to large:

O(1) < O(logn) < O(n) < O(nlogn) < O(n^2) < O(n^2logn) < O(n^3) < O(2^n) < O(n!) < O(n^n)

Python built-in type performance analysis

timeit module

The timeit module can be used to test the execution speed of a small piece of Python code.

class timeit.Timer(stmt='pass', setup='pass', timer=<timer function>)

Timer is a class that measures the execution speed of small pieces of code.

stmt parameter is the code statement to be tested;

The setup parameter is the setting required to run the code;

The timer parameter is a timer function, which is platform related.

timeit.Timer.timeit(number=1000000)

The object method in the Timer class that tests the execution speed of statements. The number parameter is the number of tests when testing the code. The default is 1000000. Method returns the average time spent executing code, in seconds of a float type.

list operation test

Test the time efficiency of different operations of list types:

def test1():
   l = []
   for i in range(1000):
      l = l + [i]
def test2():
   l = []
   for i in range(1000):
      l.append(i)
def test3():
   l = [i for i in range(1000)]
def test4():
   l = list(range(1000))
def test5():
    l = []
    for i in range(1000):
        l.extend([i])
def test6():
    l = []
    for i in range(1000):
        l.insert(0,i)
def test7():
   l = []
   for i in range(1000):
      l += [i]

from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
print("+:",t1.timeit(number=1000), "seconds")
t2 = Timer("test2()", "from __main__ import test2")
print("append:",t2.timeit(number=1000), "seconds")
t3 = Timer("test3()", "from __main__ import test3")
print("[i for i in range]:",t3.timeit(number=1000), "seconds")
t4 = Timer("test4()", "from __main__ import test4")
print("list(range()):",t4.timeit(number=1000), "seconds")
t5 = Timer("test5()", "from __main__ import test5")
print("extend:",t5.timeit(number=1000), "seconds")
t6 = Timer("test6()", "from __main__ import test6")
print("insert(0):",t6.timeit(number=1000), "seconds")
t7 = Timer("test7()", "from __main__ import test7")
print("+=:",t7.timeit(number=1000), "seconds")

Execution results:

+: 2.5817442 seconds
append: 0.12733859999999986 seconds
[i for i in range]: 0.0662126999999999 seconds
list(range()): 0.022346999999999895 seconds
extend: 0.23506410000000022 seconds
insert(0): 0.7607794999999999 seconds
+=: 0.19090319999999972 seconds

pop operation test

x = range(2000000)
pop_zero = Timer("x.pop(0)","from __main__ import x")
print("pop_zero ",pop_zero.timeit(number=1000), "seconds")
x = range(2000000)
pop_end = Timer("x.pop()","from __main__ import x")
print("pop_end ",pop_end.timeit(number=1000), "seconds")

Operation results:

('pop_zero ', 1.9101738929748535, 'seconds')
('pop_end ', 0.00023603439331054688, 'seconds')

It can be seen from the results that the efficiency of the last element of pop is much higher than that of the first element of pop

Time complexity of list built-in operations

Time complexity of built-in operation of dict

data structure

Data is an abstract concept. After classifying it, we can get the basic types in programming language. Such as int, float, char, etc. Data elements are not independent, and there are specific relationships. These relationships are structures. Data structure refers to the relationship between data elements in a data object.

Python provides us with many ready-made data structure types. The data structures defined by these systems and not defined by us are called Python's built-in data structures, such as lists, tuples and dictionaries. Some data organization methods are not directly defined in the python system. We need to define the organization methods to realize these data. These data organization methods are called Python extended data structures, such as stacks, queues, etc.

Difference between algorithm and data structure

The data structure only statically describes the relationship between data elements.

Efficient programs need to design and select algorithms based on data structures.

Program = data structure + algorithm

Conclusion: the algorithm is designed to solve practical problems, and the data structure is the problem carrier that the algorithm needs to deal with

Abstract data type

Abstract data type (ADT) refers to a mathematical model and a set of operations defined on the mathematical model. That is, data types and operations on data types are bound together for encapsulation. The purpose of introducing abstract data types is to separate the representation of data types and the implementation of operations on data types from the references of these data types and operations in programs, so as to make them independent of each other.

There are five most commonly used data operations:

  • insert
  • delete
  • modify
  • lookup
  • sort

Keywords: Python Algorithm data structure

Added by CreativityLost on Wed, 06 Oct 2021 17:56:33 +0300