Python usage and high performance tips summary

1. Confusing operation

This section compares some of Python's confusing operations.

1.1 random sampling with return and random sampling without return

import random
random.choices(seq, k=1)  # The list with length k is put back for sampling
random.sample(seq, k)     # list with length k, no return sampling

1.2 parameters of lambda function

func = lambda y: x + y          # The value of x is bound when the function runs
func = lambda y, x=x: x + y     # The value of x is bound when the function is defined

1.3 copy and deepcopy

import copy
y = copy.copy(x)      # Copy only the top layer
y = copy.deepcopy(x)  # Copy all nested parts

When replication is combined with variable alias, it is easy to be confused:

a = [1, 2, [3, 4]]

# Alias.
b_alias = a  
assert b_alias == a and b_alias is a

# Shallow copy.
b_shallow_copy = a[:]  
assert b_shallow_copy == a and b_shallow_copy is not a and b_shallow_copy[2] is a[2]

# Deep copy.
import copy
b_deep_copy = copy.deepcopy(a)  
assert b_deep_copy == a and b_deep_copy is not a and b_deep_copy[2] is not a[2]

The modification of the alias will affect the original variable. The elements in the (shallow) copy are the aliases of the elements in the original list, while the deep copy is a recursive copy. The modification of the deep copy does not affect the original variable.

1.4 = = and is

x == y  # Whether the two reference objects have the same value
x is y  # Whether two references point to the same object

1.5 judgment type

type(a) == int      # Ignoring polymorphism in object-oriented design
isinstance(a, int)  # The polymorphism in object-oriented design is considered

1.6 string search

str.find(sub, start=None, end=None); str.rfind(...)     # - 1 if not found
str.index(sub, start=None, end=None); str.rindex(...)   # Throw ValueError exception if not found

1.7 List backward index

This is just a matter of habit. The subscript starts from 0 when the forward index is used. If the reverse index also wants to start from 0, it can be used ~.

print(a[-1], a[-2], a[-3])
print(a[~0], a[~1], a[~2])

2. C/C + + User Guide

Many Python users have migrated from the previous C/C + +. The two languages are somewhat different in syntax and code style. This section briefly introduces them.

2.1 large numbers and small numbers

The habit of C/C + + is to define a large number. There are inf and - inf in Python:

a = float('inf')
b = float('-inf')

2.2 Boolean

The habit of C/C + + is to use 0 and non-0 values to represent True and False. Python recommends using True and False directly to represent Boolean values.

a = True
b = False

2.3 judgment is null

The habit of C/C + + in judging null pointers is if (a) and if (!a). Python's judgment on None is:

if x is None:
    pass

If if not x is used, other objects (such as strings with length of 0, lists, tuples, dictionaries, etc.) will be treated as False.

2.4 exchange value

The habit of C/C + + is to define a temporary variable to exchange values. Using the Tuple operation of Python, it can be achieved in one step.

a, b = b, a

2.5 comparison

The habit of C/C + + is to use two conditions. Python can be used in one step.

if 0 < a < 5:
    pass

2.6 Set and Get of class members

The habit of C/C + + is to Set class members to private and access their values through a series of Set and Get functions. In Python, although the corresponding Set and Get functions can also be Set through @ property, @ setter, @ delete, we should avoid unnecessary abstraction, which will be 4 - 5 times slower than direct access.

2.7 input and output parameters of function

The habit of C/C + + is to list the input and output parameters as the parameters of the function, change the value of the output parameters through the pointer, the return value of the function is the execution state, and the function caller checks the return value to judge whether it is successfully executed. In Python, there is no need for the function caller to check the return value. In case of special circumstances in the function, an exception is thrown directly.

2.8 reading documents

Compared with C/C + +, it is much easier for Python to read files. The opened file is an iteratable object that returns one line of content at a time.

with open(file_path, 'rt', encoding='utf-8') as f:
   for line in f:
       print(line)       # The end of \ n will be retained

2.9 file path splicing

The habit of C/C + + is usually to splice paths directly with +, which is easy to make mistakes. OS in Python path. Join will automatically supplement the separator between paths according to different operating systems:

import os
os.path.join('usr', 'lib', 'local')

2.10 parsing command line options

Although you can also use sys in Python like C/C + + Argv directly parses the command line selection, but it is more convenient and powerful to use the ArgumentParser tool under argparse.

2.11 calling external commands

Although OS. OS can also be used in Python like C/C + + System calls external commands directly, but uses subprocess check_ Output can freely choose whether to execute Shell or not, and can also obtain the execution results of external commands.

import subprocess
# If the return value of the external command is not 0, a subprocess. Is thrown Calledprocesserror exception
result = subprocess.check_output(['cmd', 'arg1', 'arg2']).decode('utf-8')  
# Collect standard output and standard error at the same time
result = subprocess.check_output(['cmd', 'arg1', 'arg2'], stderr=subprocess.STDOUT).decode('utf-8')  
# Execute shell commands (pipeline, redirection, etc.), you can use shlex Quote() encloses the parameter in double quotes
result = subprocess.check_output('grep python | wc > out', shell=True).decode('utf-8')

2.12 no repeated wheel making

Don't build wheels repeatedly. Python is called batteries included, which means that Python provides solutions to many common problems.

3. Common tools

3.1 reading and writing CSV files

import csv
# Reading and writing without header
with open(name, 'rt', encoding='utf-8', newline='') as f:  # newline = '' let Python not handle line breaks uniformly
    for row in csv.reader(f):
        print(row[0], row[1])  # The data read from CSV is of str type
with open(name, mode='wt') as f:
    f_csv = csv.writer(f)
    f_csv.writerow(['symbol', 'change'])

# Read and write with header
with open(name, mode='rt', newline='') as f:
    for row in csv.DictReader(f):
        print(row['symbol'], row['change'])
with open(name, mode='wt') as f:
    header = ['symbol', 'change']
    f_csv = csv.DictWriter(f, header)
    f_csv.writeheader()
    f_csv.writerow({'symbol': xx, 'change': xx})

Note that when the CSV file is too large, an error will be reported:_ csv.Error: field larger than field limit (131072), solved by modifying the upper limit

import sys
csv.field_size_limit(sys.maxsize)

csv can also read data divided by \ t

f = csv.reader(f, delimiter='\t')

3.2 iterator tools

itertools defines many iterator tools, such as subsequence tools:

import itertools
itertools.islice(iterable, start=None, stop, step=None)
# islice('ABCDEF', 2, None) -> C, D, E, F

itertools.filterfalse(predicate, iterable)         # Filter out elements whose predicate is False
# filterfalse(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6

itertools.takewhile(predicate, iterable)           # Stop iteration when predicate is False
# takewhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 1, 4

itertools.dropwhile(predicate, iterable)           # Start iteration when predicate is False
# dropwhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6, 4, 1

itertools.compress(iterable, selectors)            # selectors select according to whether each element is True or False
# compress('ABCDEF', [1, 0, 1, 0, 1, 1]) -> A, C, E, F

Sequence sorting:

sorted(iterable, key=None, reverse=False)

itertools.groupby(iterable, key=None)              # Grouped by value, iterable needs to be sorted first
# groupby(sorted([1, 4, 6, 4, 1])) -> (1, iter1), (4, iter4), (6, iter6)

itertools.permutations(iterable, r=None)           # The return value is Tuple
# permutations('ABCD', 2) -> AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC

itertools.combinations(iterable, r=None)           # Combination, the return value is Tuple
itertools.combinations_with_replacement(...)
# combinations('ABCD', 2) -> AB, AC, AD, BC, BD, CD

Merge multiple sequences:

itertools.chain(*iterables)                        # Direct splicing of multiple sequences
# chain('ABC', 'DEF') -> A, B, C, D, E, F

import heapq
heapq.merge(*iterables, key=None, reverse=False)   # Multiple sequences are spliced in sequence
# merge('ABF', 'CDE') -> A, B, C, D, E, F

zip(*iterables)                                    # When the shortest sequence is exhausted, it stops and the result can only be consumed once
itertools.zip_longest(*iterables, fillvalue=None)  # When the longest sequence is exhausted, it stops and the result can only be consumed once

3.3 counter

The counter counts the number of occurrences of each element in an iteratable object.

import collections
# establish
collections.Counter(iterable)

# frequency
collections.Counter[key]                 # key occurrence frequency
# Returns the n elements with the highest occurrence frequency and their corresponding occurrence frequency. If n is None, returns all elements
collections.Counter.most_common(n=None)

# Insert / update
collections.Counter.update(iterable)
counter1 + counter2; counter1 - counter2  # counter addition and subtraction

# Check whether the constituent elements of two strings are the same
collections.Counter(list1) == collections.Counter(list2)

3.4 Dict with default value

When accessing a Key that does not exist, defaultdict will set it to a default value.

import collections
collections.defaultdict(type)  # When accessing dict[key] for the first time, type will be called without parameters to provide an initial value for dict[key]

3.5 ordered Dict

import collections
collections.OrderedDict(items=None)  # Preserve the original insertion order during iteration

4. High performance programming and debugging

4.1 output error and warning information

Output information to standard error

import sys
sys.stderr.write('')

Output warning information

import warnings
warnings.warn(message, category=UserWarning)  
# The values of category include deprecationwarning, syntax warning, runtimewarning, resourcewarning and futurewarning

Controls the output of warning messages

$ python -W all     # Output all warnings, which is equivalent to setting warnings simplefilter('always')
$ python -W ignore  # Ignoring all warnings is equivalent to setting warnings simplefilter('ignore')
$ python -W error   # Converting all warnings to exceptions is equivalent to setting warnings simplefilter('error')

4.2 testing in code

Sometimes, in order to debug, we want to add some code to the code, usually some print statements, which can be written as:

# In the debug part of the code
if __debug__:
    pass

Once debugging is completed, this part of the code will be ignored by executing the - O option on the command line:

$ python -0 main.py

4.3 code style check

Using pylint, you can check a lot of code style and syntax, and find some errors before running

pylint main.py

4.4 code time consuming

Time consuming test

$ python -m cProfile main.py

Testing a block of code takes time

# Code block definition
from contextlib import contextmanager
from time import perf_counter

@contextmanager
def timeblock(label):
    tic = perf_counter()
    try:
        yield
    finally:
        toc = perf_counter()
        print('%s : %s' % (label, toc - tic))

# Time consuming testing of code blocks
with timeblock('counting'):
    pass

Some principles of code time-consuming optimization

  • Focus on optimizing where performance bottlenecks occur, not all code.
  • Avoid using global variables. The search of local variables is faster than that of global variables. It is usually 15% - 30% faster to define the code of global variables in functions.
  • Avoid using Access properties. Using from module import name will be faster, and the member variable of the frequently accessed class self Put member into a local variable.
  • Try to use built-in data structures. STR, list, set and dict are implemented in C and run very fast.
  • Avoid creating unnecessary intermediate variables, and copy deepcopy().
  • String splicing, such as a + ':' + b + ':' + c, will create a large number of useless intermediate variables, and the efficiency of ':', join([a, b, c]) will be much higher. In addition, it is necessary to consider whether string splicing is necessary. For example, the efficiency of print(':'.join([a, b, c]) is lower than that of print(a, b, c, sep = ':').

5. Other skills

5.1 argmin and argmax

items = [2, 1, 3, 4]
argmin = min(range(len(items)), key=items.__getitem__)

The same is true for argmax.

5.2 transpose 2D list

A = [['a11', 'a12'], ['a21', 'a22'], ['a31', 'a32']]
A_transpose = list(zip(*A))  # list of tuple
A_transpose = list(list(col) for col in zip(*A))  # list of list

5.3 expand one-dimensional list into two-dimensional list

A = [1, 2, 3, 4, 5, 6]

# Preferred.
list(zip(*[iter(A)] * 2))

Author: Zhang Hao https://zhuanlan.zhihu.com/p/48293468

Finally, let's talk about technology. Let's talk about it together.

Long press scan code to chat together

Added by nicholasstephan on Fri, 04 Mar 2022 07:55:59 +0200