Full of dry goods! 20 Python tips

1. Confusing operation

This section compares some of Python's confusing operations.

1.1 random sampling with return and random sampling without return

import random
random.choices(seq, k=1)  # The list with length k is put back for sampling
random.sample(seq, k)     # list with length k, no return sampling

1.2 parameters of lambda function

func = lambda y: x + y          # The value of x is bound when the function runs
func = lambda y, x=x: x + y     # The value of x is bound when the function is defined

1.3 copy and deepcopy

import copy
y = copy.copy(x)      # Copy only the top layer
y = copy.deepcopy(x)  # Copy all nested parts

When replication is combined with variable aliases, it is easy to confuse:

a = [1, 2, [3, 4]]

# Alias.
b_alias = a  
assert b_alias == a and b_alias is a

# Shallow copy.
b_shallow_copy = a[:]  
assert b_shallow_copy == a and b_shallow_copy is not a and b_shallow_copy[2] is a[2]

# Deep copy.
import copy
b_deep_copy = copy.deepcopy(a)  
assert b_deep_copy == a and b_deep_copy is not a and b_deep_copy[2] is not a[2]

Modifying the alias will affect the original variable. The elements in the (shallow) copy are the aliases of the elements in the original list, while the deep copy is recursive. The modification of the deep copy does not affect the original variable.

1.4 = = and is

x == y  # Do the two reference objects have the same value
x is y  # Whether two references point to the same object

1.5 judgment type

type(a) == int      # Ignoring polymorphism in object-oriented design
isinstance(a, int)  # The polymorphism in object-oriented design is considered

1.6 string search

str.find(sub, start=None, end=None); str.rfind(...)     # If not found, - 1 is returned
str.index(sub, start=None, end=None); str.rindex(...)   # Throw ValueError exception if not found

1.7 List backward index

This is just a matter of habit. The subscript starts from 0 when the forward index is used. If the reverse index also wants to start from 0, it can be used ~.

print(a[-1], a[-2], a[-3])
print(a[~0], a[~1], a[~2])

2. Common tools

2.1 reading and writing CSV files

import csv
# Reading and writing without header
with open(name, 'rt', encoding='utf-8', newline='') as f:  # newline = '' let Python not handle line breaks uniformly
    for row in csv.reader(f):
        print(row[0], row[1])  # The data read from CSV is of str type
with open(name, mode='wt') as f:
    f_csv = csv.writer(f)
    f_csv.writerow(['symbol', 'change'])

# Read and write with header
with open(name, mode='rt', newline='') as f:
    for row in csv.DictReader(f):
        print(row['symbol'], row['change'])
with open(name, mode='wt') as f:
    header = ['symbol', 'change']
    f_csv = csv.DictWriter(f, header)
    f_csv.writeheader()
    f_csv.writerow({'symbol': xx, 'change': xx})

Note that an error will be reported when the CSV file is too large:_ csv.Error: field larger than field limit (131072), solved by modifying the upper limit

import sys
csv.field_size_limit(sys.maxsize)

csv can also read data divided by \ t

f = csv.reader(f, delimiter='\t')

2.2 iterator tools

Many iterator tools are defined in itertools, such as subsequence tools:

import itertools
itertools.islice(iterable, start=None, stop, step=None)
# islice('ABCDEF', 2, None) -> C, D, E, F

itertools.filterfalse(predicate, iterable)         # Filter out elements whose predicate is False
# filterfalse(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6

itertools.takewhile(predicate, iterable)           # Stop iteration when predicate is False
# takewhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 1, 4

itertools.dropwhile(predicate, iterable)           # Start iteration when predicate is False
# dropwhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6, 4, 1

itertools.compress(iterable, selectors)            # selectors select according to whether each element is True or False
# compress('ABCDEF', [1, 0, 1, 0, 1, 1]) -> A, C, E, F

Sequence sorting:

sorted(iterable, key=None, reverse=False)

itertools.groupby(iterable, key=None)              # Grouped by value, iterable needs to be sorted first
# groupby(sorted([1, 4, 6, 4, 1])) -> (1, iter1), (4, iter4), (6, iter6)

itertools.permutations(iterable, r=None)           # The return value is Tuple
# permutations('ABCD', 2) -> AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC

itertools.combinations(iterable, r=None)           # The return value is Tuple
itertools.combinations_with_replacement(...)
# combinations('ABCD', 2) -> AB, AC, AD, BC, BD, CD

Merge multiple sequences:

itertools.chain(*iterables)                        # Direct splicing of multiple sequences
# chain('ABC', 'DEF') -> A, B, C, D, E, F

import heapq
heapq.merge(*iterables, key=None, reverse=False)   # Multiple sequences are spliced in order
# merge('ABF', 'CDE') -> A, B, C, D, E, F

zip(*iterables)                                    # When the shortest sequence is exhausted, it stops, and the result can only be consumed once
itertools.zip_longest(*iterables, fillvalue=None)  # When the longest sequence is exhausted, it stops and the result can only be consumed once

2.3 counter

The counter counts the number of occurrences of each element in an iteratable object.

import collections
# establish
collections.Counter(iterable)

# frequency
collections.Counter[key]                 # key occurrence frequency
# Returns the n elements with the highest occurrence frequency and their corresponding occurrence frequency. If n is None, returns all elements
collections.Counter.most_common(n=None)

# Insert / update
collections.Counter.update(iterable)
counter1 + counter2; counter1 - counter2  # counter addition and subtraction

# Check whether the constituent elements of two strings are the same
collections.Counter(list1) == collections.Counter(list2)

2.4 Dict with default

When accessing a nonexistent Key, defaultdict will set it to a default value.

import collections
collections.defaultdict(type)  # When dict[key] is accessed for the first time, type will be called without parameters to provide an initial value for dict[key]

2.5 ordered Dict

import collections
collections.OrderedDict(items=None)  # Preserve the original insertion order when iterating

3. High performance programming and debugging

3.1 output error and warning messages

Output information to standard error

import sys
sys.stderr.write('')

Output warning information

import warnings
warnings.warn(message, category=UserWarning)  
# The values of category include deprecationwarning, syntax warning, runtimewarning, resourcewarning and futurewarning

Controls the output of warning messages

$ python -W all     # Output all warnings, which is equivalent to setting warnings simplefilter('always')
$ python -W ignore  # Ignoring all warnings is equivalent to setting warnings simplefilter('ignore')
$ python -W error   # Converting all warnings to exceptions is equivalent to setting warnings simplefilter('error')

3.2 testing in code

Sometimes, in order to debug, we want to add some code to the code, usually some print statements, which can be written as:

# In the debug section of the code
if __debug__:
    pass

Once debugging is complete, this part of the code is ignored by executing the - O option on the command line:

$ python -0 main.py

3.3 code style check

Using pylint, you can check many code styles and syntax, and find some errors before running

pylint main.py

3.4 code time consuming

Time consuming test

$ python -m cProfile main.py

Testing a block of code takes time

# Code block definition
from contextlib import contextmanager
from time import perf_counter

@contextmanager
def timeblock(label):
    tic = perf_counter()
    try:
        yield
    finally:
        toc = perf_counter()
        print('%s : %s' % (label, toc - tic))

# Time consuming testing of code blocks
with timeblock('counting'):
    pass

Some principles of code optimization

  • Focus on optimizing where performance bottlenecks occur, not all code.
  • Avoid using global variables. The lookup of local variables is faster than that of global variables. It is usually 15% - 30% faster to define the code of global variables in functions.
  • Avoid using Access properties. Using from module import name will be faster, and the member variable of the frequently accessed class self Put member into a local variable.
  • Try to use built-in data structures. str, list, set, dict, etc. are implemented in C and run very fast.
  • Avoid creating unnecessary intermediate variables, and copy deepcopy().
  • String splicing, such as a + ':' + b + ':' + c, will create a large number of useless intermediate variables, and the efficiency of ':', join([a, b, c]) will be much higher. In addition, consider whether string splicing is necessary. For example, print(':'.join([a, b, c]) is less efficient than print(a, b, c, sep = ':').

4. Other skills

4.1 argmin and argmax

items = [2, 1, 3, 4]
argmin = min(range(len(items)), key=items.__getitem__)

The same is true for argmax.

4.2 transpose 2D list

A = [['a11', 'a12'], ['a21', 'a22'], ['a31', 'a32']]
A_transpose = list(zip(*A))  # list of tuple
A_transpose = list(list(col) for col in zip(*A))  # list of list

4.3 expand one-dimensional list into two-dimensional list

A = [1, 2, 3, 4, 5, 6]

# Preferred.
list(zip(*[iter(A)] * 2))

Keywords: Python

Added by linuxdoniv on Wed, 05 Jan 2022 17:26:27 +0200