Chapter 14: testing, debugging and exceptions
The experiment was great, but debugging? It's not that interesting. The fact that there is no compiler to analyze your code before Python tests the code makes testing an important part of development. The goal of this chapter is to discuss some common problems about testing, debugging, and exception handling. However, it is not a brief introduction to test driven development or unit test modules. Therefore, the author assumes that the reader is familiar with the concept of testing.
14.1 test stdout output
problem
There is a method in your program that will output to standard output (sys.stdout). That is, it prints text onto the screen. You want to write a test to prove it. Given an input, the corresponding output can be displayed normally.
Solution
Use unittest The patch() function in the mock module is very simple to use and can simulate sys. Net for a single test Stdout then rolls back without generating a large number of temporary variables or directly exposing state variables in test cases.
As an example, we define the following function in the mymodule module module:
# mymodule.py def urlprint(protocol, host, domain): url = '{}://{}.{}'.format(protocol, host, domain) print(url)
By default, the built-in print function sends the output to sys stdout . To test that the output is really there, you can simulate it with a avatar object and then use assertions to confirm the results. Use unittest The patch() method of mock module can easily replace objects in the context of test run, and automatically return to their original state when the test is completed. The following is the test code for the mymodule module module:
from io import StringIO from unittest import TestCase from unittest.mock import patch import mymodule class TestURLPrint(TestCase): def test_url_gets_to_stdout(self): protocol = 'http' host = 'www' domain = 'example.com' expected_url = '{}://{}.{}\n'.format(protocol, host, domain) with patch('sys.stdout', new=StringIO()) as fake_out: mymodule.urlprint(protocol, host, domain) self.assertEqual(fake_out.getvalue(), expected_url)
discuss
The urlprint() function accepts three parameters, and the test method will first set the value of each parameter. expected_ The URL variable is set to a string containing the desired output.
unittest. mock. The patch () function is used as a context manager, using a StringIO object instead of sys stdout . fake_ The out variable is a mock object created in the process. It can be used in the with statement to perform various checks. When the with statement ends, patch will restore everything to the state before the test starts. One thing to note is that some C extensions to Python may ignore sys Stdout configuration and write directly to standard output. Limited to space, this section will not cover this explanation. It is applicable to pure Python code. If you really need to capture I/O in the C extension, you can open a temporary file and redirect the standard output to it. For more information on capturing I/O and StringIO objects as strings, see section 5.6.
14.2 patch objects in unit test
problem
In the unit test you write, you need to patch the specified objects to assert their expected behavior in the test (for example, assert the number of parameters when called, access the specified attributes, etc.).
Solution
unittest. mock. The patch() function can be used to solve this problem. patch() can also be used as a decorator, context manager, or alone, although it is not common. For example, the following is an example of using it as a decorator:
from unittest.mock import patch import example @patch('example.func') def test1(x, mock_func): example.func(x) # Uses patched example.func mock_func.assert_called_with(x)
It can also be used as a context manager:
with patch('example.func') as mock_func: example.func(x) # Uses patched example.func mock_func.assert_called_with(x)
Finally, you can also use it to patch manually:
p = patch('example.func') mock_func = p.start() example.func(x) mock_func.assert_called_with(x) p.stop()
If possible, you can overlay decorators and context managers to patch multiple objects. For example:
@patch('example.func1') @patch('example.func2') @patch('example.func3') def test1(mock1, mock2, mock3): ... def test2(): with patch('example.patch1') as mock1, \ patch('example.patch2') as mock2, \ patch('example.patch3') as mock3: ...
discuss
patch() accepts the full pathname of an existing object and replaces it with a new value. The original value is automatically restored after the decorator function or context manager is completed. By default, all values are overridden by the MagicMock instance. For example:
>>> x = 42 >>> with patch('__main__.x'): ... print(x) ... <MagicMock name='x' id='4314230032'> >>> x 42 >>>
However, you can replace the value with anything you want by providing a second parameter to patch():
>>> x 42 >>> with patch('__main__.x', 'patched_value'): ... print(x) ... patched_value >>> x 42 >>>
The MagicMock instance used as a replacement value can simulate callable objects and instances. They record the usage information of objects and allow you to perform assertion checks, such as:
>>> from unittest.mock import MagicMock >>> m = MagicMock(return_value = 10) >>> m(1, 2, debug=True) 10 >>> m.assert_called_with(1, 2, debug=True) >>> m.assert_called_with(1, 2) Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../unittest/mock.py", line 726, in assert_called_with raise AssertionError(msg) AssertionError: Expected call: mock(1, 2) Actual call: mock(1, 2, debug=True) >>> >>> m.upper.return_value = 'HELLO' >>> m.upper('hello') 'HELLO' >>> assert m.upper.called >>> m.split.return_value = ['hello', 'world'] >>> m.split('hello world') ['hello', 'world'] >>> m.split.assert_called_with('hello world') >>> >>> m['blah'] <MagicMock name='mock.__getitem__()' id='4314412048'> >>> m.__getitem__.called True >>> m.__getitem__.assert_called_with('blah') >>>
Generally speaking, these operations will be completed in a unit test. For example, suppose you already have a function like this:
# example.py from urllib.request import urlopen import csv def dowprices(): u = urlopen('http://finance.yahoo.com/d/quotes.csv?s=@^DJI&f=sl1') lines = (line.decode('utf-8') for line in u) rows = (row for row in csv.reader(lines) if len(row) == 2) prices = { name:float(price) for name, price in rows } return prices
Normally, this function uses urlopen() to get data from the Web and parse it. In unit testing, you can give it a predefined data set. The following is an example of using patch operations:
import unittest from unittest.mock import patch import io import example sample_data = io.BytesIO(b'''\ "IBM",91.1\r "AA",13.25\r "MSFT",27.72\r \r ''') class Tests(unittest.TestCase): @patch('example.urlopen', return_value=sample_data) def test_dowprices(self, mock_urlopen): p = example.dowprices() self.assertTrue(mock_urlopen.called) self.assertEqual(p, {'IBM': 91.1, 'AA': 13.25, 'MSFT' : 27.72}) if __name__ == '__main__': unittest.main()
In this example, the urlopen() function in the example module is replaced by a mock object that returns a ByteIO() containing test data
Another point is that we used example when patching Urlopen instead of urllib request. urlopen . When you create patches, you must use their names in the test code. Because the test code uses from urllib Request import urlopen, then the urlopen() function used in the dowprices() function is actually located in the example module.
This section is actually just for unittest A taste of mock module. For more advanced features, refer to Official documents
14.3 test exceptions in unit test
problem
You want to write a test case to accurately determine whether an exception is thrown.
Solution
For testing exceptions, you can use the assertrains () method. For example, if you want to test that a function throws a ValueError exception, write as follows:
import unittest # A simple function to illustrate def parse_int(s): return int(s) class TestConversion(unittest.TestCase): def test_bad_int(self): self.assertRaises(ValueError, parse_int, 'N/A')
If you want to test the specific value of the exception, you need to use another method:
import errno class TestIO(unittest.TestCase): def test_file_not_found(self): try: f = open('/file/not/found') except IOError as e: self.assertEqual(e.errno, errno.ENOENT) else: self.fail('IOError not raised')
discuss
The assertrains () method provides a simple way to test the existence of exceptions. A common trap is to manually detect exceptions. For example:
class TestConversion(unittest.TestCase): def test_bad_int(self): try: r = parse_int('N/A') except ValueError as e: self.assertEqual(type(e), ValueError)
The problem with this approach is that it is easy to miss other situations, such as when no exception is thrown. Then you need to add another detection process, as follows:
class TestConversion(unittest.TestCase): def test_bad_int(self): try: r = parse_int('N/A') except ValueError as e: self.assertEqual(type(e), ValueError) else: self.fail('ValueError not raised')
The assertrains () method handles all the details, so you should use it.
One disadvantage of assertrains () is that it cannot measure the specific value of exceptions. To test the exception value, you can use the assertraiseregex () method, which can test both the existence of the exception and the string representation of the exception through regular matching. For example:
class TestConversion(unittest.TestCase): def test_bad_int(self): self.assertRaisesRegex(ValueError, 'invalid literal .*', parse_int, 'N/A')
Another thing that is easy to ignore about assertRaises() and assertraiseregex() is that they can also be used as context managers:
class TestConversion(unittest.TestCase): def test_bad_int(self): with self.assertRaisesRegex(ValueError, 'invalid literal .*'): r = parse_int('N/A')
But this method is useful when your test involves multiple execution steps.
14.4 log the test output to a file
problem
You want to write the output of unit tests to a file instead of printing to standard output.
Solution
A common technique for running unit tests is to add the following code snippet at the bottom of the test file:
import unittest class MyTest(unittest.TestCase): pass if __name__ == '__main__': unittest.main()
In this way, the test file is executable and the results of running the test will be printed on standard output. If you want to redirect the output, you need to modify the main() function as follows:
import sys def main(out=sys.stderr, verbosity=2): loader = unittest.TestLoader() suite = loader.loadTestsFromModule(sys.modules[__name__]) unittest.TextTestRunner(out,verbosity=verbosity).run(suite) if __name__ == '__main__': with open('testing.out', 'w') as f: main(f)
discuss
The interesting part of this section is not to redirect the test results to a file, but to show you some noteworthy internal working principles in the unittest module.
The unittest module will first assemble a test suite. This test suite contains various methods you define. Once the suite is assembled, the tests it contains can be executed.
The two steps are separated, unittest The TestLoader instance is used to assemble the test suite. loadTestsFromModule() is one of the methods it defines to collect test cases. It will scan a module for the TestCase class and extract the test methods from it. If you want fine-grained control, you can use the loadTestsFromTestCase() method to extract test methods from a class that inherits TestCase. TextTestRunner class is an example of a test run class. The main purpose of this class is to execute the test methods contained in a test suite. This class is similar to executing unittest The test runner used by the main () function is the same. However, we have made some low-level configuration for it here, including output file and promotion level. Although there is little code in this section, it can guide you to further customize the unittest framework. To customize how the test suite is assembled, you can do more on the TestLoader class. In order to customize the test run, you can construct your own test run class to simulate the function of TextTestRunner. These are beyond the scope of this section. The documentation of unittest module has a more in-depth explanation of the underlying implementation principle. You can go and have a look.
14.5 ignored or expected test failure
problem
You want to ignore or mark in unit tests that some tests will fail as expected.
Solution
unittest module has decorators that can be used to control the processing of specified test methods, such as:
import unittest import os import platform class Tests(unittest.TestCase): def test_0(self): self.assertTrue(True) @unittest.skip('skipped test') def test_1(self): self.fail('should have failed!') @unittest.skipIf(os.name=='posix', 'Not supported on Unix') def test_2(self): import winreg @unittest.skipUnless(platform.system() == 'Darwin', 'Mac specific test') def test_3(self): self.assertTrue(True) @unittest.expectedFailure def test_4(self): self.assertEqual(2+2, 5) if __name__ == '__main__': unittest.main()
If you run this code on a Mac, you will get the following output:
bash % python3 testsample.py -v test_0 (__main__.Tests) ... ok test_1 (__main__.Tests) ... skipped 'skipped test' test_2 (__main__.Tests) ... skipped 'Not supported on Unix' test_3 (__main__.Tests) ... ok test_4 (__main__.Tests) ... expected failure ---------------------------------------------------------------------- Ran 5 tests in 0.002s OK (skipped=2, expected failures=1)
discuss
The skip() decorator can be used to ignore a test you don't want to run. skipIf() and skipUnless() are useful when you want to run tests only when a specific platform or Python version or other dependencies are established. Use the @ expected failure decorator to mark those tests that are sure to fail, and for these tests you don't want the test framework to print more information.
Decorators that ignore methods can also be used to decorate the entire test class, such as:
@unittest.skipUnless(platform.system() == 'Darwin', 'Mac specific tests') class DarwinTests(unittest.TestCase): pass
14.6 handling multiple exceptions
problem
You have a code fragment that may throw multiple different exceptions. How can you handle all possible exceptions without creating a lot of duplicate code?
Solution
If you can handle different exceptions with a single code block, you can put them into a tuple, as follows:
try: client_obj.get_url(url) except (URLError, ValueError, SocketTimeout): client_obj.remove_url(url)
In this example, when any exception occurs in Yuanzu, remove will be executed_ URL () method. If you want to handle an exception differently, you can put it into another except statement:
try: client_obj.get_url(url) except (URLError, ValueError): client_obj.remove_url(url) except SocketTimeout: client_obj.handle_url_timeout(url)
Many exceptions have hierarchical relationships. In this case, you may use one of their base classes to catch all exceptions. For example, the following code:
try: f = open(filename) except (FileNotFoundError, PermissionError): pass
Can be rewritten as:
try: f = open(filename) except OSError: pass
OSError is the base class of FileNotFoundError and PermissionError exceptions.
discuss
Although there is nothing special about handling multiple exceptions, you can use the as keyword to get the reference of the thrown exception:
try: f = open(filename) except OSError as e: if e.errno == errno.ENOENT: logger.error('File not found') elif e.errno == errno.EACCES: logger.error('Permission denied') else: logger.error('Unexpected error: %d', e.errno)
In this example, the e variable points to an OSError exception instance that is thrown. This is useful when you want to further analyze the exception, such as handling it based on a status code.
At the same time, it should be noted that the except statement is checked sequentially, and the first matching statement will be executed. You can easily construct a case where multiple excepts match at the same time, such as:
>>> f = open('missing') Traceback (most recent call last): File "<stdin>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: 'missing' >>> try: ... f = open('missing') ... except OSError: ... print('It failed') ... except FileNotFoundError: ... print('File not found') ... It failed >>>
The reason why the FileNotFoundError statement here is not executed is that OSError is more general. It can match the FileNotFoundError exception, so it is the first one to match. When debugging, if you are not sure about the hierarchical relationship of the class of a specific exception, you can view the hierarchy of the exception__ mro__ Property to quickly browse. For example:
>>> FileNotFoundError.__mro__ (<class 'FileNotFoundError'>, <class 'OSError'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>) >>>
Any class in the above list up to BaseException can be used for exception statements.
14.7 capture all exceptions
problem
How do I catch all exceptions in my code?
Solution
To catch all exceptions, you can directly catch exceptions:
try: ... except Exception as e: ... log('Reason:', e) # Important!
This will catch all exceptions except SystemExit, KeyboardInterrupt, and GeneratorExit. If you still want to catch these three exceptions, change the Exception to BaseException.
discuss
All exceptions are usually caught because programmers cannot remember all possible exceptions in some complex operations. If you are not a very careful person, this is also a simple way to write code that is not easy to debug.
Because of this, if you choose to catch all exceptions, it is more important to print the exact reason somewhere (such as log file, printing exceptions to the screen). If you don't do this, sometimes you may be confused when you see abnormal printing, as follows:
def parse_int(s): try: n = int(v) except Exception: print("Couldn't parse")
Try to run this function, and the result is as follows:
>>> parse_int('n/a') Couldn't parse >>> parse_int('42') Couldn't parse >>>
Then you scratch your head and think, "what's going on?" Suppose you rewrite this function as follows:
def parse_int(s): try: n = int(v) except Exception as e: print("Couldn't parse") print('Reason:', e)
At this time, you can obtain the following output, indicating a programming error:
>>> parse_int('42') Couldn't parse Reason: global name 'v' is not defined >>>
Obviously, you should define exception handlers as accurately as possible. However, if you have to catch all exceptions, make sure you print the correct diagnostic information or propagate the exception so that you don't lose the exception.
14.8 create custom exception
problem
In the application you build, you want to wrap the underlying exceptions into custom exceptions.
Solution
Creating a new Exception is simple -- define a new class that inherits from Exception (or any existing Exception type). For example, if you write network related programs, you may define some exceptions similar to the following:
class NetworkError(Exception): pass class HostnameError(NetworkError): pass class TimeoutError(NetworkError): pass class ProtocolError(NetworkError): pass
Then users can use these exceptions as usual, for example:
try: msg = s.recv() except TimeoutError as e: ... except ProtocolError as e: ...
discuss
Custom Exception classes should always inherit from built-in Exception classes, or from classes that themselves inherit from Exception. Although all classes also inherit from BaseException, you should not use this base class to define new exceptions. BaseException is reserved for system exit exceptions, such as KeyboardInterrupt or SystemExit, and other exceptions that will send a signal to the application to exit. Therefore, catching these exceptions is meaningless in itself. In this case, if you inherit BaseException, it may cause your custom Exception not to be caught and directly send a signal to exit the program.
Introducing custom exceptions into the program can make your code more readable and clearly show who should read the code. Another design is to combine custom exceptions through inheritance. In complex applications, it is also useful to use base classes to group various exception classes. It allows users to catch a specific exception with a narrow range, such as the following:
try: s.send(msg) except ProtocolError: ...
You can also catch a wider range of exceptions, as follows:
try: s.send(msg) except NetworkError: ...
If the new exception you want to define is overridden__ init__ () method to ensure that you call exception with all parameters__ init__ (), for example:
class CustomError(Exception): def __init__(self, message, status): super().__init__(message, status) self.message = message self.status = status
It seems strange, but the default behavior of Exception is to accept all passed parameters and store them in tuples Args attribute Many other function libraries and some Python libraries must have all exceptions by default Args attribute, so if you ignore this step, you will find that sometimes the new Exception you define will not run as expected. For demonstration For the use of args, consider the following interactive session using the built-in RuntimeError ` Exception. Pay attention to the number of parameters used in the raise statement:
>>> try: ... raise RuntimeError('It failed') ... except RuntimeError as e: ... print(e.args) ... ('It failed',) >>> try: ... raise RuntimeError('It failed', 42, 'spam') ... except RuntimeError as e: ... print(e.args) ... ('It failed', 42, 'spam') >>>
For more information about creating custom exceptions, refer to the official Python documentation< https://docs.python.org/3/tutorial/errors.html >_
14.9 throw another exception after catching the exception
problem
If you want to catch an exception and throw another different exception, you have to keep the information of the two exceptions in the exception backtracking.
Solution
To link exceptions, use the raise from statement instead of the simple raise statement. It allows you to keep two exception messages at the same time. For example:
>>> def example(): ... try: ... int('N/A') ... except ValueError as e: ... raise RuntimeError('A parsing error occurred') from e ... >>> example() Traceback (most recent call last): File "<stdin>", line 3, in example ValueError: invalid literal for int() with base 10: 'N/A'
The above exception is the direct cause of the following exception:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in example RuntimeError: A parsing error occurred >>>
In the backtracking, you can see that both exceptions are caught. To catch such an exception, you can use a simple exception statement. However, you can also view the exception object by__ cause__ Property to track the exception chain. For example:
try: example() except RuntimeError as e: print("It didn't work:", e) if e.__cause__: print('Cause:', e.__cause__)
When another exception is thrown in the except block, a hidden exception chain will appear. For example:
>>> def example2(): ... try: ... int('N/A') ... except ValueError as e: ... print("Couldn't parse:", err) ... >>> >>> example2() Traceback (most recent call last): File "<stdin>", line 3, in example2 ValueError: invalid literal for int() with base 10: 'N/A'
When processing the above exception, another exception occurs:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in example2 NameError: global name 'err' is not defined >>>
In this example, you get the information of two exceptions at the same time, but the explanation of the exception is different. At this time, the NameError exception is thrown as the final exception of the program, rather than in the direct response to the parsing exception.
If you want to ignore the exception chain, you can use raise from None:
>>> def example3(): ... try: ... int('N/A') ... except ValueError: ... raise RuntimeError('A parsing error occurred') from None ... >>> example3() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in example3 RuntimeError: A parsing error occurred >>>
discuss
When designing code, you should be very careful when using raise statements in another except code block. In most cases, this raise statement should be changed to raise from statement. That is, you should use the following form:
try: ... except SomeException as e: raise DifferentException() from e
The reason for this is to link the reasons you should display. That is, DifferentException is derived directly from SomeException. This relationship can be seen from the retrospective results.
If you write code like this, you will still get a link exception, but this does not clearly explain whether the exception chain is an internal exception or an unknown programming error.
try: ... except SomeException: raise DifferentException()
When you use the raise from statement, it is clear that the second exception is thrown.
The last example hides the exception chain information. Although hiding exception chain information is not conducive to backtracking, it also loses a lot of useful debugging information. But everything is equal, and sometimes it's useful to keep only the right information.
14.10 re throw the caught exception
problem
You caught an exception in an exception block and now want to throw it again.
Solution
Simply use a separate rasie statement, for example:
>>> def example(): ... try: ... int('N/A') ... except ValueError: ... print("Didn't work") ... raise ... >>> example() Didn't work Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in example ValueError: invalid literal for int() with base 10: 'N/A' >>>
discuss
This problem is usually when you need to perform an operation after catching an exception (such as logging, cleaning, etc.), but then want to propagate the exception. A common usage is in the processor that catches all exceptions:
try: ... except Exception as e: # Process exception information in some way ... # Propagate the exception raise
14.11 output warning information
problem
You want your program to generate warnings (such as obsolete features or usage problems).
Solution
To output a warning message, use warning Warn() function. For example:
import warnings def func(x, y, logfile=None, debug=False): if logfile is not None: warnings.warn('logfile argument deprecated', DeprecationWarning) ...
The parameters of warn() are a warning message and a warning class. The warning classes are as follows: UserWarning, DeprecationWarning, SyntaxWarning, RuntimeWarning, ResourceWarning, or FutureWarning
The handling of warnings depends on how you run the interpreter and some other configurations. For example, if you use the - W all option to run Python, you will get the following output:
bash % python3 -W all example.py example.py:5: DeprecationWarning: logfile argument is deprecated warnings.warn('logfile argument is deprecated', DeprecationWarning)
Generally speaking, warnings are output to standard errors. If you want to convert a warning to an exception, you can use the - W error option:
bash % python3 -W error example.py Traceback (most recent call last): File "example.py", line 10, in <module> func(2, 3, logfile='log.txt') File "example.py", line 5, in func warnings.warn('logfile argument is deprecated', DeprecationWarning) DeprecationWarning: logfile argument is deprecated bash %
discuss
When you maintain the software and prompt the user for some information, but you don't need to raise it to the exception level, it will be very useful to output warning information. For example, suppose you are going to modify the function of a function library or framework, you can output a warning message for the part you want to change first, and be backward compatible for a period of time. You can also warn users of some problematic ways to use the code.
As another warning example of the built-in function library, the following demonstrates a warning message generated when destroying the file without closing it:
>>> import warnings >>> warnings.simplefilter('always') >>> f = open('/etc/passwd') >>> del f __main__:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'> >>>
By default, not all warning messages appear- The w option controls the output of warning messages- W all will output all warning messages, - W ignore will ignore all warnings, - W error will convert warnings to exceptions. Alternatively, you can use warnings The simplefilter() function controls the output. The always parameter causes all warning messages to appear, ` ` ignore ignores all warnings, and error ` converts warnings into exceptions.
These are sufficient for simple warning message generation. The warnings module provides a number of more advanced configuration options for filtering and warning message processing. For more information, refer to Python documentation
14.12 debugging basic program crash error
problem
How do you debug your program after it crashes?
Solution
If your program crashes because of an exception, run Python 3 - i someprogram Py can perform simple debugging- The i option allows an interactive shell to open when the program is finished. Then you can view the environment. For example, suppose you have the following code:
# sample.py def func(n): return n + 10 func('Hello')
Run Python 3 - I sample Py will have output similar to the following:
bash % python3 -i sample.py Traceback (most recent call last): File "sample.py", line 6, in <module> func('Hello') File "sample.py", line 4, in func return n + 10 TypeError: Can't convert 'int' object to str implicitly >>> func(10) 20 >>>
If you can't see the above, you can open the Python debugger after the program crashes. For example:
>>> import pdb >>> pdb.pm() > sample.py(4)func() -> return n + 10 (Pdb) w sample.py(6)<module>() -> func('Hello') > sample.py(4)func() -> return n + 10 (Pdb) print n 'Hello' (Pdb) q >>>
If your code is located in an environment where it is difficult to obtain an interactive shell (such as on a server), you can usually print the tracking information after catching exceptions. For example:
import traceback import sys try: func(arg) except: print('**** AN ERROR OCCURRED ****') traceback.print_exc(file=sys.stderr)
If your program doesn't crash, but just produces some results you don't understand, it's also a good choice to insert a print() statement where you're interested. However, if you're going to do this, there are some tips that can help you. First, traceback print_ The stack () function creates a trace stack when your program reaches that point. For example:
>>> def sample(n): ... if n > 0: ... sample(n-1) ... else: ... traceback.print_stack(file=sys.stderr) ... >>> sample(5) File "<stdin>", line 1, in <module> File "<stdin>", line 3, in sample File "<stdin>", line 3, in sample File "<stdin>", line 3, in sample File "<stdin>", line 3, in sample File "<stdin>", line 3, in sample File "<stdin>", line 5, in sample >>>
In addition, you can use PDB as follows set_ Trace() starts the debugger manually anywhere:
import pdb def func(arg): ... pdb.set_trace() ...
This is useful when the program is large and you want to debug the control flow and function parameters. For example, once the debugger starts running, you can use print to observe variable values or tap a command such as w to get tracking information.
discuss
Don't make debugging too complicated. Some simple errors can be known only by observing the program stack information. The actual error is usually the last line of the stack. When you are developing, you can also insert the print() function where you need to debug to diagnose information (just delete these print statements at the time of final release).
A common use of the debugger is to observe variables in a function that has crashed. Knowing how to get into the debugger after a function crashes is a useful skill.
When you want to dissect a very complex program and you don't know the underlying control logic, insert PDB set_ Statements like trace () are useful.
In fact, the program runs until it hits set_trace() statement, and then immediately enter the debugger. Then you can do more.
If you use the IDE for Python development, usually the IDE will provide its own debugger to replace pdb. For more information about this, please refer to the IDE manual you use.
14.13 performance test your program
problem
You want to test the time your program takes to run and do performance tests.
Solution
If you simply want to test the overall time spent by your program, you usually use the Unix time function, such as:
bash % time python3 someprogram.py real 0m13.937s user 0m12.162s sys 0m0.098s bash %
If you also need a detailed report on the details of the program, you can use the cProfile module:
bash % python3 -m cProfile someprogram.py 859647 function calls in 16.016 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 263169 0.080 0.000 0.080 0.000 someprogram.py:16(frange) 513 0.001 0.000 0.002 0.000 someprogram.py:30(generate_mandel) 262656 0.194 0.000 15.295 0.000 someprogram.py:32(<genexpr>) 1 0.036 0.036 16.077 16.077 someprogram.py:4(<module>) 262144 15.021 0.000 15.021 0.000 someprogram.py:4(in_mandelbrot) 1 0.000 0.000 0.000 0.000 os.py:746(urandom) 1 0.000 0.000 0.000 0.000 png.py:1056(_readable) 1 0.000 0.000 0.000 0.000 png.py:1073(Reader) 1 0.227 0.227 0.438 0.438 png.py:163(<module>) 512 0.010 0.000 0.010 0.000 png.py:200(group) ... bash %
But usually it's somewhere between these two extremes. For example, you already know that code runtime spends most of its time in a few functions. For performance testing of these functions, a simple decorator can be used:
# timethis.py import time from functools import wraps def timethis(func): @wraps(func) def wrapper(*args, **kwargs): start = time.perf_counter() r = func(*args, **kwargs) end = time.perf_counter() print('{}.{} : {}'.format(func.__module__, func.__name__, end - start)) return r return wrapper
To use this decorator, you only need to place it in front of the function definition you want to test the performance, such as:
>>> @timethis ... def countdown(n): ... while n > 0: ... n -= 1 ... >>> countdown(10000000) __main__.countdown : 0.803001880645752 >>>
To test the runtime of a code block, you can define a context manager, for example:
from contextlib import contextmanager @contextmanager def timeblock(label): start = time.perf_counter() try: yield finally: end = time.perf_counter() print('{} : {}'.format(label, end - start))
The following is an example of using this context manager:
>>> with timeblock('counting'): ... n = 10000000 ... while n > 0: ... n -= 1 ... counting : 1.5551159381866455 >>>
For testing the running performance of very small code fragments, it is convenient to use the timeit module, for example:
>>> from timeit import timeit >>> timeit('math.sqrt(2)', 'import math') 0.1432319980012835 >>> timeit('sqrt(2)', 'from math import sqrt') 0.10836604500218527 >>>
timeit will execute the statement in the first parameter one million times and calculate the running time. The second parameter is to configure the environment before running the test. If you want to change the number of loop executions, you can set the value of the number parameter as follows:
>>> timeit('math.sqrt(2)', 'import math', number=10000000) 1.434852126003534 >>> timeit('sqrt(2)', 'from math import sqrt', number=10000000) 1.0270336690009572 >>>
discuss
When performing performance tests, it should be noted that the results you get are approximate. time. perf_ The counter () function gets the highest precision timing value on a given platform. However, it is still based on clock time, and many factors will affect its accuracy, such as machine load. If you are more interested in execution time, use time process_ Time () instead. For example:
from functools import wraps def timethis(func): @wraps(func) def wrapper(*args, **kwargs): start = time.process_time() r = func(*args, **kwargs) end = time.process_time() print('{}.{} : {}'.format(func.__module__, func.__name__, end - start)) return r return wrapper
Finally, if you want to do more in-depth performance analysis, you need to read the documents of time, timeit and other related modules in detail. In this way, you can understand the platform related differences and some other pitfalls. You can also refer to an example of creating a timer class in section 13.13.
14.14 acceleration program operation
problem
Your program is running too slowly. You want to speed up the program without using complex technologies such as C extension or JIT compiler.
Solution
The first criterion for program optimization is "don't optimize", and the second criterion is "don't optimize those unimportant parts". If your program runs slowly, first you have to use the technology in section 14.13 to test its performance and find out the problem.
Generally speaking, you will find that your program spends a lot of time in a few hot spots, such as data processing loops in memory. Once you locate these points, you can use these practical techniques to speed up the program.
Use function
Many programmers start by writing simple scripts in Python. When writing scripts, you are usually used to writing unstructured code, such as:
# somescript.py import sys import csv with open(sys.argv[1]) as f: for row in csv.reader(f): # Some kind of processing pass
Few people know that code defined globally like this runs much slower than code defined in functions. This speed difference is due to the implementation of local variables and global variables (it is faster to use local variables). Therefore, if you want the program to run faster, you just need to put the script statement into the function:
# somescript.py import sys import csv def main(filename): with open(filename) as f: for row in csv.reader(f): # Some kind of processing pass main(sys.argv[1])
The speed difference depends on the actual running program, but according to experience, it is common to use functions to bring 15-30% performance improvement.
Remove attribute access as much as possible
Point of each use (.) Operators to access properties can incur additional overhead. It triggers specific methods, such as__ getattribute__ () and__ getattr__ (), these methods will perform dictionary operation.
Generally, you can use the import form of from module import name and the binding method. Suppose you have the following code snippet:
import math def compute_roots(nums): result = [] for n in nums: result.append(math.sqrt(n)) return result # Test nums = range(1000000) for n in range(100): r = compute_roots(nums)
When tested on our machine, the program took about 40 seconds. Now let's modify compute_ The roots() function is as follows:
from math import sqrt def compute_roots(nums): result = [] result_append = result.append for n in nums: result_append(sqrt(n)) return result
The running time of the modified version is about 29 seconds. The only difference is that attribute access is eliminated. Replace math. With sqrt() sqrt() . The result. The append () method is assigned to a local variable result_append, and then use it in the inner loop.
However, these changes only make sense in a lot of repetitive code, such as loops. Therefore, these optimizations should only be used in certain places.
Understanding local variables
As mentioned earlier, local variables run faster than global variables. For frequently accessed names, you can speed up the program by changing these names into local variables. For example, look at the previous calculation_ Modified version of the roots() function:
import math def compute_roots(nums): sqrt = math.sqrt result = [] result_append = result.append for n in nums: result_append(sqrt(n)) return result
In this version, sqrt is taken from the math module and put into a local variable. If you run this code, it takes about 25 seconds (another improvement over the previous 29 seconds). The reason for this additional acceleration is that the lookup of the local variable sqrt is faster than the global variable sqrt
The same applies to property access in classes. Generally speaking, find a value, such as self Name will be slower than accessing a local variable. In the internal loop, you can put a property that needs frequent access into a local variable. For example:
# Slower class SomeClass: ... def method(self): for x in s: op(self.value) # Faster class SomeClass: ... def method(self): value = self.value for x in s: op(value)
Avoid unnecessary abstraction
Any time you use additional processing layers (such as decorators, property access, descriptors) to wrap your code, it will slow down the program. For example, take a look at the following class:
class A: def __init__(self, x, y): self.x = x self.y = y @property def y(self): return self._y @y.setter def y(self, value): self._y = value
Now a simple test:
>>> from timeit import timeit >>> a = A(1,2) >>> timeit('a.x', 'from __main__ import a') 0.07817923510447145 >>> timeit('a.y', 'from __main__ import a') 0.35766440676525235 >>>
As you can see, accessing attribute y is more than a little slower than attribute x, about 4.5 times slower. If you care about performance, you need to re-examine whether the definition of Y's property accessor is really necessary. If not necessary, use simple attributes. It is really unnecessary to modify the code style just because other programming languages need to use getter/setter functions.
Use built-in containers
The built-in data types such as string, tuple, list, set and dictionary are implemented in C and run very fast. If you want to implement new data structures (such as link list, balanced tree, etc.), it is almost impossible to achieve the built-in speed in performance. Therefore, you'd better use the built-in.
Avoid creating unnecessary data structures or replications
Sometimes programmers want to show off and construct unnecessary data structures. For example, someone might write like this:
values = [x for x in sequence] squares = [x*x for x in values]
Perhaps the idea here is to first collect some values into a list, and then use list derivation to perform the operation. However, the first list is completely unnecessary and can be simply written as follows:
squares = [x*x for x in sequence]
Related to this, also note the code written by programs that are too paranoid about Python's shared data mechanism. Some people abuse copy because they don't understand or trust Python's memory model well Functions such as deepcopy(). Generally, the copy operation can be removed from these codes.
discuss
Before optimization, it is necessary to study the algorithm used under. Choosing an algorithm with complexity O(n log n) will bring much greater performance improvement than adjusting an algorithm with complexity O(n**2).
If you think you still have to optimize, please consider it as a whole. As a general rule, don't optimize every part of the program, because these modifications will make the code difficult to read and understand. You should focus on optimizing where performance bottlenecks occur, such as internal loops.
You should also pay attention to the results of minor optimization. For example, consider the following two ways to create a dictionary:
a = { 'name' : 'AAPL', 'shares' : 100, 'price' : 534.22 } b = dict(name='AAPL', shares=100, price=534.22)
The latter is more concise (you don't need to enter quotation marks on keywords). However, if you compare these two code fragments for performance testing, you will find that the method of using dict() is three times slower. Seeing this, do you have the impulse to replace all the code using dict() with the first one. Not enough. A smart programmer will only focus on what he should focus on, such as internal loops. Elsewhere, this performance loss has little impact.
If your optimization requirements are relatively high and these simple technologies in this section cannot be met, you can study some tools based on just in time compilation (JIT) technology. For example, the PyPy project is another implementation of the Python interpreter. It will analyze the operation of your program and generate native machine code for those frequently executed parts. It can sometimes greatly improve performance, usually close to the speed of C code. Unfortunately, PyPy can't fully support Python 3 until this book is written Therefore, this is what you need to study in the future. You can also consider the Numba project, which is a dynamic compiler when you use decorators to select Python functions for optimization. These functions are compiled into local machine code using LLVM. It can also greatly improve performance. However, like PyPy, its support for Python 3 is still in the experimental stage.
Finally, I quote John Ousterhout as an ending: "the best performance optimization is the migration from never working to working state". Think about it until you really need to optimize it. Making sure your program runs correctly is usually faster and more important than letting it run (at least at first)