Python Data Structure Advancement

In almost every Python basic tutorial, built-in data structures such as list,set,dict are mentioned. Python also has some advanced data structures, most of which are stored in collections libraries, in addition to
There is also array. Data structures such as array. You can compare them to STL s in C++.

These advanced data structures are also widely used in third-party libraries. For example, in pytorch, we can use collections.OrderedDict establishes a neural network and names each layer of the network:

model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

Next, this article highlights several of the more common advanced data structures that you can access to learn more about https://docs.python.org/3.8/library/collections

collections.namedtuple

We know that tuples can combine different types of variables like (1,'A',[0.1,0.2]). In a sense, tuples and structs in C++ have some similarities in application. However, tuples themselves cannot be named for data inside tuples, so we often don't know what a tuple means. collections.namedtuple solves this problem.

The data structure is defined as follows:

collections.namedtuple(typename, field_names)
# typename:tuple name
# field_name: The name of an element in a tuple, which can be an iterative object consisting of several strings or a string of field names separated by spaces.

See the following code for more usage:

#! /usr/bin/python3
from collections import namedtuple

# New namedtuple
Course = namedtuple('Course', ['name', 'credits', 'time'])

# Get properties of namedtuple
print(Course._fields)

# Instantiate Course
Linear_Algebra = Course('linear algebra',4,('Tue','13:30-15:05'))
print(Linear_Algebra)

# get attribute
print(Linear_Algebra.time)

# Render the namedtuple as a dictionary (strictly OrderedDict, described below)
print(Linear_Algebra._asdict())

The results are as follows:

('name', 'credits', 'time')
Course(name='linear algebra', credits=4, time=('Tue', '13:30-15:05'))
('Tue', '13:30-15:05')
{'name': 'linear algebra', 'credits': 4, 'time': ('Tue', '13:30-15:05') }

collections.OrderedDict

It is well known that dicts in Python are based on hash table s (a detailed mechanism allows you to browse the Fluent Python book), which allows dicts to achieve high-performance lookups while discarding orderliness (before Python 3.6). And collections.OrderedDict maintains the order when adding keys, ensuring the consistency of the iteration order of keys.

See the following code for its usage:

#! /usr/bin/python3
from collections import OrderedDict

print("Before deleting:")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
od['e'] = 5
od['f'] = 6

for key, value in od.items():
	print(key, value)

print("After deleting:")
od.pop('c') # Delete operation
for key, value in od.items():
	print(key, value)

print("After re-inserting:")
od['c'] = 3 # Assignment
for key, value in od.items():
	print(key, value)

The results are as follows:

Before deleting:
a 1
b 2
c 3
d 4
e 5
f 6
After deleting:
a 1
b 2
d 4
e 5
f 6
After re-inserting:
a 1
b 2
d 4
e 5
f 6
c 3

You can see that OrderedDict sorts elements according to the order in which they are placed. In addition, OrderedDict and dict are not very different.

Note: In Python 3. In versions 6 and later, all dict s become ordered, so there will be no difference between them.

collections.defaultdict

When using a dictionary, if you access a key-value pair that does not exist in the dictionary, the program will error, like this:

GPAdict = {'A+':4.0,'A':4.0}
print(GPAdict['A-'])
Traceback (most recent call last):
  File "/home/nullptr/open-source/advanced_python/1_advanced_data_structure/defaultdict_demo.py", line 2, in <module>
    print(GPAdict['A-'])
KeyError: 'A-'

And collections.defaultdict solves this problem. It can be created with a callable object (either a generic function or a lambda function, etc.) as the default choice when there are no keys in the dictionary.

from collections import defaultdict

GPAdict = defaultdict(lambda:4.0)
GPAdict['A+'] = 4.0
GPAdict['A'] = 4.0
print(GPAdict['B+'])

# The following can also be written
'''
def foo():
    return 4.0
GPAdict = defaultdict(foo)
GPAdict['A+'] = 4.0
GPAdict['A'] = 4.0
print(GPAdict['B+'])
'''

The results are as follows:

4.0

Keywords: Python data structure

Added by squiblo on Mon, 24 Jan 2022 21:23:36 +0200