Understanding all array data types in Python

About me
A thoughtful programmer ape, a lifelong learning practitioner, currently works as a team lead er in an entrepreneurship team. The technology stack involves Android, Python, Java and Go, which is also the main technology stack of our team.
Github: https://github.com/hylinux1024
Wechat Public Number: Angrycode

Array type is the basic array structure in various programming languages. This paper will review the implementation of various "array" types in Python.

  • list
  • tuple
  • array.array
  • str
  • bytes
  • bytearray

In fact, it is inaccurate to say that all the above types are arrays. Arrays are treated as a generalized concept, that is, lists, sequences and arrays are understood as array-like data types.

Note that all the code in this article runs in Python 3.7^^

0x00 Variable Dynamic list

list should be the most commonly used array type for Python. It is characterized by variable, dynamic expansion, and can store all objects in Python without specifying the type of stored elements.

It's very simple to use.

>>> arr = ["one","two","three"]
>>> arr[0]
'one'
# Dynamic expansion
>>> arr.append(4)
>>> arr
['one', 'two', 'three', 4]
# Delete an element
>>> del arr[2]
>>> arr
['one', 'two', 4]

0x01 immutable tuple

tuple operates like a list. It is characterized by immutability, can not be expanded, can store all objects in Python, use without specifying the type of elements stored.

>>> t = 'one','two',3
>>> t
('one', 'two', 3)
>>> t.append(4)
AttributeError: 'tuple' object has no attribute 'append'
>>> del t[0]
TypeError: 'tuple' object doesn't support item deletion

Tuple can use the + operator, which creates a new tuple object for storing data.

>>> t+(1,)
('one', 'two', 3, 1)
>>> tcopy = t+(1,)
>>> tcopy
('one', 'two', 3, 1)
>>> id(tcopy)
4604415336
>>> id(t)
4605245696

You can see that the addresses of the two objects are different after the tuple executes the + operator

0x02 array.array

If you want to use data structures similar to "arrays" in other languages in Python, you need to use the array module. Its characteristic is that it can change, store the same type of values, and can not store objects.

Because array specifies the element data type when it is used, it has more efficient spatial performance than list and tuple.

# When used, specify the element data type as `float'.`
>>> arr = array.array('f', (1.0, 1.5, 2.0, 2.5))
>>> arr
array('f', [1.0, 1.5, 2.0, 2.5])
# Modify an element
>>> arr[1]=12.45
>>> arr
array('f', [1.0, 12.449999809265137, 2.0, 2.5])
# Delete an element
>>> del arr[2]
>>> arr
array('f', [1.0, 12.449999809265137, 2.5])
# Add an element
>>> arr.append(4.89)
>>> arr
array('f', [1.0, 12.449999809265137, 2.5, 4.889999866485596])
# If you store a string-type data in an array of floating-point numbers, you will report an error.
>>> arr[0]='hello'
TypeError: must be real number, not str

The data types of elements in array can be referred to in the following table

Type code C Type Python Type
'b' signed char int
'B' unsigned char int
'u' Py_UNICODE Unicode character
'h' signed short int
'H' unsigned short int
'i' signed int int
'I' unsigned int int
'l' signed long int
'L' unsigned long int
'q' signed long long int
'Q' unsigned long long int
'f' float float
'd' double float

0x03 string sequence str

Python 3 uses str objects to represent a sequence of text characters (see how similar this is to string String in Java). It features an immutable Unicode character sequence.

Each element of str is a string object.

>>> s ='123abc'
>>> s
'123abc'
>>> s[0]
'1'
>>> s[2]
'3'
# Strings are immutable sequences and cannot be deleted
>>> del s[1]
TypeError: 'str' object doesn't support item deletion  
# To manipulate strings, you can convert them into list s  
>>> sn = list(s)
>>> sn
['1', '2', '3', 'a', 'b', 'c']
>>> sn.append(9)
>>> sn
['1', '2', '3', 'a', 'b', 'c', 9]
# Elements in strings are also string objects
>>> type(s[2])
<class 'str'>
>>> type(s)
<class 'str'>

str objects can also perform + operations, and it generates a new object for storage.

>>> s2 = s+'33'
>>> s2
'123abc33'
>>> id(s2)
4605193648
>>> id(s)
4552640416

0x04 bytes

bytes object is used to store byte sequence, which is characterized by immutable storage and can store 0-256 values.

>>> b = bytes([0,2,4,8])
>>> b[2]
4
>>> b
b'\x00\x02\x04\x08'
>>> b[0]=33
TypeError: 'bytes' object does not support item assignment
>>> del b[0]
TypeError: 'bytes' object doesn't support item deletion

0x05 bytearray

bytearray objects, similar to bytes, are used to store byte sequences. It is characterized by variable and dynamically expandable byte arrays.

>>> ba = bytearray((1,3,5,7,9))
>>> ba
bytearray(b'\x01\x03\x05\x07\t')
>>> ba[1]
3
# Delete an element
>>> del ba[1]
>>> ba
bytearray(b'\x01\x05\x07\t')
>>> ba[0]=2
>>> ba[0]
2
# Add an element
>>> ba.append(6)
# Only bytes can be added
>>> ba.append(s)
TypeError: 'str' object cannot be interpreted as an integer
>>> ba
bytearray(b'\x02\x05\x07\t\x06')
# The range of bytes is 0-256
>>> ba[2]=288
ValueError: byte must be in range(0, 256)

bytearray can be converted into bytes objects, but it is not very efficient.

# Converting bytearray to bytes generates a new object
>>> bn = bytes(ba)
>>> id(bn)
4604114344
>>> id(ba)
4552473544

Various types of 0x06 are transformed into each other

tuple->list

>>> tuple(l)
('a', 'b', 'c')

list->tuple

>>> t
('a', 'b', 'c')
>>> list(t)
['a', 'b', 'c']

str->list

>>> l = list('abc')
>>> l
['a', 'b', 'c']

list->str

>>> l
['a', 'b', 'c']
>>> ''.join(l)
'abc'

str->bytes

>>> s = '123'
>>> bytes(s)
TypeError: string argument without an encoding
>>> bytes(s,encoding='utf-8')
b'123'
# Or use str's encode() method
>>> s.encode()
b'123'

bytes->str

>>> b = b'124'
>>> b
b'124'
>>> type(b)
<class 'bytes'>
>>> str(b,encoding='utf-8')
'124'
# Or using bytes decode()
>>> b.decode()
'124'

Summary of 0x07

These data types are all built-in by Python. In actual development, we should select the appropriate data types according to the specific needs. For example, when the types of elements to be stored are diverse, you should use list or tuple. array.array has relatively good spatial performance, but it can only store a single type.

I believe that in many business scenarios list s or tuple s can meet the needs, but other data structures also need to be understood. When we do some basic components, we will consider the performance of the data structure, or when we read other people's code, we can do well.

0x08 Learning Materials

Keywords: Programming Python encoding Java github

Added by brainstorm on Fri, 09 Aug 2019 11:40:29 +0300