13 practical features you must know about PyTorch

1. DatasetFolder

When learning PyTorch, one of the first things people need to do is to implement some kind of Dataset. This is a low-level mistake. There is no need to waste time writing such things. Typically, a Dataset is either a data list (or a numpy array) or a file on disk. Therefore, organizing data on disk is better than writing a custom Dataset to load some strange format.

One of the most common data formats of the classifier is a directory with subfolders. Subfolders represent classes, and files in subfolders represent samples, as shown below.

folder/class_0/file1.txt
folder/class_0/file2.txt
folder/class_0/...

folder/class_1/file3.txt
folder/class_1/file4.txt

folder/class_2/file5.txt
folder/class_2/...

There is a built-in way to load this kind of data set. Whether your data is an image, text file or anything else, just use 'DatasetFolder'. Surprisingly, this class is part of the torchvision package, not the core PyTorch. This class is very comprehensive. You can filter files from folders, load them with custom code, and dynamically convert the original files. example:

from torchvision.datasets import DatasetFolder
from pathlib import Path
# I have text files in this folder
ds = DatasetFolder("/Users/marcin/Dev/tmp/my_text_dataset", 
    loader=lambda path: Path(path).read_text(),
    extensions=(".txt",), #only load .txt files
    transform=lambda text: text[:100], # only take first 100 characters
)

# Everything you need is already there
len(ds), ds.classes, ds.class_to_idx
(20, ['novels', 'thrillers'], {'novels': 0, 'thrillers': 1})

If you're processing images, there's also a torchvision datasets. Imagefolder class, which is based on DatasetLoader, is pre configured to load images.

2. Use as little as possible to(device), use "zeros"_ like / ones_like , or something

I've read a lot of PyTorch code from GitHub warehouse. What annoys me most is that there are many *. In almost every repo to(device) line, which transfers data from CPU or GPU to other places. Such statements usually appear in a large number of repos or beginner tutorials. I strongly recommend that you implement such operations as little as possible and rely on the built-in PyTorch function to automatically implement such operations. Use everywhere to(device) usually leads to performance degradation and exceptions:

Expected object of device type cuda but got device type cpu

Obviously, there are some cases where you can't avoid it, but most, if not all, are here. One case is to initialize a tensor of all 0 or all 1, which often occurs when the deep neural network calculates the loss. The output of the model is already on cuda. You need another tensor on cuda. At this time, you can use it*_ like operator:

my_output # on any device, if it's cuda then my_zeros will also be on cuda
my_zeros = torch.zeros_like(my_output_from_model)

Internally, PyTorch calls the following operations:

my_zeros = torch.zeros(my_output.size(), dtype=my_output.dtype, layout=my_output.layout, device=my_output.device)

Therefore, all settings are correct, which reduces the probability of errors in the code. Similar operations include:

torch.zeros_like()
torch.ones_like()
torch.rand_like()
torch.randn_like()
torch.randint_like()
torch.empty_like()
torch.full_like()

3. Register Buffer ( nn.Module.register_buffer)

This will be my advice to people not to use it everywhere to(device). Sometimes, your model or loss function needs to have preset parameters and use them when calling forward. For example, it can be a "weight" parameter, which can scale the loss or some fixed tensors. It will not change, but it will be used every time. In this case, use NN Module. register_ Buffer method, which tells PyTorch to store the values passed to it in the module and move these values with the module. If you initialize your module and move it to the GPU, these values will also move automatically. In addition, if you save the state of the module, buffers will also be saved!

Once registered, these values can be accessed in the forward function, just like the properties of other modules.

from torch import nn
import torch

class ModuleWithCustomValues(nn.Module):
    def __init__(self, weights, alpha):
        super().__init__()
        self.register_buffer("weights", torch.tensor(weights))
        self.register_buffer("alpha", torch.tensor(alpha))
    
    def forward(self, x):
        return x * self.weights + self.alpha

m = ModuleWithCustomValues(
    weights=[1.0, 2.0], alpha=1e-4
)
m(torch.tensor([1.23, 4.56]))
tensor([1.2301, 9.1201])

4. Built-in Identity()

Sometimes, when you use migration learning, you need to replace some layers with 1:1 mapping. You can use NN Module to achieve this purpose, only return the input value. PyTorch has this class built in.

For example, you want to get the image representation from a pre trained ResNet50 before the classification layer. Here's how to do this:

from torchvision.models import resnet50
model = resnet50(pretrained=True)
model.fc = nn.Identity()
last_layer_output = model(torch.rand((1, 3, 224, 224)))
last_layer_output.shape
torch.Size([1, 2048])

5. Pairwise distances: torch.cdist

The next time you have a problem calculating the Euclidean distance (or, in general, the p-norm) between two tensors, remember torch cdist. It does this and automatically uses matrix multiplication when using Euclidean distances, which improves performance.

points1 = torch.tensor([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]])
points2 = torch.tensor([[0.0, 0.0], [-1.0, -1.0], [-2.0, -2.0], [-3.0, -3.0]]) # batches don't have to be equal
torch.cdist(points1, points2, p=2.0)
tensor([[0.0000, 1.4142, 2.8284, 4.2426],
        [1.4142, 2.8284, 4.2426, 5.6569],
        [2.8284, 4.2426, 5.6569, 7.0711]])

There is no matrix multiplication or the performance of matrix multiplication. When using mm on my machine, the speed is more than twice as fast.

%%timeit
points1 = torch.rand((512, 2))
points2 = torch.rand((512, 2))
torch.cdist(points1, points2, p=2.0, compute_mode="donot_use_mm_for_euclid_dist")

867µs±142µs per loop (mean±std. dev. of 7 run, 1000 loop each)

%%timeit
points1 = torch.rand((512, 2))
points2 = torch.rand((512, 2))
torch.cdist(points1, points2, p=2.0)

417µs±52.9µs per loop (mean±std. dev. of 7 run, 1000 loop each)

6. Cosine similarity: F.cosine_similarity

As with the previous point, calculating the Euclidean distance is not always what you need. When dealing with vectors, cosine similarity is usually the measure of choice. PyTorch also has a built-in cosine similarity implementation.

import torch.nn.functional as F
vector1 = torch.tensor([0.0, 1.0])
vector2 = torch.tensor([0.05, 1.0])
print(F.cosine_similarity(vector1, vector2, dim=0))
vector3 = torch.tensor([0.0, -1.0])
print(F.cosine_similarity(vector1, vector3, dim=0))
tensor(0.9988)
tensor(-1.)

Batch calculation of cosine distance in PyTorch

import torch.nn.functional as F
batch_of_vectors = torch.rand((4, 64))
similarity_matrix = F.cosine_similarity(batch_of_vectors.unsqueeze(1), batch_of_vectors.unsqueeze(0), dim=2)
similarity_matrix
tensor([[1.0000, 0.6922, 0.6480, 0.6789],
        [0.6922, 1.0000, 0.7143, 0.7172],
        [0.6480, 0.7143, 1.0000, 0.7312],
        [0.6789, 0.7172, 0.7312, 1.0000]])

7. Normalization vector: F.normalize

The last point is still loosely related to vector and distance, that is normalization: it is usually to improve the stability of calculation by changing the size of vector. The most commonly used normalization is L2, which can be applied in PyTorch as follows:

vector = torch.tensor([99.0, -512.0, 123.0, 0.1, 6.66])
normalized_vector = F.normalize(vector, p=2.0, dim=0)
normalized_vector
tensor([ 1.8476e-01, -9.5552e-01,  2.2955e-01,  1.8662e-04,  1.2429e-02])

The old method of performing normalization in PyTorch was:

vector = torch.tensor([99.0, -512.0, 123.0, 0.1, 6.66])
normalized_vector = vector / torch.norm(vector, p=2.0)
normalized_vector
tensor([ 1.8476e-01, -9.5552e-01,  2.2955e-01,  1.8662e-04,  1.2429e-02])

L2 normalization in batch in PyTorch

batch_of_vectors = torch.rand((4, 64))
normalized_batch_of_vectors = F.normalize(batch_of_vectors, p=2.0, dim=1)
normalized_batch_of_vectors.shape, torch.norm(normalized_batch_of_vectors, dim=1) # all vectors will have length of 1.0
(torch.Size([4, 64]), tensor([1.0000, 1.0000, 1.0000, 1.0000]))

8. Linear layer + blocking technique (torch.chunk)

This is a creative skill I recently discovered. Suppose you want to map your input to N different linear projections. You can create N nn Linear to do this. Or you can create a single linear layer, do a forward pass, and then divide the output into N blocks. This approach usually leads to higher performance, so it's a skill worth remembering.

d = 1024
batch = torch.rand((8, d))
layers = nn.Linear(d, 128, bias=False), nn.Linear(d, 128, bias=False), nn.Linear(d, 128, bias=False)
one_layer = nn.Linear(d, 128 * 3, bias=False)
%%timeit
o1 = layers[0](batch)
o2 = layers[1](batch)
o3 = layers[2](batch)

289 µs ± 30.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
o1, o2, o3 = torch.chunk(one_layer(batch), 3, dim=1)

202 µs ± 8.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

9. Masked select (torch.masked_select)

Sometimes you only need to calculate part of the input tensor. Let me give you an example: the loss you want to calculate is only on the tensor that satisfies some conditions. To do this, you can use torch masked_ Select. Note that this operation can also be used when gradient is required.

data = torch.rand((3, 3)).requires_grad_()
print(data)
mask = data > data.mean()
print(mask)
torch.masked_select(data, mask)
tensor([[0.0582, 0.7170, 0.7713],
        [0.9458, 0.2597, 0.6711],
        [0.2828, 0.2232, 0.1981]], requires_grad=True)
tensor([[False,  True,  True],
        [ True, False,  True],
        [False, False, False]])
tensor([0.7170, 0.7713, 0.9458, 0.6711], grad_fn=<MaskedSelectBackward>)

Apply mask directly on tensor

Similar behavior can be achieved by using mask as the "indexer" of the input tensor.

data[mask]
tensor([0.7170, 0.7713, 0.9458, 0.6711], grad_fn=<IndexBackward>)

Sometimes, an ideal solution is to fill all False values in the mask with 0. You can do this:

data * mask
tensor([[0.0000, 0.7170, 0.7713],
        [0.9458, 0.0000, 0.6711],
        [0.0000, 0.0000, 0.0000]], grad_fn=<MulBackward0>)

10. Use {torch Where to condition tensors

This function is useful when you want to combine two tensors under one condition. If the condition is true, take the element from the first tensor. If the condition is false, take the element from the second tensor.

x = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0], requires_grad=True)
y = -x
condition_or_mask = x <= 3.0
torch.where(condition_or_mask, x, y)
tensor([ 1.,  2.,  3., -4., -5.], grad_fn=<SWhereBackward>)

11. Fill in the tensor at a given position (Tensor.scatter)

The use case of this function is as follows. You want to fill in a tensor with the value of another tensor set for the location. One dimensional tensor is easier to understand, so I'll show it first, and then continue with more advanced examples.

data = torch.tensor([1, 2, 3, 4, 5])
index = torch.tensor([0, 1])
values = torch.tensor([-1, -2, -3, -4, -5])
data.scatter(0, index, values)
tensor([-1, -2,  3,  4,  5])

The above example is very simple, but now let's see if we change index to index = torch What happens to tensor ([0, 1, 4]):

data = torch.tensor([1, 2, 3, 4, 5])
index = torch.tensor([0, 1, 4])
values = torch.tensor([-1, -2, -3, -4, -5])
data.scatter(0, index, values)
tensor([-1, -2,  3,  4, -3])

Why is the last value - 3? It's counter intuitive, isn't it? This is the central idea of the PyTorch scatter function. The index variable indicates where the ith value of the data tensor should be placed in the values tensor. i hope you can understand the following operation of python:

data_orig = torch.tensor([1, 2, 3, 4, 5])
index = torch.tensor([0, 1, 4])
values = torch.tensor([-1, -2, -3, -4, -5])
scattered = data_orig.scatter(0, index, values)

data = data_orig.clone()
for idx_in_values, where_to_put_the_value in enumerate(index):
    what_value_to_put = values[idx_in_values]
    data[where_to_put_the_value] = what_value_to_put
data, scattered
(tensor([-1, -2,  3,  4, -3]), tensor([-1, -2,  3,  4, -3]))

PyTorch scatter example of 2D data

Always remember that the shape of index is related to the shape of values, and the value in index corresponds to the position in data.

data = torch.zeros((4, 4)).float()
index = torch.tensor([
    [0, 1],
    [2, 3],
    [0, 3],
    [1, 2]
])
values = torch.arange(1, 9).float().view(4, 2)
values, data.scatter(1, index, values)
(tensor([[1., 2.],
        [3., 4.],
        [5., 6.],
        [7., 8.]]),
tensor([[1., 2., 0., 0.],
        [0., 0., 3., 4.],
        [5., 0., 0., 6.],
        [0., 7., 8., 0.]]))

12. Image interpolation in the network (F.interpolate)

When I learned PyTorch, I was surprised to find that in fact, I could adjust the image (or any intermediate Zhang Liang) in the forward transfer and maintain the gradient flow. This method is particularly useful when using CNN and GANs.

# image from https://commons.wikimedia.org/wiki/File:A_female_British_Shorthair_at_the_age_of_20_months.jpg
img = Image.open("./cat.jpg")
img

to_pil_image(
    F.interpolate(to_tensor(img).unsqueeze(0),  # batch of size 1
                  mode="bilinear", 
                  scale_factor=2.0, 
                  align_corners=False).squeeze(0) # remove batch dimension
)

See how gradient flows are saved:

F.interpolate(to_tensor(img).unsqueeze(0).requires_grad_(),
                  mode="bicubic", 
                  scale_factor=2.0, 
                  align_corners=False)
tensor([[[[0.9216, 0.9216, 0.9216,  ..., 0.8361, 0.8272, 0.8219],
    [0.9214, 0.9214, 0.9214,  ..., 0.8361, 0.8272, 0.8219],
    [0.9212, 0.9212, 0.9212,  ..., 0.8361, 0.8272, 0.8219],
    ...,
    [0.9098, 0.9098, 0.9098,  ..., 0.3592, 0.3486, 0.3421],
    [0.9098, 0.9098, 0.9098,  ..., 0.3566, 0.3463, 0.3400],
    [0.9098, 0.9098, 0.9098,  ..., 0.3550, 0.3449, 0.3387]],

    [[0.6627, 0.6627, 0.6627,  ..., 0.5380, 0.5292, 0.5238],
    [0.6626, 0.6626, 0.6626,  ..., 0.5380, 0.5292, 0.5238],
    [0.6623, 0.6623, 0.6623,  ..., 0.5380, 0.5292, 0.5238],
    ...,
    [0.6196, 0.6196, 0.6196,  ..., 0.3631, 0.3525, 0.3461],
    [0.6196, 0.6196, 0.6196,  ..., 0.3605, 0.3502, 0.3439],
    [0.6196, 0.6196, 0.6196,  ..., 0.3589, 0.3488, 0.3426]],

    [[0.4353, 0.4353, 0.4353,  ..., 0.1913, 0.1835, 0.1787],
    [0.4352, 0.4352, 0.4352,  ..., 0.1913, 0.1835, 0.1787],
    [0.4349, 0.4349, 0.4349,  ..., 0.1913, 0.1835, 0.1787],
    ...,
    [0.3333, 0.3333, 0.3333,  ..., 0.3827, 0.3721, 0.3657],
    [0.3333, 0.3333, 0.3333,  ..., 0.3801, 0.3698, 0.3635],
    [0.3333, 0.3333, 0.3333,  ..., 0.3785, 0.3684, 0.3622]]]],
grad_fn=<UpsampleBicubic2DBackward1>)

13. Make the image into a grid (torchvision.utils.make_grid)

When using PyTorch and torchvision, you do not need to use matplotlib or some external libraries to copy and paste code to display the image grid. Just use torchvision utils. make_ Just grid.

from torchvision.utils import make_grid
from torchvision.transforms.functional import to_tensor, to_pil_image
from PIL import Image
img = Image.open("./cat.jpg")
to_pil_image(
    make_grid(
        [to_tensor(i) for i in [img, img, img]],
         nrow=2, # number of images in single row
         padding=5 # "frame" size
     )
)

Reference website:

  1.  https://zhuanlan.zhihu.com/p/414682349

Keywords: Python Pytorch Data Mining

Added by fitzbean on Thu, 10 Mar 2022 15:33:35 +0200