Summary of Pytorch loss function

1 nn.L1Loss

  torch.nn.L1Loss(reduction='mean')

It is MAE(mean absolute error), and the calculation formula is

    $\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left|x_{n}-y_{n}\right|$

    $\ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean'; } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right.$

Example: element by element calculation

input = torch.arange(1,7.).view(2,3)
target = torch.arange(6).view(2,3)
print(input)
print(target)
"""
tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([[0, 1, 2],
        [3, 4, 5]])
"""
loss = nn.L1Loss(reduction='sum')
output = loss(input, target)
print(output)
"""
tensor(6.)
"""
loss = nn.L1Loss(reduction='mean')
output = loss(input, target)
print(output)
"""
tensor(1.)
"""

2 nn.MSELoss

    torch.nn.MSELoss(reduction='mean')

As its name suggests, mean squared error, i.e. L2 regular term, is calculated as

  $\ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean'; } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right.$

  $\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left(x_{n}-y_{n}\right)^{2}$

There are two modes: mean and sum, which are controlled by reduction.

Example: element by element calculation

loss = nn.MSELoss(reduction="mean")
output = loss(input, target)
print(output)
"""
tensor(1.)
"""
loss = nn.MSELoss(reduction="sum")
output = loss(input, target)
print(output)
"""
tensor(6.)
"""

It can be seen from the above experiments

    $l_{n}=\left(x_{n}-y_{n}\right)^{2}$ 

Is calculated element by element.

3 nn.SmoothL1Loss

    torch.nn.SmoothL1Loss(reduction='mean', beta=1.0)

It smoothes L1 a little and is less sensitive to outlier s than mselos.

    $\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{T}$

    $l_{n}=\left\{\begin{array}{ll}0.5\left(x_{n}-y_{n}\right)^{2} / \text { beta }, & \text { if }\left|x_{n}-y_{n}\right|<\text { beta } \\\left|x_{n}-y_{n}\right|-0.5 * \text { beta }, & \text { otherwise }\end{array}\right.$

Used in fast RCNN to avoid gradient explosion.

Example: element by element calculation

loss = nn.MSELoss(reduction="sum")
output = loss(input, target)
print(output)
"""
tensor(6.)
"""
loss = nn.SmoothL1Loss(reduction="mean")
output = loss(input, target)
print(output)
"""
tensor(0.5000)
"""
loss = nn.SmoothL1Loss(reduction="mean",beta = 3)
output = loss(input, target)
print(output)
"""
tensor(0.1667)
"""

4 nn.BCELoss # and NN BCEWithLogitsLoss

    torch.nn.BCELoss(weight=None,reduction='mean')

Binary Cross Entropy, the formula is as follows:

    $\ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean'; } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right.$

    $\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right]$

The two-way cross entropy is equivalent to the two classification simplified version of the cross entropy formula, which can be used to classify non mutually exclusive multi classification tasks.

BCELoss needs to manually input sigmoid, then add $- log(exp(x)) $if the classification of each location is 1, otherwise add $- log(exp(1-x)) $and finally calculate the average.

BCEWithLogitsLoss does not need sigmoid, and everything else is exactly the same.

Example: element by element calculation.

target = torch.tensor([[1,0,1],[0,1,1]],dtype = torch.float32)
raw_output = torch.randn(2,3,dtype = torch.float32)
output = torch.sigmoid(raw_output)
print(output)

result = np.zeros((2,3))
for ix in range(2):
    for iy in range(3):
        if(target[ix, iy]==1): 
            result[ix, iy] += -np.log(output[ix, iy])
        elif(target[ix, iy]==0): 
            result[ix, iy] += -np.log(1-output[ix, iy])

print(result)
print(np.mean(result))

loss_fn = torch.nn.BCELoss(reduction='none')
print(loss_fn(output, target))
loss_fn = torch.nn.BCELoss(reduction='mean')
print(loss_fn(output, target))
loss_fn = torch.nn.BCEWithLogitsLoss(reduction='sum')
print(loss_fn(raw_output, target))
tensor([[0.5316, 0.6816, 0.4768],
        [0.6485, 0.3037, 0.5490]])

[[0.63186073 1.14431179 0.74067789]
 [1.04543173 1.19187558 0.59973639]]

0.892315685749054

tensor([[0.6319, 1.1443, 0.7407],
        [1.0454, 1.1919, 0.5997]])

tensor(0.8923)

tensor(5.3539)

5 nn.CrossEntropyLoss

     torch.nn.CrossEntropyLoss(weight=None, ignore_index=- 100, reduction='mean', label_smoothing=0.0)

Classic Loss, the calculation formula is:

    $\text { weight }[\text { class }]\left(-\log \left(\frac{\exp (x[\text { class }])}{\sum\limits_{j} \exp (x[j])}\right)\right)=\text { weight }[\text { class }]\left(-x[\text { class }]+\log \left(\sum\limits_{j} \exp (x[j])\right)\right)$

It is equivalent to mapping the output value to the space where each value is $[0,1] $, and $1 $through softmax.

It is hoped that the smaller the loss corresponding to the correct class, the better. Therefore, for $\ left (\ frac {\ exp (x [\ text {class}])} {\ sum \ limits {J} \ exp (x [J])} \ right) $, calculate $- log() $, and map $[0,1] $to $[0,+\infty] $. The greater the probability of correct items, the smaller the overall loss.

Cross entropyloss (x) in torch is equivalent to NLLLoss(LogSoftmax(x))

It is expected to input a score that has not been normalize d. The input shape is the same as NLL, which is $(N,C) and (N)$

Example: calculated according to the number of samples

target = torch.tensor([1,0,3])
output = torch.randn(3,5)
print(output)
"""
tensor([[-2.5728, -0.4581, -0.2017,  1.8813,  0.4544],
        [-0.7278,  0.6300,  0.6510, -1.7570,  1.1788],
        [-0.4660,  0.0410,  0.6876,  0.8966,  0.1446]])
"""
loss_fn = torch.nn.CrossEntropyLoss(reduction='mean')
loss = loss_fn(output, target)
print(loss)
"""
tensor(2.1940)
"""
loss_fn = torch.nn.CrossEntropyLoss(reduction='sum')
loss = loss_fn(output, target)
print(loss)
"""
tensor(6.5821)
"""

Example: handwritten version

target = torch.tensor([1,0,3])
output = torch.randn(3,5)
print(output)
"""
tensor([[-0.1168,  1.5417,  1.1748, -1.1856, -0.1233],
        [ 0.2074, -0.7376, -0.8934,  0.0899,  0.5337],
        [-0.5323, -0.2945, -0.1710,  1.5925,  1.3654]])
"""
result = np.array([0.0, 0.0, 0.0])
for ix in range(3):
    log_sum = 0.0
    for iy in range(5):
        if(iy==target[ix]): 
            result[ix] += -output[ix, iy]
        log_sum += np.exp(output[ix, iy])
    result[ix] += np.log(log_sum)
print(result)
print(np.mean(result))

loss_fn = torch.nn.CrossEntropyLoss(reduction='mean')
loss = loss_fn(output, target)
print(loss.item())
"""
[0.75984335 1.3853296  0.80614853]
0.9837738275527954
0.9837737679481506
"""

6 nn.NLLLoss

     torch.nn.NLLLoss(weight=None,ignore_index=- 100, reduction='mean')

negative log likelihood loss is used to train n-class classifiers. For unbalanced data sets, weight can be added to the category, and the calculation formula is
    $l_{n}=-w_{y_{n}} x_{n, y_{n}}$

    $-w_{c}=\text { weight }[c] \cdot 1$

Expected input shapes $(N,C) $and $(N) $, where $N $is the batch size and $C $is the number of categories;

Calculate the negative value of the probability of the category corresponding to the target of each case, and then calculate the average / sum, which is generally used with a LogSoftMax to obtain the logarithmic probability.

Example: calculated according to the number of samples

target = torch.tensor([1,0,3])
output = torch.randn(3,5)
print(output)

loss_fn = torch.nn.NLLLoss(reduction='mean')
loss = loss_fn(output, target)
print(loss)

loss_fn = torch.nn.NLLLoss(reduction='sum')
loss = loss_fn(output, target)
print(loss)
"""
tensor([[ 1.5083,  0.1846, -1.8400, -0.0068, -0.1943],
        [ 0.5303, -0.0350, -0.3924,  0.3026,  0.6159],
        [ 2.0047, -1.0653,  0.0718, -0.8632, -1.0695]])
tensor(0.0494)
tensor(0.1482)
"""

Obviously not element by element.

Example:

import torch
input=torch.randn(3,3)
soft_input = torch.nn.Softmax(dim=0)
soft_input(input)
"""
tensor([[0.2603, 0.6519, 0.5811],
        [0.5248, 0.3026, 0.1783],
        [0.2148, 0.0455, 0.2406]])
"""
#yes softmax Take results log
torch.log(soft_input(input))
"""
tensor([[-1.3458, -0.4279, -0.5428],
        [-0.6447, -1.1952, -1.7243],
        [-1.5379, -3.0898, -1.4248]])
"""

Assuming that the label is [0,1,2], the first line takes the 0 element, the second line takes the 1 element, and the third line takes the 2 element. Remove the negative sign, that is [0.3168,3.3093,0.4701], and calculate the average value to obtain the loss value.

(0.3168+3.3093+0.4701)/3
"""
1.3654000000000002
"""
loss=torch.nn.NLLLoss()
target=torch.tensor([0,1,2])
loss(input,target)
"""
tensor(-0.1395)
"""

So, NN Nllloss is calculated by averaging log(softmax)

 

reference resources: https://segmentfault.com/a/1190000038584083

Added by Ang3l0fDeath on Wed, 09 Mar 2022 03:35:53 +0200