1 nn.L1Loss
torch.nn.L1Loss(reduction='mean')
It is MAE(mean absolute error), and the calculation formula is
$\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left|x_{n}-y_{n}\right|$
$\ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean'; } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right.$
Example: element by element calculation
input = torch.arange(1,7.).view(2,3)
target = torch.arange(6).view(2,3)
print(input)
print(target)
"""
tensor([[1., 2., 3.],
[4., 5., 6.]])
tensor([[0, 1, 2],
[3, 4, 5]])
"""
loss = nn.L1Loss(reduction='sum')
output = loss(input, target)
print(output)
"""
tensor(6.)
"""
loss = nn.L1Loss(reduction='mean')
output = loss(input, target)
print(output)
"""
tensor(1.)
"""
2 nn.MSELoss
torch.nn.MSELoss(reduction='mean')
As its name suggests, mean squared error, i.e. L2 regular term, is calculated as
$\ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean'; } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right.$
$\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left(x_{n}-y_{n}\right)^{2}$
There are two modes: mean and sum, which are controlled by reduction.
Example: element by element calculation
loss = nn.MSELoss(reduction="mean")
output = loss(input, target)
print(output)
"""
tensor(1.)
"""
loss = nn.MSELoss(reduction="sum")
output = loss(input, target)
print(output)
"""
tensor(6.)
"""
It can be seen from the above experiments
$l_{n}=\left(x_{n}-y_{n}\right)^{2}$
Is calculated element by element.
3 nn.SmoothL1Loss
torch.nn.SmoothL1Loss(reduction='mean', beta=1.0)
It smoothes L1 a little and is less sensitive to outlier s than mselos.
$\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{T}$
$l_{n}=\left\{\begin{array}{ll}0.5\left(x_{n}-y_{n}\right)^{2} / \text { beta }, & \text { if }\left|x_{n}-y_{n}\right|<\text { beta } \\\left|x_{n}-y_{n}\right|-0.5 * \text { beta }, & \text { otherwise }\end{array}\right.$
Used in fast RCNN to avoid gradient explosion.
Example: element by element calculation
loss = nn.MSELoss(reduction="sum")
output = loss(input, target)
print(output)
"""
tensor(6.)
"""
loss = nn.SmoothL1Loss(reduction="mean")
output = loss(input, target)
print(output)
"""
tensor(0.5000)
"""
loss = nn.SmoothL1Loss(reduction="mean",beta = 3)
output = loss(input, target)
print(output)
"""
tensor(0.1667)
"""
4 nn.BCELoss # and NN BCEWithLogitsLoss
torch.nn.BCELoss(weight=None,reduction='mean')
Binary Cross Entropy, the formula is as follows:
$\ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean'; } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right.$
$\ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right]$
The two-way cross entropy is equivalent to the two classification simplified version of the cross entropy formula, which can be used to classify non mutually exclusive multi classification tasks.
BCELoss needs to manually input sigmoid, then add $- log(exp(x)) $if the classification of each location is 1, otherwise add $- log(exp(1-x)) $and finally calculate the average.
BCEWithLogitsLoss does not need sigmoid, and everything else is exactly the same.
Example: element by element calculation.
target = torch.tensor([[1,0,1],[0,1,1]],dtype = torch.float32)
raw_output = torch.randn(2,3,dtype = torch.float32)
output = torch.sigmoid(raw_output)
print(output)
result = np.zeros((2,3))
for ix in range(2):
for iy in range(3):
if(target[ix, iy]==1):
result[ix, iy] += -np.log(output[ix, iy])
elif(target[ix, iy]==0):
result[ix, iy] += -np.log(1-output[ix, iy])
print(result)
print(np.mean(result))
loss_fn = torch.nn.BCELoss(reduction='none')
print(loss_fn(output, target))
loss_fn = torch.nn.BCELoss(reduction='mean')
print(loss_fn(output, target))
loss_fn = torch.nn.BCEWithLogitsLoss(reduction='sum')
print(loss_fn(raw_output, target))
tensor([[0.5316, 0.6816, 0.4768],
[0.6485, 0.3037, 0.5490]])
[[0.63186073 1.14431179 0.74067789]
[1.04543173 1.19187558 0.59973639]]
0.892315685749054
tensor([[0.6319, 1.1443, 0.7407],
[1.0454, 1.1919, 0.5997]])
tensor(0.8923)
tensor(5.3539)
5 nn.CrossEntropyLoss
torch.nn.CrossEntropyLoss(weight=None, ignore_index=- 100, reduction='mean', label_smoothing=0.0)
Classic Loss, the calculation formula is:
$\text { weight }[\text { class }]\left(-\log \left(\frac{\exp (x[\text { class }])}{\sum\limits_{j} \exp (x[j])}\right)\right)=\text { weight }[\text { class }]\left(-x[\text { class }]+\log \left(\sum\limits_{j} \exp (x[j])\right)\right)$
It is equivalent to mapping the output value to the space where each value is $[0,1] $, and $1 $through softmax.
It is hoped that the smaller the loss corresponding to the correct class, the better. Therefore, for $\ left (\ frac {\ exp (x [\ text {class}])} {\ sum \ limits {J} \ exp (x [J])} \ right) $, calculate $- log() $, and map $[0,1] $to $[0,+\infty] $. The greater the probability of correct items, the smaller the overall loss.
Cross entropyloss (x) in torch is equivalent to NLLLoss(LogSoftmax(x))
It is expected to input a score that has not been normalize d. The input shape is the same as NLL, which is $(N,C) and (N)$
Example: calculated according to the number of samples
target = torch.tensor([1,0,3])
output = torch.randn(3,5)
print(output)
"""
tensor([[-2.5728, -0.4581, -0.2017, 1.8813, 0.4544],
[-0.7278, 0.6300, 0.6510, -1.7570, 1.1788],
[-0.4660, 0.0410, 0.6876, 0.8966, 0.1446]])
"""
loss_fn = torch.nn.CrossEntropyLoss(reduction='mean')
loss = loss_fn(output, target)
print(loss)
"""
tensor(2.1940)
"""
loss_fn = torch.nn.CrossEntropyLoss(reduction='sum')
loss = loss_fn(output, target)
print(loss)
"""
tensor(6.5821)
"""
Example: handwritten version
target = torch.tensor([1,0,3])
output = torch.randn(3,5)
print(output)
"""
tensor([[-0.1168, 1.5417, 1.1748, -1.1856, -0.1233],
[ 0.2074, -0.7376, -0.8934, 0.0899, 0.5337],
[-0.5323, -0.2945, -0.1710, 1.5925, 1.3654]])
"""
result = np.array([0.0, 0.0, 0.0])
for ix in range(3):
log_sum = 0.0
for iy in range(5):
if(iy==target[ix]):
result[ix] += -output[ix, iy]
log_sum += np.exp(output[ix, iy])
result[ix] += np.log(log_sum)
print(result)
print(np.mean(result))
loss_fn = torch.nn.CrossEntropyLoss(reduction='mean')
loss = loss_fn(output, target)
print(loss.item())
"""
[0.75984335 1.3853296 0.80614853]
0.9837738275527954
0.9837737679481506
"""
6 nn.NLLLoss
torch.nn.NLLLoss(weight=None,ignore_index=- 100, reduction='mean')
negative log likelihood loss is used to train n-class classifiers. For unbalanced data sets, weight can be added to the category, and the calculation formula is
$l_{n}=-w_{y_{n}} x_{n, y_{n}}$
$-w_{c}=\text { weight }[c] \cdot 1$
Expected input shapes $(N,C) $and $(N) $, where $N $is the batch size and $C $is the number of categories;
Calculate the negative value of the probability of the category corresponding to the target of each case, and then calculate the average / sum, which is generally used with a LogSoftMax to obtain the logarithmic probability.
Example: calculated according to the number of samples
target = torch.tensor([1,0,3])
output = torch.randn(3,5)
print(output)
loss_fn = torch.nn.NLLLoss(reduction='mean')
loss = loss_fn(output, target)
print(loss)
loss_fn = torch.nn.NLLLoss(reduction='sum')
loss = loss_fn(output, target)
print(loss)
"""
tensor([[ 1.5083, 0.1846, -1.8400, -0.0068, -0.1943],
[ 0.5303, -0.0350, -0.3924, 0.3026, 0.6159],
[ 2.0047, -1.0653, 0.0718, -0.8632, -1.0695]])
tensor(0.0494)
tensor(0.1482)
"""
Obviously not element by element.
Example:
import torch
input=torch.randn(3,3)
soft_input = torch.nn.Softmax(dim=0)
soft_input(input)
"""
tensor([[0.2603, 0.6519, 0.5811],
[0.5248, 0.3026, 0.1783],
[0.2148, 0.0455, 0.2406]])
"""
#yes softmax Take results log
torch.log(soft_input(input))
"""
tensor([[-1.3458, -0.4279, -0.5428],
[-0.6447, -1.1952, -1.7243],
[-1.5379, -3.0898, -1.4248]])
"""
Assuming that the label is [0,1,2], the first line takes the 0 element, the second line takes the 1 element, and the third line takes the 2 element. Remove the negative sign, that is [0.3168,3.3093,0.4701], and calculate the average value to obtain the loss value.
(0.3168+3.3093+0.4701)/3
"""
1.3654000000000002
"""
loss=torch.nn.NLLLoss()
target=torch.tensor([0,1,2])
loss(input,target)
"""
tensor(-0.1395)
"""
So, NN Nllloss is calculated by averaging log(softmax)
reference resources: https://segmentfault.com/a/1190000038584083