pytorch learning notes (linear regression gradient decline)

A simple example of gradient descent method in machine learning

Summary of official learning materials from pytorch

y = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
x = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]

Now we have ten points. The horizontal and vertical coordinates are as shown above, which can be expressed in the figure as follows:

So what we have to do is predict a and b to generate a straight line. Fit these ten points infinitely.
That is to predict a and b in y = ax + b
We can automatically generate an A and B through the output of x to generate a predicted y '
There must be a big gap between this y 'and the actual result y. We define this gap as loss function
L = (y - y')**2
This loss function is a quadratic equation, and the road map can be drawn as follows:

So the gradient is also the reciprocal, about the reciprocal of a and b:
At this point, loss can be written as

L = (y - (ax+b))**2

There are two variables in this loss. We can find two partial derivatives, as shown in the figure below:
The partial derivative of L on a is called the gradient of a (Ga), and the partial derivative of L on b is called the gradient of b (Gb).
Our gradient descent method updates a and b according to the value of this gradient.

So how to update the values of a and b according to the gradient?

a = a - Ga
b = b - Gb 

We only need very simple subtraction to update a and b according to the gradient. When the gradient is greater than 0, we need to reduce the weight to reduce the loss, on the contrary.
So what kind of learning should we do according to this gradient? Our update process should be slow. When the ladder is too large, we need other methods to control our update speed. At this time, we introduce learning rate.

a = a - Ga * learning rate
b = b - Gb * learning rate

The learning rate determines the speed of each update. In each iteration, we can update the weight according to the gradient. After thousands of update iterations, we will get a line fitting to these ten points as shown in the figure below.

The code of pytorch is as follows:

import torch
import numpy
from matplotlib import pyplot as plt

t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]

t_c = torch.tensor(t_c)
t_u = torch.tensor(t_u)

# t_c = w * t_u + b
"""
predicted temperatures t_p/ actual measurements t_c
loss :
1. |t_p - t_c|
2. (t_p - t_c)^2
"""


def model(t_u, w, b):
    return w * t_u + b


def loss_fn(t_p, t_c):
    squared_diffs = (t_p - t_c)**2
    return squared_diffs.mean()


w = torch.ones(1)
b = torch.zeros(1)

t_p = model(t_u, w, b)

loss = loss_fn(t_p, t_c)  # tensor(1763.8846)

delta = 0.1
loss_rate_of_change_w = \
    (loss_fn(model(t_u, w+delta, b), t_c) -
     loss_fn(model(t_u, w-delta, b), t_c)) / (2.0 * delta)
"""

this code is saying that in a small neighborhood of the current values of w and b, 
a unit increase in w leads to do some change in the loss.If the change is negative ,
you need to increase w to minimize the loss, whereas if the change is positive,you 
need to decrease w.
"""
"""
you should scale the rate of change by a typically small factor, this scaling factor
has many names, the one used in machine learning is learning_rate. 
"""

learning_rate = 1e-2

w = w - learning_rate * loss_rate_of_change_w

# we can do the same with b:
loss_rate_of_change_b = \
    (loss_fn(model(t_u, w, b+delta), t_c) -
     loss_fn(model(t_u, w, b-delta), t_c)) / (2.0 * delta)

b = b - learning_rate * loss_rate_of_change_b


def loss_fn(t_p, t_c):
    squared_diffs = (t_p - t_c) ** 2
    return squared_diffs.mean()


def dloss_fn(t_p, t_c):
    dsq_diffs = 2 * (t_p - t_c)
    return dsq_diffs


def model(t_u, w, b):
    return w * t_u + b


def dmodel_dw(t_u, w, b):
    return t_u


def dmodel_db(t_u, w, b):
    return 1.0


def grad_fn(t_u, t_c, t_p, w, b):
    dloss_dw = dloss_fn(t_p, t_c) * dmodel_dw(t_u, w, b)
    dloss_db = dloss_fn(t_p, t_c) * dmodel_db(t_u, w, b)
    return torch.stack([dloss_dw.mean(), dloss_db.mean()])


def training_loop(n_epochs, learning_rate, params, t_u, t_c):
    for epoch in range(1, n_epochs+1):
        w, b = params
        t_p = model(t_u, w, b)
        loss = loss_fn(t_p, t_c)
        grad = grad_fn(t_u, t_c, t_p, w, b)
        params = params - learning_rate * grad
        print('params : ')
        print(params)
        print('grads : ')
        print(grad)
        print('Epoch %d, Loss %f' % (epoch, float(loss)))
    return params


# normalize input got the same result
# t_un = t_u/t_u.sum() * 10  # Epoch 5000, Loss 2.937128

t_un = 0.1 * t_u  # Epoch 5000, Loss 2.927648
params = training_loop(
    n_epochs= 5000,
    learning_rate=1e-2,
    params=torch.tensor([1.0, 0.0]),
    t_u=t_un,
    t_c=t_c)

t_p = model(t_un, *params)
print(t_p.detach())
fig = plt.figure(dpi=70)
plt.xlabel('x_label')
plt.ylabel('y_label')

# plt.plot(t_u.numpy(), t_p.detach().numpy())
plt.plot(t_u.numpy(), t_c.numpy(), 'o')
plt.show()

Published 21 original articles, won praise 17, visited 10000+
Private letter follow

Posted on Sat, 01 Feb 2020 05:35:23 -0800 by jarcoal