## A simple example of gradient descent method in machine learning

Summary of official learning materials from pytorch

y = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0] x = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]

Now we have ten points. The horizontal and vertical coordinates are as shown above, which can be expressed in the figure as follows:

So what we have to do is predict a and b to generate a straight line. Fit these ten points infinitely.

That is to predict a and b in y = ax + b

We can automatically generate an A and B through the output of x to generate a predicted y '

There must be a big gap between this y 'and the actual result y. We define this gap as loss function

L = (y - y')**2

This loss function is a quadratic equation, and the road map can be drawn as follows:

So the gradient is also the reciprocal, about the reciprocal of a and b:

At this point, loss can be written as

L = (y - (ax+b))**2

There are two variables in this loss. We can find two partial derivatives, as shown in the figure below:

The partial derivative of L on a is called the gradient of a (Ga), and the partial derivative of L on b is called the gradient of b (Gb).

Our gradient descent method updates a and b according to the value of this gradient.

So how to update the values of a and b according to the gradient?

a = a - Ga b = b - Gb

We only need very simple subtraction to update a and b according to the gradient. When the gradient is greater than 0, we need to reduce the weight to reduce the loss, on the contrary.

So what kind of learning should we do according to this gradient? Our update process should be slow. When the ladder is too large, we need other methods to control our update speed. At this time, we introduce learning rate.

a = a - Ga * learning rate b = b - Gb * learning rate

The learning rate determines the speed of each update. In each iteration, we can update the weight according to the gradient. After thousands of update iterations, we will get a line fitting to these ten points as shown in the figure below.

The code of pytorch is as follows:

import torch import numpy from matplotlib import pyplot as plt t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0] t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4] t_c = torch.tensor(t_c) t_u = torch.tensor(t_u) # t_c = w * t_u + b """ predicted temperatures t_p/ actual measurements t_c loss : 1. |t_p - t_c| 2. (t_p - t_c)^2 """ def model(t_u, w, b): return w * t_u + b def loss_fn(t_p, t_c): squared_diffs = (t_p - t_c)**2 return squared_diffs.mean() w = torch.ones(1) b = torch.zeros(1) t_p = model(t_u, w, b) loss = loss_fn(t_p, t_c) # tensor(1763.8846) delta = 0.1 loss_rate_of_change_w = \ (loss_fn(model(t_u, w+delta, b), t_c) - loss_fn(model(t_u, w-delta, b), t_c)) / (2.0 * delta) """ this code is saying that in a small neighborhood of the current values of w and b, a unit increase in w leads to do some change in the loss.If the change is negative , you need to increase w to minimize the loss, whereas if the change is positive,you need to decrease w. """ """ you should scale the rate of change by a typically small factor, this scaling factor has many names, the one used in machine learning is learning_rate. """ learning_rate = 1e-2 w = w - learning_rate * loss_rate_of_change_w # we can do the same with b: loss_rate_of_change_b = \ (loss_fn(model(t_u, w, b+delta), t_c) - loss_fn(model(t_u, w, b-delta), t_c)) / (2.0 * delta) b = b - learning_rate * loss_rate_of_change_b def loss_fn(t_p, t_c): squared_diffs = (t_p - t_c) ** 2 return squared_diffs.mean() def dloss_fn(t_p, t_c): dsq_diffs = 2 * (t_p - t_c) return dsq_diffs def model(t_u, w, b): return w * t_u + b def dmodel_dw(t_u, w, b): return t_u def dmodel_db(t_u, w, b): return 1.0 def grad_fn(t_u, t_c, t_p, w, b): dloss_dw = dloss_fn(t_p, t_c) * dmodel_dw(t_u, w, b) dloss_db = dloss_fn(t_p, t_c) * dmodel_db(t_u, w, b) return torch.stack([dloss_dw.mean(), dloss_db.mean()]) def training_loop(n_epochs, learning_rate, params, t_u, t_c): for epoch in range(1, n_epochs+1): w, b = params t_p = model(t_u, w, b) loss = loss_fn(t_p, t_c) grad = grad_fn(t_u, t_c, t_p, w, b) params = params - learning_rate * grad print('params : ') print(params) print('grads : ') print(grad) print('Epoch %d, Loss %f' % (epoch, float(loss))) return params # normalize input got the same result # t_un = t_u/t_u.sum() * 10 # Epoch 5000, Loss 2.937128 t_un = 0.1 * t_u # Epoch 5000, Loss 2.927648 params = training_loop( n_epochs= 5000, learning_rate=1e-2, params=torch.tensor([1.0, 0.0]), t_u=t_un, t_c=t_c) t_p = model(t_un, *params) print(t_p.detach()) fig = plt.figure(dpi=70) plt.xlabel('x_label') plt.ylabel('y_label') # plt.plot(t_u.numpy(), t_p.detach().numpy()) plt.plot(t_u.numpy(), t_c.numpy(), 'o') plt.show()