Gradient Descent
穷举法的缺陷:当权重的维度增加时,需要搜索的量也会爆炸增大。
分治思想:寻找局部最优,再继续寻找局部最优。面临的问题:容易陷入局部最优并非全局最优。
梯度下降算法
$\frac{\partial loss}{\partial \omega}$
update
$\omega = \omega - \alpha \frac{\partial loss}{\partial \omega}$
贪心算法思路
鞍点:梯度为0
损失函数求梯度
Update
$\omega = \omega - \alpha \frac{1}{N} \Sigma_{n-1}^{N} 2 * \omega x_{n}^2 - 2x_{n}y_{n}$
代码实现1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44# -*- coding: UTF-8 -*-
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6.0]
w = 1
def forward(x):
return x * w
def loss(xs,ys):
loss = 0
for x,y in zip(xs,ys):
y_pred = forward(x)
loss += (y_pred - y) ** 2
return loss / len(xs)
def grad(xs,ys):
grad = 0
for x,y in zip(xs,ys):
grad += 2 * w * (x ** 2) - 2 * x * y
return grad / len(xs)
print('before training',4,forward(4))
epochs = []
losses = []
for epoch in range(100):
epochs.append(epoch)
loss_val = loss(xs=x_data,ys=y_data)
losses.append(loss_val)
grad_val = grad(xs=x_data,ys=y_data)
w -= 0.01 * grad_val
print('Epoch:',epoch,'w=',w,'loss=',loss_val)
print('after traning',4,forward(4))
plt.plot(epochs,losses)
plt.xlabel('EPOCH')
plt.ylabel('LOSS')
plt.show()
加权均值:$C_0^{‘} = C_0;C_i^{‘} = \beta C_i + (1 - \beta) C_{i-1}^{‘}$
随机梯度下降 Stochastic Gradient Descent
随机选一个样本损失,对权重求导,最后进行更新。
$\frac{\partial loss(\omega)}{\partial \omega} = 2 * \omega x_{n}^2 - 2x_{n}y_{n}$
1 | # -*- coding: UTF-8 -*- |
梯度下降:计算梯度时可以并行,学习的性能较弱,但是效率较高
随机梯度下降:计算梯度不可以并行,学习的性能较好,但效率较低
解决办法:Batch/Mini - batch 批量随机梯度下降