Gradient_Descent

Gradient Descent

穷举法的缺陷:当权重的维度增加时,需要搜索的量也会爆炸增大。
分治思想:寻找局部最优,再继续寻找局部最优。面临的问题:容易陷入局部最优并非全局最优。

梯度下降算法

$\frac{\partial loss}{\partial \omega}$

update
$\omega = \omega - \alpha \frac{\partial loss}{\partial \omega}$
贪心算法思路

鞍点:梯度为0

损失函数求梯度

Update
$\omega = \omega - \alpha \frac{1}{N} \Sigma_{n-1}^{N} 2 * \omega x_{n}^2 - 2x_{n}y_{n}$

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# -*- coding: UTF-8 -*-
import numpy as np
import matplotlib.pyplot as plt

x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6.0]

w = 1

def forward(x):
return x * w

def loss(xs,ys):
loss = 0
for x,y in zip(xs,ys):
y_pred = forward(x)
loss += (y_pred - y) ** 2

return loss / len(xs)

def grad(xs,ys):
grad = 0
for x,y in zip(xs,ys):
grad += 2 * w * (x ** 2) - 2 * x * y
return grad / len(xs)


print('before training',4,forward(4))

epochs = []
losses = []
for epoch in range(100):
epochs.append(epoch)
loss_val = loss(xs=x_data,ys=y_data)
losses.append(loss_val)
grad_val = grad(xs=x_data,ys=y_data)
w -= 0.01 * grad_val
print('Epoch:',epoch,'w=',w,'loss=',loss_val)
print('after traning',4,forward(4))

plt.plot(epochs,losses)
plt.xlabel('EPOCH')
plt.ylabel('LOSS')
plt.show()

加权均值:$C_0^{‘} = C_0;C_i^{‘} = \beta C_i + (1 - \beta) C_{i-1}^{‘}$

随机梯度下降 Stochastic Gradient Descent

随机选一个样本损失,对权重求导,最后进行更新。
$\frac{\partial loss(\omega)}{\partial \omega} = 2 * \omega x_{n}^2 - 2x_{n}y_{n}$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# -*- coding: UTF-8 -*-
import numpy as np
import matplotlib.pyplot as plt

x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6.0]

w = 1

def forward(x):
return x * w

def loss(x,y):
return (forward(x) - y) ** 2

def sgd(x,y):
return 2 * x * (x * w - y)

for epoch in range(1000):
for x,y in zip(x_data,y_data):
grad = sgd(x,y)
w -= 0.01 * grad
print('\t grad',x,y,grad)
l = loss(x,y)
print('epoch:',epoch,'w=',w,'loss=',l)
print('after training',4,forward(4))

梯度下降:计算梯度时可以并行,学习的性能较弱,但是效率较高
随机梯度下降:计算梯度不可以并行,学习的性能较好,但效率较低

解决办法:Batch/Mini - batch 批量随机梯度下降

Donate comment here