梯度下降法以及随机梯度下降法
梯度下降法就是在更新weight的时候,向函数值下降的最快方向进行更新,具体的原理我就不再写了,就是一个求偏导的过程,有高数基础的都能够很快的理解过程。我在我的github里面会一直更新自己学习pytorch的过程,地址为: https://github.com/00paning/Pytorch_Learning
这里我直接展示一个简易实现的python代码,我们还是先看一下运行的效果图:
相关python代码如下:
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6.0]
w = 1.0
epoch_list = []
cost_list = []
def forward(x):
return x*w
def cost(xs,ys):
cost = 0
for x,y in zip(xs,ys):
y_pred = forward(x)
cost += (y_pred - y)**2
return cost/len(xs)
def gradient(xs,ys):
grad = 0
for x,y in zip(xs,ys):
grad += 2 * (x * w - y)
return grad/len(xs)
print('Predict(before training)',4,forward(4))
for epoch in range(100):
cost_val = cost(x_data,y_data)
grad_val = gradient(x_data,y_data)
w -= 0.01 *grad_val
epoch_list.append(epoch)
cost_list.append(cost_val)
print('Epoch:',epoch,'w=',w,'loss=',cost_val)
print('Predict(after training)',4,forward(4))
plt.plot(epoch_list,cost_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()
下面是随机梯度下降法,由于鞍点的存在(就是导数为0的点),当训练遇到鞍点的时候,根据weight更新的公式来看,就会一直陷在鞍点里面,这时候,可以使用随机梯度下降法。它相比与梯度下降法的改变就是在计算loss的时候,本方法每次计算loss只选取一个点计算loss。相比与梯度下降法,这个方法的运行效率是很低的,因为它不能并行计算每个x的值,但是准确性得到了提升。下面还是先看一下效果图:
相关的python代码如下:
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6.0]
w = 1.0
epoch_list = []
cost_list = []
def forward(x):
return x*w
def loss(x,y):
y_pred = forward(x)
return (y_pred - y)**2
def gradient(x,y):
return 2 * (x * w - y)
print('Predict(before training)',4,forward(4))
for epoch in range(100):
for x,y in zip(x_data,y_data):
grad = gradient(x,y)
w = w - 0.01 *grad
l = loss(x,y)
cost_list.append(l)
epoch_list.append(epoch)
print('Epoch:',epoch,'w=',w,'loss=',l)
print('Predict(after training)',4,forward(4))
plt.plot(epoch_list,cost_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()