Gradient descent algorithm
Gradient descent algorithm(梯度下降)
Repeat until convergence
w= w - a
∂
∂
w
J
(
w
,
b
)
\frac{\partial}{\partial w}J(w,b)
∂w∂J(w,b)
learning rate derivative
a refer learning rate ,this is Greek symbol Alpha
学习率控制更新模型参数w和b时采取的步骤大小
∂
∂
w
J
(
w
,
b
)
\frac{\partial}{\partial w}J(w,b)
∂w∂J(w,b) 这个是一个导数
b= b- a
∂
∂
b
J
(
w
,
b
)
\frac{\partial}{\partial b}J(w,b)
∂b∂J(w,b)
simultaneously update w and b 同时更新 w 和 b
tmp_w= w - a
∂
∂
w
J
(
w
,
b
)
\frac{\partial}{\partial w}J(w,b)
∂w∂J(w,b)
tmp_b= b- a
∂
∂
b
J
(
w
,
b
)
\frac{\partial}{\partial b}J(w,b)
∂b∂J(w,b)
Derivatives are part of calculus
J(w)
w= w - a
∂
∂
w
J
(
w
)
\frac{\partial}{\partial w}J(w)
∂w∂J(w)
m i n w \underset{w}{min} wmin J(w)
w=w-a * (positive number)
w减掉一个正数,结果会越来越小
如果
d
d
w
J
(
w
)
\frac{d}{dw} J(w)
dwdJ(w) <0
w=w-a * (negative number)
w减掉一个负数,结果会越来越大
学习率
if a is too small ,gradient descent may be slow.
如果学习率太小,梯度下降会起作用,但是速度会很慢
if a is too large,gradient descent may :
Overshoot ,never reach minimum
fail to coverge and may even diverge
大交叉可能无法收敛,甚至发散
can reach local minimum with fixed learing rate
梯度下降可以达到局部最小值,学习率a固定。
当接近局部最小梯度下降时,他会自定采取更小的步长
near a local minimum ,
Derivative becomes smalller
Update steps because smaller