当前位置: 首页 > article >正文

Machine Learning ---- Gradient Descent

目录

一、The concept of gradient:

       ① In a univariate function:

       ②In multivariate functions:

二、Introduction of gradient descent cases:

三、Gradient descent formula and its simple understanding:

四、Formula operation precautions:


一、The concept of gradient:

       ① In a univariate function

        gradient is actually the differentiation of the function, representing the slope of the tangent of the function at a given point

       ②In multivariate functions

        a gradient is a vector with a direction, and the direction of the gradient indicates the direction in which the function rises the fastest at a given point

二、Introduction of gradient descent cases:

       Do you remember the golf course inside the cat and mouse? It looks like this in the animation:

        Let's take a look at these two pictures. You can easily see the distant hill, right? We can take it as the most typical example, and the golf course can also be abstracted into a coordinate map:

        So in this coordinate, we will correspond the following (x, y) to (w, b) respectively. Then, when J (w, b) is at its maximum, which is the peak in the red area of the graph, we start the gradient descent process.

        Firstly, we rotate one circle from the highest point to find the direction with the highest slope. At this point, we can take a small step down. The reason for choosing this direction is actually because it is the steepest direction. If we walk down the same step length, the height of descent will naturally be the highest, and we can also walk faster to the lowest point (local minimum point). At the same time, after each step, we look around and choose. Finally, we can determine this path:Finally reaching the local minimum point A, is this the only minimum point? Of course not:

        It is also possible to reach point B, which is also a local minimum point. At this point, we have introduced the implementation process of gradient descent, and we will further understand its meaning through mathematical formulas.

三、Gradient descent formula and its simple understanding:

        We first provide the formula for gradient descent:

w = w - \alpha \frac{ \partial J(w,b) }{ \partial w }

b = b - \alpha \frac{ \partial J(w,b) }{ \partial b }

        In the formula, \alpha corresponds to what we call the learning rate, and the equal sign is the same as the assignment symbol in computer program code. J (w, b) can be found in the regression equation blog in the previous section. As for the determination of the learning rate, we will share it with you next time. Here, we will first understand the meaning of the formula:

        Firstly, let's simplify the formula and take b equal to 0 as an example. This way, we can better understand its meaning through a two-dimensional Cartesian coordinate system:

        In this J (w, b) coordinate graph, which is a quadratic function, since we consider b in the equation to be 0,So we can assume that \frac{ \partial J(w,b) }{ \partial w } = \frac{ \partial J(w) }{ \partial w },So, such a partial derivative can be seen as the derivative in the unary case. At this point, it can be seen that when \alpha>0 and the corresponding w value is in the right half, the derivative is positive, that is, its slope is positive. This is equivalent to subtracting a positive number from w, and its w point will move to the left, which is the closest to its minimum value, which is the optimal solution. Similarly, when in the left half of the function, its w will move to the right, which is close to the minimum value, So the step size for each movement is \alpha.

        This is a simple understanding of the gradient descent formula.


四、Formula operation precautions:

        This is a simple understanding of the gradient descent formula

        just like this:

temp_w = w - \alpha \frac{ \partial J(w,b) }{ \partial w }

temp_b = b - \alpha \frac{ \partial J(w,b) }{ \partial b }

w = temp_w

b = temp_b

        The following is an incorrect order of operations that should be avoided:

temp_w = w - \alpha \frac{ \partial J(w,b) }{ \partial w }

w = temp_w

temp_b = b - \alpha \frac{ \partial J(w,b) }{ \partial b }

b = temp_b

        This is the understanding of the formula and algorithm implementation for gradient descent. As for the code implementation, we will continue to explain it in future articles.

        Machine Learning ---- Cost function-CSDN博客


http://www.kler.cn/a/273758.html

相关文章:

  • 万界星空科技WMS仓储管理包含哪些具体内容?
  • Python如何去除网页中的隐藏元素
  • Matplotlib如何显示多张图片(管理多个子图)
  • 如何系统地自学 Python?
  • 【iOS】Blocks
  • redis cpu百分百问题
  • C#进阶实践项目(俄罗斯方块)
  • Ubuntu使用Docker部署Nginx容器并结合内网穿透实现公网访问本地服务
  • 【C++】每日一题 71 简化路径
  • [Java、Android面试]_05_内存泄漏和内存溢出
  • K8s的概念
  • 云原生 PaaS 服务:构建现代应用的利器(分布式应用服务、配置中心、数据库服务、定时任务、实时监控、服务网关、技术组件)
  • 【备忘录】查询数据库中是否存在数据 的SQL语句性能对比和优化
  • 蓝桥杯算法基础(20):(快速排序的其他优化)java版
  • IDEA中的Project工程、Module模块的概念及创建导入
  • c++复数计算器
  • 陪诊系统有什么方便之处
  • 初次文件包含漏洞
  • 关于相机与镜头的选型
  • 使用ansible剧本进行lvm分盘