当前位置：首页 > article >正文

循环神经网络RNN-数据流动

article 2025/4/2 18:16:05

RNN 的核心是一个循环单元，它在每个时间步接收两个输入：

然后，RNN 会输出：

$h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{t-1} + b_{hh})$

$o_t = W_{ho} h_t + b_{ho}$

其中：

$W_{ih}$ ：形状为 [hidden_dim, input_dim]。
- 将输入 $x_t$ 从 input_dim 映射到 hidden_dim。
$W_{hh}$ ：形状为 [hidden_dim, hidden_dim]。
- 将隐藏状态 $h_{t-1}$ 从 hidden_dim 映射到 hidden_dim。
$W_{ho}$ ：形状为 [output_dim, hidden_dim]。
- 将隐藏状态 $h_t$ 从 hidden_dim 映射到 output_dim。

计算 $W_{ih} x_t$ ：
- 输入 $x_t$ 的形状是 [input_dim]。
- 权重 $W_{ih}$ 的形状是 [hidden_dim, input_dim]。
- 结果是 [hidden_dim]。
计算 $W_{hh} h_{t-1}$ ：
- 隐藏状态 $h_{t-1}$ 的形状是 [hidden_dim]。
- 权重 $W_{hh}$ 的形状是 [hidden_dim, hidden_dim]。
- 结果是 [hidden_dim]。
相加：
- $W_{ih} x_t + b_{ih} + W_{hh} h_{t-1} + b_{hh}$ 的结果形状是 [hidden_dim]。
应用激活函数：
- $\text{tanh}$ 是逐元素操作的，不会改变形状。
- 最终结果 $h_t$ 的形状是 [hidden_dim]。
计算全连接层的输出 $o_t$ ：
- 输入 $h_t$ 的形状是 [hidden_dim]。
- 权重 $W_{ho}$ 的形状是 [output_dim, hidden_dim]。
- 结果是 [output_dim]。

假设：

$W_{ih}$ ：形状为 [256, 100]。
- 例如：
  $W_{ih} = \begin{bmatrix} w_{11} & w_{12} & \dots & w_{1,100} \\ w_{21} & w_{22} & \dots & w_{2,100} \\ \vdots & \vdots & \ddots & \vdots \\ w_{256,1} & w_{256,2} & \dots & w_{256,100} \end{bmatrix}$
$W_{hh}$ ：形状为 [256, 256]。
- 例如：
  $W_{hh} = \begin{bmatrix} w_{11} & w_{12} & \dots & w_{1,256} \\ w_{21} & w_{22} & \dots & w_{2,256} \\ \vdots & \vdots & \ddots & \vdots \\ w_{256,1} & w_{256,2} & \dots & w_{256,256} \end{bmatrix}$
$W_{ho}$ ：形状为 [10, 256]。
- 例如：
  $W_{ho} = \begin{bmatrix} w_{11} & w_{12} & \dots & w_{1,256} \\ w_{21} & w_{22} & \dots & w_{2,256} \\ \vdots & \vdots & \ddots & \vdots \\ w_{10,1} & w_{10,2} & \dots & w_{10,256} \end{bmatrix}$

计算 $W_{ih} x_t$ ：
$W_{ih} x_t = \begin{bmatrix} w_{11} x_1 + w_{12} x_2 + \dots + w_{1,100} x_{100} \\ w_{21} x_1 + w_{22} x_2 + \dots + w_{2,100} x_{100} \\ \vdots \\ w_{256,1} x_1 + w_{256,2} x_2 + \dots + w_{256,100} x_{100} \end{bmatrix}$
结果是一个 256 维的向量。
计算 $W_{hh} h_{t-1}$ ：
$W_{hh} h_{t-1} = \begin{bmatrix} w_{11} h_1 + w_{12} h_2 + \dots + w_{1,256} h_{256} \\ w_{21} h_1 + w_{22} h_2 + \dots + w_{2,256} h_{256} \\ \vdots \\ w_{256,1} h_1 + w_{256,2} h_2 + \dots + w_{256,256} h_{256} \end{bmatrix}$
结果是一个 256 维的向量。
相加：
$W_{ih} x_t + b_{ih} + W_{hh} h_{t-1} + b_{hh}$
结果是一个 256 维的向量。
应用激活函数：
$h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{t-1} + b_{hh})$
结果是一个 256 维的向量。
计算全连接层的输出 $o_t$ ：
$o_t = W_{ho} h_t + b_{ho}$
其中：
- $W_{ho}$ 的形状是 [10, 256]。
- $h_t$ 的形状是 [256]。
- 结果是 [10]。
例如：
$o_t = \begin{bmatrix} w_{11} h_1 + w_{12} h_2 + \dots + w_{1,256} h_{256} \\ w_{21} h_1 + w_{22} h_2 + \dots + w_{2,256} h_{256} \\ \vdots \\ w_{10,1} h_1 + w_{10,2} h_2 + \dots + w_{10,256} h_{256} \end{bmatrix}$
结果是一个 10 维的向量。

数据：
- 输入 $x_t$ ：形状为 [input_dim]。
- 隐藏状态 $h_{t-1}$ ：形状为 [hidden_dim]。
- 输出 $o_t$ ：形状为 [output_dim]。
权重：
- $W_{ih}$ ：形状为 [hidden_dim, input_dim]。
- $W_{hh}$ ：形状为 [hidden_dim, hidden_dim]。
- $W_{ho}$ ：形状为 [output_dim, hidden_dim]。
偏置：
- $b_{ih}$ 和 $b_{hh}$ ：形状为 [hidden_dim]。
- $b_{ho}$ ：形状为 [output_dim]。