当前位置：首页 > article >正文

【transformer理论+实战（三）】必要的 Pytorch 知识

article 2025/3/25 13:36:43

【Transformer理论+实战（三）】必要的 Pytorch 知识
【Transformer理论+实战（二）】Lora本地微调实战 --deepseek-r1蒸馏模型
【Transformer理论+实战（一）】Transformer & LLaMA & Lora介绍

文章目录

Pytorch 基础
- 张量 (Tensor)
- 拼接与拆分
- 调整形状
- 索引与切片
- 降维与升维
- 张量计算

Pytorch 由 Facebook 人工智能研究院于 2017 年推出，具有强大的 GPU 加速张量计算功能，并且能够自动进行微分计算，从而可以使用基于梯度的方法对模型参数进行优化。截至 2022 年 8 月，PyTorch 已经和 Linux 内核、Kubernetes 等并列成为世界上增长最快的 5 个开源社区之一。现在在 NeurIPS、ICML 等等机器学习顶会中，有超过 80% 研究人员用的都是 PyTorch。

Pytorch 基础

张量 (Tensor)

张量 (Tensor) 是深度学习的基础，例如常见的 0 维张量称为标量 (scalar)、1 维张量称为向量 (vector)、2 维张量称为矩阵 (matrix)。Pytorch 本质上就是一个基于张量的数学计算工具包，它提供了多种方式来创建张量：

>>> import torch
>>> torch.empty(2, 3)  # empty tensor (uninitialized), shape (2,3)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.rand(2, 3) # random tensor, each value taken from [0,1)
tensor([[0.0956, 0.6929, 0.5450],
        [0.0942, 0.1600, 0.6606]])
>>> torch.rand(2, 3).cuda() # 调用 GPU 计算
tensor([[0.0405, 0.1489, 0.8197],
        [0.9589, 0.0379, 0.5734]], device='cuda:0')

拼接与拆分

拼接 torch.cat：

>>> x = torch.tensor([[1, 2, 3], [ 4,  5,  6]], dtype=torch.double)
>>> y = torch.tensor([[7, 8, 9], [10, 11, 12]], dtype=torch.double)
>>> torch.cat((x, y), dim=0) # 默认，沿行方向（垂直拼接），两个张量的列数必须相同
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.],
        [ 7.,  8.,  9.],
        [10., 11., 12.]], dtype=torch.float64)
>>> torch.cat((x, y), dim=1) # 沿列方向（水平拼接），两个张量的行数必须相同
tensor([[ 1.,  2.,  3.,  7.,  8.,  9.],
        [ 4.,  5.,  6., 10., 11., 12.]], dtype=torch.float64)

>>> print(x.shape)
torch.Size([2, 3])
>>> z = torch.cat((x, y), dim=-1) # 按最后一个维度拼接
>>> print(z.shape)
torch.Size([2, 6])
>>> z = torch.cat((x, y), dim=-2) # 按最后第2个维度拼接
>>> print(z.shape)
torch.Size([4, 3])

>>> x=y=torch.rand(2, 3, 4)
>>> z = torch.cat((x, y), dim=-1) # 多维按最后一个维度拼接
>>> print(z.shape)
torch.Size([2, 3, 8])

torch.split()的作用是把一个tensor拆分为多个tensor，相当于是cat的逆过程：

>>> t = torch.rand([4, 128, 512, 512])
>>> print(t.shape)
torch.Size([4, 128, 512, 512])
>>> a, b = torch.split(t, 64, dim=1) # 按第1个维度拆分
>>> print(a.shape)
torch.Size([4, 64, 512, 512])
>>> print(b.shape)
torch.Size([4, 64, 512, 512])

>>> t = torch.rand([4, 128, 512, 512])
>>> a, b = torch.split(t, [128, 384], dim=2) # 使用列表，按第2个维度拆分
>>> print(a.shape)
torch.Size([4, 128, 128, 512])
>>> print(b.shape)
torch.Size([4, 128, 384, 512])

调整形状

>>> x = torch.tensor([1, 2, 3, 4, 5, 6])
>>> print(x, x.shape)
tensor([1, 2, 3, 4, 5, 6]) torch.Size([6])
>>> x.view(2, 3) # shape adjusted to (2, 3)
tensor([[1, 2, 3],
        [4, 5, 6]])
>>> x.view(3, 2) # shape adjusted to (3, 2)
tensor([[1, 2],
        [3, 4],
        [5, 6]])
>>> x.view(-1, 3) # -1 means automatic inference
tensor([[1, 2, 3],
        [4, 5, 6]])

进行 view 操作的张量必须是连续的 (contiguous)，可以调用 is_conuous 来判断张量是否连续；如果非连续，需要先通过 contiguous 函数将其变为连续的。也可以直接调用 Pytorch 新提供的 reshape 函数，它与 view 功能几乎一致，并且能够自动处理非连续张量。

索引与切片

与 Python 列表类似，Pytorch 也可以对张量进行索引和切片。索引值同样是从 0 开始，切片 [m，n] 的范围是从 m 到 n 前一个元素结束，并且可以对张量的任意一个维度进行索引或切片。例如：

>>> x = torch.arange(12).view(3, 4)
>>> x
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
>>> x[1, 3] # element at row 1, column 3
tensor(7)
>>> x[1] # all elements in row 1
tensor([4, 5, 6, 7])
>>> x[1:3] # elements in row 1 & 2
tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
>>> x[:, 2] # all elements in column 2
tensor([ 2,  6, 10])
>>> x[:, 2:4] # elements in column 2 & 3
tensor([[ 2,  3],
        [ 6,  7],
        [10, 11]])
>>> x[:, 2:4] = 100 # set elements in column 2 & 3 to 100
>>> x
tensor([[  0,   1, 100, 100],
        [  4,   5, 100, 100],
        [  8,   9, 100, 100]])

降维与升维

有时为了计算需要对一个张量进行降维或升维。例如神经网络通常只接受一个批次 (batch) 的样例作为输入，如果只有 1 个输入样例，就需要手工添加一个 batch 维度。

升维 torch.unsqueeze(input, dim, out=None) 在输入张量的 dim 位置插入一维，与索引一样，dim 值也可以为负数；
降维 torch.squeeze(input, dim=None, out=None) 在不指定 dim 时，张量中所有形状为 1 的维度都会被删除，例如 (A,1,B,1,C) 会变成 (A,B,C)；当给定 dim 时，只会删除给定的维度（形状必须为 1），例如对于(A,1,B) ，squeeze(input, dim=0) 会保持张量不变，只有 squeeze(input, dim=1) 形状才会变成 (A,B)

>>> a = torch.tensor([1, 2, 3, 4])
>>> print(a, a.shape)
tensor([1, 2, 3, 4]) torch.Size([4])
>>> b = torch.unsqueeze(a, dim=0)
>>> print(b, b.shape)
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])
>>> b = a.unsqueeze(dim=0)   # another way to unsqueeze tensor
>>> print(b, b.shape)
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])
>>> c = b.squeeze()
>>> print(c, c.shape)
tensor([1, 2, 3, 4]) torch.Size([4])

张量计算

张量的加减乘除是按元素进行计算的，例如：

>>> x = torch.tensor([1, 2, 3], dtype=torch.double)
>>> y = torch.tensor([4, 5, 6], dtype=torch.double)
>>> print(x + y)
tensor([5., 7., 9.], dtype=torch.float64)
>>> print(x - y)
tensor([-3., -3., -3.], dtype=torch.float64)
>>> print(x * y)
tensor([ 4., 10., 18.], dtype=torch.float64)
>>> print(x / y)
tensor([0.2500, 0.4000, 0.5000], dtype=torch.float64)

爱因斯坦求和约定（Einstein summation）

torch.einsum是PyTorch中的一个函数，用于执行爱因斯坦求和约定（Einstein summation）运算。爱因斯坦求和约定（einsum）提供了一套既简洁又优雅的规则，可实现包括但不限于：向量内积，向量外积，矩阵乘法，转置和张量收缩（tensor contraction）等张量操作，熟练运用 einsum 可以很方便的实现复杂的张量操作，而且不容易出错。

torch.einsum的基本语法如下：

torch.einsum(equation, *operands)

其中，equation是一个字符串，用于指定爱因斯坦求和约定的运算方式，operands是一个或多个输入张量。

在equation中，你可以使用大写字母表示张量的维度标识符，使用小写字母表示对应维度的长度。通过指定输入张量和输出张量之间的维度关系，你可以定义所需的运算操作。

首先看下 einsum 实现矩阵乘法的例子：

a = torch.rand(2,3)
b = torch.rand(3,4)
c = torch.einsum("ik,kj->ij", [a, b]) # 箭头左边表示输入张量，以逗号分割每个输入张量，箭头右边则表示输出张量。表示维度的字符只能是26个英文字母 'a' - 'z'。
# 等价操作 torch.mm(a, b)

多维场景：

import torch

# 定义张量维度（小规模方便理解）
batch_size = 2     # b
seq_len_s = 3      # s (目标序列长度)
seq_len_t = 4      # t (源序列长度)
num_heads = 2      # h (注意力头数量)
head_dim = 5       # d (每个注意力头的特征维度)

# 构造输入张量
# scores形状: (b, s, h, t) = (2, 3, 2, 4)
scores = torch.randn(batch_size, seq_len_s, num_heads, seq_len_t)
# v_cache形状: (b, t, h, d) = (2, 4, 2, 5)
v_cache = torch.randn(batch_size, seq_len_t, num_heads, head_dim)

# 使用爱因斯坦求和
output = torch.einsum("bsht,bthd->bshd", scores, v_cache)
print("输出形状:", output.shape)  # 应为 (2, 3, 2, 5)

# --------------------------------------------------
# 显式计算验证 (通过for循环实现相同逻辑)
# --------------------------------------------------
manual_output = torch.zeros_like(output)

for b in range(batch_size):
    for s in range(seq_len_s):
        for h in range(num_heads):
            for d in range(head_dim):
                # 沿t维度求和
                total = 0.0
                for t in range(seq_len_t):
                    # scores[b, s, h, t] * v_cache[b, t, h, d]
                    total += scores[b, s, h, t].item() * v_cache[b, t, h, d].item()
                manual_output[b, s, h, d] = total

# 验证两种方法结果是否一致
print("\n最大绝对误差:", torch.max(torch.abs(output - manual_output)).item())

输出如下：

输出形状: torch.Size([2, 3, 2, 5])

最大绝对误差: 2.384185791015625e-07

查看全文

http://www.kler.cn/a/597061.html

qt介绍之qscreen

OpenLayers集成天地图服务开发指南

uni-app集成保利威直播、点播SDK经验FQ（二）｜小程序直播/APP直播开发适用

k8s的存储

遇到一个奇怪问题,页面请求不到后端

TCP的“四次挥手“与TIME_WAIT状态详解

Linux vim mode | raw / cooked

2025：sql注入详细介绍

权限维持—Linux系统Rootkit后门

victoriametrics 部署

S32k3XX MCU时钟配置

【Linux】达梦数据库图形如何新建表、插入表

3. 轴指令（omron 机器自动化控制器）——＞MC_SetPosition

RAG 技术：让大型语言模型更智能

LLM - CentOS上离线部署Ollama+Qwen2.5-coder模型完全指南

CCF-CSP认证 202206-2寻宝！大冒险！

详解简单选择排序

LeetCode 160 Intersection Of Two Linked Lists 相交链表 Java

简单实用！百度AI + Raphael AI = 免费生图

第十四届蓝桥杯省赛电子类单片机学习记录（客观题）