NVIDIA Bluefield DPU上的启动流程4个阶段分别是什么?作用是什么?
文章目录
- Bluefield上的硬件介绍
- 启动流程
- 启动流程:
- eMMC中的两个存储分区:
- ATF介绍
- ATF启动的四个阶段:
- 四个主要步骤:
- 各个阶段依赖的启动文件
- 一次烧录fw失败后的信息看启动流程
- 综述
Bluefield上的硬件介绍
本文以Bluefield2为例,可以看到RSHIM实际上是Boot相关的集合。也能看到eMMC上的2个分区。
Bluefield硬件单元图:(尤其可以看到RSHIM在硬件形态上是一个单独的硬件)
Bluefield接口图:
启动流程
The default BlueField bootstream (BFB) shown above is a standard boot BFB that is stored on the embedded Multi-Media Card (eMMC) as can be seen by the boot path that points to a GUID partition (GPT) on the eMMC device
启动流程:
reset(echo "SW_RESET 1" > /dev/rshim0/misc
)之后先进入BL1的BootROM
参考:https://docs.nvidia.com/networking/display/bluefielddpuosv385/upgrading+boot+software
eMMC中的两个存储分区:
When booting from eMMC, these stages make use of two different types of storage within
the eMMC part:
• ATF and UEFI are loaded from a special area known as an eMMC boot partition. Data
from a boot partition is automatically streamed from the eMMC device to the eMMC
controller under hardware control during the initial boot-up. Each eMMC device has two
boot partitions, and the partition which is used to stream the boot data is chosen by a nonvolatile configuration register in the eMMC.
• The operating system, applications, and user data come from the remainder of the chip,
known as the user area. This area is accessed via block-size reads and writes, done by a
device driver or similar software routine.
从eMMC启动,使用eMMC中两种类型的分区。
- 一个是boot分区。在启动boot-up阶段,在硬件的控制下数据自动从eMMC设备流转到eMMC控制器。(无需软件参与)。
- 一个是系统和数据分区。通过block-size方式读写,需要驱动或者软件模拟支持。
ATF介绍
ATF is used in Armv8 systems for booting the chip and then providing secure interfaces. It
implements various Arm interface standards like PSCI (Power State Coordination Interface),
SMC (Secure Monitor Call) and TBBR (Trusted Board Boot Requirements). ATF is used as
the primary bootloader to load UEFI (Unified Extensible Firmware Interface) on the
BlueField platform.
ATF是主要的bootloader,用来加载UEFI,实现是通过ARM标准的接口实现的。
ATF启动的四个阶段:
四个主要步骤:
The BlueField™ boot flow is comprised of 4 main phases:
• Hardware loads Arm Trusted Firmware (ATF)
• ATF loads UEFI—together ATF and UEFI make up the booter software
• UEFI loads the operating system, such as the Linux kernel
• The operating system loads applications and user data
- BL1:硬件直接load ATF固件,通常所说的bootrom。直接硬件搬运执行。流片后无法修改
- BL2:ATF加载UEFI。一般是SRAM,该部分不用像DDR初始化才能用。系统启动后直接将ATF加载到SRAM中直接运行。
- BL3:UEFI加载系统OS
- BL4:OS加载用户程序
ATF has various bootloader stages when loading:
• BL1 – BL1 is stored in the on-chip boot ROM; it is executed when the primary core is
reset. Its main functionality is to do some initial architectural and platform initialization
to the point where it can load the BL2 image, then it loads BL2 and switches execution to
it.
• BL2 – BL2 is loaded and then executed on the on-chip boot SRAM. Its main functionality is to perform the rest of the low-level architectural and platform initialization (e.g. initializing DRAM, setting up the System Address Mapping and calculating the Physical
Memory Regions). It then loads the rest of the boot images (BL31, BL33). After loading
the images, it traps itself back to BL1 via an SMC, which in turn switches execution to
BL31.
• BL31 – BL31 is known as the EL3 Runtime Software. It is loaded to the boot RAM. Its
main functionality is to provide low-level runtime service support. After it finishes all its
runtime software initialization, it passes control to BL33.
• BL33 – BL33 is known as the Non-trusted Firmware. For this case we are using EDK2
(Tianocore) UEFI. It is in charge of loading and passing control to the OS. For more detail on this, please see the EDK2 source.
- BL1:存储在on-chip中的boot ROM中。主要作用做一些架构初始化和平台初始化,直到能够启动BL2,然后将执行权限交给BL2。从实际板子日志可以看到:打印就一句话:
Mellanox BlueField-2 A1 BL1 V1.1
- BL2:是在SRAM执行的。用来进一步初始化低级别的架构和平台。比如 内存DRAM,后文例子就是DRAM初始化失败。设置系统地址映射和物理内存。以及加载后面的BL31和BL33。执行结束后回到BL1,交给里面的SMC来切换execution给BL31。从日志中的
NOTICE: Finished initializing DDR
- BL31:属于EL3的runtime software。加载boot RAM,主要作用提供低级别的运行时服务。比如日志中的
GNU GRUB version 2.04
- BL33:使用Tianocore的UEFI启动。加载OS并且交给OS。更多可以参考EDK2的源码:https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-User-Documentation
ARMv7和ARMv8在引导流程上面完全不同的思路。ARMv8要兼容secure boot,需要在不同的异常等级做相应的处理,而且还需要给SoC厂商一些可配的灵活度,所以在boot上会引入不同的概念,相应的,比ARMv7(及以前)设计层面的复杂度要高很多。
参考:https://github.com/carloscn/blog/issues/65
详细ARM的流程参考这边文章非常详细: https://github.com/carloscn/blog/issues/65
各个阶段依赖的启动文件
一次烧录fw失败后的信息看启动流程
卡在了BL2阶段的ERROR: DDR Values not val
打开rshim日志查看简要信息:看到BL2 start然后异常(打开日志方式: echo "DISPLAY_LEVEL 2" > /dev/rshim0/misc
,然后查看:cat /dev/rshim0/misc
)
可以看到BL2显式boot mode是emmc,然后emmc启动异常。
尤其可见是UEFI坏了
日志分析:
参考:https://docs.nvidia.com/networking/display/bluefielddpubspv422/logging
该问题类似报错: “Memory Device: 0 BIST Failed” and “DDR BIST POST failed!”
https://forums.developer.nvidia.com/t/install-doca-on-bluefield-2-failed/231797
综述
了解Bluefield上DPU的启动流程,对于理解Bluefield各个功能组件和工具有极大的帮助。并且能够更好的理解DPU整体架构的实现。
参考:https://docs.nvidia.com/networking/display/bfswtroubleshooting/software+installation+and+upgrade
https://docs.nvidia.com/networking/display/bluefieldbsp480/upgrading+boot+software#src-3094733907_UpgradingBootSoftware-UEFISystemConfiguration