当前位置：首页 > article >正文

HuaWei、NVIDIA 数据中心 AI 算力对比

article 2025/2/28 15:45:28

HuaWei Ascend 910B

Ascend 910B 是 HuaWei 于 2023 年推出的高性能 AI 处理器芯片，其对标产品为 Nvidia A100/A800，其算力表现如下：

峰值算力：Ascend 910B 的半精度（FP16）算力达到 256 TFLOPS（每秒 256 万亿次浮点运算）。
整数精度算力：Ascend 910B 的整数精度（INT8）算力达到 512 Tera-OPS。
单精度算力：Ascend 910B 的单精度（FP32）算力达到 128 TFLOPS。
能效比：Ascend 910B 的每瓦特性能达到 5.2 TFLOPS/W，相较于英伟达 A100 的每瓦特性能 4.7 TFLOPS/W，Ascend 910B 在能效上更优。
内存带宽：Ascend 910B 的内存带宽为 768 GB/s。
互连带宽：Ascend 910B 的芯片间互连带宽为 600GB/s，卡间互连带宽为 PCIe 4.0 x16，理论带宽 31.5GB/s。
功耗：Ascend 910B 的最大功耗为 350W。
AI 算力对比：科大讯飞与华为联合优化后，在他们的场景中 Ascend 910B 已经达到 NVIDIA A100 的性能。

NVIDIA A100

数据精度	A100 80GB PCIe	A100 80GB SXM
FP64	9.7 TFLOPS	9.7 TFLOPS
FP64 Tensor Core	19.5 TFLOPS	19.5 TFLOPS
FP32	19.5 TFLOPS	19.5 TFLOPS
Tensor Float 32 (TF32)	156 TFLOPS	312 TFLOPS
BFLOAT16 Tensor Core	312 TFLOPS	624 TFLOPS
FP16 Tensor Core	312 TFLOPS	624 TFLOPS
INT8 Tensor Core	624 TOPS	1248 TOPS
GPU Memory	80GB HBM2e	80GB HBM2e
GPU Memory Bandwidth	1935 GB/s	2039 GB/s
TDP 功耗	300W	400W
插槽类型	PCIe 4.0	SXM

NVIDIA H100

NVIDIA H100 Tensor Core GPU

数据精度	H100 SXM	H100 NVL
FP64	34 TFLOPS	30 TFLOPS
FP64 Tensor Core	67 TFLOPS	60 TFLOPS
FP32	67 TFLOPS	60 TFLOPS
TF32 Tensor Core	989 TFLOPS	835 TFLOPS
BFLOAT16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP8 Tensor Core	3958 TFLOPS	3341 TFLOPS
INT8 Tensor Core	3958 TOPS	3341 TOPS
GPU Memory	80GB	94GB
GPU Memory Bandwidth	3.35TB/s	3.9TB/s
TDP 功耗	700 W	400 W
插槽类型	SXM	PCIe 5.0

基于 PCIe 的 NVIDIA H100 NVL（带有 NVLink 桥接）利用 Transformer Engine、NVLink 和 188GB HBM3 内存，在任何数据中心提供最佳性能和轻松扩展，使大型语言模型成为主流

NVIDIA H200

NVIDIA H200 Tensor Core GPU

数据精度	H200 SXM	H200 NVL
FP64	34 TFLOPS	30 TFLOPS
FP64 Tensor Core	67 TFLOPS	60 TFLOPS
FP32	67 TFLOPS	60 TFLOPS
TF32 Tensor Core	989 TFLOPS	835 TFLOPS
BFLOAT16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP8 Tensor Core	3958 TFLOPS	3341 TFLOPS
INT8 Tensor Core	3958 TOPS	3341 TOPS
GPU Memory	141GB	141GB
GPU Memory Bandwidth	4.8TB/s	4.8TB/s
TDP 功耗	700 W	600 W
插槽类型	SXM	PCIe 5.0

基于 NVIDIA Hoppe 架构，NVIDIA H200 是首款提供 141GB（吉字节）HBM3e 内存、内存带宽达 4.8TB/s（太字节每秒）的 GPU

NVIDIA GB200 & GB200 NVL72

数据精度	GB200 NVL72	GB200
Configuration	36 Grace CPU : 72 Blackwell GPUs	1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core	1440 PFLOPS	40 PFLOPS
FP8/FP6 Tensor Core	720 PFLOPS	20 PFLOPS
INT8 Tensor Core	720 POPS	20 POPS
FP16/BF16 Tensor Core	360 PFLOPS	10 PFLOPS
TF32 Tensor Core	180 PFLOPS	5 PFLOPS
FP32	6480 TFLOPS	180 TFLOPS
FP64	3240 TFLOPS	90 TFLOPS
FP64 Tensor Core	3240 TFLOPS	90 TFLOPS
GPU Memory	Up to 13.5 TB HBM3e	Up to 384 GB HBM3e
GPU Bandwidth	576 TB/s	16 TB/s
NVLink Bandwidth	130TB/s	3.6TB/s
CPU Core Count	2592 Arm Neoverse V2 cores	72 Arm Neoverse V2 cores
CPU Memory	Up to 17 TB LPDDR5X	Up to 480GB LPDDR5X
CPU Bandwidth	Up to 18.4 TB/s	Up to 512 GB/s

GB200 NVL72 架构组成：

将 36 个 Grace Blackwell 超级芯片组合在一起，包含 72 个 Blackwell GPU 和 36 个 Grace CPU，通过第五代 NVLink 技术相互连接
每个 Grace Blackwell 超级芯片包含两个高性能的 NVIDIA Blackwell Tensor Core GPU 和一个 NVIDIA Grace CPU，使用 NVIDIA NVLink-C2C 连接

码字不易，若觉得本文对你有用，欢迎点赞 👍、分享 🚀 ，相关技术热点时时看🔥🔥🔥…

http://www.kler.cn/a/452289.html

相关文章：

谈谈JSON

DigitalOcean Droplet 云服务器：新增自动扩展池功能

npm : 无法加载文件 D:\Nodejs\node_global\npm.ps1，因为在此系统上禁止运行脚本

openwrt 负载均衡方法 openwrt负载均衡本地源接口

08 Django - Django媒体文件静态文件文件上传

Ubuntu存储硬盘扩容-无脑ChatGPT方法

嵌入式学习-QT-Day06

网站使用站群服务器都有哪些好处？

Vue学习手册03 Vue虚拟DOM详解

mysql,数据库主从同步搭建

帝国cms电脑pc站url跳转到手机站url的方法

20241225在ubuntu22.04.5下使用smartmontools命令查看ssd的寿命

Diffusers使用笔记

2024年河北省职业院校技能大赛云计算应用赛项赛题第2套（容器云）

从tryLock()源码来出发，解析Redisson的重试机制和看门狗机制

2024年OpenTiny年度人气贡献者评选正式开始

MFC用List Control 和Picture控件实现界面切换效果

leetcode hot100 翻转二叉树

golang实现yaml配置文件的解析

DVWA靶场第三关 CSRF