关于TFLOPS、GFLOPS、TOPS
序言
在人工智能与高性能计算飞速发展的今天,“GFLOPS”“TOPs”“TFLOPS” 这些术语频繁出现在芯片参数、设备评测中。
例如NANO官方的技术规格。
它们如同衡量计算能力的 “度量衡”,是判断硬件性能的关键指标。
那么,这些神秘的缩写究竟代表着什么?它们之间又存在怎样的联系与差异?
Jetson Modules, Support, Ecosystem, and Lineup | NVIDIA Developer
Technical Specifications
Jetson AGX Orin Series | Jetson Orin NX Series | Jetson Orin Nano Series | Jetson AGX Xavier Series | Jetson Xavier NX Series | Jetson TX2 Series | Jetson Nano | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jetson AGX Orin 64GB | Jetson AGX Orin Industrial | Jetson AGX Orin 32GB | Jetson Orin NX 16GB | Jetson Orin NX 8GB | Jetson Orin Nano 8GB | Jetson Orin Nano 4GB | Jetson AGX Xavier Industrial | Jetson AGX Xavier 64GB | Jetson AGX Xavier | Jetson Xavier NX 16GB | Jetson Xavier NX | TX2i | TX2 | TX2 4GB | TX2 NX | ||
AI Performance | 275 TOPS | 248 TOPS | 200 TOPS | 157 TOPS | 117 TOPS | 67 TOPs | 34 TOPs | 30 TOPS | 32 TOPS | 21 TOPS | 1.26 TFLOPS | 1.33 TFLOPS | 472 GFLOPS | ||||
GPU | 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores | 1792-core NVIDIA Ampere architecture GPU with 56 Tensor Cores | 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores | 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores | 512-core NVIDIA Ampere architecture GPU with 16 Tensor Cores | 512-core NVIDIA Volta architecture GPU with 64 Tensor Cores | 384-core NVIDIA Volta™ architecture GPU with 48 Tensor Cores | 256-core NVIDIA Pascal™ architecture GPU | 128-core NVIDIA Maxwell™ architecture GPU | ||||||||
GPU Max Frequency | 1.3 GHz | 1.2 GHz | 930 MHz | 1173 MHz | 1173 MHz | 1020 MHz | 1020 MHz | 1211 MHz | 1377 MHz | 1100 MHz | 1.12GHz | 1.3 GHz | 921MHz | ||||
CPU | 12-core NVIDIA Arm® Cortex A78AE v8.2 64-bit CPU 3MB L2 + 6MB L3 | 8-core NVIDIA Arm® Cortex A78AE v8.2 64-bit CPU 2MB L2 + 4MB L3 | 6-core NVIDIA Arm® Cortex A78AE v8.2 64-bit CPU 2MB L2 + 4MB L3 | 6-core Arm® Cortex®-A78AE v8.2 64-bit CPU 1.5MB L2 + 4MB L3 | 8-core NVIDIA Carmel Arm®v8.2 64-bit CPU 8MB L2 + 4MB L3 | 6-core NVIDIA Carmel Arm®v8.2 64-bit CPU 6MB L2 + 4MB L3 | Dual-Core NVIDIA Denver 2 64-Bit CPU and Quad-Core Arm® Cortex®-A57 MPCore processor | Quad-Core Arm® Cortex®-A57 MPCore processor | |||||||||
CPU Max Frequency | 2.2 GHz | 2.0 GHz | 2.2 GHz | 2.0 GHz | 1.7 GHz | 2.0 GHz | 2.2 GHz | 1.9 GHz | Denver2: 1.95 GHz Cortex-A57: 1.92 GHz | Denver 2: 2.2 GHz Cortex-A57: 2 GHz | 1.43GHz | ||||||
DL Accelerator | 2x NVDLA v2.0 | 1x NVDLA v2.0 | — | 2x NVDLA | — | ||||||||||||
DL Max Frequency | 1.6 Ghz | 1.4 Ghz | 1.23 GHz | — | 1.2 Ghz | 1.4 GHz | 1.1 GHz | — | |||||||||
Vision Accelerator | 1 x PVA v2.0 | — | 2x PVA v1.0 | — | |||||||||||||
Safety Cluster Engine | — | — | — | 2x Arm® Cortex®-R5 in lockstep | — | — | — | ||||||||||
Memory | 64GB 256-bit LPDDR5 204.8GB/s | 64GB 256-bit LPDDR5 (+ ECC) 204.8GB/s | 32GB 256-bit LPDDR5 204.8GB/s" | 16GB 128-bit LPDDR5 102.4GB/s | 8GB 128-bit LPDDR5 102.4GB/s | 8GB 128-bit LPDDR5 68 GB/s | 4GB 64-bit LPDDR5 51 GB/s | 32GB 256-bit LPDDR4x (ECC support) 136.5GB/s | 64GB 256-bit LPDDR4x 136.5GB/s | 32GB 256-bit LPDDR4x 136.5GB/s | 16GB 128-bit LPDDR4x 59.7GB/s | 8GB 128-bit LPDDR4x 59.7GB/s | 8GB 128-bit LPDDR4 (ECC Support) 51.2GB/s | 8GB 128-bit LPDDR4 59.7GB/s | 4GB 128-bit LPDDR4 51.2GB/s | 4GB 64-bit LPDDR4 25.6GB/s" | |
Storage | 64GB eMMC 5.1 | — | — | 64GB eMMC 5.1 | 32GB eMMC 5.1 | 16GB eMMC 5.1 | 32GB eMMC 5.1 | 16GB eMMC 5.1 | 16GB eMMC 5.1† | ||||||||
Video Encode | 2x 4K60 (H.265) 4x 4K30 (H.265) 8x 1080p60 (H.265) 16x 1080p30 (H.265) | 1x 4K60 (H.265) 3x 4K30 (H.265) 7x 1080p60 (H.265) 15x 1080p30 (H.265) | 1x 4K60 (H.265) 3x 4K30 (H.265) 6x 1080p60 (H.265) 12x 1080p30 (H.265) | 1080p30 supported by 1-2 CPU cores | 2x 4K60 (H.265) 6x 4K30 (H.265) 12x 1080p60 (H.265) 24x 1080p30 (H.265) | 4x 4K60 (H.265) 8x 4K30 (H.265) 16x 1080p60 (H.265) 32x 1080p30 (H.265) | 2x 4K60 (H.265) 4x 4K30 (H.265) 10x 1080p60 (H.265) 22x 1080p30 (H.265) | 1x 4K60 (H.265) 3x 4K30 (H.265) 4x 1080p60 (H.265) | 1x 4K30 (H.265) 2x 1080p60 (H.265) | ||||||||
Video Decode | 1x 8K30 (H.265) 3x 4K60 (H.265) 7x 4K30 (H.265) 11x 1080p60 (H.265) 22x 1080p30 (H.265) | 1x 8K30 (H.265) 3x 4K60 (H.265) 7x 4K30 (H.265) 11x 1080p60 (H.265) 23x 1080p30 (H.265) | 1x 8K30 (H.265) 2x 4K60 (H.265) 4x 4K30 (H.265) 9x 1080p60 (H.265) 18x 1080p30 (H.265) | 1x 4K60 (H.265) 2x 4K30 (H.265) 5x 1080p60 (H.265) 11x 1080p30 (H.265) | 2x 8K30 (H.265) 4x 4K60 (H.265) 8x 4K30 (H.265) 18x 1080p60 (H.265) 36x 1080p30 (H.265) | 2x 8K30 (H.265) 6x 4K60 (H.265) 12x 4K30 (H.265) 26x 1080p60 (H.265) 52x 1080p30 (H.265) | 2x 8K30 (H.265) 6x 4K60 (H.265) 12x 4K30 (H.265) 22x 1080p60 (H.265) 44x 1080p30 (H.265) | 2x 4K60 (H.265) 7x 1080p60 (H.265) 14x 1080p30 (H.265) | 1x 4K60 (H.265) 4x 1080p60 (H.265) | ||||||||
CSI Camera | Up to 6 cameras (16 via virtual channels**)16 lanes MIPI CSI-2D-PHY 2.1 (up to 40Gbps) | C-PHY 2.0 (up to 164Gbps) | 2x MIPI CSI-2 22-pin Camera Connectors | Up to 4 cameras (8 via virtual channels***) 8 lanes MIPI CSI-2 D-PHY 2.1 (up to 20Gbps) | Up to 4 cameras (8 via virtual channels***) 8 lanes MIPI CSI-2 D-PHY 2.1 (up to 20Gbps) | Up to 6 cameras (36 via virtual channels)16 lanes MIPI CSI-2D-PHY 1.2 (up to 40 Gbps)C-PHY 1.1 (up to 62 Gbps) | Up to 6 cameras (36 via virtual channels)16 lanes MIPI CSI-2 | 8 lanes SLVS-ECD-PHY 1.2 (up to 40 Gbps)C-PHY 1.1 (up to 62 Gbps) | Up to 6 cameras (24 via virtual channels)14 lanes MIPI CSI-2D-PHY 1.2 (up to 30 Gbps) | Up to 6 cameras (12 via virtual channels)12 lanes MIPI CSI-2D-PHY 1.2 (up to 30 Gbps) | Up to 5 cameras (12 via virtual channels)12 lanes MIPI CSI-2D-PHY 1.1 (up to 30 Gbps) | Up to 4 cameras 12 lanes MIPI CSI-2D-PHY 1.1 (up to 18 Gbps) | |||||||
PCIE* | Up to 2 x8 + 2 x4 + 2 x1 (PCIe Gen4, Root Port & Endpoint) | 1 x4 + 3 x1 (PCIe Gen4, Root Port & Endpoint) | 1 x4 + 3 x1 (PCIe Gen3, Root Port, & Endpoint) | 1 x8 + 1 x4 + 1 x2 + 2 x1 (PCIe Gen4, Root Port & Endpoint) | 1 x4 (PCIe Gen4) + 1 x1 (PCIe Gen3) | up to 1 x1 + 1 x4 OR 1 x1 + 1 x1 + 1 x2 (PCIe Gen2) | 1 x1 + 1 x2 (PCIe Gen2) | 1 x4 (PCIe Gen2) | |||||||||
USB* | 3x USB 3.2 Gen2 (10 Gbps) 4x USB 2.0 | 3x USB.3.2 Gen2 (10.Gbps) 3x USB 2.0 | 3x USB 3.2 Gen2 (10 Gbps) 3x USB 2.0 | 3x USB 3.2 Gen2 (10 Gbps) 4x USB 2.0 | 1x USB 3.2 Gen2 (10 Gbps) 3x USB 2.0 | up to 3x USB 3.0 (5 Gbps) 3x USB 2.0 | 1x USB 3.0 (5 Gbps) 3x USB 2.0 | 1x USB 3.0 (5 Gbps) 3x USB 2.0 | |||||||||
Networking* | 1x GbE 1x 10GbE | 1x GbE | 1x GbE | 1x GbE | 1x GbE | 1x GbE | 1x GbE, WLAN | 1x GbE | 1x GbE | ||||||||
Display | 1x 8K60 multi-mode DP 1.4a (+MST)/eDP1.4a/HDMI 2.1 | 1x 8K60 multi-mode DP 1.4a (+MST)/eDP1.4a/HDMI 2.1 | 1x 4K30 multi-mode DP 1.2 (+MST)/eDP 1.4/HDMI 1.4** | 3 multi-mode DP 1.4/eDP 1.4/HDMI 2.0 | 2 multi-mode DP 1.4/eDP 1.4/HDMI 2.0 | 2 multi-mode DP 1.2/eDP 1.4/HDMI 2.02 x4 DSI (1.5Gbps/lane) | 2 multi-mode DP 1.2/eDP 1.4/HDMI 2.01 x2 DSI (1.5Gbps/lane) | 2 multi-mode DP 1.2/eDP 1.4/HDMI 2.01 x2 DSI (1.5Gbps/lane) | |||||||||
Other IO | 4x UART, 3x SPI, 4x I2S, 8x I2C, 2x CAN, PWM, DMIC & DSPK, GPIOs | 3x UART, 2x SPI, 2x I2S, 4x I2C, 1x CAN, DMIC & DSPK, PWM, GPIOs | 3x UART, 2x SPI, 2x I2S, 4x I2C, 1x CAN, DMIC & DSPK, PWM, GPIOs | 5x UART, 3x SPI, 4x I2S, 8x I2C, 2x CAN, PWM, DMIC, GPIOs | 3x UART, 2x SPI, 2x I2S, 4x I2C, 1x CAN, DMIC & DSPK, PWM, GPIOs | 5x UART, 3x SPI, 4x I2S, 8x I2C, 2x CAN, GPIOs | 3x UART, 2x SPI, 4x I2S, 4x I2C, 1x CAN, GPIOs | 3x UART, 2x SPI, 2x I2S, 4x I2C, GPIOs | |||||||||
Power | 15W - 60W | 15W - 75W | 15W - 40W | 10W - 15W - 25W - 40W | 10W - 15W - 25W - 40W | 7W - 15W - 25W | 7W - 10W - 25W | 20W - 40W | 10W - 30W | 10W - 20W | 10W - 20W | 7.5W - 15W | 5W - 10W | ||||
Mechanical | 100mm x 87mm 699-pin Molex Mirror Mezz Connector Integrated Thermal Transfer Plate | 69.6mm x 45mm260-pin SO-DIMM connector | 69.6mm x 45mm260-pin SO-DIMM connector | 100mm x 87mm699-pin connectorIntegrated Thermal Transfer Plate | 69.6mm x 45mm260-pin SO-DIMM connector | 87mm x 50mm400-pin connectorIntegrated Thermal Transfer Plate | 69.6mm x 45mm260-pin SO-DIMM connector | 69.6mm x 45mm260-pin SO-DIMM connector | |||||||||
Jetson AGX Orin 64GB | Jetson AGX Orin Industrial | Jetson AGX Orin 32GB | Jetson Orin NX 16GB | Jetson Orin NX 8GB | Jetson Orin Nano 8GB | Jetson Orin Nano 4GB | Jetson AGX Xavier Industrial | Jetson AGX Xavier 64GB | Jetson AGX Xavier | Jetson Xavier NX 16GB | Jetson Xavier NX | TX2i | TX2 | TX2 4GB | TX2 NX | ||
Jetson AGX Orin Series | Jetson Orin NX Series | Jetson Orin Nano Series | Jetson AGX Xavier Series | Jetson Xavier NX Series |
从基础概念理解算力单位
GFLOPS(Giga Floating-point Operations Per Second),即每秒十亿次浮点运算(1 GFLOPS = 10^9 FLOPS),是用于衡量计算机硬件处理浮点数据运算速度的指标。
在科学计算、图形渲染等领域,大量涉及复杂的小数运算,比如气象预测中模拟大气变化,需要对海量带有小数的气象数据进行计算,此时 GFLOPS 数值越高,意味着计算机能更快地完成这些复杂运算,输出结果。
TFLOPS(Tera Floating-point Operations Per Second),每秒一万亿次浮点运算(1 TFLOPS = 10^12 FLOPS),是 GFLOPS 的进阶单位,1 TFLOPS 等于 1000 GFLOPS。
随着人工智能模型和复杂计算任务对算力需求的飙升,TFLOPS 逐渐成为高端 GPU、AI 加速器等硬件性能的重要衡量标准。例如在深度学习训练中,模型需要处理海量数据并进行复杂的浮点计算来调整参数,TFLOPS 数值高的硬件能显著缩短训练时间,提升研发效率。
TOPs(Tera Operations Per Second),每秒一万亿次运算(1 TOPs = 10^12 OPS),与 TFLOPS 相比,TOPs 不局限于浮点运算,它可以是整数运算、逻辑运算等多种类型运算的综合衡量。
在 AI 芯片领域,很多任务涉及大量的整数乘法和累加运算,TOPs 能够更全面地反映芯片在执行 AI 算法时的整体运算能力 ,因此常被用于评估手机 AI 芯片、边缘计算设备等的性能。
单位 | 换算为 GFLOPS | 换算为 TFLOPS |
---|---|---|
1 GFLOPS | 1 GFLOPS | 0.001 TFLOPS |
1 TFLOPS | 1000 GFLOPS | 1 TFLOPS |
1 TOPs | 需明确运算类型(见下文) | 需明确运算类型(见下文) |
关键说明:TOPs 与 TFLOPS/GFLOPS 的差异
-
TOPs(Tera Operations Per Second):
表示 每秒一万亿次运算,但 不限定运算类型(可能是整数、逻辑或浮点运算)。- 若 TOPs 特指浮点运算,则 1 TOPs = 1 TFLOPS(仅在特定场景下成立,需厂商明确说明)。
- 若 TOPs 包含整数运算,则无法直接与 TFLOPS/GFLOPS 换算(因浮点和整数运算复杂度不同)。
-
TFLOPS/GFLOPS:
仅衡量 浮点运算能力,单位间直接换算:1 TFLOPS = 1000 GFLOPS。
不同单位的应用场景差异
1,GFLOPS 主要应用于传统的科学计算、图形处理领域。
比如在 3D 游戏开发中,显卡需要实时渲染复杂的游戏场景,处理大量的图形数据,GFLOPS 能直观地体现显卡在图形渲染方面的运算能力,从而帮助游戏开发者和玩家了解显卡是否能够流畅运行高画质游戏。
2,TFLOPS 则在人工智能训练和推理、高性能计算等前沿领域发挥关键作用。
以大型语言模型训练为例,模型参数规模庞大,需要在短时间内完成海量的浮点运算,TFLOPS 数值高的服务器集群可以大幅降低训练成本和时间。在超算中心,科学家利用高 TFLOPS 的超级计算机进行气候模拟、药物研发等复杂科学研究,加速科研进程。
3,TOPs 在移动设备、智能家居、智能安防等边缘计算场景应用广泛。
例如智能手机中的 AI 芯片,要在低功耗的前提下实现图像识别、语音助手等功能,TOPs 指标可以帮助厂商和消费者了解芯片在处理这些 AI 任务时的运算效率,从而判断设备的智能化水平。