Home / Hardware / NVIDIA / Blackwell / B200

NVIDIA Blackwell B200

Arch: BlackwellProduction ReadyTDP 1000W
Compute Performance (FP4 Tensor)▲ 5.0x vs H100
20.0 PetaFLOPS
Dense FP4 performance utilizing 2nd Gen Transformer Engine
B200 Blackwell20.0 PF
H200 Hopper3.9 PF
Memory System
192GB HBM3e
8 Hi-Stacks / 1024-bit interface
9.5 TB/s Bandwidth
Interconnect & I/O
1.8 TB/s NVLink 5
Bi-directional total bandwidth
PCIe Gen 6.0 x16
Real-World Applications
Large Language Model Training

Train frontier-scale models with 192GB HBM3e per GPU. The B200's FP4 Tensor performance enables training runs that previously required 5x more H100s, dramatically reducing cluster size and operational costs for models above 70B parameters.

High-Throughput Inference

Serve production LLM workloads at scale with 2nd Gen Transformer Engine and native FP4 quantization. A single B200 node can handle the inference throughput of an entire H100 rack for latency-sensitive applications like real-time chat and code completion.

Scientific Computing & Simulation

Accelerate molecular dynamics, climate modeling, and computational fluid dynamics. The 9.5 TB/s memory bandwidth and NVLink 5 interconnect make multi-GPU simulation workloads up to 4x faster than previous generation.

Generative AI & Media

Power video generation, 3D rendering, and multimodal AI pipelines. The B200's massive memory capacity supports models like Sora-class video generators and real-time neural radiance fields without the memory bottlenecks that limit H100-based deployments.

Full Technical Specifications
Transistor Count208 Billion (4NP Process)
Die SizeDual-Die CoWoS-L (Reticle Limit x2)
CUDA Cores160 Streaming Multiprocessors (Est.)
Tensor Cores5th Gen Tensor Core Architecture
Memory Interface8192-bit HBM3e
L2 Cache128MB Unified
Form FactorSXM6 / PCIe Add-in Card
Thermal Design Power1000W (Configurable)