Home / Hardware / AMD / CDNA 4 / MI355X

AMD Instinct MI355X

Arch: CDNA 4Production ReadyTDP 1400W
Compute Performance (FP8 Matrix)▲ 1.9x vs MI300X
10.1 PetaFLOPS
Peak FP8 matrix performance with doubled per-CU throughput on CDNA 4
MI355X CDNA 410.1 PF
MI300X CDNA 35.3 PF
B200 Blackwell9.0 PF
Memory System
288GB HBM3e
12 Hi-Stacks / 8192-bit interface
8.0 TB/s Bandwidth
Interconnect & I/O
896 GB/s Infinity Fabric (4th Gen)
Bi-directional xGMI links
PCIe Gen 5.0 x16
Real-World Applications
Large-Scale LLM Inference

With 288GB HBM3e and 8 TB/s bandwidth, the MI355X serves 200B+ parameter models without tensor parallelism. Native MXFP4 support delivers 20.1 PFLOPS, enabling cost-efficient inference for production AI platforms at 33% lower TCO than competing solutions.

Mixed-Precision AI Training

CDNA 4's expanded datatype support — including MXFP6 and MXFP4 — maximizes training throughput while maintaining model accuracy. The MI355X doubles per-CU throughput versus MI300X, delivering leadership performance for open-source model training on ROCm.

HPC & Scientific Computing

With 78.6 TFLOPS FP64 and 288GB of memory, the MI355X handles memory-intensive scientific workloads from climate modeling to computational chemistry. The 3D chiplet architecture on 3nm delivers exceptional energy efficiency for sustained HPC workloads.

Multi-Model AI Platforms

The massive 288GB memory pool allows hosting multiple AI models simultaneously — routing models, embedding engines, and multiple LLMs from a single accelerator. This consolidation reduces infrastructure costs by up to 40% for AI-as-a-service platforms.

Full Technical Specifications
GPU ArchitectureAMD CDNA 4
Process NodeTSMC 3nm / 6nm (3D Chiplet)
Compute Units256
Stream Processors16,384
Matrix Cores1,024 (AI Accelerators)
Memory Capacity288 GB HBM3e
Memory Interface8192-bit
Memory Bandwidth8.0 TB/s
Form FactorOAM (OCP Accelerator Module)
Thermal Design Power1400W (Liquid Cooled)