Name: AMD Instinct MI355X
Brand: AMD

Compute Performance (FP8 Matrix)▲ 1.9x vs MI300X

10.1 PetaFLOPS

Peak FP8 matrix performance with doubled per-CU throughput on CDNA 4

MI355X CDNA 410.1 PF

MI300X CDNA 35.3 PF

B200 Blackwell9.0 PF

Memory System

288GB HBM3e

12 Hi-Stacks / 8192-bit interface

8.0 TB/s Bandwidth

Interconnect & I/O

896 GB/s Infinity Fabric (4th Gen)

Bi-directional xGMI links

PCIe Gen 5.0 x16

Real-World Applications

Large-Scale LLM Inference

With 288GB HBM3e and 8 TB/s bandwidth, the MI355X serves 200B+ parameter models without tensor parallelism. Native MXFP4 support delivers 20.1 PFLOPS, enabling cost-efficient inference for production AI platforms at 33% lower TCO than competing solutions.

Mixed-Precision AI Training

CDNA 4's expanded datatype support — including MXFP6 and MXFP4 — maximizes training throughput while maintaining model accuracy. The MI355X doubles per-CU throughput versus MI300X, delivering leadership performance for open-source model training on ROCm.

HPC & Scientific Computing

With 78.6 TFLOPS FP64 and 288GB of memory, the MI355X handles memory-intensive scientific workloads from climate modeling to computational chemistry. The 3D chiplet architecture on 3nm delivers exceptional energy efficiency for sustained HPC workloads.

Multi-Model AI Platforms

The massive 288GB memory pool allows hosting multiple AI models simultaneously — routing models, embedding engines, and multiple LLMs from a single accelerator. This consolidation reduces infrastructure costs by up to 40% for AI-as-a-service platforms.

Full Technical Specifications

GPU Architecture	AMD CDNA 4
Process Node	TSMC 3nm / 6nm (3D Chiplet)
Compute Units	256
Stream Processors	16,384
Matrix Cores	1,024 (AI Accelerators)
Memory Capacity	288 GB HBM3e
Memory Interface	8192-bit
Memory Bandwidth	8.0 TB/s
Form Factor	OAM (OCP Accelerator Module)
Thermal Design Power	1400W (Liquid Cooled)