With 288GB HBM3e and 8 TB/s bandwidth, the MI355X serves 200B+ parameter models without tensor parallelism. Native MXFP4 support delivers 20.1 PFLOPS, enabling cost-efficient inference for production AI platforms at 33% lower TCO than competing solutions.
CDNA 4's expanded datatype support — including MXFP6 and MXFP4 — maximizes training throughput while maintaining model accuracy. The MI355X doubles per-CU throughput versus MI300X, delivering leadership performance for open-source model training on ROCm.
With 78.6 TFLOPS FP64 and 288GB of memory, the MI355X handles memory-intensive scientific workloads from climate modeling to computational chemistry. The 3D chiplet architecture on 3nm delivers exceptional energy efficiency for sustained HPC workloads.
The massive 288GB memory pool allows hosting multiple AI models simultaneously — routing models, embedding engines, and multiple LLMs from a single accelerator. This consolidation reduces infrastructure costs by up to 40% for AI-as-a-service platforms.
| GPU Architecture | AMD CDNA 4 |
| Process Node | TSMC 3nm / 6nm (3D Chiplet) |
| Compute Units | 256 |
| Stream Processors | 16,384 |
| Matrix Cores | 1,024 (AI Accelerators) |
| Memory Capacity | 288 GB HBM3e |
| Memory Interface | 8192-bit |
| Memory Bandwidth | 8.0 TB/s |
| Form Factor | OAM (OCP Accelerator Module) |
| Thermal Design Power | 1400W (Liquid Cooled) |