Instinct MI100 GPU accelerator, prepared for large workloads, provides high performance gains with its new architecture.
RX 6000 series graphics cards based on the RDNA 2 graphics architecture began to be used. While we are now waiting for more affordable cards like the RX 6700 and RX 6500 series, a new move has come from AMD in the field of high performance computing. Red team introduced AMD Instinct MI100 Accelerator, which it describes as “the world’s fastest HPC GPU accelerator for scientific workloads.”
AMD stated that Instinct MI100 uses the new CDNA architecture built from the ground up to deliver “a huge leap in computing and interconnection performance.” Compared to past HPC accelerators, a performance increase of approximately 3.5 times (FP32) and a performance increase of approximately 7 times (FP16) for artificial intelligence throughput.
Key technologies behind the MI100 GPU include:
- A brand new Matrix Core Technology for machine learning with superior performance.
- AMD Infinity Fabric Link Technology for 64GB / s CPU – GPU bandwidth and up to 276GB / s peer-to-peer (P2P) bandwidth performance.
- PCIe Gen 4.0 connectivity for FP64 performance up to 11.5 TFLOPS (or 23.1 TFLOPS highest FP32 performance).
- Ultra fast HBM2 memory technology.
The new Instinct MI100 accelerator cards have been tested for some time by the Oak Ridge Leadership Computing Facility. Plant director Bronson Messer said the MI100 delivered “up to 2-3 times performance improvement over other GPUs” on test platforms. Finally, it is stated that improvements have been made on the energy efficiency side.
AMD Radeon Instinct Accelerator Features
|Accelerator Name||AMD Radeon Instinct MI6||AMD Radeon Instinct MI8||AMD Radeon Instinct MI25||AMD Radeon Instinct MI50||AMD Radeon Instinct MI60||AMD Radeon Instinct MI100|
|Architectural||Polaris 10||Fiji XT||Vega 10||Vega 20||Vega 20||Arcturus|
|Production||14nm FinFET||28nm||14nm FinFET||7nm FinFET||7nm FinFET||7nm FinFET|
|Frequency Rate||1237 MHz||1000 MHz||1500 MHz||1725 MHz||1800 MHz||~ 1500 MHz|
|FP16 Computing||5.7 TFLOPs||8.2 TFLOPs||24.6 TFLOPs||26.5 TFLOPs||29.5 TFLOPs||185 TFLOPs|
|FP32 Computing||5.7 TFLOPs||8.2 TFLOPs||12.3 TFLOPs||13.3 TFLOPs||14.7 TFLOPs||23.1 TFLOPs|
|FP64 Computing||384 GFLOPs||512 GFLOPs||768 GFLOPs||6.6 TFLOPs||7.4 TFLOPs||11.5 TFLOPs|
|VRAM||16 GB GDDR5||4 GB HBM1||16 GB HBM2||16 GB HBM2||32 GB HBM2||32 GB HBM2|
|Memory Frequency||1750 MHz||500 MHz||945 MHz||1000 MHz||1000 MHz||1200 MHz|
|Band width||224 GB / s||512 GB / s||484 GB / s||1 TB / s||1 TB / s||1.23 TB / s|
|Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling|