Powered by NVIDIA Tesla V100 GPUs and NVSwitch


    World’s Most Powerful Accelerated Server Platform for Deep Learning, Machine Learning, and HPC

    We’re at the dawn of a new age of intelligence, where deep learning, machine learning and high performance computing (HPC) are transforming our world. From autonomous vehicles, and optimizing retail logistics, to global climate simulations, new challenges are emerging whose solutions demand enormous computing resources.

    NVIDIA HGX-2 is the world’s most powerful accelerated scale-up server platform. Designed with mixed-precision computing, it accelerates every workload to solve these massive challenges. The HGX-2 platform was used to set records on MLPerf, the first industry-wide AI benchmark, delivering the highest single-node performance and validating it as the world’s most powerful, versatile, and scalable computing platform.

    Enables “the World’s Largest GPU”

    Accelerated by 16 NVIDIA? Tesla? V100 GPUs and NVIDIA NVSwitch?, HGX-2 has the unprecedented compute power, bandwidth, and memory topology to train massive models, analyze datasets, and solve simulations faster and more efficiently. The 16 Tesla V100 GPUs work as a single unified 2-petaFLOP accelerator with half a terabyte (TB) of total GPU memory, allowing it to handle the most computationally intensive workloads and enable "the world’s largest GPU."

    Enables the World’s Largest GPU

    AI Training: HGX-2 Replaces 300 CPU-Only Server Nodes

    Workload: ResNet50, 90 epochs to solution  | CPU Server: Dual-Socket Intel Xeon Gold 6140
    |  Dataset: ImageNet2012  |

    Driving Next-Generation AI Deep Learning to Faster Performance

    AI deep learning models are exploding in complexity and require large memory, multiple GPUs, and an extremely fast connection between the GPUs to work. With NVSwitch connecting all GPUs and unified memory, HGX-2 provides the power to handle these new models for faster training of advanced AI. A single HGX-2 replaces 300 CPU-powered servers, saving significant cost, space, and energy in the data center.

    Machine Learning: HGX-2 544X Speedup Compared to CPU-Only Server Nodes

    GPU Measurements Completed on DGX-2 | CPU: 20 CPU cluster- comparison is prorated to 1 CPU (61 GB of memory, 8 vCPUs, 64-bit platform), Apache Spark | US Mortgage Data Fannie Mae and Freddie Mac 2006-2017 | 146M mortgages | Benchmark 200GB CSV dataset | Data preparation includes joins, variable transformations

    Driving Next-Generation AI Machine Learning to Faster Performance

    AI machine learning models require loading, transforming and processing extremely large datasets to glean insights. With 0.5TB of unified memory accessible at a bandwidth of 16TB/s, and all-to-all GPU communications with NVSwitch, HGX-2 has the power to load and perform calculations on enormous datasets to derive actionable insights quickly. With RAPIDS open source machine learning software, a single HGX-2 replaces 544 CPU-based servers, generating significant cost and space savings.

    HPC: HGX-2 Replaces up to 135 CPU-Only Server Nodes

    Application (Dataset): MILC (APEX Medium) and Chroma (szscl21_24_128) | CPU Server: Dual-Socket Intel Xeon Platinum 8280 (Cascade Lake)

    The Highest-Performing HPC Supernode

    HPC applications require strong server nodes with the computing power to perform a massive number of calculations per second. Increasing the compute density of each node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For HPC simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors to facilitate computation, making GPUs connected by NVSwitch ideal. A single HGX-2 server replaces up to 135 CPU based servers in science applications.

    NVSwitch for Full-Bandwidth Computing

    NVSwitch enables every GPU to communicate with every other GPU at full bandwidth of 2.4TB/sec to solve the largest of AI and HPC problems. Every GPU has full access to 0.5TB of aggregate HBM2 memory at a bandwidth of 16TB/s to handle the most massive of datasets. By enabling a unified server node, NVSwitch dramatically accelerates complex AI deep learning, AI machine learning, and HPC applications.

    NVSwitch for Full-Bandwidth Computing


    HGX-1 HGX-2
    Performance 1 petaFLOP tensor operations
    125 teraFLOPS single-precision
    62 teraFLOPS double-precision
    2 petaFLOPS tensor operations
    250 teraFLOPS single-precision
    125 teraFLOPS double-precision
    GPUs 8x NVIDIA Tesla V100 16x NVIDIA Tesla V100
    GPU Memory 256GB total
    7.2TB/s bandwidth
    512GB total
    16TB/s bandwidth
    NVIDIA CUDA? Cores 40,960 81,920
    NVIDIA Tensor Cores 5,120 10,240
    Communication Channel Hybrid cube mesh powered by NVLink 300GB/s bisection bandwidth NVSwitch powered by NVLink 2.4TB/s bisection bandwidth

    HGX-1 Reference Architecture

    Powered by NVIDIA Tesla GPUs and NVLink

    NVIDIA HGX-1 is a reference architecture that standardized the design of data centers accelerating AI in the cloud. Based on eight Tesla SXM2 V100 boards, a hybrid cube mesh topology for scalability, and 1 petaFLOP of compute power, its modular design works seamlessly in hyperscale data centers and delivers a quick, simple path to AI.

    Empowering the Data Center Ecosystem

    NVIDIA partners with the world’s leading manufacturers to rapidly advance AI cloud computing. NVIDIA provides HGX-2 GPU baseboards, design guidelines, and early access to GPU computing technologies for partners to integrate into servers and deliver at scale to their data center ecosystems.