Deploy a solution that delivers the compute-intensive resources needed to run AI workloads on existing high performance computing (HPC) clusters.
Base and Plus Configurations for Intel® Select Solutions for HPC & AI Converged Clusters [Univa]
Ingredient | Intel Select Solutions for HPC & AI Converged Clusters [Univa] Base Configuration Details | Intel Select Solutions for HPC & AI Converged Clusters [Univa] Plus Configuration Details |
---|---|---|
Workload Domain (Minimum 4-Compute-Node Configuration) | ||
Platform | Dual-socket server platform | Dual-socket server platform |
Processor | 2 × Intel® Xeon® Gold 6126 processor (2.60 GHz, 12 cores, 24 threads), Intel® Xeon® Gold 6226 processor (2.70 GHz, 12 cores, 24 threads), or a higher model number Intel® Xeon® Scalable processor | 2 × Intel® Xeon® Gold 6252 processor (2.10 GHz, 24 cores, 48 threads), or a higher model number Intel Xeon Scalable processor |
Memory | 192 GB | 192 GB |
Boot Drive | 240 GB Intel® SSD Data Center (Intel® SSD DC) S3520 SAA 3.0, 6 Gbps or equivalent | 240 GB Intel SSD DC S3520 SAA 3.0, 6 Gbps or equivalent |
Storage | HPC parallel file system (470 megabits per second [Mbps] per client) | HPC parallel file system (470 Mbps per client) |
Messaging Fabric | Intel® Omni-Path Host Fabric Interface (Intel® OP HFI) adapter 100 series | Intel Omni-Path Host Fabric Interface (Intel OP HFI) adapter 100 series |
Management Network Switch | 10 GbE switch | 10 GbE switch |
Scheduling Software | Univa Grid Engine and Univa Universal Resource Broker* | Univa Grid Engine and Univa Universal Resource Broker |
Software | Linux operating system Intel® Cluster Checker 2019 OpenHPC** Intel® Omni-Path Fabric (Intel OP Fabric) software Intel® Parallel Studio XE 2019 Cluster Edition** Apache Spark TensorFlow Horovod |
Linux operating system Intel Cluster Checker 2019 OpenHPC** Intel Omni-Path Fabric (Intel OP Fabric) software Intel Parallel Studio XE 2019 Cluster Edition** Apache Spark TensorFlow Horovod |
Management Domain | ||
Management Network | Integrated 10 GbE** | Integrated 10 GbE** |
Firmware and Software Optimizations | Intel® Hyper-Threading Technology (Intel® HT Technology) enabled Intel® Turbo Boost Technology enabled XPT prefetch enabled |
Intel HT Technology enabled Intel Turbo Boost Technology enabled XPT prefetch enabled |
Minimum Performance Standards In addition to the benchmarks below, performance of simulation and modeling applications on HPC and AI converged clusters will be comparable to the levels specified in the Intel Select Solutions for Simulation and Modeling solution brief. |
||
Algorithm/Test | Training/inference | Using Univa 4-node cluster (images/sec) |
ResNet50 Int8 | Inference | 6,300 |
ResNet50 | Training | 400 |
Business Value of Choosing a Plus Configuration Over a Base Configuration | The Plus configuration allows faster time to train an AI model with its increased compute capabilities, and AI inferencing sees decreased time to insights due to Intel Deep Learning Boost (Intel DL Boost). | |
**Recommended, not required |