Big Data and advances in Artificial Technology have outstripped current processor technology. Compute Accelerators and GPUs have filled the gap providing thousands of compute cores, field programmable gate arrays, specialised AI-specific cores and high bandwidth interconnection.
Most systems are configured with dual processors, a lot of RAM (starts at 256GB but much more is common), high speed SAS SSDs or NVMe, multiple high wattage power supplies and multiple compute accelerators or GPUs.
A specialised bus is also quite common at the high-end allowing faster interconnect between GPUs.
What Makes a High-Performance Compute server
Different?
High Performance Compute servers generally use
the highest performance processors available. These generally have a high clock
rate and a large number of cores.
AMD
currently have the performance edge with up to 64 cores at 3.4GHZ. They also
offer a range of 32 core processors.
Intel
has the 28 core 8280 at 2.7ghz shipping and announced 56 core processors having
discontinued its Phi 64-72 core processors.
HPC
Processors are low volume and very expensive items with each processor costing
tens of thousands.
We have
selected the fastest and latest processors in our high-end systems and offer
slightly slower processors in our mid-range systems.
We have a number of our HPC Servers offering Dual AMD 64 core. We also offer multiple Intel 28 core processors in a number of our systems |
Most of the high-performance servers run
DDR4-2999GHZ RAM although faster RAM is available it is often not supported by
the Equipment manufacturers.
Most
high-performance Compute servers have a minimum of 512GB RAM although systems
with 3TB or more are available.
There is
also use of Intel Optane DC Persistent RAM available in 512GB modules to
massively expand the maximum memory size and provide non-volatile storage for
in-RAM databases.
We
have the complete Intel Optane Range available and able to be included in many
of our HPC Compute configurations.
The industry standard PCIE3 bus is currently
being updated to PCIE4 effectively doubling the transfer rate from 32GB/sec to
64GB/sec.
For
systems with a large number of GPUs Nvidia has SXM3/NVLINK bypassing the PCIe
bus and providing up to 300GB/sec between GPUs.
We have the largest range of SXM3 systems available in Australia and new PCIE4-based systems |
The HPSS Collaboration is an organisation specifically addressing High Performance Computing storage requirements.
PetaFlops understands storage performance is critical and so SAS SSD technology is preferred. Many systems are also utilising M2.NVMe for its advantages in transfer rates.
As you can see from the chart, Intel Optane offer extremely low drive latency and the use of the memory slots avoids significant controller latency.
The use of Intel Optane DC RAM (see above) is perfect for certain HPC operations but M2.NVME and SAS SSDs are more commonly used.
We offer M2.NVMe in several of our High-Performance Systems. Others have arrays of high performance SAS SSD. |
The Infiniband 200GB/sec is still popular for
multi compute-node supercomputers but 100GB/Sec ethernet products are also
considered.
For
industry applications a minimum of 10GBE ethernet is recommended.
All
Petaflops systems come with a minimum of 10GBE network connections.
Nvidia has recently
upgraded its V100 to V100s and this has been measured at 8.2TFLOP for FP64
double-precision floating point operations.
Its V100 and V100S both have 5120 CUDA compute cores and 640 AI specific Tensor
cores per board. With 8 board systems performance of over 50TFLOPS with
more than 40000 compute cores and 5000 tensor cores are possible. |
AMD have the Radeon Instinct Series Accelerators and the latest M150 accelerators support PCIe4 and have been benchmarked at 6.6TFLOPs for FP64 double-precision floating point operations.
We support the AMD accelerators in some of our Gigabyte and HPe server offerings.
Intel’s FPGA (field programmable gate array) cards are often used to implement algorithms in hardware providing the highest possible performance for these tasks. Sometimes this has applicability in Deep Learning models or in financial systems. They require significant programming effort.
Intel FGPA cards are supported in a few of our HPC systems. Please
contact us for details.
The Alveo U250 is an impressive PCIe3 GPU and is
supported in a limited number of HPC servers including the HPe DL380G10
How Many GPUs?
In most traditional PCIe servers the limit is 4 double width GPUs (or 8
lower powered single width GPUs) supported by specialised power supplies (eg
dual 2000W with 15A power required).
Up to 8 SXM GPU modules are supported in a number of supercomputers we
have available.
The Nvidia DGX-2 system supports up to 16 GPUs in one system making it
the world’s most powerful deep learning server. Nvidia have linked 96 of the
DGX-2 servers into one supercomputer (in the top 100) with 8.8million CUDA and
Tensor cores and 12000 TFLOP performance.
Systems with 2000W or more power supplies are
common and necessary to support the power requirements of the GPUs.
High
performance fans are standard and many vendors offer additional cooling options
such as liquid cooling.
The
systems generally require multiple 15A connections.
Single High-Performance compute can provide more performance than 100 CPU-only nodes.
However, to scale to high-end supercomputers leveraging massive parallel processing it is necessary to interconnect these computers. This is generally achieved with extremely high-speed networking.
Up to 200Gbit/sec networking is available with Mellanox Infiniband and Ethernet. Standards are being finalised for much faster solutions up to 1TBit/sec.
.
.
.
.