[N] HGX-2 Deep Learning Benchmarks: The 81,920 CUDA Core “Behemoth” GPU Server

Written by torontoai on August 15, 2019. Posted in Reddit MachineLearning.

Deep learning benchmarks for TensorFlow on Exxact TensorEX HGX-2 Server.

Original Post from Exxact Here

Notable GPU Server Features

16x NVIDIA Tesla V100 SXM3
81,920 NVIDIA CUDA Cores
10,240 NVIDIA Tensor Cores
.5TB Total GPU Memory
NVSwitch powered by NVLink 2.4TB/sec aggregate speed

Tests were run on ResNet-50, ResNet-152, Inception V3, VGG-16. Also compared FP16 to FP32 performance, and used batch size of 256 (except for ResNet152 FP32, the batch size was 64). Same tests run using 1,2,4,8 and 16 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

For the full write-up + tables and numbers visit: https://blog.exxactcorp.com/hgx2-benchmarks-for-deep-learning-in-tensorflow-16x-v100-exxact-tensorex-server/

submitted by /u/exxact-jm
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[N] HGX-2 Deep Learning Benchmarks: The 81,920 CUDA Core “Behemoth” GPU Server