A modular benchmarking infrastructure for high-performance deep learning — from the single operator to distributed training.
Deep Learning is used for many of today's data analysis tasks. It is a high-performance computing problem that requires high utilization of computing devices, collective communication, and fast parallel I/O for feeding samples into training. This richness of the domain raises an important question: How can we benchmark software and hardware for large-scale deep learning?
The key issue, due to the complex nature of these workloads, is that there is no single metric by which one neural network or hardware is objectively better than another on all counts. This is an open question that we wish to tackle in this benchmark. We measure multiple metrics (e.g., throughput, communication volume, time-to-solution) for hardware and algorithm ranking, enabling a fair and reproducible ground for competition using a modular benchmarking meta-framework.