HPC mixed precision quantization of encoder-decoder deep neural networks

Applicant

Prof. Dr. Andreas Kist
Juniorprofessur für Artificial Intelligence in Communication Disorders
Department for Artificial Intelligence in Biomedical Engineering
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

Floating point operations are a crucial part of computations in deep neural networks. However, these operations are costly and only computationally efficient on dedicated hardware, such as GPUs. Typically, the datatype float32 (“single precision”) is used for computations and uses 32 bits to represent a single number. Mixed precision, a smart combination of data types with different floating point depths (e.g. float16 and float32), allows potentially more computationally efficient operations and thus, faster training and inference of deep neural networks. TensorCores, a custom designed integrated circuit for matrix multiplications found on GPUs, specifically takes this into account and is capable to dynamically adapt to low-bit data types.

In our KONWIHR project, we investigated the use of mixed precision in a segmentation task relying on encoder-decoder deep neural networks. We evaluated our experiments on the U-Net architecture, a very established and well-parametrized architecture that is easy to deploy and scale to different HPC resources. Using a biomedical dataset, we analyzed the effect of enabling mixed precision on accuracy, training and inference speed across sandboxing and HPC hardware components. We found that we gain higher speed-ups the larger the models are. Using mixed-precision, we could train up to 1.9x faster by retaining the same overall segmentation accuracy, on HPC hardware up to 1.6x. We further compared two established neural network frameworks (TensorFlow/Keras and PyTorch) and found PyTorch is generally slightly faster than TensorFlow, but TensorFlow takes better advantage of the mixed precision setting.

Taken together, we found that by simple technical changes in the network training procedure, i.e. enabling mixed precision, we train faster at the same overall accuracies, resulting in shorter training times and thus, less energy consumption. This allows us to utilize existing hardware better, allow more throughput and reduce the CO2 footprint largely for a given experiment.

Publications

M. Dörrich, M. Fan and A. M. Kist, “Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks,” in IEEE Access, doi: 10.1109/ACCESS.2023.3284388.