HPC mixed precision quantization of encoder-decoder deep neural networks

Applicant

Prof. Dr. Andreas Kist
Juniorprofessur für Artificial Intelligence in Communication Disorders
Department for Artificial Intelligence in Biomedical Engineering
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

Encoder-decoder deep neural networks are common architectural designs for allowing semantic segmentation of sceneries, for example the segmentation of streets in autonomously driving cars or the segmentation of cancer tissue in brain MRIs. These architectures are, however, not optimized for embedded hardware, where only specific datatypes are available, such as int8, compared to float32 on GPUs. In previous works, we have shown that quantization of encoder-decoder networks enables these deep neural networks being successfully deployed to AI accelerator hardware, such as the Edge TPU (Kist and Döllinger, IEEE Access 2020).

The current process is, however, limited to training the full model in 32-bit precision and then post-hock quantize the model to int8. We hypothesize that utilizing mixed precision training would already allow to find optimal precision needed for successful training and lossless quantization. We believe that in conjunction with HPC resources, we are able to develop a dynamical training process that utilizes the parallel evaluation of several mixed precision runs to not only determine optimized deep neural networks, but also allow architectural design optimization through AutoML-techniques.

With the KONWIHR short project funding, we aim to achieve the following aims:
(i) enable mixed precision training on HPC resources
(ii) allow quantization together with mixed precision training on HPC resources
(iii) enable AutoML to optimize architecture selection
(iv) evaluate and test int8-converted models on their portability to the Edge TPU and their pixel-wise classification accuracy in semantic segmentation tasks.