Efficient Training Principles for Small Neural Networks on High-Performance Computing (HPC) Systems

Applicant

Prof. Dr. Andreas M. Kist
Artificial Intelligence in Communication Disorders
Department Artificial Intelligence in Biomedical Engineering (AIBE)
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

In high-performance computing (HPC), most optimization efforts focus on very large neural networks, such as large language models, to reduce the resources required for training. However, prior work by us and others shows that common optimization techniques such as quantization and pruning scale particularly well with increasing model size, while their efficiency decreases for smaller networks. In addition, for small models, neural architecture search (NAS) is often used to identify optimal structures by evaluating thousands of candidate architectures. Despite the small size of individual models, this large-scale evaluation process is computationally expensive.
The goal of this project is to identify and quantify training strategies that improve resource efficiency for small neural networks during NAS, enabling scalable and high-throughput architecture search in an HPC environment. To achieve this, we pursue three complementary approaches. First, we integrate knowledge distillation into our HPC-NAS system and optimize the implementation for near-maximal GPU utilization through dynamic parallelization and mixed- and low-precision techniques on modern GPU architectures. Second, we enhance zero-cost proxy methods to better predict which architectures should be trained and for how long, building on preliminary evidence that suggests substantial reductions in search time.
Third, we develop a REST API that retrieves real-world hardware metrics such as memory footprint, energy consumption, and inference time from an existing online evaluation platform. These measurements enable accurate multi-objective optimization based on actual system behavior rather than simulated estimates. Together, these approaches will be systematically evaluated to balance resource efficiency with NAS performance and to bridge the gap between the requirements of small-model training and the capabilities of high-performance computing systems.

References