DynaTok: Flexible KV caching for selectively changing tokens during LLM training

Applicant

Prof. Dr. Timo Baumann

Project Overview

Large language models (LLMs) struggle with strictly adhering to formal requirements when generating responses. Additionally,  LLMs need to be able to adapt to changes in these requirements during generation, which is especially key in human-machine interaction tasks.
In our previous work, we implemented adaptation to changing requirements by dynamically changing the prompt during generation. By properly utilizing KV caching, our implementation matches the generation performance of the inference process with static prompts closely.
Both the results from our previous work and the available literature, however, emphasize the importance of finetuning on LLMs performance for a given task. Importantly, the most common training approach used for static prompts, Supervised Finetuning (SFT), can’t be applied to dynamically changing prompts easily: SFT performs a backward pass for all tokens simultaneously, as it uses masking to enforce a causal relationship. This is not possible if the prompt changes dynamically, as the dynamic tokens can’t be in multiple states, thereby not allowing the model to learn the relationship between the different states and the generated tokens.
We have adapted SFT to generation with dynamic prompts by taking the state of the whole text at some point during generation and masking out all tokens for which the dynamic tokens are in the wrong state. This is inherently inefficient, as most of the performed forward passes are discarded by masking them out, wasting compute resources. It also proved highly unstable in our experiments.
The project objective is to develop an efficient and effective mechanism for training LLMs on tasks with dynamically changing tokens, centered around adapting the existing SFT approach as necessary. This approach will then be implemented and integrated into the existing software stack centered around the transformers and TRL python libraries and be oriented towards large, state of the art LLMs.