LexicoLLM – Leveraging Large Language Models for Lexicography


Prof. Dr. Peter Uhrig
Digital Linguistics with a focus on Big Data
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

Large Language Models (LLMs) such as BERT (Devlin et al. 2019) or GPT (Radford et al. 2019) have extremely extensive knowledge of language, as they are trained on over 1000x more language material than a single person experiences in the course of their life. In the public perception, these systems (e.g. ChatGPT 4.0) are often ascribed the characteristics of Artificial General Intelligence (AGI) due to their excellent  language production, although they are not by design. Within the scope of their capabilities, however, LLMs offer excellent opportunities to accelerate text production, which can be further optimized through
various types of fine-tuning and other adaptations.

The project proposed here aims to investigate how LLMs can be made to formulate the best possible dictionary entries, both for general dictionaries (e.g. learner’s dictionaries such as the Oxford Advanced Learner’s Dictionary) and for specialized dictionaries (e.g. collocation dictionaries such as the Oxford Collocations Dictionary for Students of English), or at least to generate a draft of the entry that minimizes the manual work required of the lexicographer.