Dr. Peter Uhrig
Lehrstuhl Anglistik: Linguistik
Department Anglistik/Amerikanistik und Romanistik
In the course of this project, automatic gesture recognition and forced alignment of audio and subtitles was carried out to allow for the large-scale analysis of co-speech gesture.
Co-speech gesture is highly relevant to the study of human face-to-face and media communication, as we can see in the following video snippet, where the speaker’s doubts about Hillary Clinton’s illness (during the 2016 presidential campaign) are expressed with air quotes and would be hard to understand if just the audio signal was available.
However, large collections of recordings annotated for gesture do not exist. In order to remedy that situation and facilitate research on co-speech gesture, a large collection of 36.000 video recordings was annotated with a gesture detection system developed specifically for that purpose at Case Western Reserve University. In combination with forced alignment using Gentle, we can now query linguistic expression and restrict our results to items with co-speech gesture (within the limits of the system’s accuracy), saving enormous amounts of time compared to a manual inspection of all examples. In the following video, we can observe the automatic annotations on a short video snippet, which work best when the speaker is alone on the screen and relatively large.
This project made use of most CPU clusters at FAU, where the RAM of the machines was large enough (LiMa, TinyEth, TinyFat, Emmy, Meggie). Containerization with Singularity and a central queue with Apache ActiveMQ allowed for seamless portability between clusters and optimal resource utilization.