Master Thesis defense by Anton Golles

Title: Text-conditioned Reverse Diffusion on a Latent Representation of Humanoid Motion

Abstract: Character animation is an extensive, time-consuming process in developing animated films and video games. We present a method for synthesizing humanoid motion sequences with a neural network that can be trained on a laptop, making it accessible to anyone. We explain the intricacies of learning the time-reversed diffusion process on a latent representation of high-dimensional, sequential data.

By off-loading some compression work from a transformer encoder to a linear encoder, we obtain a bigger model, but one that is easier to train. In exploring the latent diffusion model, we implement the pipeline for the MNIST dataset, a sub-task that grants perspective and guides our later design choices. We find that the forward diffusion process on a latent representation does not lead to exposure to our entire embedding space, limiting the possibility of learning the reverse diffusion process in the latent space.

Niels Bohr Institute

Master Thesis defense by Anton Golles

Details