Deep Generative Models for Text-to-Image Generation

The MSc thesis of Andrey Sukhobok studies deep generative models with discrete latent codes for text-to-image generation.

Deep generative models have achieved impressive results in the text-to-image generation task which is the task of generating a visual scene based on its textual description. The Master of Science thesis by Andrey Sukhobok studies a recently proposed model called DALL-E which contains a transformer-based generative model over discrete codes that represent images. The thesis studies different architectural choices of DALL-E using an artificial dataset and the CUB dataset, which contains images of birds. The obtained results suggest that using a separate encoder to process textual descriptions yields better results compared to a model that processes text and image tokens using a single encoder.

More information can be found in the thesis here.

This thesis work was partially funded by HIIT.

Contact: Alexander Ilin, Aalto University

Figure from Andrey Sukhobok’s MSc thesis.