Deep Generative Models for Text-to-Image Generation
The MSc thesis of Andrey Sukhobok studies deep generative models with discrete latent codes for text-to-image generation.
Deep generative models have achieved impressive results in the text-to-image generation task which is the task of generating a visual scene based on its textual description. The Master of Science thesis by Andrey Sukhobok studies a recently proposed model called DALL-E which contains a transformer-based generative model over discrete codes that represent images. The thesis studies different architectural choices of DALL-E using an artificial dataset and the CUB dataset, which contains images of birds. The obtained results suggest that using a separate encoder to process textual descriptions yields better results compared to a model that processes text and image tokens using a single encoder.
More information can be found in the thesis here.
This thesis work was partially funded by HIIT.
Contact: Alexander Ilin, Aalto University
Figure from Andrey Sukhobok’s MSc thesis.