Computer vision and natural language processing are normally thought of as two separate research fields. However, significant merger is happening in some areas, such as image and video captioning and visual question answering. This creates interesting opportunities for collaboration.
The Empirical Methods in Natural Language Processing conference (EMNLP) has opened a special track on vision and language. Hamed Rezazadegan Tavakoli, a postdoctoral researcher at Aalto University, attended the 2018 conference in Brussels. Tavakoli and his collaborators build systems that perceive their environment and describe it using natural language. During the conference, he met with Professor Noah Smith from the University of Washington to discuss the grounds for a mutual project.