In continuation with the previous post, here are some more datasets that are useful to work with Visual Dialogue Systems.
VFD:
The dataset was first introduced in the paper: A Visually-grounded First-person Dialogue Dataset with Verbal and Non-verbal Responses
Images: Used the 34,775 first-person images with eye-gaze annotations in the GazeFollow dataset.
Dialogues: Yahoo! Crowd Sourcing is used to collect the verbal and non-verbal responses for a given image. A total of 308,793 verbal and 81,867 non-verbal responses are collected after removing noisy/trivial responses.
The dataset can be downloaded from here
An example from the dataset is shown below
U, V, N denote an utterance, a verbal response, and a non-verbal response respectively
PhotoBook:
The dataset was first introduced in the paper: The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
Images: MS COCO dataset images are used.
Dialogues: Unlike other visual dialogue datasets where a player acts as questioner and other as answerer/guesser, in Photobook Dataset, there are no such roles. Both players converse freely and naturally leading to normal conversation. AMT is used to curate the dialogues.
The dataset consists of 164,615 utterances, 130,322 actions and spans a vocabulary of 11,805 unique tokens
The dataset can be downloaded from the github page and the source code from here.
An example from the dataset is shown below
IGC: Event-Centric Conversations on Images
The dataset was first introduced in the paper: Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
Images: A sample of eventful images from VQG Dataset.
The dataset can be downloaded here and the baseline code from here
An example from the dataset is shown below
Image-Chat:
The dataset was first introduced in the paper: Image-Chat: Engaging Grounded Conversations
Images: Images are randomly selected from YFCC100M Dataset.
The dataset can be downloaded from here and the baseline code is available here
An example from the dataset is shown below
#datasets #visual #dialog #task-oriented