Majority of the previous works in task-oriented systems, question generator used dialogue history to generate the next question in the conversation. So far, the understanding was dialogue history, along with answers aided in anticipating the best probable future question. There is also a paper on answer-driven visual estimation to improve the task success rate. However there is not much analysis on how the dialogue history is impacting the oracle’s responses. How oracle can use dialog history? This is the theme for today’s blog post. The blog post discusses about the paper, Region under Discussion for visual dialog
Authors: Mazuecos et al
Published: at EMNLP-2021
Github Repo: Region under Discussion for visual dialog
- Core Idea: Analysis of dialogue history and its impact on Oracle modelling
- Proposes a novel interpretable representation that visually grounds dialog history
- Two Oracle models are extended (i) Question+Category+Spatial (QCS) baseline (ii) LXMERT-based cross-modal Oracle (CMO)
- QCS+RuD and CMO+RuD show an increment of 41% and 46% respectively
- Released a manually annotated subset of history dependent questions
The proposed architecture and the sample dialogue are given below
Authors observations:
- Spatial, color and size questions are relative and can have their meaning changed due to the RuD (Region Under Discussion)
- Visual questions dependent on dialog history do not contain more pronouns and ellipses than history independent visual questions
- RuD is capturing the region of the image on which the history dependent question is being interpreted
- Many of these history dependent questions come from an object question that has already identified the category of the target object and now are looking for another salient object to univocally identifying it.
- Only a low percentage of questions (12%) are indeed history dependent in the Guess what?! dataset
#visual #dialouge #hallucinations #guesswhat #task-oriented