Visual question answering (VQA) is a fundamental task in vision-and-language research and has attracted considerable attention from computer vision (CV), natural language processing (NLP) and other diverse artificial intelligence communities. VQA connects CV and NLP, thereby stimulating research and expanding the limits of both fields. In the most common form of VQA, the computer is presented with an image and a textual question regarding the image. Subsequently, the computer must determine the correct answer and present it in a few words or a short phrase. Variants include binary (yes/no) and multiple-choice settings, in which candidate answers are proposed.
نظرات کاربران