Grounding Natural Language Inference on Images

Vu Trong, Hoa

Vyvozování v přirozeném jazyce s využitím obrazových dat

dc.contributor.advisor	Pecina, Pavel
dc.creator	Vu Trong, Hoa
dc.date.accessioned	2018-10-02T17:31:17Z
dc.date.available	2018-10-02T17:31:17Z
dc.date.issued	2018
dc.identifier.uri	http://hdl.handle.net/20.500.11956/101573
dc.description.abstract	Grounding Natural Language Inference on Images Hoa Trong VU July 20, 2018 Abstract Despite the surge of research interest in problems involving linguistic and vi- sual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contra- dicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilat- eral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made pub- licly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model. References 1	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	Grounding Natural Language Inference on Images	en_US
dc.subject	vyvozování v přirozeném jazyce	cs_CZ
dc.title	Grounding Natural Language Inference on Images	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2018
dcterms.dateAccepted	2018-09-11
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	191640
dc.title.translated	Vyvozování v přirozeném jazyce s využitím obrazových dat	cs_CZ
dc.contributor.referee	Libovický, Jindřich
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Computational Linguistics	en_US
thesis.degree.discipline	Matematická lingvistika	cs_CZ
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Matematická lingvistika	cs_CZ
uk.degree-discipline.en	Computational Linguistics	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Velmi dobře	cs_CZ
thesis.grade.en	Very good	en_US
uk.abstract.en	Grounding Natural Language Inference on Images Hoa Trong VU July 20, 2018 Abstract Despite the surge of research interest in problems involving linguistic and vi- sual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contra- dicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilat- eral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made pub- licly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model. References 1	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
thesis.grade.code	2

Soubory tohoto záznamu

Název:: 120309471.pdf
Velikost:: 4.855Mb
Formát:: application/pdf
Popis:: Text práce

Zobrazit/otevřít

Název:: 120309468.pdf
Velikost:: 49.33Kb
Formát:: application/pdf
Popis:: Abstrakt (anglicky)

Zobrazit/otevřít

Název:: 120317523.pdf
Velikost:: 19.54Kb
Formát:: application/pdf
Popis:: Posudek vedoucího

Zobrazit/otevřít

Název:: 120316353.pdf
Velikost:: 21.49Kb
Formát:: application/pdf
Popis:: Posudek oponenta

Zobrazit/otevřít

Název:: 120317684.pdf
Velikost:: 151.5Kb
Formát:: application/pdf
Popis:: Záznam o průběhu obhajoby

Zobrazit/otevřít

Tento záznam se objevuje v následujících sbírkách

Kvalifikační práce [10869]
Theses

Zobrazit minimální záznam