dc.contributor.advisor | Popel, Martin | |
dc.creator | Dwiastuti, Meisyarah | |
dc.date.accessioned | 2019-10-17T12:12:15Z | |
dc.date.available | 2019-10-17T12:12:15Z | |
dc.date.issued | 2019 | |
dc.identifier.uri | http://hdl.handle.net/20.500.11956/109425 | |
dc.description.abstract | Title: Indonesian-English Neural Machine Translation Author: Meisyarah Dwiastuti Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Martin Popel, Ph.D., Institute of Formal and Applied Linguis- tics Abstract: In this thesis, we conduct a study on neural machine translation (NMT) for an under-studied language, Indonesian, specifically for English-Indonesian (EN-ID) and Indonesian-English (ID-EN) in a low-resource domain, TED talks. Our goal is to implement domain adaptation methods to improve the low-resource EN-ID and ID-EN NMT systems. First, we implement model fine-tuning method for EN-ID and ID-EN NMT systems by leveraging a large parallel corpus contain- ing movie subtitles. Our analysis shows the benefit of this method for the improve- ment of both systems. Second, we improve our ID-EN NMT system by leveraging English monolingual corpora through back-translation. Our back-translation ex- periments focus on how to incorporate the back-translated monolingual corpora to the training set, in which we investigate various existing training regimes and introduce a novel 4-way-concat training regime. We also analyze the effect of fine- tuning our back-translation models with different scenarios. Experimental results show that our method of implementing back-translation followed by model... | en_US |
dc.language | English | cs_CZ |
dc.language.iso | en_US | |
dc.publisher | Univerzita Karlova, Matematicko-fyzikální fakulta | cs_CZ |
dc.subject | strojový překlad | cs_CZ |
dc.subject | hluboké neuronové sítě | cs_CZ |
dc.subject | Transformer | cs_CZ |
dc.subject | indonéština | cs_CZ |
dc.subject | machine translation | en_US |
dc.subject | deep neural networks | en_US |
dc.subject | Transformer | en_US |
dc.subject | Indonesian | en_US |
dc.title | Indonesian-English Neural Machine Translation | en_US |
dc.type | diplomová práce | cs_CZ |
dcterms.created | 2019 | |
dcterms.dateAccepted | 2019-09-09 | |
dc.description.department | Institute of Formal and Applied Linguistics | en_US |
dc.description.department | Ústav formální a aplikované lingvistiky | cs_CZ |
dc.description.faculty | Faculty of Mathematics and Physics | en_US |
dc.description.faculty | Matematicko-fyzikální fakulta | cs_CZ |
dc.identifier.repId | 211042 | |
dc.title.translated | Indonésko-anglický neuronový strojový překlad | cs_CZ |
dc.contributor.referee | Novák, Michal | |
thesis.degree.name | Mgr. | |
thesis.degree.level | navazující magisterské | cs_CZ |
thesis.degree.discipline | Computational Linguistics | en_US |
thesis.degree.discipline | Matematická lingvistika | cs_CZ |
thesis.degree.program | Informatika | cs_CZ |
thesis.degree.program | Computer Science | en_US |
uk.thesis.type | diplomová práce | cs_CZ |
uk.taxonomy.organization-cs | Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky | cs_CZ |
uk.taxonomy.organization-en | Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics | en_US |
uk.faculty-name.cs | Matematicko-fyzikální fakulta | cs_CZ |
uk.faculty-name.en | Faculty of Mathematics and Physics | en_US |
uk.faculty-abbr.cs | MFF | cs_CZ |
uk.degree-discipline.cs | Matematická lingvistika | cs_CZ |
uk.degree-discipline.en | Computational Linguistics | en_US |
uk.degree-program.cs | Informatika | cs_CZ |
uk.degree-program.en | Computer Science | en_US |
thesis.grade.cs | Výborně | cs_CZ |
thesis.grade.en | Excellent | en_US |
uk.abstract.en | Title: Indonesian-English Neural Machine Translation Author: Meisyarah Dwiastuti Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Martin Popel, Ph.D., Institute of Formal and Applied Linguis- tics Abstract: In this thesis, we conduct a study on neural machine translation (NMT) for an under-studied language, Indonesian, specifically for English-Indonesian (EN-ID) and Indonesian-English (ID-EN) in a low-resource domain, TED talks. Our goal is to implement domain adaptation methods to improve the low-resource EN-ID and ID-EN NMT systems. First, we implement model fine-tuning method for EN-ID and ID-EN NMT systems by leveraging a large parallel corpus contain- ing movie subtitles. Our analysis shows the benefit of this method for the improve- ment of both systems. Second, we improve our ID-EN NMT system by leveraging English monolingual corpora through back-translation. Our back-translation ex- periments focus on how to incorporate the back-translated monolingual corpora to the training set, in which we investigate various existing training regimes and introduce a novel 4-way-concat training regime. We also analyze the effect of fine- tuning our back-translation models with different scenarios. Experimental results show that our method of implementing back-translation followed by model... | en_US |
uk.file-availability | V | |
uk.publication.place | Praha | cs_CZ |
uk.grantor | Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky | cs_CZ |
thesis.grade.code | 1 | |