Indonesian-English Neural Machine Translation
Indonésko-anglický neuronový strojový překlad
diploma thesis (DEFENDED)

View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/109425Identifiers
Study Information System: 211042
Collections
- Kvalifikační práce [11325]
Author
Advisor
Referee
Novák, Michal
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
Computational Linguistics
Department
Institute of Formal and Applied Linguistics
Date of defense
9. 9. 2019
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
English
Grade
Excellent
Keywords (Czech)
strojový překlad, hluboké neuronové sítě, Transformer, indonéštinaKeywords (English)
machine translation, deep neural networks, Transformer, IndonesianTitle: Indonesian-English Neural Machine Translation Author: Meisyarah Dwiastuti Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Martin Popel, Ph.D., Institute of Formal and Applied Linguis- tics Abstract: In this thesis, we conduct a study on neural machine translation (NMT) for an under-studied language, Indonesian, specifically for English-Indonesian (EN-ID) and Indonesian-English (ID-EN) in a low-resource domain, TED talks. Our goal is to implement domain adaptation methods to improve the low-resource EN-ID and ID-EN NMT systems. First, we implement model fine-tuning method for EN-ID and ID-EN NMT systems by leveraging a large parallel corpus contain- ing movie subtitles. Our analysis shows the benefit of this method for the improve- ment of both systems. Second, we improve our ID-EN NMT system by leveraging English monolingual corpora through back-translation. Our back-translation ex- periments focus on how to incorporate the back-translated monolingual corpora to the training set, in which we investigate various existing training regimes and introduce a novel 4-way-concat training regime. We also analyze the effect of fine- tuning our back-translation models with different scenarios. Experimental results show that our method of implementing back-translation followed by model...