Indonesian-English Neural Machine Translation
Indonésko-anglický neuronový strojový překlad
diplomová práce (OBHÁJENO)

Zobrazit/ otevřít
Trvalý odkaz
http://hdl.handle.net/20.500.11956/109425Identifikátory
SIS: 211042
Kolekce
- Kvalifikační práce [11320]
Autor
Vedoucí práce
Oponent práce
Novák, Michal
Fakulta / součást
Matematicko-fyzikální fakulta
Obor
Matematická lingvistika
Katedra / ústav / klinika
Ústav formální a aplikované lingvistiky
Datum obhajoby
9. 9. 2019
Nakladatel
Univerzita Karlova, Matematicko-fyzikální fakultaJazyk
Angličtina
Známka
Výborně
Klíčová slova (česky)
strojový překlad, hluboké neuronové sítě, Transformer, indonéštinaKlíčová slova (anglicky)
machine translation, deep neural networks, Transformer, IndonesianTitle: Indonesian-English Neural Machine Translation Author: Meisyarah Dwiastuti Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Martin Popel, Ph.D., Institute of Formal and Applied Linguis- tics Abstract: In this thesis, we conduct a study on neural machine translation (NMT) for an under-studied language, Indonesian, specifically for English-Indonesian (EN-ID) and Indonesian-English (ID-EN) in a low-resource domain, TED talks. Our goal is to implement domain adaptation methods to improve the low-resource EN-ID and ID-EN NMT systems. First, we implement model fine-tuning method for EN-ID and ID-EN NMT systems by leveraging a large parallel corpus contain- ing movie subtitles. Our analysis shows the benefit of this method for the improve- ment of both systems. Second, we improve our ID-EN NMT system by leveraging English monolingual corpora through back-translation. Our back-translation ex- periments focus on how to incorporate the back-translated monolingual corpora to the training set, in which we investigate various existing training regimes and introduce a novel 4-way-concat training regime. We also analyze the effect of fine- tuning our back-translation models with different scenarios. Experimental results show that our method of implementing back-translation followed by model...