Multilingual Learning using Syntactic Multi-Task Training
Vícejazyčné učení pomocí víceúlohového trénování syntaxe
diplomová práce (OBHÁJENO)

Zobrazit/ otevřít
Trvalý odkaz
http://hdl.handle.net/20.500.11956/107286Identifikátory
SIS: 211732
Kolekce
- Kvalifikační práce [11322]
Autor
Vedoucí práce
Oponent práce
Mareček, David
Fakulta / součást
Matematicko-fyzikální fakulta
Obor
Matematická lingvistika
Katedra / ústav / klinika
Ústav formální a aplikované lingvistiky
Datum obhajoby
11. 6. 2019
Nakladatel
Univerzita Karlova, Matematicko-fyzikální fakultaJazyk
Angličtina
Známka
Výborně
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capable of learning tasks across several languages. However, typical recurrent neural models fail to scale beyond a small number of related lan- guages and can be quite detrimental if multiple distant languages are grouped together for training. This thesis introduces a simple method that does not have this scaling problem, producing a single multi-task model that predicts universal part-of-speech, morphological features, lemmas, and dependency trees simultane- ously for 124 Universal Dependencies treebanks across 75 languages. By leverag- ing the multilingual BERT model pretrained on 104 languages, we apply several modifications and fine-tune it on all available Universal Dependencies training data. The resulting model, we call UDify, can closely match or exceed state-of- the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components. We evaluate UDify for multilingual learning, showing that low-resource languages benefit the most from cross-linguistic annotations. We also evaluate UDify for zero-shot learning, with results suggesting that multilingual training provides strong UD predictions even for languages that neither UDify nor BERT...