Multilingual Learning using Syntactic Multi-Task Training

Kondratyuk, Daniel

Vícejazyčné učení pomocí víceúlohového trénování syntaxe

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (151.5Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/107286

Identifikátory

SIS: 211732

Oponent práce

Mareček, David

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Matematická lingvistika

Katedra / ústav / klinika

Ústav formální a aplikované lingvistiky

Datum obhajoby

11. 6. 2019

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Výborně

Recent research has shown promise in multilingual modeling, demonstrating how a single model is capable of learning tasks across several languages. However, typical recurrent neural models fail to scale beyond a small number of related lan- guages and can be quite detrimental if multiple distant languages are grouped together for training. This thesis introduces a simple method that does not have this scaling problem, producing a single multi-task model that predicts universal part-of-speech, morphological features, lemmas, and dependency trees simultane- ously for 124 Universal Dependencies treebanks across 75 languages. By leverag- ing the multilingual BERT model pretrained on 104 languages, we apply several modifications and fine-tune it on all available Universal Dependencies training data. The resulting model, we call UDify, can closely match or exceed state-of- the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components. We evaluate UDify for multilingual learning, showing that low-resource languages benefit the most from cross-linguistic annotations. We also evaluate UDify for zero-shot learning, with results suggesting that multilingual training provides strong UD predictions even for languages that neither UDify nor BERT...

Citace dokumentu

Metadata

Zobrazit celý záznam