Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat

Oluokun, Adedayo

Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (151.8Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/101633

Identifikátory

SIS: 200733

Oponent práce

Rosa, Rudolf

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Matematická lingvistika

Katedra / ústav / klinika

Ústav formální a aplikované lingvistiky

Datum obhajoby

11. 9. 2018

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Velmi dobře

Klíčová slova (česky)

závislostní syntax, universal dependencies, jazyky s nedostatečnými zdroji

Klíčová slova (anglicky)

dependency parsing, annotation, parallel data, projection, UDPipe, part-of-speech tagging, low-resource

The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.

Citace dokumentu

Metadata

Zobrazit celý záznam