dc.contributor.advisor | Zeman, Daniel | |
dc.creator | Oluokun, Adedayo | |
dc.date.accessioned | 2018-10-02T17:51:39Z | |
dc.date.available | 2018-10-02T17:51:39Z | |
dc.date.issued | 2018 | |
dc.identifier.uri | http://hdl.handle.net/20.500.11956/101633 | |
dc.description.abstract | The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe. | en_US |
dc.language | English | cs_CZ |
dc.language.iso | en_US | |
dc.publisher | Univerzita Karlova, Matematicko-fyzikální fakulta | cs_CZ |
dc.subject | dependency parsing | en_US |
dc.subject | annotation | en_US |
dc.subject | parallel data | en_US |
dc.subject | projection | en_US |
dc.subject | UDPipe | en_US |
dc.subject | part-of-speech tagging | en_US |
dc.subject | low-resource | en_US |
dc.subject | závislostní syntax | cs_CZ |
dc.subject | universal dependencies | cs_CZ |
dc.subject | jazyky s nedostatečnými zdroji | cs_CZ |
dc.title | Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat | en_US |
dc.type | diplomová práce | cs_CZ |
dcterms.created | 2018 | |
dcterms.dateAccepted | 2018-09-11 | |
dc.description.department | Ústav formální a aplikované lingvistiky | cs_CZ |
dc.description.department | Institute of Formal and Applied Linguistics | en_US |
dc.description.faculty | Faculty of Mathematics and Physics | en_US |
dc.description.faculty | Matematicko-fyzikální fakulta | cs_CZ |
dc.identifier.repId | 200733 | |
dc.title.translated | Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat | cs_CZ |
dc.contributor.referee | Rosa, Rudolf | |
thesis.degree.name | Mgr. | |
thesis.degree.level | navazující magisterské | cs_CZ |
thesis.degree.discipline | Computational Linguistics | en_US |
thesis.degree.discipline | Matematická lingvistika | cs_CZ |
thesis.degree.program | Informatika | cs_CZ |
thesis.degree.program | Computer Science | en_US |
uk.thesis.type | diplomová práce | cs_CZ |
uk.taxonomy.organization-cs | Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky | cs_CZ |
uk.taxonomy.organization-en | Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics | en_US |
uk.faculty-name.cs | Matematicko-fyzikální fakulta | cs_CZ |
uk.faculty-name.en | Faculty of Mathematics and Physics | en_US |
uk.faculty-abbr.cs | MFF | cs_CZ |
uk.degree-discipline.cs | Matematická lingvistika | cs_CZ |
uk.degree-discipline.en | Computational Linguistics | en_US |
uk.degree-program.cs | Informatika | cs_CZ |
uk.degree-program.en | Computer Science | en_US |
thesis.grade.cs | Velmi dobře | cs_CZ |
thesis.grade.en | Very good | en_US |
uk.abstract.en | The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe. | en_US |
uk.file-availability | V | |
uk.publication.place | Praha | cs_CZ |
uk.grantor | Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky | cs_CZ |
thesis.grade.code | 2 | |