Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT

Mašek, Jan

Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT

dc.contributor.advisor	Žabokrtský, Zdeněk
dc.creator	Mašek, Jan
dc.date.accessioned	2017-05-26T11:49:42Z
dc.date.available	2017-05-26T11:49:42Z
dc.date.issued	2015
dc.identifier.uri	http://hdl.handle.net/20.500.11956/62601
dc.description.abstract	Prostudovali jsme závislostní korpusy, jež jsou součástí projektu HamleDT, a částečně jsme sjednotili soubor značek užitých pro anotaci syntaktické roviny. Následně jsme použili metodu založenou na variačních n-gramech pro automatickou detekci chyb na morfologické a syntaktické rovině. Potom jsme využili výstup morfologického značkovače, respektive závislostního syntaktického analyzátoru pro opravení chyb detekovaných v předchozím kroku. Spolehlivost detekce i opravy chyb na obou anotačních rovinách jsme vyhodnotili na základě náhodně vybraných vzorků nalezených předpokládaných chyb z několika korpusů. Powered by TCPDF (www.tcpdf.org)	cs_CZ
dc.description.abstract	We studied the treebanks included in HamleDT and partially unified their label sets. Afterwards, we used a method based on variation n-grams to automatically detect errors in morphological and dependency annotation. Then we used the output of a part-of-speech tagger / dependency parser trained on each treebank to correct the detected errors. The performance of both the detection and the correction of errors on both annotation levels was manually evaluated on a randomly selected samples of suspected errors from several treebanks. Powered by TCPDF (www.tcpdf.org)	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	závislostní korpusy	cs_CZ
dc.subject	detekce chyb	cs_CZ
dc.subject	oprava chyb	cs_CZ
dc.subject	variační n-gramy	cs_CZ
dc.subject	dependency treebanks	en_US
dc.subject	error detection	en_US
dc.subject	error correction	en_US
dc.subject	variation n-grams	en_US
dc.title	Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2015
dcterms.dateAccepted	2015-06-05
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	149420
dc.title.translated	Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT	cs_CZ
dc.contributor.referee	Mareček, David
dc.identifier.aleph	002004733
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Matematická lingvistika	cs_CZ
thesis.degree.discipline	Computational Linguistics	en_US
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Matematická lingvistika	cs_CZ
uk.degree-discipline.en	Computational Linguistics	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Velmi dobře	cs_CZ
thesis.grade.en	Very good	en_US
uk.abstract.cs	Prostudovali jsme závislostní korpusy, jež jsou součástí projektu HamleDT, a částečně jsme sjednotili soubor značek užitých pro anotaci syntaktické roviny. Následně jsme použili metodu založenou na variačních n-gramech pro automatickou detekci chyb na morfologické a syntaktické rovině. Potom jsme využili výstup morfologického značkovače, respektive závislostního syntaktického analyzátoru pro opravení chyb detekovaných v předchozím kroku. Spolehlivost detekce i opravy chyb na obou anotačních rovinách jsme vyhodnotili na základě náhodně vybraných vzorků nalezených předpokládaných chyb z několika korpusů. Powered by TCPDF (www.tcpdf.org)	cs_CZ
uk.abstract.en	We studied the treebanks included in HamleDT and partially unified their label sets. Afterwards, we used a method based on variation n-grams to automatically detect errors in morphological and dependency annotation. Then we used the output of a part-of-speech tagger / dependency parser trained on each treebank to correct the detected errors. The performance of both the detection and the correction of errors on both annotation levels was manually evaluated on a randomly selected samples of suspected errors from several treebanks. Powered by TCPDF (www.tcpdf.org)	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
dc.identifier.lisID	990020047330106986