Forma a funkce u substantiv v češtině: vztah pádu a syntaktické funkce. Na materiálu korpusu současné psané češtiny (SYN2005)

Jelínek, Tomáš

Form and function of nouns in Czech: relation between nominal case and syntactic function. Based on a synchronic written corpus of Czech (SYN2005)

dc.contributor.advisor	Petkevič, Vladimír
dc.creator	Jelínek, Tomáš
dc.date.accessioned	2020-11-26T17:02:26Z
dc.date.available	2020-11-26T17:02:26Z
dc.date.issued	2012
dc.identifier.uri	http://hdl.handle.net/20.500.11956/44160
dc.description.abstract	The case in Czech is the basic morphological means by which nouns express their function in a sentence. The objective of this thesis is to describe, from a frequency point of view, the relation between form and function of nouns, or, more precisely, how frequently cases (both simple and prepositional) are used to realise syntactic functions in sentences. The thesis is based on one of the largest corpora of written synchronic Czech: 100-million-token corpus SYN2005. In order to obtain data on frequencies of syntactic functions of nouns in relation to their cases, we annotated the corpus SYN2005 with a dependency syntactic annotation. For this annotation, we adopted the format of the analytical layer of the Prague Dependency Treebank. The syntactic annotation has been performed by a stochastic parser: the MST parser. Since the reliability of this annotation was not high enough, we have built an automatic correction module, which identifies errors of syntactic annotation in the output of the stochastic parser and corrects these errors by means of linguistic rules. We have implemented 26 different rules, but annotation errors have been reduced by merely 6-8%. However, this correction module can be further developed. It can be used to correct the output of any dependency parser trained on the data from...	en_US
dc.description.abstract	Pád je v češtině základním prostředkem morfologické roviny, jímž substantiva vyjadřují svou funkci ve větě. Cílem této práce je popsat z frekvenčního hlediska vztah mezi formou a funkcí substantiv, přesněji řečeno, jak často se prosté a předložkové pády substantiv používají k realizaci syntaktických funkcí ve větě. Práce je založena na rozsáhlém korpusu synchronní psané češtiny SYN2005. Abychom získali údaje o frekvencích syntaktických funkcí substantiv ve vztahu k jejich pádům, opatřili jsme korpus SYN2005 závislostním syntaktickým značkováním, jehož formát jsme převzali z analytické roviny Pražského závislostního korpusu. Syntaktickou anotaci jsme uskutečnili pomocí stochastického MST parseru. Spolehlivost syntaktické anotace však nebyla dostatečně vysoká, vytvořili jsme proto automatický opravný modul, který vyhledává chyby syntaktické anotace ve výstupu stochastického parseru a na základě lingvistických pravidel tyto chyby opravuje. Implementovali jsme 26 různých pravidel, počet chyb anotace se však podařilo snížit jen o 6-8 %. Opravný modul je však možné dále rozvíjet. Lze jím korigovat výstup kteréhokoli závislostního parseru natrénovaného na datech Pražského závislostního korpusu. Syntakticky anotovaný korpus SYN2005 jsme využili jako základ výzkumu frekvence syntaktických funkcí substantiv...	cs_CZ
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Filozofická fakulta	cs_CZ
dc.subject	surface syntax	en_US
dc.subject	Czech	en_US
dc.subject	frequency	en_US
dc.subject	nouns	en_US
dc.subject	syntactic functions	en_US
dc.subject	simple and prepositional case	en_US
dc.subject	corpus	en_US
dc.subject	dependency syntax	en_US
dc.subject	povrchová syntax	cs_CZ
dc.subject	čeština	cs_CZ
dc.subject	frekvence	cs_CZ
dc.subject	substantiva	cs_CZ
dc.subject	syntaktické funkce	cs_CZ
dc.subject	prostý a předložkový pád	cs_CZ
dc.subject	korpus	cs_CZ
dc.subject	závislostní syntax	cs_CZ
dc.title	Forma a funkce u substantiv v češtině: vztah pádu a syntaktické funkce. Na materiálu korpusu současné psané češtiny (SYN2005)	cs_CZ
dc.type	dizertační práce	cs_CZ
dcterms.created	2012
dcterms.dateAccepted	2012-06-25
dc.description.department	Institute of Theoretical and Computational Linguistics	en_US
dc.description.department	Ústav teoretické a komputační lingvistiky	cs_CZ
dc.description.faculty	Filozofická fakulta	cs_CZ
dc.description.faculty	Faculty of Arts	en_US
dc.identifier.repId	25748
dc.title.translated	Form and function of nouns in Czech: relation between nominal case and syntactic function. Based on a synchronic written corpus of Czech (SYN2005)	en_US
dc.contributor.referee	Lopatková, Markéta
dc.contributor.referee	Uličný, Oldřich
dc.identifier.aleph	001481511
thesis.degree.name	Ph.D.
thesis.degree.level	doktorské	cs_CZ
thesis.degree.discipline	Matematická lingvistika	cs_CZ
thesis.degree.discipline	Mathematical Linguistics	en_US
thesis.degree.program	Filologie	cs_CZ
thesis.degree.program	Philology	en_US
uk.thesis.type	dizertační práce	cs_CZ
uk.taxonomy.organization-cs	Filozofická fakulta::Ústav teoretické a komputační lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Arts::Institute of Theoretical and Computational Linguistics	en_US
uk.faculty-name.cs	Filozofická fakulta	cs_CZ
uk.faculty-name.en	Faculty of Arts	en_US
uk.faculty-abbr.cs	FF	cs_CZ
uk.degree-discipline.cs	Matematická lingvistika	cs_CZ
uk.degree-discipline.en	Mathematical Linguistics	en_US
uk.degree-program.cs	Filologie	cs_CZ
uk.degree-program.en	Philology	en_US
thesis.grade.cs	Prospěl/a	cs_CZ
thesis.grade.en	Pass	en_US
uk.abstract.cs	Pád je v češtině základním prostředkem morfologické roviny, jímž substantiva vyjadřují svou funkci ve větě. Cílem této práce je popsat z frekvenčního hlediska vztah mezi formou a funkcí substantiv, přesněji řečeno, jak často se prosté a předložkové pády substantiv používají k realizaci syntaktických funkcí ve větě. Práce je založena na rozsáhlém korpusu synchronní psané češtiny SYN2005. Abychom získali údaje o frekvencích syntaktických funkcí substantiv ve vztahu k jejich pádům, opatřili jsme korpus SYN2005 závislostním syntaktickým značkováním, jehož formát jsme převzali z analytické roviny Pražského závislostního korpusu. Syntaktickou anotaci jsme uskutečnili pomocí stochastického MST parseru. Spolehlivost syntaktické anotace však nebyla dostatečně vysoká, vytvořili jsme proto automatický opravný modul, který vyhledává chyby syntaktické anotace ve výstupu stochastického parseru a na základě lingvistických pravidel tyto chyby opravuje. Implementovali jsme 26 různých pravidel, počet chyb anotace se však podařilo snížit jen o 6-8 %. Opravný modul je však možné dále rozvíjet. Lze jím korigovat výstup kteréhokoli závislostního parseru natrénovaného na datech Pražského závislostního korpusu. Syntakticky anotovaný korpus SYN2005 jsme využili jako základ výzkumu frekvence syntaktických funkcí substantiv...	cs_CZ
uk.abstract.en	The case in Czech is the basic morphological means by which nouns express their function in a sentence. The objective of this thesis is to describe, from a frequency point of view, the relation between form and function of nouns, or, more precisely, how frequently cases (both simple and prepositional) are used to realise syntactic functions in sentences. The thesis is based on one of the largest corpora of written synchronic Czech: 100-million-token corpus SYN2005. In order to obtain data on frequencies of syntactic functions of nouns in relation to their cases, we annotated the corpus SYN2005 with a dependency syntactic annotation. For this annotation, we adopted the format of the analytical layer of the Prague Dependency Treebank. The syntactic annotation has been performed by a stochastic parser: the MST parser. Since the reliability of this annotation was not high enough, we have built an automatic correction module, which identifies errors of syntactic annotation in the output of the stochastic parser and corrects these errors by means of linguistic rules. We have implemented 26 different rules, but annotation errors have been reduced by merely 6-8%. However, this correction module can be further developed. It can be used to correct the output of any dependency parser trained on the data from...	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Filozofická fakulta, Ústav teoretické a komputační lingvistiky	cs_CZ
thesis.grade.code	P
uk.publication-place	Praha	cs_CZ
dc.identifier.lisID	990014815110106986