dc.contributor.advisor | Hajič, Jan | |
dc.creator | Pecina, Pavel | |
dc.date.accessioned | 2021-05-19T17:46:24Z | |
dc.date.available | 2021-05-19T17:46:24Z | |
dc.date.issued | 2010 | |
dc.identifier.uri | http://hdl.handle.net/20.500.11956/24519 | |
dc.description.abstract | Lexical Association Measures: Collocation Extraction Pavel Pecina Abstract of Doctoral Thesis This thesis is devoted to an empirical study of lexical association measures and their application for collocation extraction. We focus on two-word (bigram) collocations only. We compiled a comprehensive inventory of 82 lexical association measures and present their empirical evaluation on four reference data sets: dependency bigrams from the manually annotated Prague Dependency Trcebank, surface bigrams from the same source, instances of the previous from the Czech National Corpus provided with automatically assigned lemmas and part-of-speech tags, and distance verb-noun bigrams from the automatically part-of-spcech tagged Swedish Parole Corpus. Collocation candidates in the reference data sets were manually annotated and identified as collocations and non-collocations. The evaluation scheme is based on measuring the quality of ranking collocation candidates according to their chance to form collocations. The methods are compared by precision-recall curves and mean average precision scores adopted from the field of information retrieval. Tests of statistical significance were also performed. Further, we study the possibility of combining lexical association measures and present empirical results of several... | en_US |
dc.description.abstract | Lexical Association Measures:Collocation Extraction Pavel Pecina Abstract of Doctoral Thesis This thesis is devoted to an empirical study of lexical association measures and their application for collocation extraction. We focus on two-word (bigram) collocations only. We compiled a comprehensive inventory of 82 lexical association measures and present their empirical evaluation on four reference data sets: dependency bigrams from the manually annotated Prague Dependency Treebank, surface bigrams from the same source, instances of the previous from the Czech National Corpus provided with automatically assigned lemmas and part~of-speech tags, and distance verb-noun bigrams from the automatically part-of-speech tagged Swedish Parole Corpus. Collocation candidates in the reference data sets were manually annotated and identified as collocations and non-collocations. The evaluation scheme is based on measuring the quality of ranking collocation candidates according to their chance to form collocations. The methods are compared by precision-recall curves and mean average precision scores adopted from the field of information retrieval. Tests of statistical significance were also performed. Further, we study the possibility of combining lexical association measures and present empirical results of several... | cs_CZ |
dc.language | English | cs_CZ |
dc.language.iso | en_US | |
dc.publisher | Univerzita Karlova, Matematicko-fyzikální fakulta | cs_CZ |
dc.title | Lexical Association Measures Collocation Extraction | en_US |
dc.type | rigorózní práce | cs_CZ |
dcterms.created | 2010 | |
dcterms.dateAccepted | 2010-01-21 | |
dc.description.department | Institute of Formal and Applied Linguistics | en_US |
dc.description.department | Ústav formální a aplikované lingvistiky | cs_CZ |
dc.description.faculty | Faculty of Mathematics and Physics | en_US |
dc.description.faculty | Matematicko-fyzikální fakulta | cs_CZ |
dc.identifier.repId | 81946 | |
dc.title.translated | Lexical Association Measures Collocation Extraction | cs_CZ |
dc.identifier.aleph | 001117146 | |
thesis.degree.name | RNDr. | |
thesis.degree.level | rigorózní řízení | cs_CZ |
thesis.degree.discipline | Data Engineering | en_US |
thesis.degree.discipline | Datové inženýrství | cs_CZ |
thesis.degree.program | Informatics | en_US |
thesis.degree.program | Informatika | cs_CZ |
uk.thesis.type | rigorózní práce | cs_CZ |
uk.taxonomy.organization-cs | Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky | cs_CZ |
uk.taxonomy.organization-en | Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics | en_US |
uk.faculty-name.cs | Matematicko-fyzikální fakulta | cs_CZ |
uk.faculty-name.en | Faculty of Mathematics and Physics | en_US |
uk.faculty-abbr.cs | MFF | cs_CZ |
uk.degree-discipline.cs | Datové inženýrství | cs_CZ |
uk.degree-discipline.en | Data Engineering | en_US |
uk.degree-program.cs | Informatika | cs_CZ |
uk.degree-program.en | Informatics | en_US |
thesis.grade.cs | Uznáno | cs_CZ |
thesis.grade.en | Recognized | en_US |
uk.abstract.cs | Lexical Association Measures:Collocation Extraction Pavel Pecina Abstract of Doctoral Thesis This thesis is devoted to an empirical study of lexical association measures and their application for collocation extraction. We focus on two-word (bigram) collocations only. We compiled a comprehensive inventory of 82 lexical association measures and present their empirical evaluation on four reference data sets: dependency bigrams from the manually annotated Prague Dependency Treebank, surface bigrams from the same source, instances of the previous from the Czech National Corpus provided with automatically assigned lemmas and part~of-speech tags, and distance verb-noun bigrams from the automatically part-of-speech tagged Swedish Parole Corpus. Collocation candidates in the reference data sets were manually annotated and identified as collocations and non-collocations. The evaluation scheme is based on measuring the quality of ranking collocation candidates according to their chance to form collocations. The methods are compared by precision-recall curves and mean average precision scores adopted from the field of information retrieval. Tests of statistical significance were also performed. Further, we study the possibility of combining lexical association measures and present empirical results of several... | cs_CZ |
uk.abstract.en | Lexical Association Measures: Collocation Extraction Pavel Pecina Abstract of Doctoral Thesis This thesis is devoted to an empirical study of lexical association measures and their application for collocation extraction. We focus on two-word (bigram) collocations only. We compiled a comprehensive inventory of 82 lexical association measures and present their empirical evaluation on four reference data sets: dependency bigrams from the manually annotated Prague Dependency Trcebank, surface bigrams from the same source, instances of the previous from the Czech National Corpus provided with automatically assigned lemmas and part-of-speech tags, and distance verb-noun bigrams from the automatically part-of-spcech tagged Swedish Parole Corpus. Collocation candidates in the reference data sets were manually annotated and identified as collocations and non-collocations. The evaluation scheme is based on measuring the quality of ranking collocation candidates according to their chance to form collocations. The methods are compared by precision-recall curves and mean average precision scores adopted from the field of information retrieval. Tests of statistical significance were also performed. Further, we study the possibility of combining lexical association measures and present empirical results of several... | en_US |
uk.file-availability | V | |
uk.grantor | Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky | cs_CZ |
thesis.grade.code | U | |
uk.publication-place | Praha | cs_CZ |
uk.thesis.defenceStatus | U | |
dc.identifier.lisID | 990011171460106986 | |