Show simple item record

Corpora as Data Sources for the Up-Grading of Morphological Tagging
dc.contributor.authorOsolsobě, Klára
dc.contributor.authorČermák, Petr
dc.date.accessioned2018-05-28T11:04:06Z
dc.date.available2018-05-28T11:04:06Z
dc.date.issued2015
dc.identifier.urihttp://hdl.handle.net/20.500.11956/96413
dc.description.abstractAdjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.*). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant sourceof knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar(derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbaladjectives with -oucí/-ící. The link between the inflected and the word-formation variants will bedemonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).en
dc.formatpdf
dc.language.isocs
dc.publisherUniverzita Karlova, Filozofická fakulta
dc.sourceČasopis pro moderní filologii (Journal for Modern Philology), 2015, 97, 2, 136-145
dc.titleKorpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)|í]cí z hlediska morfologického značkování)cs
dc.typeVědecký článekcs
dcterms.accessRightsopenAccess
dcterms.licensehttps://creativecommons.org/licenses/by-nc-nd/2.0/
dc.title.translatedCorpora as Data Sources for the Up-Grading of Morphological Taggingen
dc.publisher.publicationPlacePraha
uk.internal-typeuk_publication
dc.description.startPage136
dc.description.endPage145
dcterms.isPartOf.nameČasopis pro moderní filologii (Journal for Modern Philology)cs
dcterms.isPartOf.journalYear2015
dcterms.isPartOf.journalVolume2015
dcterms.isPartOf.journalIssue2
dcterms.isPartOf.issn2336-6591
dc.relation.isPartOfUrlhttps://casopispromodernifilologii.ff.cuni.cz
dc.subject.keywordverbální adjektivumcs
dc.subject.keywordmorfologické značkovánícs
dc.subject.keywordautomatická morfologická analýzacs
dc.subject.keywordvariantacs
dc.subject.keywordslovotvorbacs
dc.subject.keywordgerund/deverbal adjectiveen
dc.subject.keywordpos taggingen
dc.subject.keywordautomatic morphological analysisen
dc.subject.keywordvarianten
dc.subject.keywordderivationalen
dc.subject.keywordmorphologyen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


© 2017 Univerzita Karlova, Ústřední knihovna, Ovocný trh 560/5, 116 36 Praha 1; email: admin-repozitar [at] cuni.cz

Za dodržení všech ustanovení autorského zákona jsou zodpovědné jednotlivé složky Univerzity Karlovy. / Each constituent part of Charles University is responsible for adherence to all provisions of the copyright law.

Upozornění / Notice: Získané informace nemohou být použity k výdělečným účelům nebo vydávány za studijní, vědeckou nebo jinou tvůrčí činnost jiné osoby než autora. / Any retrieved information shall not be used for any commercial purposes or claimed as results of studying, scientific or any other creative activities of any person other than the author.

DSpace software copyright © 2002-2015  DuraSpace
Theme by 
@mire NV