Show simple item record

Towards the lemmatization of Old Czech texts: data, software, applications
dc.contributor.authorSynková, Pavlína
dc.contributor.authorLehečka, Boris
dc.contributor.authorSvoboda, Ondřej
dc.date.accessioned2018-11-28T15:02:26Z
dc.date.available2018-11-28T15:02:26Z
dc.date.issued2018
dc.identifier.issn2336-6702
dc.identifier.urihttp://hdl.handle.net/20.500.11956/103953
dc.publisherUniverzita Karlova, Filozofická fakultacs_CZ
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.0/
dc.sourceStudie z aplikované lingvistiky - Studies in Applied Linguistics, 2018, 9, Special Issue, 66-84cs_CZ
dc.source.urihttps://studiezaplikovanelingvistiky.ff.cuni.cz
dc.subjectcommon nounscs_CZ
dc.subjectNLP software and applicationscs_CZ
dc.subjectOld Czechcs_CZ
dc.subjecttaggingcs_CZ
dc.subjectXMLcs_CZ
dc.subjectapelativacs_CZ
dc.subjectlemmatizacecs_CZ
dc.subjectNLP software a aplikacecs_CZ
dc.subjectstará češtinacs_CZ
dc.subjecttagovánícs_CZ
dc.subjectXMLcs_CZ
dc.titleNa cestě k lemmatizaci staročeských textů: data, software, aplikacecs_CZ
dc.title.alternativeTowards the lemmatization of Old Czech texts: data, software, applicationscs_CZ
dc.typeVědecký článekcs_CZ
uk.abstract.enThis paper introduces the description of Old Czech common nouns developed and used in a tool for tagging and lemmatizing common nouns occurring in transcribed digital editions of Old Czech texts. This description consists of four parts: the first features an overview of all declension type endings (approx. 100 declension patterns), the second part analyses alternations in the morphological basis accompanying declension (approx. 120 types of alternations), the third part deals with formal changes connected mainly with the language’s historical development (approx. 100 formal changes) and, finally, the fourth part contains a list of lemmas extracted from modern dictionaries of Old Czech (approx. 29 000 lemmas). Furthermore, the paper introduces the software developed and used for this purpose, namely i) the tool which makes it possible a) to generate word forms and subsequently search for multiple word forms in the texts at once, b) to create lists of word forms filtered by sequences of characters occurring at the end of the word forms, ii) the tool for assigning a declension pattern to a lemma, and iii) the tool enabling work with large databases. Finally, the paper describes two applications developed on the basis of Old Czech common noun description, i.e. i) a database of Old Czech common noun declension patterns connected with Old Czech dictionaries and the Old Czech text bank, ii) a tool for generating word forms, which is used for the lemmatization and tagging of Old Czech texts.cs_CZ
uk.internal-typeuk_publication
dc.description.startPage66
dc.description.endPage84
dcterms.isPartOf.nameStudie z aplikované lingvistiky - Studies in Applied Linguisticscs_CZ
dcterms.isPartOf.journalYear2018
dcterms.isPartOf.journalVolume9
dcterms.isPartOf.journalIssueSpecial Issue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

http://creativecommons.org/licenses/by-nc-nd/2.0/
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by-nc-nd/2.0/

© 2017 Univerzita Karlova, Ústřední knihovna, Ovocný trh 560/5, 116 36 Praha 1; email: admin-repozitar [at] cuni.cz

Za dodržení všech ustanovení autorského zákona jsou zodpovědné jednotlivé složky Univerzity Karlovy. / Each constituent part of Charles University is responsible for adherence to all provisions of the copyright law.

Upozornění / Notice: Získané informace nemohou být použity k výdělečným účelům nebo vydávány za studijní, vědeckou nebo jinou tvůrčí činnost jiné osoby než autora. / Any retrieved information shall not be used for any commercial purposes or claimed as results of studying, scientific or any other creative activities of any person other than the author.

DSpace software copyright © 2002-2015  DuraSpace
Theme by 
@mire NV