dc.contributor.author | Synková, Pavlína | |
dc.contributor.author | Lehečka, Boris | |
dc.contributor.author | Svoboda, Ondřej | |
dc.date.accessioned | 2018-11-28T15:02:26Z | |
dc.date.available | 2018-11-28T15:02:26Z | |
dc.date.issued | 2018 | |
dc.identifier.issn | 2336-6702 | |
dc.identifier.uri | http://hdl.handle.net/20.500.11956/103953 | |
dc.publisher | Univerzita Karlova, Filozofická fakulta | cs_CZ |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/2.0/ | |
dc.source | Studie z aplikované lingvistiky - Studies in Applied Linguistics, 2018, 9, Special Issue, 66-84 | cs_CZ |
dc.source.uri | https://studiezaplikovanelingvistiky.ff.cuni.cz | |
dc.subject | common nouns | cs_CZ |
dc.subject | NLP software and applications | cs_CZ |
dc.subject | Old Czech | cs_CZ |
dc.subject | tagging | cs_CZ |
dc.subject | XML | cs_CZ |
dc.subject | apelativa | cs_CZ |
dc.subject | lemmatizace | cs_CZ |
dc.subject | NLP software a aplikace | cs_CZ |
dc.subject | stará čeština | cs_CZ |
dc.subject | tagování | cs_CZ |
dc.subject | XML | cs_CZ |
dc.title | Na cestě k lemmatizaci staročeských textů: data, software, aplikace | cs_CZ |
dc.title.alternative | Towards the lemmatization of Old Czech texts: data, software, applications | cs_CZ |
dc.type | Vědecký článek | cs_CZ |
uk.abstract.en | This paper introduces the description of Old Czech common nouns developed and used in a tool for tagging and lemmatizing common nouns occurring in transcribed digital editions of Old Czech texts. This description consists of four parts: the first features an overview of all declension type endings (approx. 100 declension patterns), the second part analyses alternations in the morphological basis accompanying declension (approx. 120 types of alternations), the third part deals with formal changes connected mainly with the language’s historical development (approx. 100 formal changes) and, finally,
the fourth part contains a list of lemmas extracted from modern dictionaries of Old Czech (approx.
29 000 lemmas). Furthermore, the paper introduces the software developed and used for this purpose,
namely i) the tool which makes it possible a) to generate word forms and subsequently search
for multiple word forms in the texts at once, b) to create lists of word forms filtered by sequences of
characters occurring at the end of the word forms, ii) the tool for assigning a declension pattern to
a lemma, and iii) the tool enabling work with large databases. Finally, the paper describes two applications developed on the basis of Old Czech common noun description, i.e. i) a database of Old
Czech common noun declension patterns connected with Old Czech dictionaries and the Old Czech
text bank, ii) a tool for generating word forms, which is used for the lemmatization and tagging of
Old Czech texts. | cs_CZ |
uk.internal-type | uk_publication | |
dc.description.startPage | 66 | |
dc.description.endPage | 84 | |
dcterms.isPartOf.name | Studie z aplikované lingvistiky - Studies in Applied Linguistics | cs_CZ |
dcterms.isPartOf.journalYear | 2018 | |
dcterms.isPartOf.journalVolume | 9 | |
dcterms.isPartOf.journalIssue | Special Issue | |