Kvantitativní charakteristiky termínů

Kováříková, Dominika

Quantitative Characteristics of Terms

dc.contributor.advisor	Čermák, František
dc.creator	Kováříková, Dominika
dc.date.accessioned	2018-10-29T21:26:01Z
dc.date.available	2018-10-29T21:26:01Z
dc.date.issued	2014
dc.identifier.uri	http://hdl.handle.net/20.500.11956/66501
dc.description.abstract	Metoda automatického vyhledávání termínů TERMIT je zaměřena nejen na samotnou úspěšnost, tedy co nejvyšší počet správně vyhledaných termínů, ale v první řadě na vlast- nosti, které při identifikaci jednoslovných a víceslovných termínů hrají nejdůležitější roli. Je založena na data miningu, tedy na vytěžování informací z velkých objemů (korpusových) dat. Metoda TERMIT se při rozpoznávání termínů v reálných textech i při hledání pod- statných kvantitativních rysů termínů osvědčila. Na jejím základě je možné jednoslovný termín charakterizovat jako slovo, které se v odborných textech daného oboru vyskytuje výrazně častěji než v textech neakademických, vyskytuje se jen v malém počtu akade- mických disciplín, v celém korpusu (SYN2010) je nerovnoměrně rozložené a málo frekven- tované a rozestupy mezi jeho jednotlivými výskyty jsou nepravidelné. Víceslovný termín je podle výsledků metody TERMIT ustálená kolokace složená z méně frekventovaných slov, která obvykle obsahuje alespoň jedno slovo s vysokou terminologickou platností, tedy jed- noslovný termín. S pomocí těchto charakteristik termínů lze více než 95 % textu zařadit správně mezi jednoslovné i víceslovné termíny a netermíny. Na...	cs_CZ
dc.description.abstract	The new method of automatic term recognition TERMIT is focused not only on the high number of correctly labeled terms, but also on the most important attributes of a term (in terms of their role in automatic term identification process). The method is based on data mining, i.e. finding meaningful information in very large corpus data. It was able to both successfuly identify terms in academic texts and find constitutive features of a term as a terminological unit. The single-word term (SWT) can be characterized as a word with a low frequency in corpus (SYN2010) that occurs considerably more often in specialized texts of a given field than in non-academic texts, occurs in a small number of academic disciplines, its distribution in the corpus (SYN2010) is uneven as is the distance between its two instances. The multi-word term (MWT) is a stable collocation consisting of words with low frequency and contains at least one (and often more) single-word term. Based on the characteristics of SWT and MWT, it is possible to classify individual tokens in texts as terms or non-terms with a success rate of more than 95 %. Automatically identified terms can be used to identify percentage of SWT or MWT in different academic disciplines, as well as find terms shared by two or more domains in order to assess their...	en_US
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Filozofická fakulta	cs_CZ
dc.subject	single-word term	en_US
dc.subject	multi-word term	en_US
dc.subject	characteristics of terms	en_US
dc.subject	automatic term recognition	en_US
dc.subject	data mining	en_US
dc.subject	jednoslovný termín	cs_CZ
dc.subject	víceslovný termín	cs_CZ
dc.subject	charakteristiky termínů	cs_CZ
dc.subject	automatické vyhledávání termínů	cs_CZ
dc.subject	data mining	cs_CZ
dc.title	Kvantitativní charakteristiky termínů	cs_CZ
dc.type	dizertační práce	cs_CZ
dcterms.created	2014
dcterms.dateAccepted	2014-12-17
dc.description.department	Institute of Czech Language and Theory of Communication	en_US
dc.description.department	Ústav českého jazyka a teorie komunikace	cs_CZ
dc.description.faculty	Filozofická fakulta	cs_CZ
dc.description.faculty	Faculty of Arts	en_US
dc.identifier.repId	102157
dc.title.translated	Quantitative Characteristics of Terms	en_US
dc.contributor.referee	Bozděchová, Ivana
dc.contributor.referee	Machová, Svatava
dc.identifier.aleph	001879387
thesis.degree.name	Ph.D.
thesis.degree.level	doktorské	cs_CZ
thesis.degree.discipline	Czech Language	en_US
thesis.degree.discipline	Český jazyk	cs_CZ
thesis.degree.program	Philology	en_US
thesis.degree.program	Filologie	cs_CZ
uk.thesis.type	dizertační práce	cs_CZ
uk.taxonomy.organization-cs	Filozofická fakulta::Ústav českého jazyka a teorie komunikace	cs_CZ
uk.taxonomy.organization-en	Faculty of Arts::Institute of Czech Language and Theory of Communication	en_US
uk.faculty-name.cs	Filozofická fakulta	cs_CZ
uk.faculty-name.en	Faculty of Arts	en_US
uk.faculty-abbr.cs	FF	cs_CZ
uk.degree-discipline.cs	Český jazyk	cs_CZ
uk.degree-discipline.en	Czech Language	en_US
uk.degree-program.cs	Filologie	cs_CZ
uk.degree-program.en	Philology	en_US
thesis.grade.cs	Prospěl/a	cs_CZ
thesis.grade.en	Pass	en_US
uk.abstract.cs	Metoda automatického vyhledávání termínů TERMIT je zaměřena nejen na samotnou úspěšnost, tedy co nejvyšší počet správně vyhledaných termínů, ale v první řadě na vlast- nosti, které při identifikaci jednoslovných a víceslovných termínů hrají nejdůležitější roli. Je založena na data miningu, tedy na vytěžování informací z velkých objemů (korpusových) dat. Metoda TERMIT se při rozpoznávání termínů v reálných textech i při hledání pod- statných kvantitativních rysů termínů osvědčila. Na jejím základě je možné jednoslovný termín charakterizovat jako slovo, které se v odborných textech daného oboru vyskytuje výrazně častěji než v textech neakademických, vyskytuje se jen v malém počtu akade- mických disciplín, v celém korpusu (SYN2010) je nerovnoměrně rozložené a málo frekven- tované a rozestupy mezi jeho jednotlivými výskyty jsou nepravidelné. Víceslovný termín je podle výsledků metody TERMIT ustálená kolokace složená z méně frekventovaných slov, která obvykle obsahuje alespoň jedno slovo s vysokou terminologickou platností, tedy jed- noslovný termín. S pomocí těchto charakteristik termínů lze více než 95 % textu zařadit správně mezi jednoslovné i víceslovné termíny a netermíny. Na...	cs_CZ
uk.abstract.en	The new method of automatic term recognition TERMIT is focused not only on the high number of correctly labeled terms, but also on the most important attributes of a term (in terms of their role in automatic term identification process). The method is based on data mining, i.e. finding meaningful information in very large corpus data. It was able to both successfuly identify terms in academic texts and find constitutive features of a term as a terminological unit. The single-word term (SWT) can be characterized as a word with a low frequency in corpus (SYN2010) that occurs considerably more often in specialized texts of a given field than in non-academic texts, occurs in a small number of academic disciplines, its distribution in the corpus (SYN2010) is uneven as is the distance between its two instances. The multi-word term (MWT) is a stable collocation consisting of words with low frequency and contains at least one (and often more) single-word term. Based on the characteristics of SWT and MWT, it is possible to classify individual tokens in texts as terms or non-terms with a success rate of more than 95 %. Automatically identified terms can be used to identify percentage of SWT or MWT in different academic disciplines, as well as find terms shared by two or more domains in order to assess their...	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Filozofická fakulta, Ústav českého jazyka a teorie komunikace	cs_CZ
thesis.grade.code	P
dc.identifier.lisID	990018793870106986