Analýza starých manuskriptů

Piptová, Marcela

Medieval manusripts' analysis

dc.contributor.advisor	Šikudová, Elena
dc.creator	Piptová, Marcela
dc.date.accessioned	2021-07-23T10:05:57Z
dc.date.available	2021-07-23T10:05:57Z
dc.date.issued	2021
dc.identifier.uri	http://hdl.handle.net/20.500.11956/127964
dc.description.abstract	Tato práce se věnuje analýze historických manuskriptů s využitím statistických metod. Konkrétně se jedná o binarizaci dokumentu, tj. oddělení popředí od pozadí, dále detekci řádek textu a nakonec rozdělování těchto řádek na jednotlivá slova. Oproti tištěným dokumentům je tento proces ovšem značně komplikován obecně horší kvalitou rukopisů, nepravidelnou strukturou dokumentu, ozdobnými prvky přímo v textu apod. V práci uvádíme možné přístupy k řešení těchto problémů a detailně popisujeme algoritmus, který byl navržen a zvolen k implementaci. Důraz je kladen zejména na to, aby byly co nejlépe nalezeny a odstraněny netextové oblasti (iluminace apod.) v dokumentu. Součástí práce jsou i experimenty a vyhodnocení úspěšnosti zvolené metody. 1	cs_CZ
dc.description.abstract	This thesis deals with an analysis of medieval manuscripts using statistical methods. Firstly, the document is binarized, i.e. the foreground regions are classified. Then the detection of text lines is performed. Finally, detected text lines are split into separate words. This process is more complicated for historical manuscripts compared to printed documents due to their age, irregular page layout and non-textual parts (images) within the text. In this text, various approaches to these problems are discussed. Particular attention is paid to the algorithm that was designed and implemented to perform the detection and deletion of non-textual parts of the document. Experimental results are included and evaluated. 1	en_US
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	binarizace\|segmentace\|detekce řádků	cs_CZ
dc.subject	binarization\|segmentation\|line detection	en_US
dc.title	Analýza starých manuskriptů	cs_CZ
dc.type	bakalářská práce	cs_CZ
dcterms.created	2021
dcterms.dateAccepted	2021-07-02
dc.description.department	Department of Software and Computer Science Education	en_US
dc.description.department	Katedra softwaru a výuky informatiky	cs_CZ
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	235695
dc.title.translated	Medieval manusripts' analysis	en_US
dc.contributor.referee	Bída, Michal
thesis.degree.name	Bc.
thesis.degree.level	bakalářské	cs_CZ
thesis.degree.discipline	General Computer Science	en_US
thesis.degree.discipline	Obecná informatika	cs_CZ
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	bakalářská práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwaru a výuky informatiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software and Computer Science Education	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Obecná informatika	cs_CZ
uk.degree-discipline.en	General Computer Science	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Tato práce se věnuje analýze historických manuskriptů s využitím statistických metod. Konkrétně se jedná o binarizaci dokumentu, tj. oddělení popředí od pozadí, dále detekci řádek textu a nakonec rozdělování těchto řádek na jednotlivá slova. Oproti tištěným dokumentům je tento proces ovšem značně komplikován obecně horší kvalitou rukopisů, nepravidelnou strukturou dokumentu, ozdobnými prvky přímo v textu apod. V práci uvádíme možné přístupy k řešení těchto problémů a detailně popisujeme algoritmus, který byl navržen a zvolen k implementaci. Důraz je kladen zejména na to, aby byly co nejlépe nalezeny a odstraněny netextové oblasti (iluminace apod.) v dokumentu. Součástí práce jsou i experimenty a vyhodnocení úspěšnosti zvolené metody. 1	cs_CZ
uk.abstract.en	This thesis deals with an analysis of medieval manuscripts using statistical methods. Firstly, the document is binarized, i.e. the foreground regions are classified. Then the detection of text lines is performed. Finally, detected text lines are split into separate words. This process is more complicated for historical manuscripts compared to printed documents due to their age, irregular page layout and non-textual parts (images) within the text. In this text, various approaches to these problems are discussed. Particular attention is paid to the algorithm that was designed and implemented to perform the detection and deletion of non-textual parts of the document. Experimental results are included and evaluated. 1	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwaru a výuky informatiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O