Adaptabilní heuristika pro vyhodnocování podobnosti zdrojových textů

Vodsloň, František

Adaptable heuristic for source code similarity measuring.

dc.contributor.advisor	Holan, Tomáš
dc.creator	Vodsloň, František
dc.date.accessioned	2017-04-21T06:12:00Z
dc.date.available	2017-04-21T06:12:00Z
dc.date.issued	2010
dc.identifier.uri	http://hdl.handle.net/20.500.11956/30601
dc.description.abstract	Princip fungování většiny existujících systémů pro vyhledávání plagiátů v zadané množině zdrojových textů spočívá v postupném porovnávání každého textu s ostatními texty v množině. Ve většině případů vyjde spočítaná míra podobnosti natolik malá, že se dále není třeba danou dvojicí souborů zabývat (můžeme s jistotou na základě dosažené míry podobnosti prohlásit, že se nejedná o plagiáty). Cílem této práce je navrhnout algoritmus pro předvýběr dvojic souborů určených k porovnání. Heuristický algoritmus by měl efektivně odhadovat výsledky složitějšího porovnávacího programu a na základě tohoto odhadu rozhodovat, zda připustit dvojici zdrojových textů k porovnání. Algoritmus by měl být adaptabilní v tom smyslu, že by měnil svoje odhady v závislosti na spektru zdrojových textů obsažených v systému.	cs_CZ
dc.description.abstract	Most of systems for plagiarism detection within a set of source codes is based on sequential comparing of each source code with all other source codes in the set. Computed similarity is in most cases so low, that we can deduce compared codes are not plagiarized. Purpose of this work is to create a heuristic algorithm for pre-selection of source code pairs for comparing. Heuristic algortihm should effectively aproximate results of the main comparing program, which is more complicated and slower. The plagiarism detection system will then decide, based on the result of heuristic algorithm, whether the source code pair will be compared using main comparing program or not. Algorithm should be self-adapting - it should be able to improve itself depending on the set of source codes saved in system.	en_US
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.title	Adaptabilní heuristika pro vyhodnocování podobnosti zdrojových textů	cs_CZ
dc.type	diplomová práce	cs_CZ
dcterms.created	2010
dcterms.dateAccepted	2010-02-02
dc.description.department	Department of Software and Computer Science Education	en_US
dc.description.department	Katedra softwaru a výuky informatiky	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	49853
dc.title.translated	Adaptable heuristic for source code similarity measuring.	en_US
dc.contributor.referee	Kopecký, Michal
dc.identifier.aleph	001196804
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Softwarové systémy	cs_CZ
thesis.degree.discipline	Software Systems	en_US
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwaru a výuky informatiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software and Computer Science Education	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Softwarové systémy	cs_CZ
uk.degree-discipline.en	Software Systems	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Princip fungování většiny existujících systémů pro vyhledávání plagiátů v zadané množině zdrojových textů spočívá v postupném porovnávání každého textu s ostatními texty v množině. Ve většině případů vyjde spočítaná míra podobnosti natolik malá, že se dále není třeba danou dvojicí souborů zabývat (můžeme s jistotou na základě dosažené míry podobnosti prohlásit, že se nejedná o plagiáty). Cílem této práce je navrhnout algoritmus pro předvýběr dvojic souborů určených k porovnání. Heuristický algoritmus by měl efektivně odhadovat výsledky složitějšího porovnávacího programu a na základě tohoto odhadu rozhodovat, zda připustit dvojici zdrojových textů k porovnání. Algoritmus by měl být adaptabilní v tom smyslu, že by měnil svoje odhady v závislosti na spektru zdrojových textů obsažených v systému.	cs_CZ
uk.abstract.en	Most of systems for plagiarism detection within a set of source codes is based on sequential comparing of each source code with all other source codes in the set. Computed similarity is in most cases so low, that we can deduce compared codes are not plagiarized. Purpose of this work is to create a heuristic algorithm for pre-selection of source code pairs for comparing. Heuristic algortihm should effectively aproximate results of the main comparing program, which is more complicated and slower. The plagiarism detection system will then decide, based on the result of heuristic algorithm, whether the source code pair will be compared using main comparing program or not. Algorithm should be self-adapting - it should be able to improve itself depending on the set of source codes saved in system.	en_US
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwaru a výuky informatiky	cs_CZ
dc.identifier.lisID	990011968040106986