Automatic Generation of Synthetic XML Documents

Betík, Roman

Automatické generování umělých XML dokumentů

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (146.5Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/78270

Identifikátory

SIS: 167483

Katalog UK: 990020261980106986

Oponent práce

Svoboda, Martin

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Softwarové systémy

Katedra / ústav / klinika

Katedra softwarového inženýrství

Datum obhajoby

9. 9. 2015

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Velmi dobře

Klíčová slova (česky)

XML, JSON, Big Data, generátor, testování, benchmark, umělá data, NoSQL

Klíčová slova (anglicky)

XML, JSON, Big Data, generator, testing, benchmark, synthetic data, NoSQL

Cílem této práce je prozkoumat možnosti a omezení v generování umělých XML a JSON dokumentů používaných v oblasti Big Data. První část práce zkoumá vlastnosti nejpoužívanejších XML generátorů, Big Data a JSON generátorů a porovnává jejich vlastnosti. Další část práce popisuje návrh vlastního algoritmu na generování semistrukturovaných dat. Hlavní zaměření algoritmu je paralelní vykonávání procesu generování se zachovaním možností na kontrolu obsahu generovaných dokumentů. Generátor umožňuje využít vzorky skutečných dat v procesu generování dat umělých a je také schopen automaticky generovat jednoduché odkazy mezi výstupními dokumenty ve formátu JSON. Poslední část práce poskytuje výsledky experimentů s generátorem při testování databáze MongoDB, popisuje jeho přínos a porovnává ho s jinými řešeními. Powered by TCPDF (www.tcpdf.org)

Abstrakt (anglicky)

The aim of this thesis is to research the current possibilities and limitations of automatic generation of synthetic XML and JSON documents used in the area of Big Data. The first part of the work discusses the properties of the most used XML data generators, Big Data and JSON generators and compares them. The next part of the thesis proposes an algorithm for data generation of semistructured data. The main focus of the algorithm is on the parallel execution of the generation process while preserving the ability to control the contents of the generated documents. The data generator can also use samples of real data in the generation of the synthetic data and is also capable of automatic creation of simple references between JSON documents. The last part of the thesis provides the results of experiments with the data generator exploited for the purpose of testing database MongoDB, describes its added value and compares it to other solutions. Powered by TCPDF (www.tcpdf.org)

Citace dokumentu

Metadata

Zobrazit celý záznam