Efficient representation of k-mer sets
Efektivní reprezentace množin k-merů
bachelor thesis (DEFENDED)
View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/184307Identifiers
Study Information System: 249202
Collections
- Kvalifikační práce [10928]
Author
Advisor
Consultant
Břinda, Karel
Referee
Kolman, Petr
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
General Computer Science
Department
Computer Science Institute of Charles University
Date of defense
7. 9. 2023
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
English
Grade
Good
Keywords (Czech)
množiny k-merů|nejkratší nadřetězec|bioinformatika|hladový algoritmusKeywords (English)
k-mer sets|shortest superstring|bioinformatics|greedy algorithmIn this thesis we explore and compare various methods for efficient k-mer set representation. We evaluate traditional de Bruijn graph representation techniques against greedy approximation algorithms for the Shortest Superstring Problem. We describe the linear- time implementation of the well-known Greedy algorithm by Ukkonen [1990] and extend it to another related algorithm, called TGreedy. In addition, we test selected algorithms on a bacterial genome and pangenome to highlight the differences in the size of their output representation and the computational resources used, providing an insight into their respective efficiencies.