Machine learning-based identification of separating features in molecular fragments
Identifikace separujících vlastností molekulárních fragmentů pomocí strojového učení
bakalářská práce (OBHÁJENO)
Zobrazit/ otevřít
Trvalý odkaz
http://hdl.handle.net/20.500.11956/2093Identifikátory
SIS: 172518
Kolekce
- Kvalifikační práce [10676]
Autor
Vedoucí práce
Oponent práce
Škoda, Petr
Fakulta / součást
Matematicko-fyzikální fakulta
Obor
Obecná informatika
Katedra / ústav / klinika
Katedra softwarového inženýrství
Datum obhajoby
31. 1. 2017
Nakladatel
Univerzita Karlova, Matematicko-fyzikální fakultaJazyk
Angličtina
Známka
Výborně
Klíčová slova (česky)
cheminformatika, strojové učení, molekulární reprezentaceKlíčová slova (anglicky)
cheminformatics, machine learning, molecular representationChosen molecular representation is one of the key parameters of virtual screening campaigns where one is searching in-silico for active molecules with respect to given macromolecular target. Most campaigns employ a molecular representation in which a molecule is represented by the presence or absence of a predefined set of topological fragments. Often, this information is enriched by physiochemical features of these fragments: i.e. the representation distinguishes fragments with identical topology, but different features. Given molecular representation, however, most approaches always use the same static set of features irrespective of the specific target. The goal of this thesis is, given a set of known active and inactive molecules with respect to a target, to study the possibilities of parameterization of a fragment-based molecular representation with feature weights dependent on the given target. In this setting, we are given a very general molecular representation, with targets represented by sets of known active and inactive molecules. We subsequently propose a machine-learning approach that would identify which of the features are relevant for the given target. This will be done using a multi-stage pipeline that includes data preprocessing using statistical imputation and dimensionality...