Aplikace teorie informace na studium učení hlubokých neuronových sítí
Aplications of information theory to the study of deep learning
bachelor thesis (DEFENDED)
View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/202629Identifiers
Study Information System: 282204
Collections
- Kvalifikační práce [11981]
Author
Advisor
Referee
Schmid, Martin
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
Mathematics for Information Technologies
Department
Department of Applied Mathematics
Date of defense
5. 9. 2025
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
Czech
Grade
Excellent
Keywords (Czech)
Hluboké Učení|Strojové Učení|Teorie Informačního Hrdla|Neuronové SítěKeywords (English)
Deep Learning|information bottleneck theory|Machine Learning|Neural NetworksPr ̌es pozoruhodny ́ empiricky ́ u 'spe ̌ch hluboke ́ho uc ̌enı ́(deep learning) jeho teoreticke ́ za ́klady zaosta ́vajı ́. Teorie informace poskytuje u ́c ̌inny ́ ra ́mec pro analy ́zu vnitr ̌nı ́ch reprezentacı ́ v sı ́tı ́ch, zejme ́na dı ́ky neda ́vne ́mu pokroku v teorii informac ̌nı ́ho hrdla (information bottleneck, IB) a konceptu infor- mac ̌nı ́roviny. Tato pra ́ce zkouma ́, jak struktura informac ̌nı ́roviny, konkre ́tne ̌ shlukova ́nı ́vnitr ̌nı ́ch reprezentacı ́, ovlivn ̌uje vy ́kon neuronovy ́ch sı ́tı ́. Pr ̌edstavujeme Purity teorii, novy ́ ra ́mec pro kvantifikaci shlukova ́nı ́reprezentacı ́po vrstva ́ch, ktery ́ dopln ̌uje sta ́vajı ́cı ́ IB perspektivy. Nas ̌e analy ́za odhaluje vy ́znamnou korelaci mezi strukturou informac ̌nı ́roviny a schopnostı ́generalizace v u ́loha ́ch bina ́rnı ́ klasifikace. Na za ́klade ̌ te ́to korelace navrhujeme novou metriku zaloz ̌enou na teorii informace, ktera ́ u ́c ̌inne ̌ pr ̌edpovı ́da ́ schopnost modelu generalizovat. Da ́le vyvı ́jı ́me algoritmus pro vy ́be ̌r modelu vyuz ̌ı ́vajı ́cı ́ tuto metriku, ktery ́ prokazatelne ̌ pr ̌ekona ́va ́ vy ́be ̌r zaloz ̌eny ́ vy ́hradne ̌ na tre ́novacı ́ ztra ́te ̌ (train loss).
Despite deep learning's remarkable empirical success, its theoretical un- derpinnings lag behind. Information theory provides a powerful framework for analyzing internal network representations, particularly through recent advances in information bottleneck (IB) theory and the information plane. This thesis investigates how the structure of the information plane, specifi- cally the clustering behavior of internal representations, influences neural net- work performance. We introduce Purity theory, a novel framework for quan- tifying layer-wise clustering, complementing established IB perspectives. Our analysis reveals a significant correlation between information plane structure and generalization performance in binary classification tasks. Leveraging this correlation, we propose a new information-theoretic metric that effectively predicts model generalization capability. Furthermore, we develop a model selection algorithm based on this metric, which demonstrably outperforms selection based solely on training loss.
