Smoothness of Functions Learned by Neural Networks
Hladkost funkcí naučených neuronovými sítěmi
bachelor thesis (DEFENDED)
View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/119446Identifiers
Study Information System: 224645
Collections
- Kvalifikační práce [11233]
Author
Advisor
Referee
Straka, Milan
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
General Computer Science
Department
Institute of Formal and Applied Linguistics
Date of defense
7. 7. 2020
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
English
Grade
Excellent
Keywords (Czech)
strojové učení, neuronové sítě, hladkost, zobecňováníKeywords (English)
machine learning, neural networks, smoothness, generalizationModern neural networks can easily fit their training set perfectly. Surprisingly, they generalize well despite being "overfit" in this way, defying the bias-variance trade-off. A prevalent explanation is that stochastic gradient descent has an implicit bias which leads it to learn functions that are simple, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the hypothesis that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and conduct experiments to determine whether these measures are implicitly being optimized for. We exclude the possibility that smoothness measures based on first derivatives (the gradient) are being implicitly optimized for. Measures based on second derivatives (the Hessian), on the other hand, show promising results. 1