Smoothness of Functions Learned by Neural Networks

Volhejn, Václav

Hladkost funkcí naučených neuronovými sítěmi

bakalářská práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (151.8Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/119446

Identifikátory

SIS: 224645

Oponent práce

Straka, Milan

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Obecná informatika

Katedra / ústav / klinika

Ústav formální a aplikované lingvistiky

Datum obhajoby

7. 7. 2020

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Výborně

Klíčová slova (česky)

strojové učení, neuronové sítě, hladkost, zobecňování

Klíčová slova (anglicky)

machine learning, neural networks, smoothness, generalization

Modern neural networks can easily fit their training set perfectly. Surprisingly, they generalize well despite being "overfit" in this way, defying the bias-variance trade-off. A prevalent explanation is that stochastic gradient descent has an implicit bias which leads it to learn functions that are simple, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the hypothesis that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and conduct experiments to determine whether these measures are implicitly being optimized for. We exclude the possibility that smoothness measures based on first derivatives (the gradient) are being implicitly optimized for. Measures based on second derivatives (the Hessian), on the other hand, show promising results. 1

Citace dokumentu

Metadata

Zobrazit celý záznam