Comparison of approaches to text classification
Porovnání přístupů ke klasifikaci textu
bakalářská práce (OBHÁJENO)
Zobrazit/ otevřít
Trvalý odkaz
http://hdl.handle.net/20.500.11956/117016Identifikátory
SIS: 208448
Kolekce
- Kvalifikační práce [11242]
Autor
Vedoucí práce
Oponent práce
Vidová Hladká, Barbora
Fakulta / součást
Matematicko-fyzikální fakulta
Obor
Obecná informatika
Katedra / ústav / klinika
Ústav formální a aplikované lingvistiky
Datum obhajoby
5. 9. 2019
Nakladatel
Univerzita Karlova, Matematicko-fyzikální fakultaJazyk
Angličtina
Známka
Výborně
Klíčová slova (česky)
NLP, klasifikace textu, strojové učení, klasifikace recenzíKlíčová slova (anglicky)
NLP, text classification, machine learning, review classificationThe focus of this thesis is short text classification. Short text is the prevailing form of text on e-commerce and review platforms, such as Yelp, Tripadvisor or Heureka. As the popularity of the online communication is increasing, it is becoming infeasible for users to filter information manually. It is therefore becoming more and more important to recog- nise the relevant information in text. Classification of reviews is especially challenging, because they have limited structure, use informal language, contain a high number of errors and rely heavily on context and common knowledge. One of the possible appli- cations of machine learning is to automatically filter data and show users only relevant pieces of information. We work with restaurant reviews from Yelp and aim to predict their usefulness. Most restaurants have relatively many reviews, yet only few are truly useful. Our objective is to compare machine learning methods for predicting usefulness. 1