Comparison of approaches to text classification
Porovnání přístupů ke klasifikaci textu
bachelor thesis (DEFENDED)

View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/117016Identifiers
Study Information System: 208448
Collections
- Kvalifikační práce [10371]
Author
Advisor
Referee
Vidová Hladká, Barbora
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
General Computer Science
Department
Institute of Formal and Applied Linguistics
Date of defense
5. 9. 2019
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
English
Grade
Excellent
Keywords (Czech)
NLP, klasifikace textu, strojové učení, klasifikace recenzíKeywords (English)
NLP, text classification, machine learning, review classificationThe focus of this thesis is short text classification. Short text is the prevailing form of text on e-commerce and review platforms, such as Yelp, Tripadvisor or Heureka. As the popularity of the online communication is increasing, it is becoming infeasible for users to filter information manually. It is therefore becoming more and more important to recog- nise the relevant information in text. Classification of reviews is especially challenging, because they have limited structure, use informal language, contain a high number of errors and rely heavily on context and common knowledge. One of the possible appli- cations of machine learning is to automatically filter data and show users only relevant pieces of information. We work with restaurant reviews from Yelp and aim to predict their usefulness. Most restaurants have relatively many reviews, yet only few are truly useful. Our objective is to compare machine learning methods for predicting usefulness. 1