| dc.contributor.advisor | Plátek, Ondřej | |
| dc.creator | Obedkova, Maria | |
| dc.date.accessioned | 2019-10-17T12:08:10Z | |
| dc.date.available | 2019-10-17T12:08:10Z | |
| dc.date.issued | 2019 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.11956/109402 | |
| dc.description.abstract | Data-Driven Pronunciation Generation for ASR Maria Obedkova In ASR systems, dictionaries are usually used to describe pronunciations of words in a language. These dictionaries are typically hand-crafted by linguists. One of the most significant drawbacks of dictionaries created this way is that linguistically motivated pronunciations are not necessarily the optimal ones for ASR. The goal of this research was to explore approaches of data-driven pro- nunciation generation for ASR. We investigated several approaches of lexicon generation and implemented the completely new data-driven solution based on the pronunciation clustering. We proposed an approach for feature extraction and researched different unsupervised methods for pronunciation clustering. We evaluated the proposed approach and compared it with the current hand-crafted dictionary. The proposed data-driven approach could beat the established base- lines but underperformed in comparison to the hand-crafted dictionary which could be due to unsatisfactory features extracted from data or insufficient fine tuning. 1 | en_US |
| dc.language | English | cs_CZ |
| dc.language.iso | en_US | |
| dc.publisher | Univerzita Karlova, Matematicko-fyzikální fakulta | cs_CZ |
| dc.subject | ASR | cs_CZ |
| dc.subject | fonetický slovník | cs_CZ |
| dc.subject | data-driven | cs_CZ |
| dc.subject | fonetika | cs_CZ |
| dc.subject | ASR | en_US |
| dc.subject | phonetic dictionary | en_US |
| dc.subject | data-driven | en_US |
| dc.subject | unsupervised | en_US |
| dc.subject | phonetics | en_US |
| dc.title | Data-driven Pronunciation Generation for ASR | en_US |
| dc.type | diplomová práce | cs_CZ |
| dcterms.created | 2019 | |
| dcterms.dateAccepted | 2019-09-09 | |
| dc.description.department | Institute of Formal and Applied Linguistics | en_US |
| dc.description.department | Ústav formální a aplikované lingvistiky | cs_CZ |
| dc.description.faculty | Faculty of Mathematics and Physics | en_US |
| dc.description.faculty | Matematicko-fyzikální fakulta | cs_CZ |
| dc.identifier.repId | 212087 | |
| dc.title.translated | Generování fonetického slovníku pro rozpoznávání řeči z dat | cs_CZ |
| dc.contributor.referee | Peterek, Nino | |
| thesis.degree.name | Mgr. | |
| thesis.degree.level | navazující magisterské | cs_CZ |
| thesis.degree.discipline | Computational Linguistics | en_US |
| thesis.degree.discipline | Matematická lingvistika | cs_CZ |
| thesis.degree.program | Informatika | cs_CZ |
| thesis.degree.program | Computer Science | en_US |
| uk.thesis.type | diplomová práce | cs_CZ |
| uk.taxonomy.organization-cs | Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky | cs_CZ |
| uk.taxonomy.organization-en | Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics | en_US |
| uk.faculty-name.cs | Matematicko-fyzikální fakulta | cs_CZ |
| uk.faculty-name.en | Faculty of Mathematics and Physics | en_US |
| uk.faculty-abbr.cs | MFF | cs_CZ |
| uk.degree-discipline.cs | Matematická lingvistika | cs_CZ |
| uk.degree-discipline.en | Computational Linguistics | en_US |
| uk.degree-program.cs | Informatika | cs_CZ |
| uk.degree-program.en | Computer Science | en_US |
| thesis.grade.cs | Výborně | cs_CZ |
| thesis.grade.en | Excellent | en_US |
| uk.abstract.en | Data-Driven Pronunciation Generation for ASR Maria Obedkova In ASR systems, dictionaries are usually used to describe pronunciations of words in a language. These dictionaries are typically hand-crafted by linguists. One of the most significant drawbacks of dictionaries created this way is that linguistically motivated pronunciations are not necessarily the optimal ones for ASR. The goal of this research was to explore approaches of data-driven pro- nunciation generation for ASR. We investigated several approaches of lexicon generation and implemented the completely new data-driven solution based on the pronunciation clustering. We proposed an approach for feature extraction and researched different unsupervised methods for pronunciation clustering. We evaluated the proposed approach and compared it with the current hand-crafted dictionary. The proposed data-driven approach could beat the established base- lines but underperformed in comparison to the hand-crafted dictionary which could be due to unsatisfactory features extracted from data or insufficient fine tuning. 1 | en_US |
| uk.file-availability | V | |
| uk.publication.place | Praha | cs_CZ |
| uk.grantor | Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky | cs_CZ |
| thesis.grade.code | 1 | |