Spoken Language Translation via Phoneme Representation of the Source Language

Polák, Peter

Strojový překlad mluvené řeči přes fonetickou reprezentaci zdrojové řeči

dc.contributor.advisor	Bojar, Ondřej
dc.creator	Polák, Peter
dc.date.accessioned	2020-07-29T10:00:44Z
dc.date.available	2020-07-29T10:00:44Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/20.500.11956/119532
dc.description.abstract	We refactor the traditional two-step approach of automatic speech recognition for spoken language translation. Instead of conventional graphemes, we use phonemes as an intermediate speech representation. Starting with the acoustic model, we revise the cross-lingual transfer and propose a coarse-to-fine method providing further speed-up and performance boost. Further, we review the translation model. We experiment with source and target encoding, boosting the robustness by utilizing the fine-tuning and transfer across ASR and SLT. We empirically document that this conventional setup with an alternative representation not only performs well on standard test sets but also provides robust transcripts and translations on challenging (e.g., non-native) test sets. Notably, our ASR system outperforms commercial ASR systems. 1	en_US
dc.description.abstract	Revidujeme tradičný dvojkrokový prístup automatického rozpoznávania reči pre pre- klad hovoreného jazyka. Namiesto konvenčných grafémov používame fonémy ako reprezen- táciu reči v medzikroku. Počnúc akustickým modelom, revidujeme prenos medzi jazykmi a navrhujeme "coarse-to-fine" metódu, ktorá poskytuje ďalšie zrýchlenie konvergencie a zvýšenie výkonu. Ďalej skúmame prekladový model. Experimentujeme so zdrojovým a cieľovým kódovaním a zvyšujeme robustnosť pomocou fine-tuningu a transferu medzi ASR a SLT. Empiricky dokumentujeme, že toto konvenčné nastavenie s alternatívnou reprezentáciou nielen dobre funguje na štandardných testovacích súboroch, ale tiež posky- tuje kvalitné transkripty a preklady na náročných (napr. nerodilých) testovacích dátach. Náš ASR systém prekonáva komerčné ASR systémy. 1	cs_CZ
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	spoken language translation	en_US
dc.subject	automatic speech recognition	en_US
dc.subject	transfer learning	en_US
dc.subject	non-native speech translation	en_US
dc.title	Spoken Language Translation via Phoneme Representation of the Source Language	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2020
dcterms.dateAccepted	2020-07-08
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	222230
dc.title.translated	Strojový překlad mluvené řeči přes fonetickou reprezentaci zdrojové řeči	cs_CZ
dc.contributor.referee	Peterek, Nino
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Umělá inteligence	cs_CZ
thesis.degree.discipline	Artificial Intelligence	en_US
thesis.degree.program	Computer Science	en_US
thesis.degree.program	Informatika	cs_CZ
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Umělá inteligence	cs_CZ
uk.degree-discipline.en	Artificial Intelligence	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Revidujeme tradičný dvojkrokový prístup automatického rozpoznávania reči pre pre- klad hovoreného jazyka. Namiesto konvenčných grafémov používame fonémy ako reprezen- táciu reči v medzikroku. Počnúc akustickým modelom, revidujeme prenos medzi jazykmi a navrhujeme "coarse-to-fine" metódu, ktorá poskytuje ďalšie zrýchlenie konvergencie a zvýšenie výkonu. Ďalej skúmame prekladový model. Experimentujeme so zdrojovým a cieľovým kódovaním a zvyšujeme robustnosť pomocou fine-tuningu a transferu medzi ASR a SLT. Empiricky dokumentujeme, že toto konvenčné nastavenie s alternatívnou reprezentáciou nielen dobre funguje na štandardných testovacích súboroch, ale tiež posky- tuje kvalitné transkripty a preklady na náročných (napr. nerodilých) testovacích dátach. Náš ASR systém prekonáva komerčné ASR systémy. 1	cs_CZ
uk.abstract.en	We refactor the traditional two-step approach of automatic speech recognition for spoken language translation. Instead of conventional graphemes, we use phonemes as an intermediate speech representation. Starting with the acoustic model, we revise the cross-lingual transfer and propose a coarse-to-fine method providing further speed-up and performance boost. Further, we review the translation model. We experiment with source and target encoding, boosting the robustness by utilizing the fine-tuning and transfer across ASR and SLT. We empirically document that this conventional setup with an alternative representation not only performs well on standard test sets but also provides robust transcripts and translations on challenging (e.g., non-native) test sets. Notably, our ASR system outperforms commercial ASR systems. 1	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ