LLM-Based Synthetic Data Generation for NLP Metric Validation

Eigler, Lukáš

Generování syntetických dat pomocí LLM pro validaci evaluačních metrik v NLP

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (295.9Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/209637

Identifikátory

SIS: 288612

Oponent práce

Kartáč, Ivan

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Informatika - Umělá inteligence

Katedra / ústav / klinika

Ústav formální a aplikované lingvistiky

Datum obhajoby

8. 6. 2026

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Výborně

Klíčová slova (česky)

large language models|natural language processing|automatic evaluation metrics|synthetic data

Klíčová slova (anglicky)

velké jazykové modely|zpracování přirozeného jazyka|automatické evaluační metriky|syntetická data

Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exist for English datasets. We propose Meta-Judge, a scalable framework that uses LLMs to generate synthetic evaluation datasets via controlled semantic degradation of reference texts, replacing human judgment. We validate our approach using meta-correlation, measuring the alignment between metric rankings derived from synthetic data and those from human-annotated data. We experiment across Machine Translation, Question Answering, and Summarization in eight languages using 4 open-source LLMs. Large models achieve meta-correlation above 0.9 on question-answering datasets. To reduce inference cost, we finetune a 1B-parameter model using GRPO with an unsupervised ensemble of metrics, recovering most of the performance of large models.

Citace dokumentu

Metadata

Zobrazit celý záznam