Details
Original language | English |
---|---|
Title of host publication | Instructional Design and Technology Enhanced Learning: Current States and Future Perspectives |
Subtitle of host publication | Book of Abstracts |
Pages | 57 |
Number of pages | 1 |
Publication status | Published - 23 Aug 2024 |
Event | EARLI SIG 6&7 Biennial Conference 2024: Instructional Design and Technology Enhanced Learning: Current States and Future Perspectives - University of Tübingen, Tübingen, Germany Duration: 21 Aug 2024 → 23 Aug 2024 https://www.earli.org/sig-6-7-conference-2024 |
Abstract
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Instructional Design and Technology Enhanced Learning: Current States and Future Perspectives: Book of Abstracts. 2024. p. 57.
Research output: Chapter in book/report/conference proceeding › Conference abstract › Research › peer review
}
TY - CHAP
T1 - Generation and Evaluation of Synthetic Text Data for the Students’ Conceptions Identification Task
AU - Stanja, Judith
AU - Hoppe, Anett
AU - Dannemann, Sarah
AU - Krugel, Johannes
PY - 2024/8/23
Y1 - 2024/8/23
N2 - Synthetic data generation is a solution to mitigate data scarcity. We investigate the generation of synthetic text data via prompting a pre-trained Large Language Model (LLM). The prompt design is based on reconstructive analyses from biology education of real student texts. Prompts were designed for the generation of positive and negative samples for intentional explanation patterns for the evolutionary adaptation of whales. We propose a mixed methods approach for the evaluation of the dataset: investigating statistical commonalities and differences between synthetic and real data and assessing frame-related aspects and correctness via an annotation study. Our preliminary findings show that ranges for text lengths and number of sentences are similar for synthetic and real data. We get mixed results for the similarity and lexical complexity of texts. The range of vocabulary sizes is similar in both datasets. We find that it is possible to generate data with indicators for the intentional patterns though we also get false samples. Generating positive samples worked better than for negative samples. Due to generation errors, further usage as training data requires cleaning of the synthetic data. The inter-annotator agreement in the annotation study was high. The study revealed crucial differences in frame annotations for correct positive and negative samples. We identify open questions and further steps for future research.
AB - Synthetic data generation is a solution to mitigate data scarcity. We investigate the generation of synthetic text data via prompting a pre-trained Large Language Model (LLM). The prompt design is based on reconstructive analyses from biology education of real student texts. Prompts were designed for the generation of positive and negative samples for intentional explanation patterns for the evolutionary adaptation of whales. We propose a mixed methods approach for the evaluation of the dataset: investigating statistical commonalities and differences between synthetic and real data and assessing frame-related aspects and correctness via an annotation study. Our preliminary findings show that ranges for text lengths and number of sentences are similar for synthetic and real data. We get mixed results for the similarity and lexical complexity of texts. The range of vocabulary sizes is similar in both datasets. We find that it is possible to generate data with indicators for the intentional patterns though we also get false samples. Generating positive samples worked better than for negative samples. Due to generation errors, further usage as training data requires cleaning of the synthetic data. The inter-annotator agreement in the annotation study was high. The study revealed crucial differences in frame annotations for correct positive and negative samples. We identify open questions and further steps for future research.
M3 - Conference abstract
SP - 57
BT - Instructional Design and Technology Enhanced Learning: Current States and Future Perspectives
T2 - EARLI SIG 6&7 Biennial Conference 2024
Y2 - 21 August 2024 through 23 August 2024
ER -