Loading [MathJax]/extensions/tex2jax.js

Automatic feedback on physics tasks using open-source generative artificial intelligence

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

  • André Meyer
  • Tom Bleckmann
  • Gunnar Friege

Organisationseinheiten

Details

OriginalspracheEnglisch
Seiten (von - bis) 1–26
Seitenumfang26
FachzeitschriftInternational Journal of Science Education
PublikationsstatusElektronisch veröffentlicht (E-Pub) - 12 Mai 2025

Abstract

This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.

ASJC Scopus Sachgebiete

Zitieren

Automatic feedback on physics tasks using open-source generative artificial intelligence. / Meyer, André; Bleckmann, Tom; Friege, Gunnar.
in: International Journal of Science Education, 12.05.2025, S. 1–26.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Meyer A, Bleckmann T, Friege G. Automatic feedback on physics tasks using open-source generative artificial intelligence. International Journal of Science Education. 2025 Mai 12; 1–26. Epub 2025 Mai 12. doi: 10.1080/09500693.2025.2499220
Download
@article{abb7d6ea134a48d3871a693840c2a47c,
title = "Automatic feedback on physics tasks using open-source generative artificial intelligence",
abstract = "This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.",
keywords = "LLM, automated feedback, science education",
author = "Andr{\'e} Meyer and Tom Bleckmann and Gunnar Friege",
note = "Publisher Copyright: {\textcopyright} 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.",
year = "2025",
month = may,
day = "12",
doi = "10.1080/09500693.2025.2499220",
language = "English",
pages = " 1–26",
journal = "International Journal of Science Education",
issn = "0950-0693",
publisher = "Taylor and Francis Ltd.",

}

Download

TY - JOUR

T1 - Automatic feedback on physics tasks using open-source generative artificial intelligence

AU - Meyer, André

AU - Bleckmann, Tom

AU - Friege, Gunnar

N1 - Publisher Copyright: © 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

PY - 2025/5/12

Y1 - 2025/5/12

N2 - This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.

AB - This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.

KW - LLM

KW - automated feedback

KW - science education

UR - http://www.scopus.com/inward/record.url?scp=105004836149&partnerID=8YFLogxK

U2 - 10.1080/09500693.2025.2499220

DO - 10.1080/09500693.2025.2499220

M3 - Article

SP - 1

EP - 26

JO - International Journal of Science Education

JF - International Journal of Science Education

SN - 0950-0693

ER -