Automatic feedback on physics tasks using open-source generative artificial intelligence

André Meyer; Tom Bleckmann; Gunnar Friege

doi:10.1080/09500693.2025.2499220

Details

Originalsprache	Englisch
Seiten (von - bis)	1–26
Seitenumfang	26
Fachzeitschrift	International Journal of Science Education
Publikationsstatus	Elektronisch veröffentlicht (E-Pub) - 12 Mai 2025

Abstract

This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.

ASJC Scopus Sachgebiete

Sozialwissenschaften (insg.)
Ausbildung bzw. Denomination

Zitieren

Automatic feedback on physics tasks using open-source generative artificial intelligence. / Meyer, André; Bleckmann, Tom; Friege, Gunnar.
in: International Journal of Science Education, 12.05.2025, S. 1–26.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Meyer, A, Bleckmann, T & Friege, G 2025, 'Automatic feedback on physics tasks using open-source generative artificial intelligence', International Journal of Science Education, S. 1–26. https://doi.org/10.1080/09500693.2025.2499220

Meyer, A., Bleckmann, T., & Friege, G. (2025). Automatic feedback on physics tasks using open-source generative artificial intelligence. International Journal of Science Education, 1–26. Vorabveröffentlichung online. https://doi.org/10.1080/09500693.2025.2499220

Meyer A, Bleckmann T, Friege G. Automatic feedback on physics tasks using open-source generative artificial intelligence. International Journal of Science Education. 2025 Mai 12; 1–26. Epub 2025 Mai 12. doi: 10.1080/09500693.2025.2499220

Meyer, André ; Bleckmann, Tom ; Friege, Gunnar. / Automatic feedback on physics tasks using open-source generative artificial intelligence. in: International Journal of Science Education. 2025 ; S. 1–26.

Download

@article{abb7d6ea134a48d3871a693840c2a47c,

title = "Automatic feedback on physics tasks using open-source generative artificial intelligence",

abstract = "This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.",

keywords = "LLM, automated feedback, science education",

author = "Andr{\'e} Meyer and Tom Bleckmann and Gunnar Friege",

note = "Publisher Copyright: {\textcopyright} 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.",

year = "2025",

month = may,

day = "12",

doi = "10.1080/09500693.2025.2499220",

language = "English",

pages = " 1–26",

journal = "International Journal of Science Education",

issn = "0950-0693",

publisher = "Taylor and Francis Ltd.",

}

Download

TY - JOUR

T1 - Automatic feedback on physics tasks using open-source generative artificial intelligence

AU - Meyer, André

AU - Bleckmann, Tom

AU - Friege, Gunnar

PY - 2025/5/12

Y1 - 2025/5/12

N2 - This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.

AB - This study explores the feasibility of using open-source large language models (LLMs) to generate automatic feedback on physics problem-solving tasks in educational settings. A quantised version of the open-source LLM OpenChat 3.6 was employed to generate German-language feedback for high school students on standard school hardware. The study procedure involved five stages: data preparation, model selection, prompt design, response evaluation, and quality analysis of feedback. OpenChat 3.6 achieved an accuracy of 0.84 in classifying student answers. In comparison, GPT4-o achieved an accuracy of 0.85. The open-source LLM provided accurate and suitable feedback in 69% of cases, with substantial interrater agreement (κ = 0.89) on feedback quality. However, performance varied across task types, highlighting areas for improvement in prompt specificity, especially in handling physics terminology. These findings suggest that, with optimisation, open-source LLMs can offer a locally controlled and effective solution for formative assessment in physics education, enabling real-time, targeted feedback to support student learning.

KW - LLM

KW - automated feedback

KW - science education

UR - http://www.scopus.com/inward/record.url?scp=105004836149&partnerID=8YFLogxK

U2 - 10.1080/09500693.2025.2499220

DO - 10.1080/09500693.2025.2499220

M3 - Article

SP - 1

EP - 26

JO - International Journal of Science Education

JF - International Journal of Science Education

SN - 0950-0693

ER -

Research@Leibniz University

Automatic feedback on physics tasks using open-source generative artificial intelligence

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren