TY - GEN
T1 - Assessing Critical Thinking through a Multi-Agent LLM-Based Debate Chatbot
AU - Park, Bogyeom
AU - Seo, Kyoungwon
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/4/26
Y1 - 2025/4/26
N2 - Critical thinking (CT) is a crucial competency in education, requiring structured and reliable assessment methods. This study introduces an LLM-based debate chatbot framework designed to evaluate CT by integrating argument construction and analytical reasoning (i.e., argumentation skills) and qualitative reasoning criteria (i.e., intellectual standards). Two assessment models were developed: the Single-Agent (SA) model and the Multi-Agent (MA) model. The MA model achieved an Intraclass Correlation Coefficient (ICC) of 0.78 and 97.37 % Agreement within ±1 with human evaluators, indicating strong alignment with human assessment results. By independently assessing argumentation skills and intellectual standards, the MA model more effectively mirrored the nuanced assessment patterns observed in human assessments. These findings highlight the potential of multi-agent approaches in CT assessment, bridging structured assessment models with expert-like evaluative reasoning. This study contributes to the development of LLM-driven CT assessment frameworks, offering a scalable foundation for integrating diverse evaluative criteria in automated educational assessment.
AB - Critical thinking (CT) is a crucial competency in education, requiring structured and reliable assessment methods. This study introduces an LLM-based debate chatbot framework designed to evaluate CT by integrating argument construction and analytical reasoning (i.e., argumentation skills) and qualitative reasoning criteria (i.e., intellectual standards). Two assessment models were developed: the Single-Agent (SA) model and the Multi-Agent (MA) model. The MA model achieved an Intraclass Correlation Coefficient (ICC) of 0.78 and 97.37 % Agreement within ±1 with human evaluators, indicating strong alignment with human assessment results. By independently assessing argumentation skills and intellectual standards, the MA model more effectively mirrored the nuanced assessment patterns observed in human assessments. These findings highlight the potential of multi-agent approaches in CT assessment, bridging structured assessment models with expert-like evaluative reasoning. This study contributes to the development of LLM-driven CT assessment frameworks, offering a scalable foundation for integrating diverse evaluative criteria in automated educational assessment.
KW - Critical thinking assessment
KW - Debate chatbot
KW - Large language model
KW - Multi-Agent
UR - https://www.scopus.com/pages/publications/105005731601
U2 - 10.1145/3706599.3721207
DO - 10.1145/3706599.3721207
M3 - Conference contribution
AN - SCOPUS:105005731601
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
T2 - 2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025
Y2 - 26 April 2025 through 1 May 2025
ER -