Assessing Critical Thinking through a Multi-Agent LLM-Based Debate Chatbot

Bogyeom Park, Kyoungwon Seo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Critical thinking (CT) is a crucial competency in education, requiring structured and reliable assessment methods. This study introduces an LLM-based debate chatbot framework designed to evaluate CT by integrating argument construction and analytical reasoning (i.e., argumentation skills) and qualitative reasoning criteria (i.e., intellectual standards). Two assessment models were developed: the Single-Agent (SA) model and the Multi-Agent (MA) model. The MA model achieved an Intraclass Correlation Coefficient (ICC) of 0.78 and 97.37 % Agreement within ±1 with human evaluators, indicating strong alignment with human assessment results. By independently assessing argumentation skills and intellectual standards, the MA model more effectively mirrored the nuanced assessment patterns observed in human assessments. These findings highlight the potential of multi-agent approaches in CT assessment, bridging structured assessment models with expert-like evaluative reasoning. This study contributes to the development of LLM-driven CT assessment frameworks, offering a scalable foundation for integrating diverse evaluative criteria in automated educational assessment.

Original languageEnglish
Title of host publicationCHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400713958
DOIs
StatePublished - 26 Apr 2025
Event2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025 - Yokohama, Japan
Duration: 26 Apr 20251 May 2025

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025
Country/TerritoryJapan
CityYokohama
Period26/04/251/05/25

Keywords

  • Critical thinking assessment
  • Debate chatbot
  • Large language model
  • Multi-Agent

Fingerprint

Dive into the research topics of 'Assessing Critical Thinking through a Multi-Agent LLM-Based Debate Chatbot'. Together they form a unique fingerprint.

Cite this