Enriching the Korean learner corpus for grammatical error correction and writing assessment

  • Jayoung Song
  • , Kyung Tae Lim
  • , Jungyeul Park

Research output: Contribution to journalArticlepeer-review

Abstract

Despite growing global interest in Korean language education, learner corpora tailored to Korean L2 writing remain scarce. This paper introduces KoLLA v2.0, the first Korean learner corpus to incorporate both multi-reference GEC annotations and rubric-based essay scoring. We extend the original KoLLA dataset by adding a second human correction for each sentence, creating the first multi-reference GEC resource for Korean. This design captures the variability of valid corrections in an agglutinative and morphologically complex language, enabling fairer and more realistic evaluation of GEC systems. In parallel, we enrich the corpus with rubric-based scores based on criteria from the Korean National Language Institute, providing standardized, multi-dimensional assessments of grammatical accuracy, coherence, and lexical diversity. These enhancements position KoLLA v2.0 as a standardized resource for research in Korean L2 learning and instruction, and as a benchmark for evaluating automated error correction and essay scoring systems.

Original languageEnglish
Article number15
JournalLanguage Resources and Evaluation
Volume60
Issue number1
DOIs
StatePublished - Mar 2026

Keywords

  • Grammatical error correction
  • KoLLA corpus
  • Korean learner corpus
  • Multi-reference GEC
  • Rubric-based scoring

Fingerprint

Dive into the research topics of 'Enriching the Korean learner corpus for grammatical error correction and writing assessment'. Together they form a unique fingerprint.

Cite this