TY - JOUR
T1 - KLUE
T2 - 35th Conference on Neural Information Processing Systems - Track on Datasets and Benchmarks, NeurIPS Datasets and Benchmarks 2021
AU - Park, Sungjoon
AU - Moon, Jihyung
AU - Kim, Sungdong
AU - Cho, Won Ik
AU - Han, Jiyoon
AU - Park, Jangwon
AU - Song, Chisung
AU - Kim, Junseong
AU - Song, Youngsook
AU - Oh, Taehwan
AU - Lee, Joohong
AU - Oh, Juhyun
AU - Lyu, Sungwon
AU - Jeong, Younghoon
AU - Lee, Inkwon
AU - Seo, Sangwoo
AU - Lee, Dongjun
AU - Kim, Hyunwoo
AU - Lee, Myeonghwa
AU - Jang, Seongbo
AU - Do, Seungwon
AU - Kim, Sunkyoung
AU - Lim, Kyungtae
AU - Lee, Jongwon
AU - Park, Kyumin
AU - Shin, Jamin
AU - Kim, Seonghyun
AU - Park, Lucy
AU - Oh, Alice
AU - Ha, Jung Woo
AU - Cho, Kyunghyun
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of eight Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We create all of the datasets from scratch in a principled way. We design the tasks to have diverse formats and each task to be built upon various source corpora that respect copyrights. Also, we propose suitable evaluation metrics and organize annotation protocols in a way to ensure quality. To prevent ethical risks in KLUE, we proactively remove examples reflecting social biases, containing toxic content or personally identifiable information (PII). Along with the benchmark datasets, we release pretrained language models (PLM) for Korean, KLUE-BERT and KLUE-RoBERTa, and find KLUE-RoBERTaLARGE outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. The fine-tuning recipes are publicly open for anyone to reproduce our baseline result. We believe our work will facilitate future research on cross-lingual as well as Korean language models and the creation of similar resources for other languages.
AB - We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of eight Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We create all of the datasets from scratch in a principled way. We design the tasks to have diverse formats and each task to be built upon various source corpora that respect copyrights. Also, we propose suitable evaluation metrics and organize annotation protocols in a way to ensure quality. To prevent ethical risks in KLUE, we proactively remove examples reflecting social biases, containing toxic content or personally identifiable information (PII). Along with the benchmark datasets, we release pretrained language models (PLM) for Korean, KLUE-BERT and KLUE-RoBERTa, and find KLUE-RoBERTaLARGE outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. The fine-tuning recipes are publicly open for anyone to reproduce our baseline result. We believe our work will facilitate future research on cross-lingual as well as Korean language models and the creation of similar resources for other languages.
UR - http://www.scopus.com/inward/record.url?scp=105000459569&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:105000459569
SN - 1049-5258
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 6 December 2021 through 14 December 2021
ER -