Abstract
In the field of natural language processing, a lot of progress has been made with the advent of Transformer having a self-attention mechanism. At the same time, the recently increasing model size causes difficulties in deploying the model for online serving that requires fast inference. To address this issue, one can employ model compression techniques when a target domain is coherent with the training corpus (i.e., a general domain) of pre-trained models such as BERT. However, the additional domain adaptation step is required along with model compression when we leverage such pre-trained models for special target domains such as medicine, law, finance, etc. In this paper, we propose an Efficient Domain Adaptive Distillation (EDAD) method to efficiently create a lightweight model capable of fast inference for a target domain by integrating knowledge distillation, which is one of the popular model compression methods, and domain adaptation processes. Experimental results demonstrate that EDAD can train a compact model for a target domain with much lower computational costs by integrating the two individual processes, adaptation and compression, into a single process and shows comparable performance with existing methods for named entity recognition (NER) tasks in the medical domain.
| Translated title of the contribution | EDAD: Efficient Domain Adaptive Distillation by Integrating Domain Adaptation and Knowledge Distillation |
|---|---|
| Original language | Korean |
| Pages (from-to) | 133-141 |
| Number of pages | 9 |
| Journal | 대한산업공학회지 |
| Volume | 49 |
| Issue number | 2 |
| DOIs | |
| State | Published - 2023 |