XNC: XOR and NOT-Based Lossless Compression for Optimizing Unquantized Embedding Layers in Large Language Models

Junghyeok Lee, Jihoon Jang, Hyun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Although 4-bit quantized small LLMs have been proposed recently, many studies have retained FP16 precision for embedding layers, as they constitute a relatively small proportion of the overall model in existing LLMs and suffer from severe accuracy degradation when quantized. However, in quantized small LLMs, the embedding layer accounts for a substantial proportion of the total model parameters, necessitating its compression. Since embedding layers are sensitive to approximation, lossless compression is more desirable than lossy compression methods such as quantization. While existing lossless compression methods efficiently compress patterns such as zeros, narrow values, or frequently occurring values, embedding layers typically lack these patterns, making effective compression more challenging. In this paper, we propose XOR and NOT-based lossless compression (XNC), which applies XOR operations between adjacent 16-bit blocks and then performs a NOT operation on the result, effectively truncating the upper and lower bits to compress the embedding layer to 9-bit without any loss. The proposed method leverages XOR and NOT operations, enabling easy hardware implementation, with only four cycles required for compression and three cycles for decompression, ensuring efficient data compression without performance degradation. As a result, the proposed compression technique achieves an average compression ratio of 1.34× for the embedding layers of small LLMs without any loss, effectively reducing the model size of 4-bit quantized LLMs by an average of 9.91%. The code is available at https://github.com/IDSL-SeoulTech/XNC.

Original languageEnglish
Title of host publicationISCAS 2025 - IEEE International Symposium on Circuits and Systems, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350356830
DOIs
StatePublished - 2025
Event2025 IEEE International Symposium on Circuits and Systems, ISCAS 2025 - London, United Kingdom
Duration: 25 May 202528 May 2025

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)0271-4310

Conference

Conference2025 IEEE International Symposium on Circuits and Systems, ISCAS 2025
Country/TerritoryUnited Kingdom
CityLondon
Period25/05/2528/05/25

Keywords

  • embedding layer
  • Lossless compression
  • on-device AI
  • quantization
  • small language models

Fingerprint

Dive into the research topics of 'XNC: XOR and NOT-Based Lossless Compression for Optimizing Unquantized Embedding Layers in Large Language Models'. Together they form a unique fingerprint.

Cite this