TY - GEN
T1 - Cache compression with golomb-rice code and quantization for convolutional neural networks
AU - Bae, Seung Hwan
AU - Lee, Hyuk Jae
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2021 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Cache compression schemes reduce the cache miss rate by increasing the effective cache capacity and consequently, reduce memory access and power consumption. Therefore, cache compression is beneficial for applications with heavy memory traffic, including convolutional neural network (CNN). In this paper, a new cache compression of a floating-point number is proposed for CNNs. The exponent is compressed using the Golomb-Rice code, instead of the Huffman code, for an efficient hardware implementation. The compression syntax is carefully designed so that the size of compressed data is not very far from the entropy, which is the theoretical limit, by distinguishing two different types of data used in CNNs. On the other hand, since the mantissa of CNNs data can be hardly compressed by entropy coding, it is simply quantized for data reduction that may not degrade the CNN performance significantly thanks to the error robustness of CNNs. The quantization reduces 23 bits of a mantissa to 4 bits. The experimental results show that the miss rate of a 1 MB compressed cache with the proposed compression method applied is almost similar to that of an uncompressed 2 MB cache without any decrease of the CNN accuracy.
AB - Cache compression schemes reduce the cache miss rate by increasing the effective cache capacity and consequently, reduce memory access and power consumption. Therefore, cache compression is beneficial for applications with heavy memory traffic, including convolutional neural network (CNN). In this paper, a new cache compression of a floating-point number is proposed for CNNs. The exponent is compressed using the Golomb-Rice code, instead of the Huffman code, for an efficient hardware implementation. The compression syntax is carefully designed so that the size of compressed data is not very far from the entropy, which is the theoretical limit, by distinguishing two different types of data used in CNNs. On the other hand, since the mantissa of CNNs data can be hardly compressed by entropy coding, it is simply quantized for data reduction that may not degrade the CNN performance significantly thanks to the error robustness of CNNs. The quantization reduces 23 bits of a mantissa to 4 bits. The experimental results show that the miss rate of a 1 MB compressed cache with the proposed compression method applied is almost similar to that of an uncompressed 2 MB cache without any decrease of the CNN accuracy.
KW - Cache compression
KW - Convolutional neural network
KW - Golomb-Rice code
KW - Quantization
UR - http://www.scopus.com/inward/record.url?scp=85109002481&partnerID=8YFLogxK
U2 - 10.1109/ISCAS51556.2021.9401655
DO - 10.1109/ISCAS51556.2021.9401655
M3 - Conference contribution
AN - SCOPUS:85109002481
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2021 IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 53rd IEEE International Symposium on Circuits and Systems, ISCAS 2021
Y2 - 22 May 2021 through 28 May 2021
ER -