TY - GEN
T1 - Improving small file I/O performance for massive digital archives
AU - Kim, Hwajung
AU - Yeom, Heonyoung
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/14
Y1 - 2017/11/14
N2 - With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.
AB - With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85043788078&partnerID=8YFLogxK
U2 - 10.1109/eScience.2017.39
DO - 10.1109/eScience.2017.39
M3 - Conference contribution
AN - SCOPUS:85043788078
T3 - Proceedings - 13th IEEE International Conference on eScience, eScience 2017
SP - 256
EP - 265
BT - Proceedings - 13th IEEE International Conference on eScience, eScience 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th IEEE International Conference on eScience, eScience 2017
Y2 - 24 October 2017 through 27 October 2017
ER -