TY - GEN
T1 - Finer-LRU
T2 - 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021
AU - Bang, Jiwoo
AU - Kim, Chungyong
AU - Kim, Sunggon
AU - Chen, Qichen
AU - Lee, Cheongjun
AU - Byun, Eun Kyu
AU - Lee, Jaehwan
AU - Eom, Hyeonsang
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - In HPC systems, the increasing need for a higher level of concurrency has led to packing more cores within a single chip. However, since multiple processes share memory space, the frequent access to resources in critical sections where only atomic operation has to be executed can result in poor performance. In this paper, we focus on reducing lock contention on the memory management system of an HPC manycore architecture. One of the critical sections causing severe lock contention in the I/O path is in the page management system, which uses multiple Least Recently Used (LRU) lists with a single lock instance. To solve this problem, we propose a Finer-LRU scheme, which optimizes the page reclamation process by splitting LRU lists into multiple sub-lists, each having its own lock instance. Our evaluation result shows that the Finer-LRU scheme can improve sequential write throughput by 57.03% and reduce latency by 98.94% compared to the baseline Linux kernel version 5.2.8 in the Intel Knights Landing (KNL) architecture.
AB - In HPC systems, the increasing need for a higher level of concurrency has led to packing more cores within a single chip. However, since multiple processes share memory space, the frequent access to resources in critical sections where only atomic operation has to be executed can result in poor performance. In this paper, we focus on reducing lock contention on the memory management system of an HPC manycore architecture. One of the critical sections causing severe lock contention in the I/O path is in the page management system, which uses multiple Least Recently Used (LRU) lists with a single lock instance. To solve this problem, we propose a Finer-LRU scheme, which optimizes the page reclamation process by splitting LRU lists into multiple sub-lists, each having its own lock instance. Our evaluation result shows that the Finer-LRU scheme can improve sequential write throughput by 57.03% and reduce latency by 98.94% compared to the baseline Linux kernel version 5.2.8 in the Intel Knights Landing (KNL) architecture.
KW - Fine-grained lock
KW - High performance computing
KW - Manycore architecture
KW - Page reclamation process
UR - http://www.scopus.com/inward/record.url?scp=85113540722&partnerID=8YFLogxK
U2 - 10.1109/IPDPS49936.2021.00065
DO - 10.1109/IPDPS49936.2021.00065
M3 - Conference contribution
AN - SCOPUS:85113540722
T3 - Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
SP - 567
EP - 576
BT - Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 May 2021 through 21 May 2021
ER -