TY - GEN
T1 - Evaluation of the Memory Communication Traffic in a Hierarchical Cache Model for Massively-Manycore Processors
AU - Khanjari, Sharifa Al
AU - Vanderbauwhede, Wim
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/3/31
Y1 - 2016/3/31
N2 - The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the «memory distance»). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the «memory distance» between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies.
AB - The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the «memory distance»). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the «memory distance» between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies.
KW - Manycore
KW - Network on Chip
KW - Quadtree
KW - Shared-Memory Architecture
UR - http://www.scopus.com/inward/record.url?scp=84968894978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84968894978&partnerID=8YFLogxK
U2 - 10.1109/PDP.2016.30
DO - 10.1109/PDP.2016.30
M3 - Conference contribution
AN - SCOPUS:84968894978
T3 - Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016
SP - 726
EP - 733
BT - Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016
A2 - Cotronis, Yiannis
A2 - Daneshtalab, Masoud
A2 - Papadopoulos, George Angelos
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016
Y2 - 17 February 2016 through 19 February 2016
ER -