TY - GEN
T1 - Adjustable block size coherent caches
AU - Dubnicki, Czarek
AU - LeBlanc, Thomas J.
PY - 1993/12/1
Y1 - 1993/12/1
N2 - Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the number of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, and merged back into a single cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by application.
AB - Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the number of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, and merged back into a single cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by application.
UR - http://www.scopus.com/inward/record.url?scp=0027805837&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0027805837&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0027805837
SN - 0897915097
T3 - Proceedings of the Ninth Annual International Symposium on Computer Architecture
SP - 170
EP - 180
BT - Proceedings of the Ninth Annual International Symposium on Computer Architecture
PB - Publ by ACM
T2 - Proceedings of the 19th Annual International Symposium on Compu- ter Architecture
Y2 - 19 May 1992 through 21 May 1992
ER -