Take advantage of Spatial Locality
In C, matrixes are stored in row-major order. So if you access element a[i][j]
, an access to element a[i][j+1]
will be likely to hit the cache. No access to main memory will be performed. Caches are faster than main memory, so the access pattern does matter.
Of course, more factors must be taken into accout, such as write access / read access, write policy (write through, write back / write allocate , no write allocate), multilevel caches, and so on. But that seems an overkill for this question.
Have some fun with a profiling tool, such as cachegrind, and see it for yourself.
For example, consider a dummy program accesing 4MB matrices. Check out the differences between the miss rates on each access pattern.
column access
$ cat col_major.c
#include <stdio.h>
int main(){
size_t i,j;
const size_t dim = 1024 ;
int matrix [dim][dim];
for (i=0;i< dim; i++){
for (j=0;j <dim;j++){
matrix[j][i]= i*j;
}
}
return 0;
}
$ valgrind --tool=cachegrind ./col_major
==3228== D refs: 10,548,665 (9,482,149 rd + 1,066,516 wr)
==3228== D1 misses: 1,049,704 ( 968 rd + 1,048,736 wr)
==3228== L2d misses: 1,049,623 ( 893 rd + 1,048,730 wr)
==3228== D1 miss rate: 9.9% ( 0.0% + 98.3% )
==3228== L2d miss rate: 9.9% ( 0.0% + 98.3% )
row access
$ cat row_major.c
#include <stdio.h>
int main(){
size_t i,j;
const size_t dim = 1024 ;
int matrix [dim][dim];
for (i=0;i< dim; i++)
for (j=0;j <dim;j++)
matrix[i][j]= i*j;
return 0;
}
$ valgrind --tool=cachegrind ./row_major
==3524== D refs: 10,548,665 (9,482,149 rd + 1,066,516 wr)
==3524== D1 misses: 66,664 ( 968 rd + 65,696 wr)
==3524== L2d misses: 66,583 ( 893 rd + 65,690 wr)
==3524== D1 miss rate: 0.6% ( 0.0% + 6.1% )
==3524== L2d miss rate: 0.6% ( 0.0% + 6.1% )