The natural candidate for the parallelism is the outer loop, however, because it has two variable used to iterate (i.e., i
and k
), we can not simply parallelized like it is using OpenMP.
Therefore, we first need to re-arrange the outer loop so that it contains only one variable. Let us choose the variable k
since it is used to iterate over the array result
. It is important to be the variable used to iterate over the array result
because it is the only array being written into. Hence, it is the one that makes the more sense to distribute the iteration over that array among threads.
If one runs your double loop (with a n=4) and instead of result[k]+=matrix[j]*vector[l];
prints the values of i, k and j
, one gets the following:
i=0 k=0 j=0
i=0 k=0 j=1
i=0 k=0 j=2
i=0 k=0 j=3
i=4 k=1 j=4
i=4 k=1 j=5
i=4 k=1 j=6
i=4 k=1 j=7
i=8 k=2 j=8
i=8 k=2 j=9
i=8 k=2 j=10
i=8 k=2 j=11
i=12 k=3 j=12
i=12 k=3 j=13
i=12 k=3 j=14
i=12 k=3 j=15
So i = k * n;
So your double loop it is equivalent to :
for(int k=0; k<n; k++){
int i = k * n;
for(int j=i, l=0; j <i+n, l<n; j++,l++)
result[k]+=matrix[j]*vector[l];
}
now the outer loop can parallelized with OpenMP, namely:
#pragma omp parallel for
for(int k=0; k<n; k++){
for(int j= k * n, l=0; l<n; j++,l++)
result[k] += matrix[j] * vector[l];
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…