Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
328 views
in Technique[技术] by (71.8m points)

simd - What is "vectorization"?

Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorization, and what does it mean for example, that "a loop is vectorized" ?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Many CPUs have "vector" or "SIMD" instruction sets which apply the same operation simultaneously to two, four, or more pieces of data. Modern x86 chips have the SSE instructions, many PPC chips have the "Altivec" instructions, and even some ARM chips have a vector instruction set, called NEON.

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

(I chose 4 because it's what modern hardware is most likely to directly support; the term "vectorization" is also used to describe a higher level software transformation where you might just abstract away the loop altogether and just describe operating on arrays instead of the elements that comprise them)


The difference between vectorization and loop unrolling: Consider the following very simple loop that adds the elements of two arrays and stores the results to a third array.

for (int i=0; i<16; ++i)
    C[i] = A[i] + B[i];

Unrolling this loop would transform it into something like this:

for (int i=0; i<16; i+=4) {
    C[i]   = A[i]   + B[i];
    C[i+1] = A[i+1] + B[i+1];
    C[i+2] = A[i+2] + B[i+2];
    C[i+3] = A[i+3] + B[i+3];
}

Vectorizing it, on the other hand, produces something like this:

for (int i=0; i<16; i+=4)
    addFourThingsAtOnceAndStoreResult(&C[i], &A[i], &B[i]);

Where "addFourThingsAtOnceAndStoreResult" is a placeholder for whatever intrinsic(s) your compiler uses to specify vector instructions. Note that some compilers are able to auto vectorize very simple loops like this, which can often be enabled via a compile option. More complex algorithms still require help from the programmer to generate good vector code.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...