Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

c - Conditional SSE/AVX add or zero elements based on compare

I have the following __m128 vectors:

v_weight

v_entropy

I need to add v_entropy to v_weight only where elements in v_weight are not 0f.

Obviously _mm_add_ps() adds all elements regardless.

I can compile up to AVX, but not AVX2.

EDIT

I do know beforehand how many elements in v_weight will be 0 (there will always be either 0 or the last 1, 2, or 3 elements). If it's easier, how do I zero-out the corresponding elements in v_entropy?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The cmpeq/cmpgt instructions create a mask, all ones or all zeros. The overall process goes as follows:

auto mask=_mm_cmpeq_ps(_mm_setzero_ps(), w);
mask=_mm_andnot_ps(mask, entropy);
w = _mm_add_ps(w, mask);

Other option is to accumulate anyway, but use blendv to select between added/not added.

auto w2=_mm_add_ps(e,w);
auto mask=_mm_cmpeq_ps(zero,w);
w=_mm_blendv_ps(w2,w, mask);

Third option uses the fact that w+e = 0, when w=0

 m=(w==0); // make mask as in above
 w+=e; // add
 w&=~m; // revert adding for w==0

(I'm using cmpeq instead of cmpneq to make it usable for integers as well.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...