I'm testing Intel ADX add with carry and add with overflow to pipeline adds on large integers. I'd like to see what expected code generation should look like. From _addcarry_u64 and _addcarryx_u64 with MSVC and ICC, I thought this would be a suitable test case:
#include <stdint.h>
#include <x86intrin.h>
#include "immintrin.h"
int main(int argc, char* argv[])
{
#define MAX_ARRAY 100
uint8_t c1 = 0, c2 = 0;
uint64_t a[MAX_ARRAY]={0}, b[MAX_ARRAY]={0}, res[MAX_ARRAY];
for(unsigned int i=0; i< MAX_ARRAY; i++){
c1 = _addcarryx_u64(c1, res[i], a[i], (unsigned long long int*)&res[i]);
c2 = _addcarryx_u64(c2, res[i], b[i], (unsigned long long int*)&res[i]);
}
return 0;
}
When I examine the generated code from GCC 6.1 using -O3
and -madx
, it reveals serialized addc
. -O1
and -O2
produces similar results:
main:
subq $688, %rsp
xorl %edi, %edi
xorl %esi, %esi
leaq -120(%rsp), %rdx
xorl %ecx, %ecx
leaq 680(%rsp), %r8
.L2:
movq (%rdx), %rax
addb $-1, %sil
adcq %rcx, %rax
setc %sil
addb $-1, %dil
adcq %rcx, %rax
setc %dil
movq %rax, (%rdx)
addq $8, %rdx
cmpq %r8, %rdx
jne .L2
xorl %eax, %eax
addq $688, %rsp
ret
So I'm guessing the test case is not quite hitting the mark, or I am doing something wrong, or I am using something incorrectly, ...
If I am parsing Intel's docs on _addcarryx_u64
correctly, I believe the C code should generate the pipeline. So I'm guessing I am doing something wrong:
Description
Add unsigned 64-bit integers a and b with unsigned 8-bit carry-in c_in
(carry or overflow flag), and store the unsigned 64-bit result in out,
and the carry-out in dst (carry or overflow flag).
How can I generate the pipeline'd add with carry/add with overflow (adcx
/adox
)?
I've actually got a 5th generation Core i7 ready for testing (notice the adx
cpu flag):
$ cat /proc/cpuinfo | grep adx
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowprefetch ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
...
See Question&Answers more detail:
os