Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
441 views
in Technique[技术] by (71.8m points)

c++ - Efficient computation of the high order bits of a 32 bit integer multiplication

Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produces a 64 bit result, but this is truncated to the low 32 bits if you store it in a 32 bit integer.

For example, on PowerPC, the mulhw opcode returns the high 32 bits of the 64 bit result of a 32x32 bit multiply in one clock. This is exactly what I'm looking for, but more portably. There's a similar opcode, umulhi(), in NVidia CUDA.

In C/C++, is there an efficient way to return the high order bits of the 32x32 multiply? Currently I compute it by casting to 64 bits, something like:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  unsigned long long xx=x;
  xx*=y;
  return (unsigned int)(xx>>32);
}

but this is over 11 times slower than a regular 32 by 32 multiply because I'm using overkill 64 bit math even for the multiply.

Is there a faster way to compute the high order bits?

This is clearly not best solved with a BigInteger library (which is overkill and will have huge overhead).

SSE seems to have PMULHUW, a 16x16 -> top 16 bit version of this, but not a 32x32 -> top 32 version like I'm looking for.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:

umulhi32:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        mull    8(%ebp)
        movl    %edx, %eax
        popl    %ebp
        ret

Which is just doing a single 32 bit mull and putting the high 32 bits of the result (from %edx) into the return value.

That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  return (unsigned int)(((unsigned long long)x * y)>>32);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...