Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
695 views
in Technique[技术] by (71.8m points)

algorithm - De Bruijn-like sequence for `2^n - 1`: how is it constructed?

I'm looking at the entry Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup from Bit Twiddling hacks.

I can easily see how the second algorithm in that entry works

static const int MultiplyDeBruijnBitPosition2[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];

which calculates n = log2 v where v is known to be a power of 2. In this case 0x077CB531 is an ordinary De Bruijn sequence, and the rest is obvious.

However, the first algorithm in that entry

static const int MultiplyDeBruijnBitPosition[32] =
{
  0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
  8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};

v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];

looks a bit more tricky to me. We begin by snapping v to a nearest greater 2^n - 1 value. This 2^n - 1 value is then multiplied by 0x07C4ACDD, which in this case acts the same way as the DeBruijn sequence in the previous algorithm did.

My question is: how do we construct this magic 0x07C4ACDD sequence? I.e. how do we construct a sequence that can be used to generate unique indices when multiplied by a 2^n - 1 value? For 2^n multiplier it is just an ordinary De Bruijn sequence, as we can see above, so it is clear where 0x077CB531 came from. But what about 2^n - 1 multiplier 0x07C4ACDD? I feel like I'm missing something obvious here.

P.S. To clarify my question: I'm not really looking for an algorithm to generate these sequences. I'm more interested in some more-or-less trivial property (if one exists) that makes 0x07C4ACDD work as we want it to work. For 0x077CB531 the property that makes it work is pretty obvious: it contains all 5-bit combinations "stored" in the sequence with 1-bit stepping (which is basically what De Bruijn sequence is).

The 0x07C4ACDD, on the other hand, is not a De Bruijn sequence by itself. So, what property were they aiming for when constructing 0x07C4ACDD (besides the non-constructive "it should make the above algorithm work")? Someone did come up with the above algorithm somehow. So they probably knew that the approach is viable, and that the appropriate sequence exists. How did they know that?

For example, if I were to construct the algorithm for an arbitrary v, I'd do

v |= v >> 1;
v |= v >> 2;
...

first. Then I'd just do ++v to turn v into a power of 2 (let's assume it doesn't overflow). Then I'd apply the first algorithm. And finally I'd do --r to obtain the final answer. However, these people managed to optimize it: they eliminated the leading ++v and the trailing --r steps simply by changing the multiplier and rearranging the table. How did they know it was possible? What is the math behind this optimization?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A De Bruijn sequence of order n over k symbols (and of k^n length) have a property that every possible n-length word appears as consecutive characters in it, some of them with cyclic wrapping. For example, in the case of k=2, n=2, the possible words are 00, 01, 10, 11, and a De Bruijn sequence is 0011. 00, 01, 11 appears in it, 10 with wrapping. This property naturally means that left shifting a De Bruijn sequence (multiplying with power of two) and taking its upper n bits results in a unique number for each power of two multiplier. Then you only need a lookup table to determine which one it is. It works on a similar principle to numbers which are one less than power of two, but the magic number in this case is not a De Bruijn sequence, but an analogue. The defining property simply changes to "every possible n-length word appears as the sum of the first m subsequences of length n, mod 2^n". This property is all that is needed for the algorithm to work. They simply used this different class of magic numbers to speed up the algorithm. I did as well.

One possible method of construction of De Bruijn numbers is the generation of a Hamiltonian path of the De Bruijn graph, Wikipedia provides an example of such a graph. In this case, the nodes are 2^5=32-bit integers, the directed edges are transitions between them, where a transition is a left shift and a binary or operation according to the label of the edge, 0 or 1. There might be a direct analogue to 2^n-1 type magic numbers, it might be worth exploring, but this isn't a way people usually construct such algorithms.

In practice you might try to construct it differently, especially if you want it to behave in a tad different manner. For example, the implementation of leading/trailing number of zeros algorithms on the bit twiddling hacks page can only return values in [0..31]. It needs additional checking for the case of 0, which has 32 zeros. This check requires a branching and can be way too slow on some CPUs.

The way I did it, I used a 64 element lookup table instead of 32, generated random magic numbers, and for each of them I built up a lookup table with power of two inputs, checked its correctness (injectivity), then verified it for all 32-bit numbers. I continued till I encountered a correct magic number. The resulting numbers do not fulfill a property of "every possible n-length word appears", since only 33 numbers appear, which are unique for all 33 possible input.

Exhaustive brute force search sounds slow, especially if good magic numbers are rare, but if we first test known power of two values as inputs, the table is filled quickly, rejection is fast, and the rejection rate is very high. We only need to clear the table after each magic number. In essence I exploited a high rejection rate algorithm to construct magic numbers.

The resulting algorithms are

int32 Integer::numberOfLeadingZeros (int32 x)
{
    static int32 v[64] = {
        32, -1, 1, 19, -1, -1, -1, 27, -1, 24, 3, -1, 29, -1, 9, -1,
        12, 7, -1, 20, -1, -1, 4, 30, 10, -1, 21, -1, 5, 31, -1, -1,
        -1, -1, 0, 18, 17, 16, -1, -1, 15, -1, -1, -1, 26, -1, 14, -1,
        23, -1, 2, -1, -1, 28, 25, -1, -1, 13, 8, -1, -1, 11, 22, 6};
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x *= 0x749c0b5d;
    return v[cast<uint32>(x) >> 26];
}

int32 Integer::numberOfTrailingZeros (int32 x)
{
    static int32 v[64] = {
        32, -1, 2, -1, 3, -1, -1, -1, -1, 4, -1, 17, 13, -1, -1, 7,
        0, -1, -1, 5, -1, -1, 27, 18, 29, 14, 24, -1, -1, 20, 8, -1,
        31, 1, -1, -1, -1, 16, 12, 6, -1, -1, -1, 26, 28, 23, 19, -1,
        30, -1, 15, 11, -1, 25, 22, -1, -1, 10, -1, 21, 9, -1, -1, -1};
    x &= -x;
    x *= 0x4279976b;
    return v[cast<uint32>(x) >> 26];
}

As for your question of how did they know, they probably didn't. They experimented, tried to change things, just like me. After all, it isn't a big stretch of imagination that 2^n-1 inputs might work instead of 2^n inputs with different magic number and lookup table.

Here, I made a simplified version of my magic number generator code. It checks all possible magic numbers in 5 minutes if we check only for power of two inputs, finding 1024 magic numbers. Checking against other inputs are pointless since they are reduced to 2^n-1 form anyway. Does not construct the table, but it is trivial once you know the magic number.

#include <Frigo/all>
#include <Frigo/all.cpp>

using namespace Frigo::Lang;
using namespace std;

class MagicNumberGenerator
{

    public:

        static const int32 log2n = 5;
        static const int32 n = 1 << log2n;
        static const bool tryZero = false;

        MagicNumberGenerator () {}

        void tryAllMagic ()
        {
            for( int32 magic = 0; magic < Integer::MAX_VALUE; magic++ ){
                tryMagic(magic);
            }
            tryMagic(Integer::MAX_VALUE);
            for( int32 magic = Integer::MIN_VALUE; magic < 0; magic++ ){
                tryMagic(magic);
            }
        }

        bool tryMagic (int32 magic)
        {
            //  clear table
            for( int32 i = 0; i < n; i++ ){
                table[i] = -1;
            }
            //  try for zero
            if( tryZero and not tryInput(magic, 0) ){
                return false;
            }
            //  try for all power of two inputs, filling table quickly in the process
            for( int32 i = 0; i < 32; i++ ){
                if( not tryInput(magic, 1 << i) ){
                    return false;
                }
            }
            //  here we would test all possible 32-bit inputs except zero, but it is pointless due to the reduction to 2^n-1 form
            //  we found a magic number
            cout << "Magic number found: 0x" << Integer::toHexString(magic) << endl;
            return true;
        }

        bool tryInput (int32 magic, int32 x)
        {
            //  calculate good answer
            int32 leadingZeros = goodNumberOfLeadingZeros(x);
            //  calculate scrambled but hopefully injective answer
            x |= x >> 1;
            x |= x >> 2;
            x |= x >> 4;
            x |= x >> 8;
            x |= x >> 16;
            x *= magic;
            x = Integer::unsignedRightShift(x, 32 - log2n);
            //  reject if answer is not injective
            if( table[x] != -1 ){
                return table[x] == leadingZeros;
            }
            //  store result for further injectivity checks
            table[x] = leadingZeros;
            return true;
        }

        static int32 goodNumberOfLeadingZeros (int32 x)
        {
            int32 r = 32;
            if( cast<uint32>(x) & 0xffff0000 ){
                x >>= 16;
                r -= 16;
            }
            if( x & 0xff00 ){
                x >>= 8;
                r -= 8;
            }
            if( x & 0xf0 ){
                x >>= 4;
                r -= 4;
            }
            if( x & 0xc ){
                x >>= 2;
                r -= 2;
            }
            if( x & 0x2 ){
                x >>= 1;
                r--;
            }
            if( x & 0x1 ){
                r--;
            }
            return r;
        }

        int32 table[n];

};

int32 main (int32 argc, char* argv[])
{
    if(argc||argv){}
    measure{
        MagicNumberGenerator gen;
        gen.tryAllMagic();
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...