assembly - Assembler task : translations

Question

Welcome To Ask or Share your Answers For Others

assembly - Assembler task : translations

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:51+0000

Anyway, about your question... the first loop doesn't have dependency inside the loop, i.e. each iteration of loop is independent and can be calculated ASAP (actually all except the last iteration can be just thrown away, because they don't affect return value at all).

The second loop for each iteration depends on previous result, so the CPU has to wait with each next imul until the previous result is ready. The imul on modern x86 has still throughput about 1.0 I guess, but the latency may be above 1.0 and not sure what the dependency will do, depends completely on your target CPU platform, which you didn't specify. (somebody like Peter Cordes can surely answer this for particular modern Intel micro architectures, or you can read yourself Agner's tables, but as you didn't specify target architecture, I don't see point in making any particular real world example, for me this general chit-chat level is enough)

For example on 80386 I guess the second loop would be faster, because it has less instructions, and 80386 was still quite "simple" inside, with imul taking several clocks in either case. On latest Intel CPUs the dependency will probably just so-so skew it in favour of first one, but not much, as imul is reasonably fast today.

Anyway, this is nice example how sorting out your algorithm first, and tuning that, will give you the biggest performance gain, as the first loop is not a real loop, and writing it as simple formula will make the code even faster.

Curiously enough, I tried in godbolt explorer, what modern compilers do about it, and gcc does some quite convoluted thing to read through each array member, or what exactly does that wall of instructions do (too lazy to check in debugger), while clang compiler does see through it and produces the simplified formula instead: https://godbolt.org/g/p2MGHs

P.S. the first loop can be simplified down to:

int loop1_fix(int *a, int x, int n) {
    if (0 < n) return a[n-1]*x*x*x*x;
    else return x*x*x;
}

Categories

assembly - Assembler task : translations

assembly - Assembler task : translations

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags