Showing posts with label compilers. Show all posts
Showing posts with label compilers. Show all posts

[ next | prev | up ]

Superoptimizer

In 1987, Henry Massalin invented the “superoptimizer,” a program that finds the shortest possible instruction sequence for a given function. The superoptimizer works by sheer dumb luck: it checks whether every possible instruction sequence up to a given length produces the desired answer. The sequences that pass this test are not guaranteed to be an exact match for the function, but they usually are and can be checked analytically by a human.

As an example of what a superoptimizer can find, here is an Intel x86 instruction sequence that starts with a signed 16-bit number in ax and ends with the sign of that number (-1, 0, or 1) in dx.

cwd
neg ax
adc dx, dx

Massalin gives this example in his 1987 paper, “Superoptimizer: a look at the smallest program” (subscription required, sadly).

Superoptimizers are useful for compiler writers, but perhaps more as a source of amusement than as a time-saving device. One important way that they can be useful is to find sequences that avoid conditional branches, which get more and more expensive as processor pipelines deepen. Torbjörn Granlund and Richard Kenner used a superoptimizer to eliminate branches in gcc's output; see their paper “Eliminating Branches using a Superoptimizer and the GNU C Compiler.”

The Wikipedia entry for Superoptimization gives links to implementations.

Division via Multiplication

Division is the most expensive integer operation you can ask of your CPU. A well-known optimization rewrites division by constant powers of two as bit shifts: if x is an unsigned integer, then x>>8 is a fast way to compute x/256.

Division by constants that aren't powers of two are a little harder. Jim Blinn dedicated his November 1995 column (“Three Wrongs Make a Right”; subscription required) in IEEE Computer Graphics and Applications to tricks for computing x/255 for 16-bit x (a common computation during alpha compositing). Since 1/255 is approximately the same as 257/65536, you can get pretty close to x/255 by computing x*257/65536 (aka (x*257)>>16)). 257/65536 is a tiny bit bigger smaller than 1/255, so the approximation is sometimes too small by 1. Adding 257/65536 to nudge the values slightly upward results in a perfect approximation: for unsigned 16-bit x, x/255 = (x*257+257)>>16. No division!

In 1991, Robert Alverson described how to determine these approximations for the general case of arbitrary integer constants (“Integer Division Using Reciprocals”). Alverson was on the team building the 64-bit Tera Computer; they used the technique as the hardware implementation of integer division.

The Tera Computer didn't exactly catch on, but a few years later, Torbjörn Granlund and Peter Montgomery, inspired by Alverson's paper, implemented the technique in the GNU C compiler (gcc) for division by integer constants (“Division by Invariant Integers Using Multiplication”). Gcc still uses this technique, as do other compilers. Gcc rewrites a 32-bit unsigned x/255 into (x*0x80808081)>>39 (the intermediate values are 64-bit integers).

Henry S. Warren's delightfully bit-twiddly book Hacker's Delight devotes 46 pages to this topic. His web site includes a magic number computer.