Thoughts and links about programming, by

Division via Multiplication
Posted on Monday, January 14, 2008.

Division is the most expensive integer operation you can ask of your CPU. A well-known optimization rewrites division by constant powers of two as bit shifts: if x is an unsigned integer, then x>>8 is a fast way to compute x/256.

Division by constants that aren't powers of two are a little harder. Jim Blinn dedicated his November 1995 column (“Three Wrongs Make a Right”; subscription required) in IEEE Computer Graphics and Applications to tricks for computing x/255 for 16-bit x (a common computation during alpha compositing). Since 1/255 is approximately the same as 257/65536, you can get pretty close to x/255 by computing x*257/65536 (aka (x*257)>>16)). 257/65536 is a tiny bit bigger smaller than 1/255, so the approximation is sometimes too small by 1. Adding 257/65536 to nudge the values slightly upward results in a perfect approximation: for unsigned 16-bit x, x/255 = (x*257+257)>>16. No division!

In 1991, Robert Alverson described how to determine these approximations for the general case of arbitrary integer constants (“Integer Division Using Reciprocals”). Alverson was on the team building the 64-bit Tera Computer; they used the technique as the hardware implementation of integer division.

The Tera Computer didn't exactly catch on, but a few years later, Torbj√∂rn Granlund and Peter Montgomery, inspired by Alverson's paper, implemented the technique in the GNU C compiler (gcc) for division by integer constants (“Division by Invariant Integers Using Multiplication”). Gcc still uses this technique, as do other compilers. Gcc rewrites a 32-bit unsigned x/255 into (x*0x80808081)>>39 (the intermediate values are 64-bit integers).

Henry S. Warren's delightfully bit-twiddly book Hacker's Delight devotes 46 pages to this topic. His web site includes a magic number computer.

(Comments originally posted via Blogger.)

  • Taj (January 14, 2008 11:24 AM) s/tiny bit bigger/tiny bit smaller/ ?

  • Russ Cox (January 14, 2008 11:54 AM) Fixed, thanks Taj.

  • knowak (August 17, 2009 11:34 AM) Very insightful and a good starting point to explore the topic further. Thanks!

  • Ralph Corderoy (July 4, 2010 6:01 AM) And on some architectures, the resulting x * 257 step can be done cheaply with an add and a barrel-shifter, e.g. ARM's "add r0, r0, r0 lsl #8", r0 = r0 + (r0 << 8), avoiding a multiply.

  • Anonymous (January 20, 2011 5:07 AM) 66413.....25989