Showing posts with label math. Show all posts
Showing posts with label math. Show all posts

[ next | prev | up ]

Bi-quinary and other bases

Doug McIlroy's talk on History of Computing at Bell Labs mentioned that George Stibitz's relay-based decimal calculators that stored digits in what he called bi-quinary: “Arithmetic was done in bi-quinary, a two out of five representation for decimal integers, and if there weren't exactly two out of five relays activated it would stop.” This allowed a sequence of computing jobs to run unattended over the weekend: if the computer detected an error (not exactly two relays active in some digit), then it would stop the current job and start over.

McIlroy's description of the number system as being 5 bits differs from any of the representations given by the Wikipedia entries for bi-quinary coded decimal and biquinary, which describe a system that is basically unary: 5 bits to count 0 through 4 (only 1 bit on at a time) and two top bits: 01 = 0, 10 = 5. Other historical summaries you can find online by searching for Stibitz and bi-quinary seem to back up Wikipedia. In this version, bi-quinary was 1-of-2 plus 1-of-5, not 2-of-5. In the 7-bit version, the bit positions are worth 5, 0, 4, 3, 2, 1, and 0, which isn't very interesting. It's far more interesting to think about the 2-out-of-5 system.

The valid codes in a 2-of-5 system like McIlroy described would be 00011, 00101, 00110, 01001, 01010, 01100, 10001, 10010, 10100, 11000. If you assign these the values 0 through 9, you can set up a system of linear equations and solve them to find the value of each of the five bit positions b4, b3, b2, b1, and b0. The fact that 00011 = 0 means that b1 + b0 = 0. The fact that 00110 - 00101 = 2 - 1 = 1 means that b1 - b0 = 1. These two equalities imply b1 = ½ and b0 = -½. Given that you can pick out the other bits fairly easily: from right to left the "places" are -½, ½, 1½, 3½, and 6½. This works for 0 through 8 but then, sadly, fails for 9: 11000 = 6½ + 3½ = 10. No other code assignment has a solution either (I wrote a small program to try them all.)

I guess that's why the 1-of-2 plus 1-of-5 form was more popular than 2-of-5: it was probably easier to design the arithmetic circuits if each bit could be assigned a particular value. It's too bad, because 2-of-5 would have mathematical elegance on its side if it could be made to work.

If you dig around, you can find other esoteric number representations: negabinary has base -2 (negative two), so the positional values are 1, -2, 4, -8, and so on. Thus 5 is 101 but 3 is 111 = 1-2+4. Negabinary appears to have been first invented by Vittorio Grunwald in 1885. It was used in a few experimental computers in the 1950s, but it didn't catch on. Even weirder than negabinary numbers are the quater-imaginary numbers, invented by a high school student named Donald Knuth in 1955. Those use base 2i (where i is the imaginary square root of -1). (Knuth is also responsible for the negabinary history.)

Odder still are balanced ternary (ternary but digits are -1, 0, and 1) and Bubble Babble, but no discussion of interesting counting systems or high school math nerds would be complete without mentioning counting on your fingers in binary.

Leaping Years and Drawing Lines

Some of the earliest mathematics was invented to design calendars. Since the Earth rotates around its axis about 365.25 times per revolution around the sun, some years in a solar calendar need to be 365 days long and others 366 days long in order to keep the calendar in sync with the sun. The key challenge in designing a calendar is to find a pattern of short and long years that achieve the desired approximation. The Julian calendar uses the simple pattern of one leap year every four years. The Islamic calendar, which syncs against the moon, needs to have years approximating 354.37 days per year. To do this, it inserts 11 leap years over a 30 year cycle.

Bresenham's algorithm chooses which pixels to use when drawing a bitmap representation of a line. The basic idea, for a mostly horizontal line that angles upward, is to move left to right, one pixel at a time, moving upward when the distance between the actual y and the pixel-approximate y has grown too large.

drawline(dx, dy)    /* mostly horizontal line, upward line */
    x = 0
    y = 0
    error = 0
    while (x < dx || y < dy)
        drawpoint(x, y);
        x++;
        if (x*(dy/dx) - y > 0.5)
            y++;
    drawpoint(x, y);

For example, drawing a line 30 pixels long and 12 pixels high produces

In their 2004 paper “Line Drawing, Leap Years, and Euclid” (PDF), Mitchell Harris and Edward Reingold show that the problem of choosing a leap year pattern is a generalized version of Bresenham's line drawing problem. This is surprising, but makes intuitive sense when you realize that both are concerned with computing an integer approximation to a rational fraction as evenly as possible. Leap year computations are basically drawing a line with slope 1/365.25 (or 1/364.37): each pixel is a day, and its y coordinate determines which year it belongs to.

Harris and Reingold go on to show that Euclid's algorithm can compute leap year patterns! Euclid's algorithm usually computes the greatest common divisor of two numbers:

gcd(u, v)
    while (u != v)
        if (u > v)
            u = u mod v;
        else
            v = v mod u;
    return u;

The connection is that if you're trying to draw a line dx by dy but dx and dy have some common divisor d, then the Bresenham algorithm will simply emit the pattern for a line dx/d by dy/d, d times. The example shown above can be viewed as two 15x6 lines, three 10x4 lines, or six 5x2 lines:

Euclid's algorithm gives the largest possible d, and you can adapt the algorithm to build the associated repeating pattern while it computes d.

In Volume 2, Donald Knuth calls it “the granddaddy of all algorithms, because it is the oldest nontrivial algorithm that has survived to the present day.” It also seems to have some famous children.


Further reading:

Bresenham described his algorithm in his 1965 paper “Algorithm for computer control of a digital plotter

Godfried Toussaint's 2005 paper “The Euclidean algorithm generates traditional musical rhythms” shows yet another pattern connection, this time to rhythms.

The Gregorian calendar is an evolved version of the Julian calendar and can't be viewed as a Bresenham line. However, Jeffrey Shallit's 1994 paper “Pierce expansions and rules for the determination of leap years” shows how to compute leap years using Pierce expansions, and this model can generate the Gregorian calendar too.

Edward Reingold and Nashum Dershowitz are the experts on calendar algorithms. Their pair of papers 1990 Calendrical Calculations and 1993 Calendrical Calculations, II: Three Historical Calendars gives algorithms (and Lisp code) for essentially any date-related computation you might want to do. They expanded upon these papers in their books Calendrical Calculations: The Millennium Edition and Calendrical Tabulations: 1900-2200.

Elegance and Power

These are the most elegant programs I've ever seen.

A power series is an infinite-degree polynomial. For example,


e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \frac{x^4}{4!} + \cdots

Computation and manipulation of power series has a long and distinguished history as a test of so-called stream processing systems, which manipulate (arbitrarily long finite prefixes of) infinite streams. In the 1970s, Gilles Kahn and David MacQueen used power series as an unpublished test of their coroutine-based stream-processing system. Hal Abelson and Gerald Sussman devote Section 3.5 of their well-known 1985 textbook Structure and Interpretation of Computer Programs to stream processing, although only a single exercise mentions power series. In the late 1980s, Doug McIlroy explored power series processing in the context of Rob Pike's concurrent programming language Newsqueak. He published the work in his 1990 paper “Squinting at Power Series” (the Newsqueak interpreter is named squint).

Stream processing requires lazy evaluation to keep computations finite. All of these implementations built frameworks for lazy evaluation atop other building blocks: concurrent processes (Kahn and MacQueen, McIlroy) or first-class functions (Abelson and Sussman). It's only natural, then, to consider what the implementations would look like in a language with explicit support and encouragement for lazy evaluation, a language like Haskell. McIlroy revisited the topic in the context of Haskell in his 1998 Functional Pearl, “Power Series, Power Serious” (gzipped PS).

Symbolic manipulations can represent a power series as just a list of integer coefficients. Mathematically, this is equivalent to writing the polynormial in Horner form, which repeatedly factors x out of the remainder of the polynomial: F = f0 + xF1, or in Haskell, (f:fs) (= f + x fs).



In this form, ex is [1, 1, 1%2, 1%6, 1%24, 1%120, ...].

As a simple example, McIlroy's addition and multiplication implementations mimic the usual rules:


(f0 + xF1) + (g0 + xG1) = (f0 + g0) + x(F1 + G1)


  (f:fs) + (g:gs) = f+g : fs+gs

(f0 + xF1) (g0 + xG1) = f0g0 + x(f0G1 + g0F1) + x2(F1G1)


{- (f:fs) * (g:gs) = (f*g : 0) + (0 : f.*gs + g.*fs) + (0 : 0 : fs*gs) -}
   (f:fs) * (g:gs) = f*g : (f.*gs + g.*fs + (0 : fs*gs))
       c .* (f:fs) = c.*f : c.*fs

The commented-out definition is a more literal, but less efficient, translation of the equation.

Haskell's pattern matching and operator overloading are responsible for the elegance of the presentation, but it is lazy evaluation that is responsible for the elegance of the computation. The underlying lazy computation mechanism (whether provided directly, as in Haskell, or via other primitives, as in Scheme and Newsqueak) takes care of all the bookkeeping necessary to compute only the terms needed.

McIlroy implements derivative and integral operators as well. Both rely on a helper function to keep track of the leading multiplicative constant:


deriv (f:fs) = (deriv1 fs 1) where
    deriv1 (g:gs) n = n*g : (deriv1 gs (n+1))

integral fs = 0 : (int1 fs 1) where
    int1 (g:gs) n = g/n : (int1 gs (n+1))

These definitions enable elegant definitions of the power series for exp, sin, and cos:


expx = 1 + (integral expx)

sinx = integral cosx
cosx = 1 - (integral sinx)

To me, these three equations programs are beautifully elegant, almost magical.

When learning recursion in an introductory programming course (or induction in an introductory math course), it is common to feel like there's no actual work going on: one case just got rewritten in terms of others. The base cases, of course, provide the foundation on which the others are built. Learning to determine which base cases are necessary given the recursive steps is the key to being comfortable with recursion. Only then can you see that the program or proof is structurally sound.

The recursion in the definitions above gives me the same introductary queasiness, because it is hard to see the base cases. Upon closer inspection, the base case is in the expansion of integral, which begins with a constant zero term, making it possible to determine the first term in each of those series without further recursion. Having the first term makes it possible to find the second term, and so on.

For me, understanding why they work only enhances the beauty of these programs.

McIlroy continues the theme in his 2000 ICFP paper “The Music of Streams” (gzipped PS), which contains even more examples but fewer programs.

I've extracted runnable code from the paper. McIlroy maintains equivalent code. McIlroy's version uses these shorter versions of integral and deriv, which will delight Haskell aficionados:


int fs = 0 : zipWith (/) fs [1..]
diff (_:ft) = zipWith (*) ft [1..]

For the numerical computing aficionados, there is a one-line implementation of power series reversion:


revert (0:ft) = rs where rs = 0 : 1/(ft#rs)

(For comparison, Donald Knuth devotes Section 4.7 of Volume 2, about 10 pages, to the manipulation of power series. The equivalent reversion implementation takes about half a page.)

Alpha Compositing

The notion of an alpha channel carrying the opacities of a digital image is so ordinary now in computer graphics that it may surprise you to discover that it had to be invented. The idea, however, is much deeper than even its inventors understood: Not only does the alpha channel solve the original problem of image compositing but it frees us from the tyranny of the rectangular image and paves the way for the digital convergence of the two main branches of computer picturing, the sampling theory and geometry branches.

Alvy Ray Smith and Ed Catmull invented the alpha channel in 1977. The excerpt above is from Smith's account.

Tom Porter and Tom Duff explored the algebra of the alpha channel in their 1984 SIGGRAPH paper “Compositing Digital Images,” in which they define a full suite of 16 compositing operators analogous to the 16 boolean bitblt operators. They also showed how to perform alpha computations efficiently by storing the color channels in “premultiplied alpha” form.

For the invention and development of the alpha channel, Smith, Catmull, Porter, and Duff won a technical Academy Award in 1996.

Jim Blinn gives a nice overview of compositing in his pair of 1994 columns (Part 1: Theory, Part 2: Practice; subscription required). Alvy Ray Smith also wrote two technical memos that clarify the subject: “Image Compositing Fundamentals” and “Alpha and the History of Digital Compositing.”

I Could Tell You, But ...

For two centuries, Benjamin Franklin had the final word on sharing secrets: “Three may keep a Secret, if two of them are dead.

Shamir's 1979 paper “How to share a secret” is a significant advance over Franklin's algorithm. It shows how to split a secret into n pieces such that any k pieces can be used to reconstruct the secret but k–1 cannot.

Best of all, the idea is dead simple. Assume the secret s is a number smaller than some prime p. Then pick random integer coefficients to create a degree-k–1 polynomial f(x) computed mod p. Set the x0 coefficient to the secret message s, so that f(0) = s. Then hand out the pairs (1, f(1)), (2, f(2)), ..., (n, f(n)) as the secret shares. Since any polynomial of degree k–1 is uniquely determined by k points, any k shares can be used to reconstruct the original polynomial and then compute f(0). But each set of k–1 shares is consistent with p different possible polynomials, each with a different f(0) value, and thus leak no information at all.

Shamir's idea is not quite as simple as Franklin's, but I'm confident Franklin would have understood it.

Martin Tompa and Heather Woll explain how to detect participants that lie about their shares in “How to Share a Secret with Cheaters.”

Superoptimizer

In 1987, Henry Massalin invented the “superoptimizer,” a program that finds the shortest possible instruction sequence for a given function. The superoptimizer works by sheer dumb luck: it checks whether every possible instruction sequence up to a given length produces the desired answer. The sequences that pass this test are not guaranteed to be an exact match for the function, but they usually are and can be checked analytically by a human.

As an example of what a superoptimizer can find, here is an Intel x86 instruction sequence that starts with a signed 16-bit number in ax and ends with the sign of that number (-1, 0, or 1) in dx.

cwd
neg ax
adc dx, dx

Massalin gives this example in his 1987 paper, “Superoptimizer: a look at the smallest program” (subscription required, sadly).

Superoptimizers are useful for compiler writers, but perhaps more as a source of amusement than as a time-saving device. One important way that they can be useful is to find sequences that avoid conditional branches, which get more and more expensive as processor pipelines deepen. Torbjörn Granlund and Richard Kenner used a superoptimizer to eliminate branches in gcc's output; see their paper “Eliminating Branches using a Superoptimizer and the GNU C Compiler.”

The Wikipedia entry for Superoptimization gives links to implementations.

Division via Multiplication

Division is the most expensive integer operation you can ask of your CPU. A well-known optimization rewrites division by constant powers of two as bit shifts: if x is an unsigned integer, then x>>8 is a fast way to compute x/256.

Division by constants that aren't powers of two are a little harder. Jim Blinn dedicated his November 1995 column (“Three Wrongs Make a Right”; subscription required) in IEEE Computer Graphics and Applications to tricks for computing x/255 for 16-bit x (a common computation during alpha compositing). Since 1/255 is approximately the same as 257/65536, you can get pretty close to x/255 by computing x*257/65536 (aka (x*257)>>16)). 257/65536 is a tiny bit bigger smaller than 1/255, so the approximation is sometimes too small by 1. Adding 257/65536 to nudge the values slightly upward results in a perfect approximation: for unsigned 16-bit x, x/255 = (x*257+257)>>16. No division!

In 1991, Robert Alverson described how to determine these approximations for the general case of arbitrary integer constants (“Integer Division Using Reciprocals”). Alverson was on the team building the 64-bit Tera Computer; they used the technique as the hardware implementation of integer division.

The Tera Computer didn't exactly catch on, but a few years later, Torbjörn Granlund and Peter Montgomery, inspired by Alverson's paper, implemented the technique in the GNU C compiler (gcc) for division by integer constants (“Division by Invariant Integers Using Multiplication”). Gcc still uses this technique, as do other compilers. Gcc rewrites a 32-bit unsigned x/255 into (x*0x80808081)>>39 (the intermediate values are 64-bit integers).

Henry S. Warren's delightfully bit-twiddly book Hacker's Delight devotes 46 pages to this topic. His web site includes a magic number computer.