[ next | up ]

On Duff's Device and Coroutines

At first glance, Duff's Device is one of the most mysterious pieces of C code you'll ever see:

    void
    send(short *to, short *from, int count)
    {
        int n=(count+7)/8;
        switch(count%8){
        case 0: do{ *to = *from++;
        case 7:     *to = *from++;
        case 6:     *to = *from++;
        case 5:     *to = *from++;
        case 4:     *to = *from++;
        case 3:     *to = *from++;
        case 2:     *to = *from++;
        case 1:     *to = *from++;
                }while(--n>0);
        }
    }

It's an 8x-unrolled while loop interlaced with a switch statement. The switch jumps into the middle of the while loop to handle the part of the count that isn't a multiple of 8. Once inside the while loop, the switch logic never executes again.

Tom Duff first described Duff's Device in a November 1983 email to Ron Gomes, Dennis Ritchie, and Rob Pike. It spread to a larger group when he revised the note and posted it to net.lang.c in May 1984. The 1984 message is the less frequently reproduced but is the one in which Duff gave it a name: “I think I'll name it after myself — ‘Duff's Device’ has a nice ring to it.” Bjarne Stroustroup used a variant as an example in The C++ Programming Language. Eventually Duff posted the 1983 message along with commentary to Usenet in 1988, in response to a heated debate about the merits of Duff's Device as an idiom.

Duff's Device is rarely worth reusing, but in the specific case that it was written for, it was an apt tool for the job. The more amazing thing about it is that it's valid C. Since a switch statement is just a computed goto and the case labels are just labels, switching into the middle of a while is just as legal as goto'ing there. In Duff's description, he wrote:

It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully. (Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)

In 2000, Simon Tatham wrote about a C coroutine implementation based on this idea, to make callback-based event-driven programming easier. (He used it in his Windows SSH client, PuTTY.) A switch-based implementation has the advantage of being portable and relatively simple to emit using preprocessor macros, though it doesn't allow the use of other switch statements in the code.

A year ago, I asked Duff if that kind of coroutine implementation is what he had in mind, and he confirmed that it was, pointing at a blog comment and adding:

I only ever did this twice: once for a coroutinish application in an SDLC driver (for UNIX to IBM OS/360 communication) in the 1970s and once for loop unwinding in 1983. Both were desperate situations. I have never felt the need to stoop so low since then.

(Today, of course, there are few excuses not to use threads, or at the very least, coroutines with real stacks. If you are stuck with callbacks and events, something like Max Krohn's tame preprocessor is a cleaner solution than macros.)

Alpha Compositing

The notion of an alpha channel carrying the opacities of a digital image is so ordinary now in computer graphics that it may surprise you to discover that it had to be invented. The idea, however, is much deeper than even its inventors understood: Not only does the alpha channel solve the original problem of image compositing but it frees us from the tyranny of the rectangular image and paves the way for the digital convergence of the two main branches of computer picturing, the sampling theory and geometry branches.

Alvy Ray Smith and Ed Catmull invented the alpha channel in 1977. The excerpt above is from Smith's account.

Tom Porter and Tom Duff explored the algebra of the alpha channel in their 1984 SIGGRAPH paper “Compositing Digital Images,” in which they define a full suite of 16 compositing operators analogous to the 16 boolean bitblt operators. They also showed how to perform alpha computations efficiently by storing the color channels in “premultiplied alpha” form.

For the invention and development of the alpha channel, Smith, Catmull, Porter, and Duff won a technical Academy Award in 1996.

Jim Blinn gives a nice overview of compositing in his pair of 1994 columns (Part 1: Theory, Part 2: Practice; subscription required). Alvy Ray Smith also wrote two technical memos that clarify the subject: “Image Compositing Fundamentals” and “Alpha and the History of Digital Compositing.”

Play Tic-Tac-Toe with Knuth

Section 7.1.2 of the Volume 4 pre-fascicle 0A of Donald Knuth's The Art of Computer Programming is titled “Boolean Evaluation.” In it, Knuth considers the construction of a set of nine boolean functions telling the correct next move in an optimal game of tic-tac-toe. In a footnote, Knuth tells this story:

This setup is based on an exhibit from the early 1950s at the Museum of Science and Industry in Chicago, where the author was first introduced to the magic of switching circuits. The machine in Chicago, designed by researchers at Bell Telephone Laboratories, allowed me to go first; yet I soon discovered there was no way to defeat it. Therefore I decided to move as stupidly as possible, hoping that the designers had not anticipated such bizarre behavior. In fact I allowed the machine to reach a position where it had two winning moves; and it seized both of them! Moving twice is of course a flagrant violation of the rules, so I had won a moral victory even though the machine had announced that I had lost.

That story alone is fairly amusing. But turning the page, the reader finds a quotation from Charles Babbage's Passages from the Life of a Philosopher, published in 1864:

I commenced an examination of a game called “tit-tat-to” ... to ascertain what number of combinations were required for all the possible variety of moves and situations. I found this to be comparatively insignificant. ... A difficulty, however, arose of a novel kind. When the automaton had to move, it might occur that there were two different moves, each equally conducive to his winning the game. ... Unless, also, some provision were made, the machine would attempt two contradictory motions.

The only real winning move is not to play.

Debugging the Universe

Every programmer knows what debugging is. Given a program that isn't behaving as expected, you slowly refine your understanding of both the program and the anomalous behavior until you understand exactly why the two aren't in agreement. Sometimes the bug is in the program, other times in your understanding of the program. Then you fix it.

Physicists debug the universe. When the universe doesn't behave as expected, they debug it, trying to reconcile their understanding of the universe and what they are seeing. The difference is that the bug, by definition, is always in their understanding and never in the universe.

During the development of the Global Positioning System (GPS), the physicists and programmers had to debug general relativity. To tell the story, you need to know a tiny bit about how GPS works.

Each GPS satellite broadcasts a known pseudo-random number sequence. A simple GPS receiver has its own pseudo-random sequence generator that is synced with the satellites. By timing how far “behind” the satellite sequences appear to be compared to the receiver's time, the receiver can determine how far away they are. Using the known positions of and distances to three satellites, a GPS receiver can triangulate its position in three-dimensional space. If the GPS receiver's clock is not exactly in sync with the satellites, it can use readings from four satellites to triangulate its position in four-dimensional space-time, synchronizing its clock in the process.

All this assumes that the clocks in the satellites are running at the same speed as the clocks on the Earth, but the satellites are literally running circles around the Earth; at those speeds, relativity kicks in and unexpected behaviors emerge. This was actually something the GPS engineers had to consider. Peter Galison tells the story better than I can:

According to relativity, satellites that were orbiting the earth at 12,500 miles per hour ran their clocks slow (relative to the earth) by 7 millionths of a second per day. Even general relativity (Einstein's theory of gravity) had to be programmed into the system. Eleven thousand miles in space, where the satellites orbited, general relativity predicted that the weaker gravitational field would leave the satellite clocks running fast (relative to the earth's surface) by 45 millionths of a second per day. Together, these two corrections add up to a staggering correction of 38 millionths (that is, 38,000 billionths) of a second per day in a GPS system that had to be accurate to within 50 billionths of a second each day. Before the first cesium atomic clock launch in June 1977, some GPS engineers were sufficiently dubious about these enormous relativistic effects to insist that the satellite's atomic clock broadcast its time “raw.” Its relativity-correction mechanism idled onboard. Down came the signal, running fast over the first twenty-four hours almost precisely by the predicted 38,000 billionths of a second. After twenty days of such gains, ground control commanded the frequency synthesizer to activate, correcting the broadcast time signal. Without that relativistic correction, it would have taken less than two minutes for the GPS system to exceed its allowable [daily] error.


(From Peter Galison, Einstein's Clocks, Poincaré's Maps pp. 288-289.)

I heartily recommend Galison's book, a history of the development of the physical concept of time throughout the twentieth century. Galison has doctorates in both physics and the history of science; using his dual expertise he makes the material accessible to dual laymen.

For a more technical account, the book's endnotes cite Neil Ashby, “General Relativity in the Global Positioning System

I Could Tell You, But ...

For two centuries, Benjamin Franklin had the final word on sharing secrets: “Three may keep a Secret, if two of them are dead.

Shamir's 1979 paper “How to share a secret” is a significant advance over Franklin's algorithm. It shows how to split a secret into n pieces such that any k pieces can be used to reconstruct the secret but k–1 cannot.

Best of all, the idea is dead simple. Assume the secret s is a number smaller than some prime p. Then pick random integer coefficients to create a degree-k–1 polynomial f(x) computed mod p. Set the x0 coefficient to the secret message s, so that f(0) = s. Then hand out the pairs (1, f(1)), (2, f(2)), ..., (n, f(n)) as the secret shares. Since any polynomial of degree k–1 is uniquely determined by k points, any k shares can be used to reconstruct the original polynomial and then compute f(0). But each set of k–1 shares is consistent with p different possible polynomials, each with a different f(0) value, and thus leak no information at all.

Shamir's idea is not quite as simple as Franklin's, but I'm confident Franklin would have understood it.

Martin Tompa and Heather Woll explain how to detect participants that lie about their shares in “How to Share a Secret with Cheaters.”

Play Chess with God

Starting in the late 1970s, Ken Thompson proved by example that computers could completely analyze chess endgames, which involve only a small number of pieces. Thompson's key insight was that when there are only a few pieces on the board, there might be very many possible move sequences, but there are relatively few distinct board positions. An exhaustive search over board positions can run efficiently even though an exhaustive search over move sequences would not. (The same idea underlies Thompson's efficient parallel NFA simulation for regular expression matching.)

Thompson's first endgame analyses ran on boards with only four pieces, but over time he took advantage of improvements in processor speeds and storage sizes to analyze five- and six-piece endgames as well. Thompson described the programs in the ICCA Journal: his 1986 article “Retrograde Analysis of Certain Endgames” (PDF; 1MB) introduces the technique and covers five-piece endgames, and his 1996 article “6-Piece Endgames” (PDF; 1MB) describes refinements and optimizations for six pieces.

Writing about a 262-move optimal ‘rook and knight versus two knights’ game Thompson found in the 6-piece endgames, former championship chess player Tim Krabbé explained:

Playing over these moves is an eerie experience. They are not human; a grandmaster does not understand them any better than someone who has learned chess yesterday. The knights jump, the kings orbit, the sun goes down, and every move is the truth. It's like being revealed the Meaning of Life, but it's in Estonian.
On his web page, Thompson links to the online 6-piece endgame database as “Play Chess with God.” The CD art for the five-piece databases used a similar theme.


Further reading: With Joe Condon, Thompson built the chess playing computer Belle, which won the ACM North American Computer Chess Championship five times. Belle was later confiscated by U. S. Customs on its way to a contest in Moscow, on suspicion of being of military use. Thompson was quoted as saying that Belle's only military use would be “to drop it out of an airplane. You might kill somebody that way.”

Thompson's work in computer chess was featured in a Computer History Museum exhibit, which included a video of Thompson telling his story. (Beware that in the transcript, ‘endgames’ is usually written as ‘n games.’)

Superoptimizer

In 1987, Henry Massalin invented the “superoptimizer,” a program that finds the shortest possible instruction sequence for a given function. The superoptimizer works by sheer dumb luck: it checks whether every possible instruction sequence up to a given length produces the desired answer. The sequences that pass this test are not guaranteed to be an exact match for the function, but they usually are and can be checked analytically by a human.

As an example of what a superoptimizer can find, here is an Intel x86 instruction sequence that starts with a signed 16-bit number in ax and ends with the sign of that number (-1, 0, or 1) in dx.

cwd
neg ax
adc dx, dx

Massalin gives this example in his 1987 paper, “Superoptimizer: a look at the smallest program” (subscription required, sadly).

Superoptimizers are useful for compiler writers, but perhaps more as a source of amusement than as a time-saving device. One important way that they can be useful is to find sequences that avoid conditional branches, which get more and more expensive as processor pipelines deepen. Torbjörn Granlund and Richard Kenner used a superoptimizer to eliminate branches in gcc's output; see their paper “Eliminating Branches using a Superoptimizer and the GNU C Compiler.”

The Wikipedia entry for Superoptimization gives links to implementations.

Division via Multiplication

Division is the most expensive integer operation you can ask of your CPU. A well-known optimization rewrites division by constant powers of two as bit shifts: if x is an unsigned integer, then x>>8 is a fast way to compute x/256.

Division by constants that aren't powers of two are a little harder. Jim Blinn dedicated his November 1995 column (“Three Wrongs Make a Right”; subscription required) in IEEE Computer Graphics and Applications to tricks for computing x/255 for 16-bit x (a common computation during alpha compositing). Since 1/255 is approximately the same as 257/65536, you can get pretty close to x/255 by computing x*257/65536 (aka (x*257)>>16)). 257/65536 is a tiny bit bigger smaller than 1/255, so the approximation is sometimes too small by 1. Adding 257/65536 to nudge the values slightly upward results in a perfect approximation: for unsigned 16-bit x, x/255 = (x*257+257)>>16. No division!

In 1991, Robert Alverson described how to determine these approximations for the general case of arbitrary integer constants (“Integer Division Using Reciprocals”). Alverson was on the team building the 64-bit Tera Computer; they used the technique as the hardware implementation of integer division.

The Tera Computer didn't exactly catch on, but a few years later, Torbjörn Granlund and Peter Montgomery, inspired by Alverson's paper, implemented the technique in the GNU C compiler (gcc) for division by integer constants (“Division by Invariant Integers Using Multiplication”). Gcc still uses this technique, as do other compilers. Gcc rewrites a 32-bit unsigned x/255 into (x*0x80808081)>>39 (the intermediate values are 64-bit integers).

Henry S. Warren's delightfully bit-twiddly book Hacker's Delight devotes 46 pages to this topic. His web site includes a magic number computer.

Scooping the Loop Snooper

an elementary proof of the undecidability of the halting problem
by Geoffrey K. Pullum
Mathematics Magazine, October 2000

No program can say what another will do.
Now, I won't just assert that, I'll prove it to you:
I will prove that although you might work til you drop,
you can't predict whether a program will stop.

Imagine we have a procedure called P
that will snoop in the source code of programs to see
there aren't infinite loops that go round and around;
and P prints the word “Fine!” if no looping is found.

You feed in your code, and the input it needs,
and then P takes them both and it studies and reads
and computes whether things will all end as they should
(as opposed to going loopy the way that they could).

Well, the truth is that P cannot possibly be,
because if you wrote it and gave it to me,
I could use it to set up a logical bind
that would shatter your reason and scramble your mind.

Here's the trick I would use – and it's simple to do.
I'd define a procedure – we'll name the thing Q –
that would take a program we will call P (of course!)
to tell if it looped, by reading the source;

And if so, Q would simply print “Loop!” and then stop;
but if no, Q would go right back to the top,
and start off again, looping endlessly back,
til the universe dies and is frozen and black.

And this program called Q wouldn't stay on the shelf;
I would run it, and (fiendishly) feed it itself.
What behaviour results when I do this with Q?
When it reads its own source, just what will it do?

If P warns of loops, Q will print “Loop!” and quit;
yet P is supposed to speak truly of it.
So if Q's going to quit, then P should say, “Fine!” –
which will make Q go back to its very first line!

No matter what P would have done, Q will scoop it:
Q uses P's output to make P look stupid.
If P gets things right then it lies in its tooth;
and if it speaks falsely, it's telling the truth!

I've created a paradox, neat as can be –
and simply by using your putative P.
When you assumed P you stepped into a snare;
Your assumptions have led you right into my lair.

So, how to escape from this logical mess?
I don't have to tell you; I'm sure you can guess.
By reductio, there cannot possibly be
a procedure that acts like the mythical P.

You can never discover mechanical means
for predicting the acts of computing machines.
It's something that cannot be done. So we users
must find our own bugs; our computers are losers!

If you ever took a theory of computation course and had to write undecidability proofs, at least they didn't have to rhyme.

Crabs, the bitmap terror!

Today, window systems seem as inevitable as hierarchical file systems, a fundamental building block of computer systems. But it wasn't always that way. This paper could only have been written in the beginning, when everything about user interfaces was up for grabs.

A bitmap screen is a graphic universe where windows, cursors and icons live in harmony, cooperating with each other to achieve functionality and esthetics. A lot of effort goes into making this universe consistent, the basic law being that every window is a self contained, protected world. In particular, (1) a window shall not be affected by the internal activities of another window. (2) A window shall not be affected by activities of the window system not concerning it directly, i.e. (2.1) it shall not notice being obscured (partially or totally) by other windows or obscuring (partially or totally) other windows, (2.2) it shall not see the image of the cursor sliding on its surface (it can only ask for its position).

Of course it is difficult to resist the temptation to break these rules. Violations can be destructive or non-destructive, useful or pointless. Useful non-destructive violations include programs printing out an image of the screen, or magnifying part of the screen in a lens window. Useful destructive violations are represented by the pen program, which allows one to scribble on the screen. Pointless non-destructive violations include a magnet program, where a moving picture of a magnet attracts the cursor, so that one has to continuously pull away from it to keep working. The first pointless, destructive program we wrote was crabs.

As the crabs walk over the screen, they leave gray behind, “erasing” the apps underfoot:

For the rest of the story, see Luca Cardelli's “Crabs: the bitmap terror!” (6.7MB). Additional details in “Crabs (History and Screen Dumps)” (57.1MB).

Bitblt

In the 1980s, bitmap graphics used a powerful primitive called bitblt. Bitblt combined a rectangle of a source image with a similarly-sized rectangle in a destination image using a boolean function and replaced the destination rectangle with the boolean result.

Dan Ingalls invented bitblt for the Alto workstation developed at Xerox PARC. Rob Pike and Bart Locanthi used bitblt as the graphics primitive and the name for the Blit, the first graphical terminal for Unix.

At the 1984 SIGGRAPH conference, Ingalls, Leo Guibas (also at Xerox PARC), and Pike gave a course on the history, use, and implementation of bitblt. The course notes from 1984 (pdf, 68 pages, 5MB) have a wealth of information about bitblt, including area-filling, bitmap rotation, bitmap magnification, implementation via on-the-fly code generation, line-drawing, text manipulation, and more.

Literate Programming with Lguest

Norman Ramsey defines literate programming as “the art of preparing programs for human readers.”

Donald Knuth invented literate programming when Tony Hoare (inventor of quicksort) challenged him to publish TeX (the program). In his article introducing literate programming, Knuth explained:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

You can't beat literate programs for completeness. In a good literate program, every line is explained in clear prose. Having to explain the program using English often forces the program itself to be more clearly written too.

Good, full-size literate programs are rare. Knuth's book TeX: The Program and Chris Fraser and David Hanson's book lcc, a Retargetable Compiler for ANSI C are two, but I'd love to hear about more.

Today's link is to the literate version of lguest, an up-and-coming x86 virtual machine monitor for Linux. To first approximation, lguest is like Xen or VMware but simpler.

Killing Quicksort

Quicksort is a worst-case O(n2) algorithm, but if you assume some randomness in either the input or in the decisions made by the algorithm itself, the worst case becomes exceedingly unlikely and the expected runtime becomes O(n log n). As a result, most people treat quicksort as an O(n log n) algorithm. The only wrinkle in this logic is that most quicksort implementations are entirely deterministic, so whether they run in O(n log n) time depends entirely on whether the input data is “random enough.” So there must exist inputs that force these quicksorts into quadratic behavior.

Doug McIlroy's 1999 paper “A Killer Adversary for Quicksort” describes a simple way to find these deadly inputs. The basic idea is that since the C library qsort calls a user-supplied comparison function as the only way it inspects the input, the comparison function can decide what the input is “on the fly,” making decisions that are consistent with previous comparisons but bad for qsort. See the paper for the elegant details.

The paper graphs the 64-element input that forces Digital's qsort to go quadratic:

McIlroy provides a simple C program illustrating the technique. The GNU C library quicksort goes quadratic on exactly the same kind of input as the Digital quicksort.


(A note of caution for those playing along at home: the GNU C library qsort function is actually mergesort if the C library thinks the computer has enough free memory; to force quicksort, call the undeclared _quicksort function and compile with gcc -static.)

Introduction

I am a computer programmer. I believe that there is an art to programming, and that programs and algorithms can be beautiful.

This blog is a collection of links about programming that I find interesting. By sharing them with you, I hope to attract others with similar tastes who will share new links with me. I hope this blog will be a kind of /nev/dull or Boing Boing for programmers, what programming.reddit.com could have been, if not for the voting system's inherent bias against links to material that requires thought and time to evaluate and appreciate.

For now, I intend to publish new blog posts on Mondays, Wednesdays, and Fridays. I have accumulated enough ideas for posts that I can keep the blog going for at least a few months at that rate to bootstrap it and build up a collection of material that will attract like-minded readers to help grow it.

The blog name, research!rsc (pronounced “research bang r s c”), was one of my earliest email addresses, a reminder of a time when the net was slower and signal-to-noise ratios were higher. Submission of new material to my current email address, rsc@swtch.com, is always welcome.