Showing posts with label notation. Show all posts
Showing posts with label notation. Show all posts

[ next | prev | up ]

Bi-quinary and other bases

Doug McIlroy's talk on History of Computing at Bell Labs mentioned that George Stibitz's relay-based decimal calculators that stored digits in what he called bi-quinary: “Arithmetic was done in bi-quinary, a two out of five representation for decimal integers, and if there weren't exactly two out of five relays activated it would stop.” This allowed a sequence of computing jobs to run unattended over the weekend: if the computer detected an error (not exactly two relays active in some digit), then it would stop the current job and start over.

McIlroy's description of the number system as being 5 bits differs from any of the representations given by the Wikipedia entries for bi-quinary coded decimal and biquinary, which describe a system that is basically unary: 5 bits to count 0 through 4 (only 1 bit on at a time) and two top bits: 01 = 0, 10 = 5. Other historical summaries you can find online by searching for Stibitz and bi-quinary seem to back up Wikipedia. In this version, bi-quinary was 1-of-2 plus 1-of-5, not 2-of-5. In the 7-bit version, the bit positions are worth 5, 0, 4, 3, 2, 1, and 0, which isn't very interesting. It's far more interesting to think about the 2-out-of-5 system.

The valid codes in a 2-of-5 system like McIlroy described would be 00011, 00101, 00110, 01001, 01010, 01100, 10001, 10010, 10100, 11000. If you assign these the values 0 through 9, you can set up a system of linear equations and solve them to find the value of each of the five bit positions b4, b3, b2, b1, and b0. The fact that 00011 = 0 means that b1 + b0 = 0. The fact that 00110 - 00101 = 2 - 1 = 1 means that b1 - b0 = 1. These two equalities imply b1 = ½ and b0 = -½. Given that you can pick out the other bits fairly easily: from right to left the "places" are -½, ½, 1½, 3½, and 6½. This works for 0 through 8 but then, sadly, fails for 9: 11000 = 6½ + 3½ = 10. No other code assignment has a solution either (I wrote a small program to try them all.)

I guess that's why the 1-of-2 plus 1-of-5 form was more popular than 2-of-5: it was probably easier to design the arithmetic circuits if each bit could be assigned a particular value. It's too bad, because 2-of-5 would have mathematical elegance on its side if it could be made to work.

If you dig around, you can find other esoteric number representations: negabinary has base -2 (negative two), so the positional values are 1, -2, 4, -8, and so on. Thus 5 is 101 but 3 is 111 = 1-2+4. Negabinary appears to have been first invented by Vittorio Grunwald in 1885. It was used in a few experimental computers in the 1950s, but it didn't catch on. Even weirder than negabinary numbers are the quater-imaginary numbers, invented by a high school student named Donald Knuth in 1955. Those use base 2i (where i is the imaginary square root of -1). (Knuth is also responsible for the negabinary history.)

Odder still are balanced ternary (ternary but digits are -1, 0, and 1) and Bubble Babble, but no discussion of interesting counting systems or high school math nerds would be complete without mentioning counting on your fingers in binary.

Elegance and Power

These are the most elegant programs I've ever seen.

A power series is an infinite-degree polynomial. For example,


e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \frac{x^4}{4!} + \cdots

Computation and manipulation of power series has a long and distinguished history as a test of so-called stream processing systems, which manipulate (arbitrarily long finite prefixes of) infinite streams. In the 1970s, Gilles Kahn and David MacQueen used power series as an unpublished test of their coroutine-based stream-processing system. Hal Abelson and Gerald Sussman devote Section 3.5 of their well-known 1985 textbook Structure and Interpretation of Computer Programs to stream processing, although only a single exercise mentions power series. In the late 1980s, Doug McIlroy explored power series processing in the context of Rob Pike's concurrent programming language Newsqueak. He published the work in his 1990 paper “Squinting at Power Series” (the Newsqueak interpreter is named squint).

Stream processing requires lazy evaluation to keep computations finite. All of these implementations built frameworks for lazy evaluation atop other building blocks: concurrent processes (Kahn and MacQueen, McIlroy) or first-class functions (Abelson and Sussman). It's only natural, then, to consider what the implementations would look like in a language with explicit support and encouragement for lazy evaluation, a language like Haskell. McIlroy revisited the topic in the context of Haskell in his 1998 Functional Pearl, “Power Series, Power Serious” (gzipped PS).

Symbolic manipulations can represent a power series as just a list of integer coefficients. Mathematically, this is equivalent to writing the polynormial in Horner form, which repeatedly factors x out of the remainder of the polynomial: F = f0 + xF1, or in Haskell, (f:fs) (= f + x fs).



In this form, ex is [1, 1, 1%2, 1%6, 1%24, 1%120, ...].

As a simple example, McIlroy's addition and multiplication implementations mimic the usual rules:


(f0 + xF1) + (g0 + xG1) = (f0 + g0) + x(F1 + G1)


  (f:fs) + (g:gs) = f+g : fs+gs

(f0 + xF1) (g0 + xG1) = f0g0 + x(f0G1 + g0F1) + x2(F1G1)


{- (f:fs) * (g:gs) = (f*g : 0) + (0 : f.*gs + g.*fs) + (0 : 0 : fs*gs) -}
   (f:fs) * (g:gs) = f*g : (f.*gs + g.*fs + (0 : fs*gs))
       c .* (f:fs) = c.*f : c.*fs

The commented-out definition is a more literal, but less efficient, translation of the equation.

Haskell's pattern matching and operator overloading are responsible for the elegance of the presentation, but it is lazy evaluation that is responsible for the elegance of the computation. The underlying lazy computation mechanism (whether provided directly, as in Haskell, or via other primitives, as in Scheme and Newsqueak) takes care of all the bookkeeping necessary to compute only the terms needed.

McIlroy implements derivative and integral operators as well. Both rely on a helper function to keep track of the leading multiplicative constant:


deriv (f:fs) = (deriv1 fs 1) where
    deriv1 (g:gs) n = n*g : (deriv1 gs (n+1))

integral fs = 0 : (int1 fs 1) where
    int1 (g:gs) n = g/n : (int1 gs (n+1))

These definitions enable elegant definitions of the power series for exp, sin, and cos:


expx = 1 + (integral expx)

sinx = integral cosx
cosx = 1 - (integral sinx)

To me, these three equations programs are beautifully elegant, almost magical.

When learning recursion in an introductory programming course (or induction in an introductory math course), it is common to feel like there's no actual work going on: one case just got rewritten in terms of others. The base cases, of course, provide the foundation on which the others are built. Learning to determine which base cases are necessary given the recursive steps is the key to being comfortable with recursion. Only then can you see that the program or proof is structurally sound.

The recursion in the definitions above gives me the same introductary queasiness, because it is hard to see the base cases. Upon closer inspection, the base case is in the expansion of integral, which begins with a constant zero term, making it possible to determine the first term in each of those series without further recursion. Having the first term makes it possible to find the second term, and so on.

For me, understanding why they work only enhances the beauty of these programs.

McIlroy continues the theme in his 2000 ICFP paper “The Music of Streams” (gzipped PS), which contains even more examples but fewer programs.

I've extracted runnable code from the paper. McIlroy maintains equivalent code. McIlroy's version uses these shorter versions of integral and deriv, which will delight Haskell aficionados:


int fs = 0 : zipWith (/) fs [1..]
diff (_:ft) = zipWith (*) ft [1..]

For the numerical computing aficionados, there is a one-line implementation of power series reversion:


revert (0:ft) = rs where rs = 0 : 1/(ft#rs)

(For comparison, Donald Knuth devotes Section 4.7 of Volume 2, about 10 pages, to the manipulation of power series. The equivalent reversion implementation takes about half a page.)

The Hideous Name

In 1985, email addresses looked a lot different than they do today. They were full of now-foreign symbols like : ! and %, because each mailer had its own syntax and there was no translation between syntaxes at borders, resulting in addresses like

research!ucbvax!@cmu-cs-pt.arpa:@CMU-ITC-LINUS:dave%CMU-ITC-LINUS@CMU-CS-PT.

Rob Pike and Peter Weinberger used that address as the opening example in their paper “The Hideous Name,” which examines good and bad examples of naming in both file systems and mail addresses. Name spaces, they argue, are most powerful when new semantics can be added without adding new syntax.

The mail examples in “The Hideous Name” are not particularly relevant anymore. Everyone has standardized on the two-level ARPANET @ syntax, and the other syntaxes are used only by people looking for open mail relays. But the file system examples are more relevant today than they were 20 years ago.

Unix file names are the canonical example: a program that parses file names like /etc/passwd will have no problem parsing names of ‘special’ files like /dev/stdin. The latter is special only in semantics, not in syntax: if you run cp /dev/stdin /tmp, cp will have no trouble deciding the name of the target file (/tmp/stdin). Contrast this with the convention of using to mean standard input: what target should cp – /tmp use? In fact (on Linux, at least) it looks for a file named rather than using standard input, illustrating another problem. Special syntax is often implemented by individual programs rather than the operating system, resulting in non-uniform handling of the names. It's hard to justify why (again, on Linux) cat – >/tmp/a means something different from cp – /tmp/a.

Providing uniform interpretation of names means having some central arbiter, tradtionally the kernel. Then all programs use the same name space, and one can apply old tools to new resources. For example, if other machines' file systems were mounted on /n/system/,

sed 's!:.*!!' /n/*/etc/passwd | sort -u 

produces a list of all the user of all the machines. And if backups are mounted on /dump,

ls -l /dump/am/2007/02*/etc/passwd

shows all the password files from last February. Special semantics, but no special syntax.

Better than changing the kernel for each file system is having a general mechanism by which the kernel can defer interpretation of particular file trees to other programs. Plan 9 was the first system to make extensive use of such user-level file servers. FUSE now makes it possible to do the same on modern Unixes. For example, Amit Singh has built some interesting new file systems for OS X using MacFUSE.

Pike and Weinberger end their paper with an observation from Doug McIlroy:

bad notations can stifle progress. Roman numerals hobbled mathematics for a millennium but were propagated by custom and by natural deference to authority. Today we no longer meekly accept individual authority. Instead, we have “standards,” impersonal imprimaturs on convention. Some standards are sound and indispensable; some simply celebrate bureaucratic littleness of mind. A harvest of gimmicks to save appearances within the standard has grown up, then gimmicks to save the appearances within the appearances. You know how each one got there: an overnight hack to paste another tumor onto a wild cancerous growth. The concern was with method, regardless of results. The result is extravagantly worse than Roman numerals: you can't read the notation right to left or left to right. As an amalgam of languages, it can't be deciphered by a native speaker of any one of them, much as if we were to switch at random places in a number between Roman and Arabic signs and between big-endian and little-endian order. But now that it all “works” — at least for the strong of stomach — the tumors themselves are being standardized.

What bad notations are stifling your progress?