Showing posts with label brute force. Show all posts
Showing posts with label brute force. Show all posts

[ next | prev | up ]

Mel was Real

Things have been pretty crazy around here, so no new posts recently, but this caught my attention.

James Grimmelmann reports a few interesting facts about the Librascope/Royal McBee LGP-30. Specifically, Mel was real, and the hexadecimal digits were 0 1 2 3 4 5 6 7 8 9 F G J K Q W!

I can't resist telling a story. James is now a professor at the New York Law School, but he and I were fellow computer science students in college. He was a TA for the undergraduate introductory theory course when I took it, and his duties included, among other more enjoyable tasks, grading the first problem set. The “challenge” problem asked for a proof that in any group of six people, there is either a group of three that all know each other (a clique) or a group of three that all don't know each other (an anti-clique). I didn't see how to prove it*, so I just wrote a simple C program to iterate over all 215 possibilities and verify that the condition holds. I wrote something about using brute force and attached the C program to the problem set. This problem set was just a warm-up in proof techniques, but the course is all about what can and can't be computed by various systems. The pinnacle result, of course, is Turing's proof that some problems are uncomputable and Cook's definition of NP-completeness. In that light, my brute force proof ran against everything the course was trying to teach.

James was the grader for that question. Next to the comment about brute force, he wrote “Wait until the NP-completeness unit, Mr. Brute Force” At the end, he added, “Very, very clever. Such tricks will not avail you much where this course is headed.” I didn't appreciate it then, but in hindsight I find it very funny.

* Truth be told, I still don't see how to prove it. You're supposed to use the pigeonhole principle, and the details have been explained to me in the past, but I can never reconstruct them on demand. The exhaustive search still feels easier, though perhaps less enlightening.

Bigger Programs are Better, not Best

When computer scientists study algorithms, they are interested in how much time or working space an algorithm requires, as a function of the input size: quicksort uses O(n log n) time, but only O(n) space. For the most part, computer scientists don't study how much code a problem requires. In fact, part of what it means to be an algorithm is that it can be implemented as a fixed-size program that works regardless of the input size. Intuitively, though, bigger programs should be able to do more than smaller programs.

It is pretty easy to prove that bigger programs are better, or at least that they can do more. To start, we need to precisely define what we mean by the size a program. These kinds of discussions often use Turing machines, but these days not many people are comfortable with Turing machines, so I'm going to use Scheme. [؟]

Let's define a Scheme program's size to be the number of left parentheses it contains. We need to disallow arbitrarily large parenthesis-free expressions, so that there are only a finite number of Scheme programs of a given size. Let's limit functions to three arguments and integer literals to be numbers less than 100. These requirements might make Scheme a little more verbose, but they don't make it any less powerful. Notice that these programs are just expressions that don't take any input.

If we look at all the Scheme programs of a given size, some of those programs evaluate to integers: (+ 1 (+ 2 3)) has size 2 and evaluates to 6. Let's define B(n) to be the largest integer that can be produced by evaluating a Scheme program of size n. The example demonstrates that B(3) is at least 6.

The formal statement of the “bigger is better” assertion is that B(n+1) > B(n) for any n. For any given n, the definition of B requires that there be some Scheme program e of size n that evaluates to B(n). Then (+ 1 e) has size n+1 and evaluates to 1+B(n). This demonstrates that B(n+1) is at least 1+B(n), so B(n+1) > B(n).

There you have it. Bigger programs are better. But it turns out that no matter how big a program you write, it can't compute B(n). B(n) gets so big so fast that no fixed-size program can keep up. We can prove this too.

Consider any Scheme function f that takes a single integer argument. This is a lambda expression, not a standalone program like we've been considering, but it still has some size s. Define n = 2s+1. We can write a Scheme program n1, also of size s, that computes 2s+2 = n+1. The program looks like (+ 2 (+ 2 ... (+ 2 (+ 2 2)) ...)). Now consider the Scheme program (f n1), wihch has size s+s+1 = n. Since it has size n, it cannot compute a number bigger than B(n). The value of the argument to f is n+1, and B(n+1) > B(n), so f cannot be computing B. Essentially, B grows faster than any function you can implement in Scheme.*

There you have it. The function B cannot be computed. Bigger has bested the computer.

The original presentation of this result is Tibor Rado's 1962 paper “On Non-Computable Functions” (PDF, 320kB), which appeared in the Bell System Technical Journal. Rado used Turing machines, not Scheme programs, and called the function Σ(n), not B(n). He also defined a function S(n) which is the maximum length of any computation by a Turing machine of size n that eventually stops. Rado made a game of trying to make Turing machines of a given size that ran for as long as possible (but stopped), and he called it the “Busy Beaver game.”

The function Σ is sometimes called the Busy Beaver function, leading to a slew of papers by otherwise respectable computer scientists and mathematicians about busy beavers. In particular, a favorite pastime is to attempt to compute Σ(n) and S(n) by hand, for small values of n. This requires analyzing all possible Turing machines of the given size to figure out whether each eventually stops, and if so, how long it runs and what number it prints.

Using a computer to analyze the easy machines and then doing the hard ones by hand, Rado and his student Shen Lin proved that Σ(1) = 1, S(1) = 1, Σ(2) = 4, and S(2) = 6 in their 1965 paper “Computer Studies of Turing Machine Problems” (subscription required). Lin proved in his Ph.D. thesis that Σ(3) = 6 and S(3) = 21. In 1983, Allen Brady proved that Σ(4) = 13 and S(4) = 107.

Rona Machlin and Quentin Stout summarized the situation nicely in their 1990 paper “The Complex Behavior Of Simple Machines:”

Brady predicted that there will never be a proof of the values of Σ(5) and S(5). We are just slightly more optimistic, and are lead to recast a parable due to Erdös (who spoke in the context of determining Ramsey numbers): suppose a vastly superior alien force lands and announces that they will destroy the planet unless we provide a value of the S function, along with a proof of its correctness. If they ask for S(5) we should put all of our mathematicians, computer scientists, and computers to the task, but if they ask for S(6) we should immediately attack because the task is hopeless.

Michiel Wijers has a good bibliography. Allen Brady has recently been exploring ternary Turing machines.


* It doesn't matter that we used Scheme and Rado used Turing machines, because Scheme can simulate a Turing machine and vice versa. In fact, so far no feasible computational model has yet been found that can't be simulated by a Turing machine (or, by extension, by Scheme). The Church-Turing thesis is the name for the hypothesis that anything we would reasonably consider computable can be computed by a Turing machine (or lambda calculus or a general-recursive function, both of which are equivalent). The exact details of what you can do with a given size differs from model to model, but the net result is the same: even a Turing machine can't compute our Scheme-based B(n). The result would work for any modern programming language, but Scheme makes the proofs particularly elegant.

Play Tic-Tac-Toe with Knuth

Section 7.1.2 of the Volume 4 pre-fascicle 0A of Donald Knuth's The Art of Computer Programming is titled “Boolean Evaluation.” In it, Knuth considers the construction of a set of nine boolean functions telling the correct next move in an optimal game of tic-tac-toe. In a footnote, Knuth tells this story:

This setup is based on an exhibit from the early 1950s at the Museum of Science and Industry in Chicago, where the author was first introduced to the magic of switching circuits. The machine in Chicago, designed by researchers at Bell Telephone Laboratories, allowed me to go first; yet I soon discovered there was no way to defeat it. Therefore I decided to move as stupidly as possible, hoping that the designers had not anticipated such bizarre behavior. In fact I allowed the machine to reach a position where it had two winning moves; and it seized both of them! Moving twice is of course a flagrant violation of the rules, so I had won a moral victory even though the machine had announced that I had lost.

That story alone is fairly amusing. But turning the page, the reader finds a quotation from Charles Babbage's Passages from the Life of a Philosopher, published in 1864:

I commenced an examination of a game called “tit-tat-to” ... to ascertain what number of combinations were required for all the possible variety of moves and situations. I found this to be comparatively insignificant. ... A difficulty, however, arose of a novel kind. When the automaton had to move, it might occur that there were two different moves, each equally conducive to his winning the game. ... Unless, also, some provision were made, the machine would attempt two contradictory motions.

The only real winning move is not to play.

Play Chess with God

Starting in the late 1970s, Ken Thompson proved by example that computers could completely analyze chess endgames, which involve only a small number of pieces. Thompson's key insight was that when there are only a few pieces on the board, there might be very many possible move sequences, but there are relatively few distinct board positions. An exhaustive search over board positions can run efficiently even though an exhaustive search over move sequences would not. (The same idea underlies Thompson's efficient parallel NFA simulation for regular expression matching.)

Thompson's first endgame analyses ran on boards with only four pieces, but over time he took advantage of improvements in processor speeds and storage sizes to analyze five- and six-piece endgames as well. Thompson described the programs in the ICCA Journal: his 1986 article “Retrograde Analysis of Certain Endgames” (PDF; 1MB) introduces the technique and covers five-piece endgames, and his 1996 article “6-Piece Endgames” (PDF; 1MB) describes refinements and optimizations for six pieces.

Writing about a 262-move optimal ‘rook and knight versus two knights’ game Thompson found in the 6-piece endgames, former championship chess player Tim Krabbé explained:

Playing over these moves is an eerie experience. They are not human; a grandmaster does not understand them any better than someone who has learned chess yesterday. The knights jump, the kings orbit, the sun goes down, and every move is the truth. It's like being revealed the Meaning of Life, but it's in Estonian.
On his web page, Thompson links to the online 6-piece endgame database as “Play Chess with God.” The CD art for the five-piece databases used a similar theme.


Further reading: With Joe Condon, Thompson built the chess playing computer Belle, which won the ACM North American Computer Chess Championship five times. Belle was later confiscated by U. S. Customs on its way to a contest in Moscow, on suspicion of being of military use. Thompson was quoted as saying that Belle's only military use would be “to drop it out of an airplane. You might kill somebody that way.”

Thompson's work in computer chess was featured in a Computer History Museum exhibit, which included a video of Thompson telling his story. (Beware that in the transcript, ‘endgames’ is usually written as ‘n games.’)

Superoptimizer

In 1987, Henry Massalin invented the “superoptimizer,” a program that finds the shortest possible instruction sequence for a given function. The superoptimizer works by sheer dumb luck: it checks whether every possible instruction sequence up to a given length produces the desired answer. The sequences that pass this test are not guaranteed to be an exact match for the function, but they usually are and can be checked analytically by a human.

As an example of what a superoptimizer can find, here is an Intel x86 instruction sequence that starts with a signed 16-bit number in ax and ends with the sign of that number (-1, 0, or 1) in dx.

cwd
neg ax
adc dx, dx

Massalin gives this example in his 1987 paper, “Superoptimizer: a look at the smallest program” (subscription required, sadly).

Superoptimizers are useful for compiler writers, but perhaps more as a source of amusement than as a time-saving device. One important way that they can be useful is to find sequences that avoid conditional branches, which get more and more expensive as processor pipelines deepen. Torbjörn Granlund and Richard Kenner used a superoptimizer to eliminate branches in gcc's output; see their paper “Eliminating Branches using a Superoptimizer and the GNU C Compiler.”

The Wikipedia entry for Superoptimization gives links to implementations.