When computer scientists study algorithms, they are
interested in how much time or working space an algorithm
requires, as a function of the input size: quicksort
uses O(n log n) time, but only O(n) space.
For the most part, computer scientists don't study
how much code a problem requires. In fact,
part of what it means to be an algorithm is that it can be
implemented as a fixed-size program that works regardless
of the input size.
Intuitively, though, bigger programs should
be able to do more than smaller programs.
It is pretty easy to prove that bigger programs are better,
or at least that they can do more.
To start, we need to precisely define what we mean by
the size a program. These kinds of discussions often
use Turing machines, but these days not many people are
comfortable with Turing machines, so I'm going to use Scheme.
[؟]
Let's define a Scheme program's size to be the number of
left parentheses it contains.
We need to disallow arbitrarily large parenthesis-free expressions,
so that there are only a finite number of Scheme programs of a given size.
Let's limit functions to three arguments and
integer literals to be numbers less than 100.
These requirements might make Scheme a little more verbose,
but they don't make it any less powerful.
Notice that these programs are just expressions that don't take any input.
If we look at all the Scheme programs of a given size,
some of those programs evaluate to integers:
(+ 1 (+ 2 3)) has size 2 and evaluates to 6.
Let's define B(n) to be the largest integer that can be
produced by evaluating a Scheme program of size n.
The example demonstrates that B(3) is at least 6.
The formal statement of the “bigger is better”
assertion is that B(n+1) > B(n) for any n.
For any given n, the definition of B requires that
there be some Scheme program e of size n that
evaluates to B(n).
Then (+ 1 e) has size n+1 and evaluates to 1+B(n).
This demonstrates that B(n+1) is at least 1+B(n),
so B(n+1) > B(n).
There you have it. Bigger programs are better.
But it turns out that no matter how big a program
you write, it can't compute B(n).
B(n) gets so big so fast that no fixed-size program can keep up.
We can prove this too.
Consider any Scheme function f that takes a single integer argument.
This is a lambda expression, not a standalone program like we've been
considering, but it still has some size s.
Define n = 2s+1.
We can write a Scheme program n1, also of size s,
that computes 2s+2 = n+1. The program looks like
(+ 2 (+ 2 ... (+ 2 (+ 2 2)) ...)).
Now consider the Scheme program (f n1),
wihch has size s+s+1 = n.
Since it has size n, it cannot compute a number bigger than B(n).
The value of the argument to f is n+1,
and B(n+1) > B(n), so
f cannot be computing B.
Essentially, B grows faster than any function you can implement in Scheme.*
There you have it. The function B cannot be computed.
Bigger has bested the computer.
The original presentation of this result is
Tibor Rado's 1962 paper “On Non-Computable Functions” (PDF, 320kB),
which appeared in the Bell System Technical Journal.
Rado used Turing machines, not Scheme programs,
and called the function Σ(n), not B(n).
He also defined a function S(n) which is the maximum
length of any computation by a Turing machine of size n
that eventually stops.
Rado made a game of trying to make Turing machines
of a given size that ran for as long as possible (but stopped),
and he called it the “Busy Beaver game.”
The function Σ is sometimes called the Busy Beaver function,
leading to a slew of papers by
otherwise respectable computer scientists and mathematicians
about busy beavers.
In particular, a favorite pastime is to attempt to compute
Σ(n) and S(n) by hand, for small values of n.
This requires analyzing all possible Turing machines
of the given size to figure out
whether each eventually stops, and if so, how long it runs
and what number it prints.
Using a computer to analyze the easy machines
and then doing the hard ones by hand,
Rado and his student Shen Lin
proved that Σ(1) = 1, S(1) = 1, Σ(2) = 4, and S(2) = 6
in their 1965 paper “Computer Studies of Turing Machine Problems” (subscription required).
Lin proved in his Ph.D. thesis that Σ(3) = 6 and S(3) = 21.
In 1983, Allen Brady proved that Σ(4) = 13 and S(4) = 107.
Rona Machlin and Quentin Stout summarized the situation
nicely in their 1990 paper “The Complex Behavior Of Simple Machines:”
Brady predicted that there
will never be a proof of the values of Σ(5) and S(5).
We are just slightly more optimistic, and are
lead to recast a parable due to Erdös (who spoke
in the context of determining Ramsey numbers):
suppose a vastly superior alien force lands and announces
that they will destroy the planet unless we provide
a value of the S function, along with a proof of its
correctness. If they ask for S(5) we should put all
of our mathematicians, computer scientists, and
computers to the task, but if they ask for S(6) we should
immediately attack because the task is hopeless.
Michiel Wijers has a good bibliography.
Allen Brady has recently been
exploring ternary
Turing machines.
* It doesn't matter that we used Scheme and Rado used Turing machines,
because Scheme can simulate a Turing machine and vice versa.
In fact, so far no feasible computational model has yet been found
that can't be simulated by a Turing machine (or, by extension, by Scheme).
The Church-Turing
thesis is the name for the hypothesis that anything we would
reasonably consider computable can be computed by a Turing machine
(or lambda calculus or a general-recursive function, both of which are equivalent).
The exact details of what you can do with a given size differs
from model to model, but the net result is the same:
even a Turing machine can't compute our Scheme-based B(n).
The result would work for any modern programming language,
but Scheme makes the proofs particularly elegant.