research!rsctag:research.swtch.com,2012:research.swtch.com2023-08-18T12:01:00-04:00Russ Coxhttps://swtch.com/~rscrsc@swtch.comC and C++ Prioritize Performance over Correctnesstag:research.swtch.com,2012:research.swtch.com/ub2023-08-18T12:00:00-04:002023-08-18T12:02:00-04:00The meaning of “undefined behavior” has changed significantly since its introduction in the 1980s.
<p>
The original ANSI C standard, C89, introduced the concept of “undefined behavior,”
which was used both to describe the effect of outright bugs like
accessing memory in a freed object
and also to capture the fact that existing implementations differed about
handling certain aspects of the language,
including use of uninitialized values,
signed integer overflow, and null pointer handling.
<p>
The C89 spec defined undefined behavior (in section 1.6) as:<blockquote>
<p>
Undefined behavior—behavior, upon use of a nonportable or
erroneous program construct, of erroneous data, or of
indeterminately-valued objects, for which the Standard imposes no
requirements. Permissible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic
of the environment (with or without the issuance of a diagnostic
message), to terminating a translation or execution (with the issuance
of a diagnostic message).</blockquote>
<p>
Lumping both non-portable and buggy code into the same category was a mistake.
As time has gone on, the way compilers treat undefined behavior
has led to more and more unexpectedly broken programs,
to the point where it is becoming difficult to tell whether any program
will compile to the meaning in the original source.
This post looks at a few examples and then tries to make some general observations.
In particular, today’s C and C++ prioritize
performance to the clear detriment of correctness.
<a class=anchor href="#uninit"><h2 id="uninit">Uninitialized variables</h2></a>
<p>
C and C++ do not require variables to be initialized
on declaration (explicitly or implicitly) like Go and Java.
Reading from an uninitialized variable is undefined behavior.
<p>
In a <a href="http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html">blog post</a>,
Chris Lattner (creator of LLVM and Clang) explains the rationale:<blockquote>
<p>
<b>Use of an uninitialized variable</b>:
This is commonly known as source of problems in C programs
and there are many tools to catch these:
from compiler warnings to static and dynamic analyzers.
This improves performance by not requiring that all variables
be zero initialized when they come into scope (as Java does).
For most scalar variables, this would cause little overhead,
but stack arrays and malloc’d memory would incur
a memset of the storage, which could be quite costly,
particularly since the storage is usually completely overwritten.</blockquote>
<p>
Early C compilers were too crude to detect
use of uninitialized basic variables like integers and pointers,
but modern compilers are dramatically more sophisticated.
They could absolutely react in these cases by
“terminating a translation or execution (with the issuance
of a diagnostic message),”
which is to say reporting a compile error.
Or, if they were worried about not rejecting old programs,
they could insert a zero initialization with, as Lattner admits, little overhead.
But they don’t do either of these.
Instead, they just do whatever they feel like during code generation.
<p>
<p>
For example, here’s a simple C++ program with an uninitialized variable (a bug):
<pre>#include <stdio.h>
int main() {
for(int i; i < 10; i++) {
printf("%d\n", i);
}
return 0;
}
</pre>
<p>
If you compile this with <code>clang++</code> <code>-O1</code>, it deletes the loop entirely:
<code>main</code> contains only the <code>return</code> <code>0</code>.
In effect, Clang has noticed the uninitialized variable and chosen
not to report the error to the user but instead
to pretend <code>i</code> is always initialized above 10, making the loop disappear.
<p>
It is true that if you compile with <code>-Wall</code>, then Clang does report the
use of the uninitialized variable as a warning.
This is why you should always build with and fix warnings in C and C++ programs.
But not all compiler-optimized undefined behaviors
are reliably reported as warnings.
<a class=anchor href="#overflow"><h2 id="overflow">Arithmetic overflow</h2></a>
<p>
At the time C89 was standardized, there were still legacy
<a href="https://en.wikipedia.org/wiki/Ones%27_complement">ones’-complement computers</a>,
so ANSI C could not assume the now-standard two’s-complement representation
for negative numbers.
In two’s complement, an <code>int8</code> −1 is 0b11111111;
in ones’ complement that’s −0, while −1 is 0b11111110.
This meant that operations like signed integer overflow could not be defined,
because<blockquote>
<p>
<code>int8</code> 127+1 = 0b01111111+1 = 0b10000000</blockquote>
<p>
is −127 in ones’ complement but −128 in two’s complement.
That is, signed integer overflow was non-portable.
Declaring it undefined behavior let compilers escalate the behavior
from “non-portable”, with one of two clear meanings,
to whatever they feel like doing.
For example, a common thing programmers expect is that you can test
for signed integer overflow by checking whether the result is
less than one of the operands, as in this program:
<pre>#include <stdio.h>
int f(int x) {
if(x+100 < x)
printf("overflow\n");
return x+100;
}
</pre>
<p>
Clang optimizes away the <code>if</code> statement.
The justification is that since signed integer overflow is undefined behavior,
the compiler can assume it never happens, so <code>x+100</code> must never be less than <code>x</code>.
Ironically, this program would correctly detect overflow
on both ones’-complement and two’s-complement machines
if the compiler would actually emit the check.
<p>
In this case, <code>clang++</code> <code>-O1</code> <code>-Wall</code> prints no warning while it deletes the <code>if</code> statement,
and neither does <code>g++</code>,
although I seem to remember it used to, perhaps in subtly different situations
or with different flags.
<p>
For C++20, the <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r0.html">first version of proposal P0907</a>
suggested standardizing that signed integer overflow
wraps in two’s complement. The original draft gave a very clear statement of the history
of the undefined behavior and the motivation for making a change:<blockquote>
<p>
[C11] Integer types allows three representations for signed integral types:
<ul>
<li>
Signed magnitude
<li>
Ones’ complement
<li>
Two’s complement</ul>
<p>
See §4 C Signed Integer Wording for full wording.
<p>
C++ inherits these three signed integer representations from C. To the author’s knowledge no modern machine uses both C++ and a signed integer representation other than two’s complement (see §5 Survey of Signed Integer Representations). None of [MSVC], [GCC], and [LLVM] support other representations. This means that the C++ that is taught is effectively two’s complement, and the C++ that is written is two’s complement. It is extremely unlikely that there exist any significant code base developed for two’s complement machines that would actually work when run on a non-two’s complement machine.
<p>
The C++ that is spec’d, however, is not two’s complement. Signed integers currently allow for trap representations, extra padding bits, integral negative zero, and introduce undefined behavior and implementation-defined behavior for the sake of this extremely abstract machine.
<p>
Specifically, the current wording has the following effects:
<ul>
<li>
Associativity and commutativity of integers is needlessly obtuse.
<li>
Naïve overflow checks, which are often security-critical, often get eliminated by compilers. This leads to exploitable code when the intent was clearly not to and the code, while naïve, was correctly performing security checks for two’s complement integers. Correct overflow checks are difficult to write and equally difficult to read, exponentially so in generic code.
<li>
Conversion between signed and unsigned are implementation-defined.
<li>
There is no portable way to generate an arithmetic right-shift, or to sign-extend an integer, which every modern CPU supports.
<li>
constexpr is further restrained by this extraneous undefined behavior.
<li>
Atomic integral are already two’s complement and have no undefined results, therefore even freestanding implementations already support two’s complement in C++.</ul>
<p>
Let’s stop pretending that the C++ abstract machine should represent integers as signed magnitude or ones’ complement. These theoretical implementations are a different programming language, not our real-world C++. Users of C++ who require signed magnitude or ones’ complement integers would be better served by a pure-library solution, and so would the rest of us.</blockquote>
<p>
In the end, the C++ standards committee put up “strong resistance against” the idea of defining
signed integer overflow the way every programmer expects; the undefined behavior remains.
<a class=anchor href="#loops"><h2 id="loops">Infinite loops</h2></a>
<p>
A programmer would never accidentally cause a program to execute an infinite loop, would they?
Consider this program:
<pre>#include <stdio.h>
int stop = 1;
void maybeStop() {
if(stop)
for(;;);
}
int main() {
printf("hello, ");
maybeStop();
printf("world\n");
}
</pre>
<p>
This seems like a completely reasonable program to write. Perhaps you are debugging and want the program to stop so you can attach a debugger. Changing the initializer for <code>stop</code> to <code>0</code> lets the program run to completion.
But it turns out that, at least with the latest Clang, the program runs to completion anyway:
the call to <code>maybeStop</code> is optimized away entirely, even when <code>stop</code> is <code>1</code>.
<p>
The problem is that C++ defines that every side-effect-free loop may be assumed by the compiler to terminate.
That is, a loop that does not terminate is therefore undefined behavior.
This is purely for compiler optimizations, once again treated as more important than correctness.
The rationale for this decision played out in the C standard and was more or less adopted in the C++ standard as well.
<p>
John Regehr pointed out this problem in his post
“<a href="https://blog.regehr.org/archives/140">C Compilers Disprove Fermat’s Last Theorem</a>,”
which included this entry in a FAQ:<blockquote>
<p>
Q: Does the C standard permit/forbid the compiler to terminate infinite loops?
<p>
A: The compiler is given considerable freedom in how it implements the C program,
but its output must have the same externally visible behavior that the program would have when interpreted by the “C abstract machine” that is described in the standard. Many knowledgeable people (including me) read this as saying that the termination behavior of a program must not be changed. Obviously some compiler writers disagree, or else don’t believe that it matters. The fact that reasonable people disagree on the interpretation would seem to indicate that the C standard is flawed.</blockquote>
<p>
A few months later, Douglas Walls wrote <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1509.pdf">WG14/N1509: Optimizing away infinite loops</a>,
making the case that the standard should <i>not</i> allow this optimization.
In response, Hans-J. Boehm wrote
<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1528.htm">WG14/N1528: Why undefined behavior for infinite loops?</a>,
arguing for allowing the optimization.
<p>
Consider the potential optimization of this code:
<pre>for (p = q; p != 0; p = p->next)
++count;
for (p = q; p != 0; p = p->next)
++count2;
</pre>
<p>
A sufficiently smart compiler might reduce it to this code:
<pre>for (p = q; p != 0; p = p->next) {
++count;
++count2;
}
</pre>
<p>
Is that safe? Not if the first loop is an infinite loop. If the list at <code>p</code> is cyclic and another thread is modifying <code>count2</code>,
then the first program has no race, while the second program does.
Compilers clearly can’t turn correct, race-free programs into racy programs.
But what if we declare that infinite loops are not correct programs?
That is, what if infinite loops were undefined behavior?
Then the compiler could optimize to its robotic heart’s content.
This is exactly what the C standards committee decided to do.
<p>
The rationale, paraphrased, was:
<ul>
<li>
It is very difficult to tell if a given loop is infinite.
<li>
Infinite loops are rare and typically unintentional.
<li>
There are many loop optimizations that are only valid for non-infinite loops.
<li>
The performance wins of these optimizations are deemed important.
<li>
Some compilers already apply these optimizations, making infinite loops non-portable too.
<li>
Therefore, we should declare programs with infinite loops undefined behavior, enabling the optimizations.</ul>
<a class=anchor href="#null"><h2 id="null">Null pointer usage</h2></a>
<p>
We’ve all seen how dereferencing a null pointer causes a crash on modern operating systems:
they leave page zero unmapped by default precisely for this purpose.
But not all systems where C and C++ run have hardware memory protection.
For example, I wrote my first C and C++ programs using Turbo C on an MS-DOS system.
Reading or writing a null pointer did not cause any kind of fault:
the program just touched the memory at location zero and kept running.
The correctness of my code improved dramatically when I moved to
a Unix system that made those programs crash at the moment of the mistake.
Because the behavior is non-portable, though, dereferencing a null pointer is undefined behavior.
<p>
At some point, the justification for keeping the undefined behavior became performance.
<a href="http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html">Chris Lattner explains</a>:<blockquote>
<p>
In C-based languages, NULL being undefined enables a large number of simple scalar optimizations that are exposed as a result of macro expansion and inlining.</blockquote>
<p>
In <a href="plmm#ub">an earlier post</a>, I showed this example, lifted from <a href="https://twitter.com/andywingo/status/903577501745770496">Twitter in 2017</a>:
<pre>#include <cstdlib>
typedef int (*Function)();
static Function Do;
static int EraseAll() {
return system("rm -rf slash");
}
void NeverCalled() {
Do = EraseAll;
}
int main() {
return Do();
}
</pre>
<p>
Because calling <code>Do()</code> is undefined behavior when <code>Do</code> is null, a modern C++ compiler like Clang
simply assumes that can’t possibly be what’s happening in <code>main</code>.
Since <code>Do</code> must be either null or <code>EraseAll</code> and since null is undefined behavior,
we might as well assume <code>Do</code> is <code>EraseAll</code> unconditionally,
even though <code>NeverCalled</code> is never called.
So this program can be (and is) optimized to:
<pre>int main() {
return system("rm -rf slash");
}
</pre>
<p>
Lattner gives <a href="https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html">an equivalent example</a> (search for <code>FP()</code>)
and then this advice:<blockquote>
<p>
The upshot is that it is a fixable issue: if you suspect something weird is going on like this, try building at -O0, where the compiler is much less likely to be doing any optimizations at all.</blockquote>
<p>
This advice is not uncommon: if you cannot debug the correctness problems in your C++ program, disable optimizations.
<a class=anchor href="#sort"><h2 id="sort">Crashes out of sorts</h2></a>
<p>
C++’s <code>std::sort</code> sorts a collection of values
(abstracted as a random access iterator, but almost always an array)
according to a user-specified comparison function.
The default function is <code>operator<</code>, but you can write any function.
For example if you were sorting instances of class <code>Person</code> your
comparison function might sort by the <code>LastName</code> field, breaking
ties with the <code>FirstName</code> field.
These comparison functions end up being subtle yet boring to write,
and it’s easy to make a mistake.
If you do make a mistake and pass in a comparison function that
returns inconsistent results or accidentally reports that any value
is less than itself, that’s undefined behavior:
<code>std::sort</code> is now allowed to do whatever it likes,
including walking off either end of the array
and corrupting other memory.
If you’re lucky, it will pass some of this memory to your comparison
function, and since it won’t have pointers in the right places,
your comparison function will crash.
Then at least you have a chance of guessing the comparison function is at fault.
In the worst case, memory is silently corrupted and the crash happens much later,
with <code>std::sort</code> is nowhere to be found.
<p>
Programmers make mistakes, and when they do, <code>std::sort</code> corupts memory.
This is not hypothetical. It happens enough in practice to be a
<a href="https://stackoverflow.com/questions/18291620/why-will-stdsort-crash-if-the-comparison-function-is-not-as-operator">popular question on StackOverflow</a>.
<p>
As a final note, it turns out that <code>operator<</code> is not a valid comparison function
on floating-point numbers if NaNs are involved, because:
<ul>
<li>
1 < NaN and NaN < 1 are both false, implying NaN == 1.
<li>
2 < NaN and NaN < 2 are both false, implying NaN == 2.
<li>
Since NaN == 1 and NaN == 2, 1 == 2, yet 1 < 2 is true.</ul>
<p>
Programming with NaNs is never pleasant, but it seems particularly extreme
to allow <code>std::sort</code> to crash when handed one.
<a class=anchor href="#reveal"><h2 id="reveal">Reflections and revealed preferences</h2></a>
<p>
Looking over these examples,
it could not be more obvious that in modern C and C++,
performance is job one and correctness is job two.
To a C/C++ compiler, a programmer making a mistake and (gasp!)
compiling a program containing a bug is just not a concern.
Rather than have the compiler point out the bug or at least
compile the code in a clear, understandable, debuggable manner,
the approach over and over again is
to let the compiler do whatever it likes,
in the name of performance.
<p>
This may not be the wrong decision for these languages.
There are undeniably power users for whom every last bit of performance
translates to very large sums of money, and I don’t claim
to know how to satisfy them otherwise.
On the other hand, this performance comes at a significant
development cost, and there are probably plenty of people and companies
who spend more than their performance savings
on unnecessarily difficult debugging sessions
and additional testing and sanitizing.
It also seems like there must be a middle ground where
programmers retain most of the control they have in C and C++
but the program doesn’t crash when sorting NaNs or
behave arbitrarily badly if you accidentally dereference a null pointer.
Whatever the merits, it is important to see clearly the choice that C and C++ are making.
<p>
In the case of arithmetic overflow, later drafts of the
proposal removed the defined behavior for wrapping, explaining:<blockquote>
<p>
The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior. This direction was motivated by:
<ul>
<li>
Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs;
<li>
Implementation leeway for tools such as sanitizers;
<li>
Data from Google suggesting that over 90% of all overflow is a bug, and defining wrapping behavior would not have solved the bug.</ul>
</blockquote>
<p>
Again, performance concerns rank first.
I find the third item in the list particularly telling.
I’ve known C/C++ compiler authors who got excited about a 0.1% performance improvement,
and incredibly excited about 1%.
Yet here we have an idea that would change 10% of affected programs from incorrect to correct,
and it is rejected, because performance is more important.
<p>
The argument about sanitizers is more nuanced.
Leaving a behavior undefined allows any implementation at all, including reporting the
behavior at runtime and stopping the program.
True, the widespread use of undefined behavior enables sanitizers like ThreadSanitizer, MemorySanitizer, and UBSan,
but so would defining the behavior as “either this specific behavior, or a sanitizer report.”
If you believed correctness was job one, you could
define overflow to wrap, fixing the 10% of programs outright
and making the 90% behave at least more predictably,
and then at the same time define that overflow is still
a bug that can be reported by sanitizers.
You might object that requiring wrapping in the absence of a sanitizer
would hurt performance, and that’s fine: it’s just more evidence that
performance trumps correctness.
<p>
One thing I find surprising, though, is that correctness gets ignored even
when it clearly doesn’t hurt performance.
It would certainly not hurt performance to emit a compiler warning
about deleting the <code>if</code> statement testing for signed overflow,
or about optimizing away the possible null pointer dereference in <code>Do()</code>.
Yet I could find no way to make compilers report either one; certainly not <code>-Wall</code>.
<p>
The explanatory shift from non-portable to optimizable also seems revealing.
As far as I can tell, C89 did not use performance as a justification for any of
its undefined behaviors.
They were non-portabilities, like signed overflow and null pointer dereferences,
or they were outright bugs, like use-after-free.
But now experts like Chris Lattner and Hans Boehm point to optimization potential,
not portability, as justification for undefined behaviors.
I conclude that the rationales really have shifted from the mid-1980s to today:
an idea that meant to capture non-portability has been preserved for performance,
trumping concerns like correctness and debuggability.
<p>
Occasionally in Go we have <a href="https://go.dev/blog/compat#input">changed library functions to remove surprising behavior</a>,
It’s always a difficult decision, but we are willing
to break existing programs depending on a mistake
if correcting the mistake fixes a much larger number of programs.
I find it striking that the C and C++ standards committees are
willing in some cases to break existing programs if doing so
merely <i>speeds up</i> a large number of programs.
This is exactly what happened with the infinite loops.
<p>
I find the infinite loop example telling for a second reason:
it shows clearly the escalation from non-portable to optimizable.
In fact, it would appear that if you want to break C++ programs in
service of optimization, one possible approach is to just do that in a
compiler and wait for the standards committee to notice.
The de facto non-portability of whatever programs you have broken
can then serve as justification for undefining their behavior,
leading to a future version of the standard in which your optimization is legal.
In the process, programmers have been handed yet another footgun
to try to avoid setting off.
<p>
(A common counterargument is that the standards committee cannot
force existing implementations to change their compilers.
This doesn’t hold up to scrutiny: every new feature that gets added
is the standards committee forcing existing implementations
to change their compilers.)
<p>
I am not claiming that anything should change about C and C++.
I just want people to recognize that the current versions of these
sacrifice correctness for performance.
To some extent, all languages do this: there is almost always a tradeoff
between performance and slower, safer implementations.
Go has data races in part for performance reasons:
we could have done everything by message copying
or with a single global lock instead, but the performance wins of
shared memory were too large to pass up.
For C and C++, though, it seems no performance win is too small
to trade against correctness.
<p>
As a programmer, you have a tradeoff to make too,
and the language standards make it clear which side they are on.
In some contexts, performance is the dominant priority and
nothing else matters quite as much.
If so, C or C++ may be the right tool for you.
But in most contexts, the balance flips the other way.
If programmer productivity, debuggability, reproducible bugs,
and overall correctness and understandability
are more important than squeezing every last little bit of performance,
then C and C++ are not the right tools for you.
I say this with some regret, as I spent many years happily writing C programs.
<p>
I have tried to avoid exaggerated, hyperbolic language in this post,
instead laying out the tradeoff and the preferences revealed
by the decisions being made.
John Regehr wrote a less restrained series of posts about undefined behavior
a decade ago, and in <a href="https://blog.regehr.org/archives/226">one of them</a> he concluded:<blockquote>
<p>
It is basically evil to make certain program actions wrong, but to not give developers any way to tell whether or not their code performs these actions and, if so, where. One of C’s design points was “trust the programmer.” This is fine, but there’s trust and then there’s trust. I mean, I trust my 5 year old but I still don’t let him cross a busy street by himself. Creating a large piece of safety-critical or security-critical code in C or C++ is the programming equivalent of crossing an 8-lane freeway blindfolded.</blockquote>
<p>
To be fair to C and C++,
if you set yourself the goal of crossing an 8-lane freeway blindfolded,
it does make sense to focus on doing it as fast as you possibly can.
Coroutines for Gotag:research.swtch.com,2012:research.swtch.com/coro2023-07-17T14:00:00-04:002023-07-17T14:02:00-04:00Why we need coroutines for Go, and what they might look like.
<p>
This post is about why we need a coroutine package for Go, and what it would look like.
But first, what are coroutines?
<p>
Every programmer today is familiar with function calls (subroutines):
F calls G, which stops F and runs G.
G does its work, potentially calling and waiting for other functions, and eventually returns.
When G returns, G is gone and F continues running.
In this pattern, only one function is running at a time,
while its callers wait, all the way up the call stack.
<p>
In contrast to subroutines, coroutines run concurrently on different stacks,
but it's still true that only one is running at a time,
while its caller waits.
F starts G, but G does not run immediately.
Instead, F must explicitly <i>resume</i> G, which then starts running.
At any point, G may turn around and <i>yield</i> back to F.
That pauses G and continues F from its resume operation.
Eventually F calls resume again, which pauses F and continues G from its yield.
On and on they go, back and forth, until G returns, which cleans up G and
continues F from its most recent resume, with some signal to F that G is done
and that F should no longer try to resume G.
In this pattern, only one coroutine is running at a time,
while its caller waits on a different stack.
They take turns in a well-defined, coordinated manner.
<p>
This is a bit abstract. Let's look at real programs.
<a class=anchor href="#lua"><h2 id="lua">Coroutines in Lua</h2></a>
<p>
To use a <a href="pcdata#gopher">venerable example</a>, consider comparing two binary trees
to see if they have the same value sequence, even if their structures are different.
For example, here is code in <a href="https://lua.org">Lua 5</a> to generate some binary trees:
<pre>function T(l, v, r)
return {left = l, value = v, right = r}
end
e = nil
t1 = T(T(T(e, 1, e), 2, T(e, 3, e)), 4, T(e, 5, e))
t2 = T(e, 1, T(e, 2, T(e, 3, T(e, 4, T(e, 5, e)))))
t3 = T(e, 1, T(e, 2, T(e, 3, T(e, 4, T(e, 6, e)))))
</pre>
<p>
The trees <code>t1</code> and <code>t2</code> both contain the values 1, 2, 3, 4, 5; <code>t3</code> contains 1, 2, 3, 4, 6.
<p>
We can write a coroutine to walk over a tree and yield each value:
<pre>function visit(t)
if t ~= nil then -- note: ~= is "not equal"
visit(t.left)
coroutine.yield(t.value)
visit(t.right)
end
end
</pre>
<p>
<p>
Then to compare two trees, we can create two visit coroutines and
alternate between them to read and compare successive values:
<pre>function cmp(t1, t2)
co1 = coroutine.create(visit)
co2 = coroutine.create(visit)
while true
do
ok1, v1 = coroutine.resume(co1, t1)
ok2, v2 = coroutine.resume(co2, t2)
if ok1 ~= ok2 or v1 ~= v2 then
return false
end
if not ok1 and not ok2 then
return true
end
end
end
</pre>
<p>
The <code>t1</code> and <code>t2</code> arguments to <code>coroutine.resume</code> are only used on the first iteration,
as the argument to <code>visit</code>.
Subsequent resumes return that value from <code>coroutine.yield</code>, but the code ignores the value.
<p>
A more idiomatic Lua version would use <code>coroutine.wrap</code>, which returns a function
that hides the coroutine object:
<pre><span style="color: #aaa">function cmp(t1, t2)</span>
next1 = coroutine.wrap(function() visit(t1) end)
next2 = coroutine.wrap(function() visit(t2) end)
<span style="color: #aaa"> while true</span>
<span style="color: #aaa"> do</span>
v1 = next1()
v2 = next2()
if v1 ~= v2 then
<span style="color: #aaa"> return false</span>
<span style="color: #aaa"> end</span>
if v1 == nil and v2 == nil then
<span style="color: #aaa"> return true</span>
<span style="color: #aaa"> end</span>
<span style="color: #aaa"> end</span>
<span style="color: #aaa">end</span>
</pre>
<p>
When the coroutine has finished, the <code>next</code> function returns <code>nil</code> (<a href="https://gist.github.com/rsc/5908886288b741b847a83c0c6597c690">full code</a>).
<a class=anchor href="#python"><h2 id="python">Generators in Python (Iterators in CLU)</h2></a>
<p>
Python provides generators that look a lot like Lua's coroutines,
but they are not coroutines, so it's worth pointing out the differences.
The main difference is that the “obvious” programs don't work.
For example, here's a direct translation of our Lua tree and visitor to Python:
<pre>def T(l, v, r):
return {'left': l, 'value': v, 'right': r}
def visit(t):
if t is not None:
visit(t['left'])
yield t['value']
visit(t['right'])
</pre>
<p>
But this obvious translation doesn't work:
<pre>>>> e = None
>>> t1 = T(T(T(e, 1, e), 2, T(e, 3, e)), 4, T(e, 5, e))
>>> for x in visit(t1):
... print(x)
...
4
>>>
</pre>
<p>
We lost 1, 2, 3, and 5. What happened?
<p>
In Python, that <code>def visit</code> does not define an ordinary function.
Because the body contains a <code>yield</code> statement, the result is a generator instead:
<pre>>>> type(visit(t1))
<class 'generator'>
>>>
</pre>
<p>
The call <code>visit(t['left'])</code> doesn't run the code in <code>visit</code> at all.
It only creates and returns a new generator, which is then discarded.
To avoid discarding those results, you have to loop over the generator and re-yield them:
<pre><span style="color: #aaa"></span>
<span style="color: #aaa">def visit(t):</span>
<span style="color: #aaa"> if t is not None:</span>
for x in visit(t['left']):
yield x
<span style="color: #aaa"> yield t['value']</span>
for x in visit(t['right'])
yield x
</pre>
<p>
Python 3.3 introduced <code>yield</code> <code>from</code>, allowing:
<pre><span style="color: #aaa">def visit(t):</span>
<span style="color: #aaa"> if t is not None:</span>
yield from visit(t['left']):
<span style="color: #aaa"> yield t['value']</span>
yield from visit(t['right'])
</pre>
<p>
The generator object contains the state of the single call to <code>visit</code>,
meaning local variable values and which line is executing.
That state is pushed onto the call stack each time the generator is resumed
and then popped back into the generator object at each <code>yield</code>,
which can only occur in the top-most call frame.
In this way, the generator uses the same stack as the original program,
avoiding the need for a full coroutine implementation
but introducing these confusing limitations instead.
<p>
Python's generators appear to be almost exactly copied from CLU,
which pioneered this abstraction (and so many other things),
although CLU calls them iterators, not generators.
A CLU tree iterator looks like:
<pre>visit = iter (t: cvt) yields (int):
tagcase t
tag empty: ;
tag non_empty(t: node):
for x: int
in tree$visit(t.left) do
yield(x);
end;
yield(t.value);
for x: int
in tree$visit(t.right) do
yield(x);
end;
end;
end visit;
</pre>
<p>
The syntax is different, especially the <code>tagcase</code> that is examining
a tagged union representation of a tree, but the basic structure,
including the nested <code>for</code> loops, is exactly the same as our first
working Python version.
Also, because CLU was statically typed, <code>visit</code> is clearly marked as an iterator (<code>iter</code>)
not a function (<code>proc</code> in CLU).
Thanks to that type information,
misuse of <code>visit</code> as an ordinary function call,
like in our buggy Python example,
is something that the compiler could (and I assume did) diagnose.
<p>
About CLU's implementation, the original implementers wrote,
“Iterators are a form of coroutine; however, their use is sufficiently constrained
that they are implemented using just the program stack.
Using an iterator is therefore only slightly more expensive than using a
procedure.”
This sounds exactly like the explanation I gave above for the Python generators.
For more, see Barbara Liskov <i>et al.</i>'s 1977 paper
“<a href="https://dl.acm.org/doi/10.1145/359763.359789">Abstraction Mechanisms in CLU</a>”,
specifically sections 4.2, 4.3, and 6.
<a class=anchor href="#thread"><h2 id="thread">Coroutines, Threads, and Generators</h2></a>
<p>
At first glance, coroutines, threads, and generators look alike.
All three provide <a href="pcdata">concurrency</a> in one form or another,
but they differ in important ways.
<ul>
<li>
<p>
Coroutines provide concurrency without parallelism:
when one coroutine is running, the one that resumed it
or yielded to it is not.
<p>
Because coroutines run one at a time and only switch at specific
points in the program, the coroutines can share data among themselves
without races.
The explicit switches (<code>coroutine.resume</code> in the first Lua example
or calling a <code>next</code> function in the second Lua example)
serve as synchronization points, creating <a href="gomm#gos_memory_model_today">happens-before edges</a>.
<p>
Because scheduling is explicit (without any preemption)
and done entirely without the operating system,
a coroutine switch takes at most around ten nanoseconds, usually even less.
Startup and teardown is also much cheaper than threads.
<li>
<p>
Threads provide more power than coroutines, but with more cost.
The additional power is parallelism, and the cost is
the overhead of scheduling, including more expensive context switches
and the need to add preemption in some form.
Typically the operating system provides threads,
and a thread switch takes a few microseconds.
<p>
For this taxonomy, Go's goroutines are cheap threads:
a goroutine switch is closer to a few hundred nanoseconds,
because the Go runtime takes on some of the scheduling work,
but goroutines still provide the full parallelism and preemption
of threads.
(Java's new lightweight threads are basically the same as goroutines.)
<li>
<p>
Generators provide less power than coroutines, because only
the top-most frame in the coroutine is allowed to yield.
That frame is moved back and forth between an object and the call stack
to suspend and resume it.</ul>
<p>
Coroutines are a useful building block for writing programs that want
concurrency for program structuring
but not for parallelism.
For one detailed example of that, see my previous post,
“<a href="pcdata">Storing Data in Control Flow</a>”.
For other examples, see Ana Lúcia De Moura and Roberto Ierusalimschy's 2009 paper
“<a href="https://dl.acm.org/doi/pdf/10.1145/1462166.1462167">Revisiting Coroutines</a>”.
For the original example, see Melvin Conway's 1963 paper
“<a href="https://dl.acm.org/doi/pdf/10.1145/366663.366704">Design of a Separable Transition-Diagram Compiler</a>”.
<a class=anchor href="#why"><h2 id="why">Why Coroutines in Go?</h2></a>
<p>
Coroutines are a concurrency pattern not directly
served by existing Go concurrency libraries.
Goroutines are often close enough,
but as we saw,
they are not the same, and sometimes that difference matters.
<p>
For example,
Rob Pike's 2011 talk “<a href="https://go.dev/talks/2011/lex.slide">Lexical Scanning in Go</a>”
presents the original lexer and parser for the <a href="https://go.dev/pkg/text/template">text/template package</a>.
They ran in separate goroutines connected by a channel,
imperfectly simulating a pair of coroutines: the
lexer and parser ran in parallel, with the lexer looking ahead to
the next token while the parser processed the most recent one.
Generators would not have been good enough—the lexer yields values from many different functions—but
full goroutines proved to be a bit too much.
The parallelism provided by the goroutines caused races
and eventually led to abandoning the design
in favor of the lexer storing state in an object,
which was a more faithful simulation of a coroutine.
Proper coroutines would have avoided the races
and been more efficient than goroutines.
<p>
An anticipated future use case for coroutines in Go
is iteration over generic collections.
We have discussed adding support to Go for
<a href="https://github.com/golang/go/discussions/56413">ranging over functions</a>,
which would encourage authors of collections and other abstractions
to provide CLU-like iterator functions.
Iterators can be implemented today using function values, without any language changes.
For example, a slightly simplified tree iterator in Go could be:
<pre>func (t *Tree[V]) All(yield func(v V)) {
if t != nil {
t.left.All(yield)
yield(t.value)
t.right.All(yield)
}
}
</pre>
<p>
That iterator can be invoked today as:
<pre>t.All(func(v V) {
fmt.Println(v)
})
</pre>
<p>
and perhaps a variant could be invoked in a future version of Go as:
<pre>for v := range t.All {
fmt.Println(v)
}
</pre>
<p>
Sometimes, however, we want to iterate over a collection
in a way that doesn't fit a single <code>for</code> loop.
The binary tree comparison is an example of this:
the two iterations need to be interlaced somehow.
As we've already seen, coroutines would provide an answer,
letting us turn a function like <code>(*Tree).All</code> (a “push” iterator)
into a function that returns a stream of values, one per call
(a “pull” iterator).
<a class=anchor href="#how"><h2 id="how">How to Implement Coroutines in Go</h2></a>
<p>
If we are to add coroutines to Go, we should aim to do it without language changes.
That means the definition of coroutines should be possible to implement
and understand in terms of ordinary Go code.
Later, I will argue for an optimized implementation provided directly by the runtime,
but that implementation should be indistinguishable from the pure Go definition.
<p>
Let's start with a very simple version that ignores the yield operation entirely.
It just runs a function in another goroutine:
<pre>package coro
func New[In, Out any](f func(In) Out) (resume func(In) Out) {
cin := make(chan In)
cout := make(chan Out)
resume = func(in In) Out {
cin <- in
return <-cout
}
go func() { cout <- f(<-cin) }()
return resume
}
</pre>
<p>
<code>New</code> takes a function <code>f</code> which must have one argument and one result.
<code>New</code> allocates channels, defines <code>resume</code>,
creates a goroutine to run <code>f</code>,
and returns the <code>resume</code> funtion.
The new goroutine blocks on <code><-cin</code>,
so there is no opportunity for parallelism.
The <code>resume</code> function unblocks the
new goroutine by sending an <code>in</code> value and then
blocks receiving an <code>out</code> value.
This send-receive pair makes a coroutine switch.
We can use <code>coro.New</code> like this (<a href="https://go.dev/play/p/gLhqAutT9Q4">full code</a>):
<pre>func main() {
resume := coro.New(strings.ToUpper)
fmt.Println(resume("hello world"))
}
</pre>
<p>
So far, <code>coro.New</code> is just a clunky way to call a function.
We need to add <code>yield</code>, which we can pass as an argument to <code>f</code>:
<pre>func New[In, Out any](f func(in In, yield func(Out) In) Out) (resume func(In) Out) {
<span style="color: #aaa"></span>
<span style="color: #aaa"> cin := make(chan In)</span>
<span style="color: #aaa"> cout := make(chan Out)</span>
<span style="color: #aaa"> resume = func(in In) Out {</span>
<span style="color: #aaa"> cin <- in</span>
<span style="color: #aaa"> return <-cout</span>
<span style="color: #aaa"> }</span>
yield := func(out Out) In {
cout <- out
return <-cin
}
go func() { cout <- f(<-cin, yield) }()
<span style="color: #aaa"> return resume</span>
<span style="color: #aaa">}</span>
</pre>
<p>
Note that there is still no parallelism here: <code>yield</code> is another send-receive pair.
These goroutines are constrained by the communication pattern
to act indistinguishably from coroutines.
<a class=anchor href="#parser"><h2 id="parser">Example: String Parser</h2></a>
<p>
Before we build up to iterator conversion, let's look at a few simpler examples.
In “<a href="pcdata">Storing Data in Control Flow</a>,” we considered the problem of
taking a function
<pre>func parseQuoted(read func() byte) bool
</pre>
<p>
and running it in a separate control flow so that bytes can be provided one
at a time to a <code>Write</code> method. Instead of the ad hoc channel-based implementation
in that post, we can use:
<pre>type parser struct {
resume func(byte) Status
}
func (p *parser) Init() {
coparse := func(_ byte, yield func(Status) byte) Status {
read := func() byte { return yield(NeedMoreInput) }
if !parseQuoted(read) {
return BadInput
}
return Success
}
p.resume = coro.New(coparse)
p.resume(0)
}
func (p *parser) Write(c byte) Status {
return p.resume(c)
}
</pre>
<p>
The <code>Init</code> funtion does all the work, and not much.
It defines a function <code>coparse</code> that has the signature needed by <code>coro.New</code>,
which means adding a throwaway input of type <code>byte</code>.
That function defines a <code>read</code> that yields <code>NeedMoreInput</code>
and then returns the byte provided by the caller.
It then runs <code>parseQuoted(read)</code>, converting the boolean result
to the usual status code.
Having created a coroutine for <code>coparse</code> using <code>coro.New</code>,
<code>Init</code> calls <code>p.resume(0)</code> to allow <code>coparse</code> to advance
to the first read in <code>parseQuoted</code>.
Finally the <code>Write</code> method is a trivial wrapper around <code>p.resume</code> (<a href="https://go.dev/play/p/MNGVPk11exV">full code</a>).
<p>
This setup abstracts away the pair of channels that we
maintained by hand in the previous post, allowing us to work
at a higher level as we write the program.
<a class=anchor href="#sieve"><h2 id="sieve">Example: Prime Sieve</h2></a>
<p>
As a slightly larger example, consider <a href="https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf">Doug McIlroy's concurrent prime sieve</a>.
It consists of a pipeline of coroutines, one for each prime <code>p</code>, each running:
<pre>loop:
n = get a number from left neighbor
if (p does not divide n)
pass n to right neighbor
</pre>
<p>
A counting coroutine on the leftmost side of the pipeline feeds the numbers 2, 3, 4, ... into the left end of the pipeline.
A printing coroutines on the rightmost side can read primes out, print them, and create new filtering coroutines.
The first filter in the pipeline removes multiples of 2, the next removes multiples of 3, the next removes multiples of 5, and so on.
<p>
The <code>coro.New</code> primitive we've created lets us take a straightforward loop that yields values
and convert it into a function that can be called to obtain each value one at a time.
Here is the counter:
<pre>func counter() func(bool) int {
return coro.New(func(more bool, yield func(int) bool) int {
for i := 2; more; i++ {
more = yield(i)
}
return 0
})
}
</pre>
<p>
The counter logic is the function literal passed to <code>New</code>.
It takes a yield function of type <code>func(int)</code> <code>bool</code>.
The code yields a value by passing it to <code>yield</code> and then receives back a boolean
saying whether to continue generating more numbers.
When told to stop, either because <code>more</code> was false on entry
or because a <code>yield</code> call returned false,
the loop ends.
It returns a final, ignored value, to satisfy the function
type required by <code>New</code>.
<p>
<code>New</code> turns this into loop a function that is the inverse of <code>yield</code>: a <code>func(bool)</code> <code>int</code>
that can be called with true to obtain the next value or with false to shut down
the generator.
The filtering coroutine is only slightly more complex:
<pre>func filter(p int, next func(bool) int) (filtered func(bool) int) {
return coro.New(func(more bool, yield func(int) bool) int {
for more {
n := next(true)
if n%p != 0 {
more = yield(n)
}
}
return next(false)
})
}
</pre>
<p>
It takes a prime <code>p</code> and a <code>next</code> func connected to the coroutine on the left
and then returns the filtered output stream to connect to the coroutine on the right.
<p>
Finally we have the printing coroutine:
<pre>func main() {
next := counter()
for i := 0; i < 10; i++ {
p := next(true)
fmt.Println(p)
next = filter(p, next)
}
next(false)
}
</pre>
<p>
Starting with the counter, <code>main</code> maintains in <code>next</code> the output
of the pipeline constructed so far.
Then it loops: read a prime <code>p</code>, print <code>p</code>, and then add a new
filter on the right end of the pipeline to remove multiples of <code>p</code> (<a href="https://go.dev/play/p/3OHQ_FHe_Na">full code</a>).
<p>
Notice that the calling relationship between coroutines can change over time:
any coroutine C can call another coroutine D's <code>next</code> function and become the
coroutine that D yields to.
The counter's first <code>yield</code> goes to <code>main</code>, while its subsequent <code>yield</code>s
go to the 2-filter.
Similarly each <code>p</code>-filter <code>yield</code>s its first output (the next prime) to <code>main</code>
while its subsequent <code>yield</code>s go to the filter for that next prime.
<a class=anchor href="#goroutines"><h2 id="goroutines">Coroutines and Goroutines</h2></a>
<p>
In a certain sense, it is a misnomer to call these control flows coroutines.
They are full goroutines, and they can do everything an ordinary
goroutine can, including block waiting for mutexes, channels,
system calls, and so on.
What <code>coro.New</code> does is create goroutines
with access to coroutine switch operations
inside the <code>yield</code> and <code>resume</code> functions (which the sieve calls <code>next</code>).
The ability to use those operations can even be passed to
different goroutines, which is happening with <code>main</code> handing off
each of its <code>next</code> streams to each successive <code>filter</code> goroutine.
Unlike the <code>go</code> statement, <code>coro.New</code> adds new concurrency to the program
<i>without</i> new parallelism.
The goroutine that <code>coro.New(f)</code> creates can only run
when some other goroutine explicitly loans it the
ability to run using <code>resume</code>; that loan is repaid by <code>yield</code>
or by <code>f</code> returning.
If you have just one main goroutine
and run 10 <code>go</code> statements, then all 11 goroutines can be running at once.
In contrast, if you have one main goroutine
and run 10 <code>coro.New</code> calls, there are now 11 control flows
but the parallelism of the program is what it was before: only one runs at a time.
Exactly which goroutines are paused in coroutine operations
can vary as the program runs, but the parallelism never increases.
<p>
In short, <code>go</code> creates a new concurrent, <i>parallel</i> control flow,
while <code>coro.New</code> creates a new concurrent, <i>non-parallel</i> control flow.
It is convenient to continue to talk about the non-parallel control flows
as coroutines, but remember that exactly which goroutines are
“non-parallel” can change over the execution of a program,
exactly the same way that which goroutines are receiving or sending from channels
can change over the execution of a program.
<a class=anchor href="#resume"><h2 id="resume">Robust Resumes</h2></a>
<p>
There are a few improvements we can make to <code>coro.New</code> so that it works better in real programs.
The first is to allow <code>resume</code> to be called after the function is done: right now it deadlocks.
Let's add a bool result indicating whether <code>resume</code>'s result came from a yield.
The <code>coro.New</code> implementation we have so far is:
<pre>func New[In, Out any](f func(in In, yield func(Out) In) Out) (resume func(In) Out) {
cin := make(chan In)
cout := make(chan Out)
resume = func(in In) Out {
cin <- in
return <-cout
}
yield := func(out Out) In {
cout <- out
return <-cin
}
go func() {
cout <- f(<-cin, yield)
}()
return resume
}
</pre>
<p>
<p>
To add this extra result, we need to track whether <code>f</code> is running
and return that result from <code>resume</code>:
<pre>func New[In, Out any](f func(in In, yield func(Out) In) Out) (resume func(In) (Out, bool)) {
<span style="color: #aaa"> cin := make(chan In)</span>
<span style="color: #aaa"> cout := make(chan Out)</span>
running := true
resume = func(in In) (out Out, ok bool) {
if !running {
return
}
<span style="color: #aaa"> cin <- in</span>
out = <-cout
return out, running
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> yield := func(out Out) In {</span>
<span style="color: #aaa"> cout <- out</span>
<span style="color: #aaa"> return <-cin</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> go func() {</span>
out := f(<-cin, yield)
running = false
cout <- out
<span style="color: #aaa"> }()</span>
<span style="color: #aaa"> return resume</span>
<span style="color: #aaa">}</span>
</pre>
<p>
Note that since <code>resume</code> can only run when the calling goroutine is blocked,
and vice versa, sharing the <code>running</code> variable is not a race.
The two are synchronizing by taking turns executing.
If <code>resume</code> is called after the coroutine has exited, <code>resume</code> returns a zero value and false.
<p>
Now we can tell when a goroutine is done (<a href="https://go.dev/play/p/Y2tcF-MHeYS">full code</a>):
<pre>func main() {
resume := coro.New(func(_ int, yield func(string) int) string {
yield("hello")
yield("world")
return "done"
})
for i := 0; i < 4; i++ {
s, ok := resume(0)
fmt.Printf("%q %v\n", s, ok)
}
}
$ go run cohello.go
"hello" true
"world" true
"done" false
"" false
$
</pre>
<a class=anchor href="#iterator"><h2 id="iterator">Example: Iterator Conversion</h2></a>
<p>
The prime sieve example showed direct use of <code>coro.New</code>,
but the <code>more bool</code> argument was a bit awkward and does not
match the iterator functions we saw before.
Let's look at converting any push iterator into a pull iterator
using <code>coro.New</code>.
We will need a way to terminate the coroutine running
the push iterator if we want to stop early, so we will add a boolean
result from <code>yield</code> indicating whether to continue,
just like in the prime sieve:
<pre>push func(yield func(V) bool)
</pre>
<p>
The goal of the new function <code>coro.Pull</code> is to turn that push function
into a pull iterator. The iterator will return the next value
and a boolean indicating whether the iteration is over,
just like a channel receive or map lookup:
<pre>pull func() (V, bool)
</pre>
<p>
If we want to stop the push iteration early, we need some
way to signal that, so <code>Pull</code> will return not just the pull
function but also a stop function:
<pre>stop func()
</pre>
<p>
Putting those together, the full signature of <code>Pull</code> is:
<pre>func Pull[V any](push func(yield func(V) bool)) (pull func() (V, bool), stop func()) {
...
}
</pre>
<p>
The first thing <code>Pull</code> needs to do is start a coroutine to run the push iterator,
and to do that it needs a wrapper function with the right type,
namely one that takes a <code>more bool</code> to match the bool result from <code>yield</code>,
and that returns a final <code>V</code>.
The <code>pull</code> function can call <code>resume(true)</code>, while the <code>stop</code> function can call <code>resume(false)</code>:
<pre>func Pull[V any](push func(yield func(V) bool)) (pull func() (V, bool), stop func()) {
copush := func(more bool, yield func(V) bool) V {
if more {
push(yield)
}
var zero V
return zero
}
resume := coro.New(copush)
pull = func() (V, bool) {
return resume(true)
}
stop = func() {
resume(false)
}
return pull, stop
}
</pre>
<p>
That's the complete implementation.
With the power of <code>coro.New</code>, it took very little code and effort to build a nice iterator converter.
<p>
<p>
To use <code>coro.Pull</code>, we need to redefine the tree's <code>All</code> method
to expect and use the new <code>bool</code> result from <code>yield</code>:
<pre>func (t *Tree[V]) All(yield func(v V) bool) {
t.all(yield)
}
func (t *Tree[V]) all(yield func(v V) bool) bool {
return t == nil ||
t.Left.all(yield) && yield(t.Value) && t.Right.all(yield)
}
</pre>
<p>
Now we have everything we need to write a tree comparison function in Go (<a href="https://go.dev/play/p/hniFxnbXTgH">full code</a>):
<pre>func cmp[V comparable](t1, t2 *Tree[V]) bool {
next1, stop1 := coro.Pull(t1.All)
next2, stop2 := coro.Pull(t2.All)
defer stop1()
defer stop2()
for {
v1, ok1 := next1()
v2, ok2 := next2()
if v1 != v2 || ok1 != ok2 {
return false
}
if !ok1 && !ok2 {
return true
}
}
}
</pre>
<p>
<a class=anchor href="#panic"><h2 id="panic">Propagating Panics</h2></a>
<p>
Another improvement is to pass panics from a coroutine back to its caller,
meaning the coroutine that most recently called <code>resume</code> to run it
(and is therefore sitting blocked in <code>resume</code> waiting for it).
Some mechanism to inform one goroutine when another panics is a very common request,
but in general that can be difficult, because we don't know which
goroutine to inform and whether it is ready to hear that message.
In the case of coroutines, we have the caller blocked waiting for news,
so it makes sense to deliver news of the panic.
<p>
To do that, we can add a <code>defer</code> to catch a panic in the new coroutine
and trigger it again in the <code>resume</code> that is waiting.
<pre>type msg[T any] struct {
panic any
val T
}
<span style="color: #aaa"></span>
<span style="color: #aaa">func New[In, Out any](f func(in In, yield func(Out) In) Out) (resume func(In) (Out, bool)) {</span>
<span style="color: #aaa"> cin := make(chan In)</span>
cout := make(chan msg[Out])
<span style="color: #aaa"> running := true</span>
<span style="color: #aaa"> resume = func(in In) (out Out, ok bool) {</span>
<span style="color: #aaa"> if !running {</span>
<span style="color: #aaa"> return</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> cin <- in</span>
m := <-cout
if m.panic != nil {
panic(m.panic)
}
return m.val, running
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> yield := func(out Out) In {</span>
cout <- msg[Out]{val: out}
<span style="color: #aaa"> return <-cin</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> go func() {</span>
defer func() {
if running {
running = false
cout <- msg[Out]{panic: recover()}
}
}()
<span style="color: #aaa"> out := f(<-cin, yield)</span>
<span style="color: #aaa"> running = false</span>
cout <- msg[Out]{val: out}
<span style="color: #aaa"> }()</span>
<span style="color: #aaa"> return resume</span>
<span style="color: #aaa">}</span>
</pre>
<p>
<p>
Let's test it out (<a href="https://go.dev/play/p/Sihm8KVlTIB">full code</a>):
<pre>func main() {
defer func() {
if e := recover(); e != nil {
fmt.Println("main panic:", e)
panic(e)
}
}()
next, _ := coro.Pull(func(yield func(string) bool) {
yield("hello")
panic("world")
})
for {
fmt.Println(next())
}
}
</pre>
<p>
The new coroutine yields <code>hello</code> and then panics <code>world</code>.
That panic is propagated back to the main goroutine,
which prints the value and repanics.
We can see that the panic appears to originate in the call to <code>resume</code>:
<pre>% go run coro.go
hello true
main panic: world
panic: world [recovered]
panic: world
goroutine 1 [running]:
main.main.func1()
/tmp/coro.go:9 +0x95
panic({0x108f360?, 0x10c2cf0?})
/go/src/runtime/panic.go:1003 +0x225
main.coro_New[...].func1()
/tmp/coro.go.go:55 +0x91
main.Pull[...].func2()
/tmp/coro.go.go:31 +0x1c
main.main()
/tmp/coro.go.go:17 +0x52
exit status 2
%
</pre>
<a class=anchor href="#cancel"><h2 id="cancel">Cancellation</h2></a>
<p>
Panic propagation takes care of telling the caller about an early coroutine exit,
but what about telling a coroutine about an early caller exit?
Analogous to the <code>stop</code> function in the pull iterator,
we need some way to signal to the coroutine that it's no longer needed,
perhaps because the caller is panicking, or perhaps because the caller
is simply returning.
<p>
To do that, we can change <code>coro.New</code> to return not just <code>resume</code> but
also a <code>cancel</code> func.
Calling <code>cancel</code> will be like <code>resume</code>, except that <code>yield</code> panics instead of returning a value.
If a coroutine panics in a different way during cancellation,
we want <code>cancel</code> to propagate that panic, just as <code>resume</code> does.
But of course we don't want <code>cancel</code> to propagate its own panic,
so we create a unique panic value we can check for.
We also have to handle a cancellation in before <code>f</code> begins.
<pre>var ErrCanceled = errors.New("coroutine canceled")
<span style="color: #aaa"></span>
func New[In, Out any](f func(in In, yield func(Out) In) Out) (resume func(In) (Out, bool), cancel func()) {
cin := make(chan msg[In])
<span style="color: #aaa"> cout := make(chan msg[Out])</span>
<span style="color: #aaa"> running := true</span>
<span style="color: #aaa"> resume = func(in In) (out Out, ok bool) {</span>
<span style="color: #aaa"> if !running {</span>
<span style="color: #aaa"> return</span>
<span style="color: #aaa"> }</span>
cin <- msg[In]{val: in}
<span style="color: #aaa"> m := <-cout</span>
<span style="color: #aaa"> if m.panic != nil {</span>
<span style="color: #aaa"> panic(m.panic)</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> return m.val, running</span>
<span style="color: #aaa"> }</span>
cancel = func() {
e := fmt.Errorf("%w", ErrCanceled) // unique wrapper
cin <- msg[In]{panic: e}
m := <-cout
if m.panic != nil && m.panic != e {
panic(m.panic)
}
}
yield := func(out Out) In {
cout <- msg[Out]{val: out}
m := <-cin
if m.panic != nil {
panic(m.panic)
}
return m.val
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> go func() {</span>
<span style="color: #aaa"> defer func() {</span>
<span style="color: #aaa"> if running {</span>
<span style="color: #aaa"> running = false</span>
<span style="color: #aaa"> cout <- msg[Out]{panic: recover()}</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> }()</span>
var out Out
m := <-cin
if m.panic == nil {
out = f(m.val, yield)
}
running = false
cout <- msg[Out]{val: out}
}()
return resume, cancel
<span style="color: #aaa">}</span>
</pre>
<p>
We could change <code>Pull</code> to use panics to cancel iterators as well,
but in that context the explicit <code>bool</code> seems clearer,
especially since stopping an iterator is unexceptional.
<a class=anchor href="#sieve2"><h2 id="sieve2">Example: Prime Sieve Revisited</h2></a>
<p>
Let's look at how panic propagation and cancellation make cleanup of the prime sieve “just work”.
First let's update the sieve to use the new API.
The <code>counter</code> and <code>filter</code> functions are already
“one-line” <code>return coro.New(...)</code> calls.
They change signature to include the additional cancel func returned from <code>coro.New</code>:
<pre>func counter() (func(bool) (int, bool), func()) {
return coro.New(...)
}
func filter(p int, next func(bool) (int, bool)) (func(bool) (int, bool), func()) {
return coro.New(...)
}
</pre>
<p>
Then let's convert the <code>main</code> function to be a <code>primes</code> function that prints <code>n</code> primes (<a href="https://go.dev/play/p/XWV8ACRKjDS">full code</a>):
<pre>func primes(n int) {
next, cancel := counter()
defer cancel()
for i := 0; i < n; i++ {
p, _ := next(true)
fmt.Println(p)
next, cancel = filter(p, next)
defer cancel()
}
}
</pre>
<p>
When this function runs, after it has gotten <code>n</code> primes, it returns.
Each of the deferred <code>cancel</code> calls cleans up the
coroutines that were created.
And what if one of the coroutines has a bug and panics?
If the coroutine was resumed by a <code>next</code> call in <code>primes</code>,
then the panic comes back to <code>primes</code>, and <code>primes</code>'s deferred
<code>cancel</code> calls clean up all the other coroutines.
If the coroutine was resumed by a <code>next</code> call in a <code>filter</code> coroutine,
then the panic will propagate up to the waiting <code>filter</code> coroutine
and then the next waiting <code>filter</code> coroutine, and so on, until it
gets to the <code>p</code> <code>:=</code> <code>next(true)</code> in <code>primes</code>, which will
again clean up the remaining coroutines.
<a class=anchor href="#api"><h2 id="api">API</h2></a>
<p>
The API we've arrived at is:<blockquote>
<p>
New creates a new, paused coroutine ready to run the function f.
The new coroutine is a goroutine that never runs on its own:
it only runs while some other goroutine invokes and waits for it,
by calling resume or cancel.
<p>
A goroutine can pause itself and switch to the new coroutine by calling resume(in).
The first call to resume starts f(in, yield).
Resume blocks while f runs, until either f calls yield(out) or returns out.
When f calls yield, yield blocks and resume returns out, true.
When f returns, resume returns out, false.
When resume has returned due to a yield, the next resume(in)
switches back to f, with yield returning in.
<p>
Cancel stops the execution of f and shuts down the coroutine.
If resume has not been called,
then f does not run at all.
Otherwise, cancel causes the blocked yield call
to panic with an error satisfying errors.Is(err, ErrCanceled).
<p>
If f panics and does not recover the panic,
the panic is stopped in f's coroutine and restarted in the goroutine
waiting for f, by causing the blocked resume or cancel that is waiting
to re-panic with the same panic value.
Cancel does not re-panic when f's panic is one that
cancel itself triggered.
<p>
Once f has returned or panicked, the coroutine no longer exists.
Subsequent calls to resume return zero, false.
Subsequent calls to cancel simply return.
<p>
The functions resume, cancel, and yield can be passed between
and used by different goroutines, in effect dynamically changing
which goroutine is “the coroutine.”
Although New creates a new goroutine, it also establishes an
invariant that one goroutine is always blocked,
either in resume, cancel, yield, or (right after New)
waiting for the resume that will call f.
This invariant holds until f returns, at which point the
new goroutine is shut down.
The net result is that coro.New creates new concurrency in the program
without any new parallelism.
<p>
If multiple goroutines call resume or cancel, those calls are serialized.
Similarly, if multiple goroutines call yield, those calls are serialized.</blockquote>
<pre>func New[In, Out any](f func(in In, yield func(Out) In) Out) (resume func(In) (Out, bool), cancel func())
</pre>
<a class=anchor href="#efficiency"><h2 id="efficiency">Efficiency</h2></a>
<p>
As I said at the start, while it's important to have a definition of coroutines
that can be understood by reference to a pure Go implementation,
I believe we should use an optimized runtime implementation.
On my 2019 MacBook Pro, passing values back and
forth using the channel-based <code>coro.New</code> in this post requires
approximately 190ns per switch, or 380ns per value in <code>coro.Pull</code>.
Remember that <code>coro.Pull</code> would not be the standard way
to use an iterator: the standard way would be to invoke the iterator
directly, which has no coroutine overhead at all.
You only need <code>coro.Pull</code> when you want to process
iterated values incrementally, not using a single for loop.
Even so, we want to make <code>coro.Pull</code> as fast as we can.
<p>
First I tried having the compiler mark send-receive pairs
and leave hints for the runtime to fuse them into a single operation.
That would let the channel runtime bypass the scheduler
and jump directly to the other coroutine.
This implementation requires
about 118ns per switch, or 236ns per pulled value (38% faster).
That's better, but it's still not as fast as I would like.
The full generality of channels is adding too much overhead.
<p>
Next I added a direct coroutine switch to the runtime,
avoiding channels entirely.
That cuts the coroutine switch to three atomic compare-and-swaps
(one in the coroutine data structure, one for the scheduler status
of the blocking coroutine, and one for the scheduler status of the resuming coroutine),
which I believe is optimal given the safety invariants that must be maintained.
That implementation takes 20ns per switch, or 40ns per pulled value.
This is about 10X faster than the original channel implementation.
Perhaps more importantly, 40ns per pulled value seems small enough
in absolute terms not to be a bottleneck for code that needs <code>coro.Pull</code>.
Storing Data in Control Flowtag:research.swtch.com,2012:research.swtch.com/pcdata2023-07-11T14:00:00-04:002023-07-11T14:02:00-04:00Write programs, not simulations of programs.
<p>
A decision that arises over and over when designing concurrent programs
is whether to represent program state in control flow or as data.
This post is about what that decision means and how to approach it.
Done well, taking program state stored in data
and storing it instead in control flow can make programs
much clearer and more maintainable than they otherwise would be.
<p>
Before saying much more, it’s important to note that
<a href="https://www.youtube.com/watch?v=oV9rvDllKEg">concurrency is not parallelism.</a>:
<ul>
<li>
<p>
Concurrency is about <i>how you write programs</i>,
about being able to compose independently executing control flows,
whether you call them processes or threads or goroutines,
so that your program can be <i>dealing with</i> lots of things at once without turning into a giant mess.
<li>
<p>
On the other hand, parallelism is about <i>how you execute programs</i>,
allowing multiple computations to run simultaneously,
so that your program can be <i>doing</i> lots of things at once efficiently.</ul>
<p>
Concurrency lends itself naturally to parallel execution,
but the focus in this post is about how to use concurrency
to write cleaner programs, not faster ones.
<p>
The difference between concurrent programs and non-concurrent programs
is that concurrent programs can be written as if they are executing multiple
independent control flows at the same time.
The name for the smaller control flows varies by language:
thread, task, process, fiber, coroutine, goroutine, and so on.
No matter the name, the fundamental point for this post
is that writing a program in terms of multiple independently executing control flows
allows you to store program state in the execution state of one or more
of those control flows, specifically in the program counter
(which line is executing in that piece)
and on the stack.
Control flow state can always be maintained as explicit data instead,
but then the explicit data form is essentially simulating the control flow.
Most of the time, using the control flow features built into a programming language
is easier to understand, reason about, and maintain than simulating them
in data structures.
<p>
The rest of this post illustrates the rather abstract claims I’ve been making
about storing data in control flow by walking through some
concrete examples.
They happen to be written in <a href="https://go.dev/">Go</a>,
but the ideas apply to any language that supports writing concurrent programs,
including essentially every modern language.
<a class=anchor href="#step"><h2 id="step">A Step-by-Step Example</h2></a>
<p>
Here is a seemingly trivial problem
that demonstrates what it means to store program state in control flow.
Suppose we are reading characters from a file and want to scan over a C-style double-quoted string.
In this case, we have a non-parallel program.
There is no opportunity for parallelism here,
but as we will see, concurrency can still play a useful part.
<p>
If we don’t worry about checking the exact escape sequences in the string,
it suffices to match the regular expression <code>"([^"\\]|\\.)*"</code>,
which matches a double quote, then a sequence of zero or more characters,
and then another double quote.
Between the quotes, a character is anything that’s not a quote or backslash,
or else a backslash followed by anything (including a quote or backslash).
<p>
Every regular expression can be compiled into finite automaton or state machine,
so we might use a tool to turn that specification into this Go code:
<pre>state := 0
for {
c := read()
switch state {
case 0:
if c != '"' {
return false
}
state = 1
case 1:
if c == '"' {
return true
}
if c == '\\' {
state = 2
} else {
state = 1
}
case 2:
state = 1
}
}
</pre>
<p>
The code has a single variable named <code>state</code> that represents the state of the automaton.
The for loop reads a character and updates the state, over and over,
until it finds either the end of the string or a syntax error.
This is the kind of code that a program would write and that only a program could love.
It’s difficult for people to read, and it will be difficult for people to maintain.
<p>
The main reason this program is so opaque is that its program state is stored as data,
specifically in the variable named <code>state</code>.
When it’s possible to store state in code instead, that often leads to a clearer program.
To see this, let’s transform the program, one small step at a time,
into an equivalent but much more understandable version.
<p>
<p>
We can start by duplicating the <code>read</code> calls into each case of the switch:
<pre><span style="color: #aaa">state := 0 state := 0</span>
<span style="color: #aaa">for { for {</span>
c := read()
<span style="color: #aaa"> switch state { switch state {</span>
<span style="color: #aaa"> case 0: case 0:</span>
c := read()
<span style="color: #aaa"> if c != '"' { if c != '"' {</span>
<span style="color: #aaa"> return false return false</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> state = 1 state = 1</span>
<span style="color: #aaa"> case 1: case 1:</span>
c := read()
<span style="color: #aaa"> if c == '"' { if c == '"' {</span>
<span style="color: #aaa"> return true return true</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> if c == '\\' { if c == '\\' {</span>
<span style="color: #aaa"> state = 2 state = 2</span>
<span style="color: #aaa"> } else { } else {</span>
<span style="color: #aaa"> state = 1 state = 1</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> case 2: case 2:</span>
c := read()
<span style="color: #aaa"> state = 1 state = 1</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa">} }</span>
</pre>
<p>
(In this and all the displays that follow, the old program is on the left, the new program
is on the right, and lines that haven’t changed are printed in gray text.)
<p>
<p>
Now, instead of writing to <code>state</code> and then immediately going around the for loop again
to look up what to do in that state, we can use code labels and goto statements:
<pre>state := 0 state0:
for {
switch state {
case 0:
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c != '"' { if c != '"' {</span>
<span style="color: #aaa"> return false return false</span>
<span style="color: #aaa"> } }</span>
state = 1 goto state1
case 1: state1:
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c == '"' { if c == '"' {</span>
<span style="color: #aaa"> return true return true</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> if c == '\\' { if c == '\\' {</span>
state = 2 goto state2
<span style="color: #aaa"> } else { } else {</span>
state = 1 goto state1
<span style="color: #aaa"> } }</span>
case 2: state2:
c := read() read()
state = 1 goto state1
}
}
</pre>
<p>
Then we can simplify the program further.
The <code>goto</code> <code>state1</code> right before the <code>state1</code> label is a no-op and can be deleted.
And we can see that there’s only one way to get to state2,
so we might as well replace the <code>goto</code> <code>state2</code> with the actual code from state2:
<pre><span style="color: #aaa">state0: state0:</span>
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c != '"' { if c != '"' {</span>
<span style="color: #aaa"> return false return false</span>
<span style="color: #aaa"> } }</span>
goto state1
<span style="color: #aaa">state1: state1:</span>
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c == '"' { if c == '"' {</span>
<span style="color: #aaa"> return true return true</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> if c == '\\' { if c == '\\' {</span>
goto state2
} else {
goto state1
}
state2:
<span style="color: #aaa"> read() read()</span>
<span style="color: #aaa"> goto state1 goto state1</span>
} else {
goto state1
}
</pre>
<p>
Then we can factor the “goto state1” out of both branches of the if statement.
<pre><span style="color: #aaa">state0: state0:</span>
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c != '"' { if c != '"' {</span>
<span style="color: #aaa"> return false return false</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> </span>
<span style="color: #aaa">state1: state1:</span>
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c == '"' { if c == '"' {</span>
<span style="color: #aaa"> return true return true</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> if c == '\\' { if c == '\\' {</span>
<span style="color: #aaa"> read() read()</span>
goto state1 }
} else { goto state1
goto state1
}
</pre>
<p>
Then we can drop the unused <code>state0</code> label and replace the <code>state1</code> loop with an actual loop.
Now we have something that looks like a real program:
<pre>state0:
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c != '"' { if c != '"' {</span>
<span style="color: #aaa"> return false return false</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> </span>
state1: for {
<span style="color: #aaa"> c := read() c := read()</span>
<span style="color: #aaa"> if c == '"' { if c == '"' {</span>
<span style="color: #aaa"> return true return true</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa"> if c == '\\' { if c == '\\' {</span>
<span style="color: #aaa"> read() read()</span>
<span style="color: #aaa"> } }</span>
goto state1 }
</pre>
<p>
We can simplify a little further, eliminating some unnecessary variables,
and we can make the check for the final quote (<code>c</code> <code>==</code> <code>""</code>) be the loop terminator.
<pre>c := read() if read() != '"' {
if c != '"' {
<span style="color: #aaa"> return false return false</span>
<span style="color: #aaa">} }</span>
<span style="color: #aaa"> </span>
for { var c byte
c := read() for c != '"' {
if c == '"' { c = read()
return true
}
<span style="color: #aaa"> if c == '\\' { if c == '\\' {</span>
<span style="color: #aaa"> read() read()</span>
<span style="color: #aaa"> } }</span>
<span style="color: #aaa">} }</span>
return true
</pre>
<p>
The final version is:
<pre>func parseQuoted(read func() byte) bool {
if read() != '"' {
return false
}
var c byte
for c != '"' {
c = read()
if c == '\\' {
read()
}
}
return true
}
</pre>
<p>
Earlier I explained the regular expression by saying it
“matches a double quote, then a sequence of zero or more characters,
and then another double quote.
Between the quotes, a character is anything that’s not a quote or backslash,
or else a backslash followed by anything.”
It’s easy to see that this program does exactly that.
<p>
Hand-written programs can have opportunities to use control flow too.
For example, here is a version that a person might have written by hand:
<pre>if read() != '"' {
return false
}
inEscape := false
for {
c := read()
if inEscape {
inEscape = false
continue
}
if c == '"' {
return true
}
if c == '\\' {
inEscape = true
}
}
</pre>
<p>
The same kinds of small steps can be used to convert the boolean variable
<code>inEscape</code> from data to control flow,
ending at the same cleaned up version.
<p>
<p>
Either way, the <code>state</code> variable in the original is now implicitly represented
by the program counter, meaning which part of the program is executing.
The comments in this version indicate the implicit value of the original’s
<code>state</code> (or <code>inEscape</code>) variables:
<pre><span style="color: #aaa">func parseQuoted(read func() byte) bool {</span>
// state == 0
<span style="color: #aaa"> if read() != '"' {</span>
<span style="color: #aaa"> return false</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"></span>
<span style="color: #aaa"> var c byte</span>
<span style="color: #aaa"> for c != '"' {</span>
// state == 1 (inEscape = false)
<span style="color: #aaa"> c = read()</span>
<span style="color: #aaa"> if c == '\\' {</span>
// state == 2 (inEscape = true)
<span style="color: #aaa"> read()</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> return true</span>
<span style="color: #aaa">}</span>
</pre>
<p>
The original program was, in essence, <i>simulating</i> this control flow
using the explicit <code>state</code> variable as a program counter,
tracking which line was executing.
If a program can be converted to store explicit state in control flow instead,
then that explicit state was merely an awkward simulation of the control flow.
<a class=anchor href="#more"><h2 id="more">More Threads for More State</h2></a>
<p>
Before widespread support for concurrency,
that kind of awkward simulation was often necessary,
because a different part of the program wanted
to use the control flow instead.
<p>
For example, suppose the text being parsed is
the result of decoding base64 input, in which
sequences of four 6-bit characters (drawn from a 64-character alphabet)
decode to three 8-bit bytes.
The core of that decoder looks like:
<pre>for {
c1, c2, c3, c4 := read(), read(), read(), read()
b1, b2, b3 := decode(c1, c2, c3, c4)
write(b1)
write(b2)
write(b3)
}
</pre>
<p>
If we want those <code>write</code> calls to feed into the parser from the
previous section, we need a parser that can be called with one byte at a time,
not one that demands a <code>read</code> callback.
This decode loop cannot be presented as a <code>read</code> callback
bceause it obtains 3 input bytes at a time and
uses its control flow to track which ones have been written.
Because the decoder is storing its own state
in its control flow, <code>parseQuoted</code> cannot.
<p>
In a non-concurrent program, this base64 decoder and <code>parseQuoted</code>
would be at an impasse: one would have to give up
its use of control flow state and fall back to some kind
of simulated version instead.
<p>
<p>
To rewrite <code>parseQuoted</code>, we have to reintroduce the <code>state</code>
variable, which we can encapsulate in a struct with a <code>Write</code> method:
<pre>type parser struct {
state int
}
func (p *parser) Init() {
p.state = 0
}
func (p *parser) Write(c byte) Status {
switch p.state {
case 0:
if c != '"' {
return BadInput
}
p.state = 1
case 1:
if c == '"' {
return Success
}
if c == '\\' {
p.state = 2
} else {
p.state = 1
}
case 2:
p.state = 1
}
return NeedMoreInput
}
</pre>
<p>
The <code>Init</code> method initializes the state,
and then each <code>Write</code> loads the state,
takes actions based on the state and the input byte,
and then saves the state back to the struct.
<p>
For <code>parseQuoted</code>, the state machine is simple enough that this may be completely fine.
But maybe the state machine is much more complex,
or maybe the algorithm is best expressed recursively.
In those cases, being passed an input sequence by the caller
one byte at a time means making all that state explicit in a
data structure simulating the original control flow.
<p>
Concurrency eliminates the contention between different parts of the
program over which gets to store state in control flow,
because now there can be multiple control flows.
<p>
<p>
Suppose we already have the <code>parseQuoted</code> function,
and it’s big and complicated
and tested and correct, and we don’t want to change it.
We can avoid editing that code at all by writing this wrapper:
<pre>type parser struct {
c chan byte
status chan Status
}
func (p *parser) Init() {
p.c = make(chan byte)
p.status = make(chan Status)
go p.run()
<-p.status // always NeedMoreInput
}
func (p *parser) run() {
if !parseQuoted(p.read) {
p.status <- BadSyntax
} else {
p.status <- Success
}
}
func (p *parser) read() byte {
p.status <- NeedMoreInput
return <-p.c
}
func (p *parser) Write(c byte) Status {
p.c <- c
return <-p.status
}
</pre>
<p>
Note the use of <code>parseQuoted</code>, completely unmodified, in the <code>run</code> method.
Now the base64 decoder can use <code>p.Write</code> and keep its program counter
and local variables.
<p>
The new goroutine that <code>Init</code> creates runs the <code>p.run</code> method,
which invokes the original <code>parseQuoted</code> function with an
appropriate implementation of <code>read</code>.
Before starting <code>p.run</code>, <code>Init</code> allocates two channels for communicating
between the <code>p.run</code> method, runing in its own goroutine,
and whatever goroutine calls <code>p.Write</code> (such as the base64 decoder’s goroutine).
The channel <code>p.c</code> carries bytes from <code>Write</code> to <code>read</code>, and the channel <code>p.status</code>
carries status updates back.
Each time <code>parseQuoted</code> calls <code>read</code>, <code>p.read</code> sends <code>NeedMoreInput</code> on <code>p.status</code>
and waits for an input byte on <code>p.c</code>.
Each time <code>p.Write</code> is called, it does the opposite: it sends the input byte <code>c</code> on <code>p.c</code>
and then waits for and returns an updated status from <code>p.status</code>.
These two calls take turns, back and forth,
one executing and one waiting at any given moment.
<p>
To get this cycle going, the <code>Init</code> method does the initial receive from <code>p.status</code>,
which will correspond to the first <code>read</code> in <code>parseQuoted</code>.
The actual status for that first update is guaranteed to be <code>NeedMoreInput</code>
and is discarded.
To end the cycle, we assume that when <code>Write</code> returns <code>BadSyntax</code> or <code>Success</code>,
the caller knows not to call <code>Write</code> again.
If the caller incorrectly kept calling <code>Write</code>,
the send on <code>p.c</code> would block forever, since <code>parseQuoted</code> is done.
We would of course make that more robust in a production implementation.
<p>
<p>
By creating a new control flow (a new goroutine), we were able to keep the
code-state-based implementation of <code>parseQuoted</code> as well as our code-state-based
base64 decoder.
We avoided having to understand the internals of either implementation.
In this example, both are trivial enough that rewriting one would not
have been a big deal,
but in a larger program, it could be a huge win to be able to write this kind of
adapter instead of having to make changes to existing code.
As we’ll discuss <a href="#limitations">later</a>, the conversion is not entirely free – we need to
make sure the extra control flow gets cleaned up, and we need to think about
the cost of the context switches – but it may well still be a net win.
<a class=anchor href="#stack"><h2 id="stack">Store Stacks on the Stack</h2></a>
<p>
The base64 decoder’s control flow state included not just the program counter
but also two local variables.
Those would have to be pulled out into a struct if the decoder had to be changed
not to use control flow state.
Programs can use an arbitrary number of local variables by using their call stack.
For example, suppose we have a simple binary tree data structure:
<pre>type Tree[V any] struct {
left *Tree[V]
right *Tree[V]
value V
}
</pre>
<p>
If you can’t use control flow state, then to implement iteration over this tree,
you have to introduce an explicit “iterator”:
<pre>type Iter[V any] struct {
stk []*Tree[V]
}
func (t *Tree[V]) NewIter() *Iter[V] {
it := new(Iter[V])
for ; t != nil; t = t.left {
it.stk = append(it.stk, t)
}
return it
}
func (it *Iter[V]) Next() (v V, ok bool) {
if len(it.stk) == 0 {
return v, false
}
t := it.stk[len(it.stk)-1]
v = t.value
it.stk = it.stk[:len(it.stk)-1]
for t = t.right; t != nil; t = t.left {
it.stk = append(it.stk, t)
}
return v, true
}
</pre>
<p>
<p>
On the other hand, if you can use control flow state,
confident that other parts of the
program that need their own state can run in other control flows,
then you can implement iteration without an explicit iterator,
as a method that calls a yield function for each value:
<pre>func (t *Tree[V]) All(f func(v V)) {
if t != nil {
t.left.All(f)
f(t.value)
t.right.All(f)
}
}
</pre>
<p>
The <code>All</code> method is obviously correct.
The correctness of the <code>Iter</code> version is much less obvious.
The simplest explanation is that <code>Iter</code> is simulating <code>All</code>.
The <code>NewIter</code> method’s loop that sets up <code>stk</code> is
simulating the recursion in <code>t.All(f)</code> down successive <code>t.left</code> branches.
<code>Next</code> pops and saves the <code>t</code> at the top of the stack
and then simulates the recursion in <code>t.right.All(f)</code> down successive <code>t.left</code> branches,
setting up for the next <code>Next</code>.
Finally it returns the value from the top-of-stack <code>t</code>,
simulating <code>f(value)</code>.
<p>
We could write code like <code>NewIter</code> and argue its correctness
by explaining that it simulates a simple function like <code>All</code>.
I’d rather write <code>All</code> and stop there.
<a class=anchor href="#tree"><h2 id="tree">Comparing Binary Trees</h2></a>
<p>
One might argue that <code>NewIter</code> is better than <code>All</code>,
because it does not use any control flow state, so it can be
used in contexts that already use their control flows
to hold other information.
For example, what if we want to traverse two binary trees
at the same time, checking that they hold the same values
even if their internal structure differs.
With <code>NewIter</code>, this is straighforward:
<pre>func SameValues[V any](t1, t2 *Tree[V]) bool {
it1 := t1.NewIter()
it2 := t2.NewIter()
for {
v1, ok1 := it1.Next()
v2, ok2 := it2.Next()
if v1 != v2 || ok1 != ok2 {
return false
}
if !ok1 && !ok2 {
return true
}
}
}
</pre>
<p>
This program cannot be written as easily using <code>All</code>,
the argument goes, because <code>SameValues</code> wants to use
its own control flow (advancing two lists in lockstep)
that cannot be replaced by <code>All</code>’s control flow
(recursion over the tree).
But this is a false dichotomy, the same one we saw
with <code>parseQuoted</code> and the base64 decoder.
If two different functions have different demands
on control flow state, they can run in different control flows.
<p>
<p>
In our case, we can write this instead:
<pre><span style="color: #aaa">func SameValues[V any](t1, t2 *Tree[V]) bool {</span>
c1 := make(chan V)
c2 := make(chan V)
go gopher(c1, t1.All)
go gopher(c2, t2.All)
<span style="color: #aaa"> for {</span>
v1, ok1 := <-c1
v2, ok2 := <-c2
<span style="color: #aaa"> if v1 != v2 || ok1 != ok2 {</span>
<span style="color: #aaa"> return false</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> if !ok1 && !ok2 {</span>
<span style="color: #aaa"> return true</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa"> }</span>
<span style="color: #aaa">}</span>
<span style="color: #aaa"></span>
func gopher[V any](c chan<- V, all func(func(V))) {
all(func(v V) { c <- v })
close(c)
}
</pre>
<p>
The function <code>gopher</code> uses <code>all</code> to walk a tree, announcing
each value into a channel. After the walk, it closes the channel.
<p>
<code>SameValues</code> starts two concurrent gophers,
each of which walks one tree and announces the values into one channel.
Then <code>SameValues</code> does exactly the same loop as before
to compare the two value streams.
<p>
Note that <code>gopher</code> is not specific to binary trees in any way:
it applies to <i>any</i> iteration function.
That is, the general idea of starting a goroutine to run the <code>All</code> method
works for converting any code-state-based iteration into an
incremental iterator.
My next post, “<a href="coro">Coroutines for Go</a>,” expands on this idea.
<a class=anchor href="#limits"><h2 id="limits">Limitations</h2></a>
<p>
This approach of storing data in control flow is not a panacea.
Here are a few caveats:
<ul>
<li>
<p>
If the state needs to evolve in ways that don’t naturally map
to control flow, then it’s usually best to leave the state as data.
For example, the state maintained by a node in a distributed system
is usually not best represented in control flow,
because timeouts, errors, and other unexpected events tend to
require adjusting the state in unpredictable ways.
<li>
<p>
If the state needs to be serialized for operations like snapshots,
or sending over a network, that’s usually easier with data than code.
<li>
<p>
When you do need to create multiple control flows to hold different control flow state,
the helper control flows need to be shut down. When <code>SameValues</code> returns
false, it leaves the two concurrent <code>gopher</code>s blocked waiting to
send their next values. Instead, it should unblock them.
That requires communication in the other direction
to tell <code>gopher</code> to stop early. “<a href="coro">Coroutines for Go</a>” shows that.
<li>
<p>
In the multiple thread case, the switching costs can be significant.
On my laptop, a C thread switch takes a few microseconds.
A channel operation and goroutine switch
is an order of magnitude cheaper: a couple hundred nanoseconds.
An optimized coroutine system can reduce the cost to tens of
nanoseconds or less.</ul>
<p>
In general, storing data in control flow is a valuable tool
for writing clean, simple, maintainable programs.
Like all tools, it works very well for some jobs and not as well for others.
<a class=anchor href="#gopher"><h2 id="gopher">Counterpoint: John McCarthy’s GOPHER</h2></a>
<p>
The idea of using concurrency to align a pair of binary trees is over 50 years old.
It first appeared in Charles Prenner’s
“<a href="https://dl.acm.org/doi/abs/10.1145/942582.807990">The control structure facilities of ECL</a>”
(<i>ACM SIGPLAN Notices</i>, Volume 6, Issue 12, December 1971; see pages 106–109).
In that presentation, titled “Tree Walks Using Coroutines”, the problem was to take two binary trees A and B with the same number of nodes
and copy the value sequence from A into B despite the two having different internal structure.
They present a straightforward coroutine-based variant.
<p>
Brian Smith and Carl Hewitt introduced the problem of simply comparing two
Lisp-style cons trees (in which internal nodes carry no values)
in their draft of “<a href="https://www.scribd.com/document/185900689/A-Plasma-Primer">A Plasma Primer</a>” (March 1975; see pages 61-62).
For that problem, which they named “samefringe”, they used continuation-based actors
to run a pair of “fringe” actors (credited to Howie Shrobe)
over the two trees and report nodes back to a comparison loop.
<p>
Gerald Sussman and Guy Steele presented the samefringe problem again,
in “<a href="https://dspace.mit.edu/bitstream/handle/1721.1/5794/AIM-349.pdf">Scheme: An Interpreter for Extended Lambda Calculus</a>” (December 1975; see pages 8–9),
with roughly equivalent code (crediting Smith, Hewitt, and Shrobe for inspiration).
They refer to it as a “classic problem difficult to solve in most programming languages”.
<p>
In August 1976, <i>ACM SIGART Bulletin</i> published Patrick Greussay’s
“<a href="https://dl.acm.org/doi/10.1145/1045270.1045273">An Iterative Lisp Solution to the Samefringe Problem</a>”,
This prompted <a href="https://dl.acm.org/action/showFmPdf?doi=10.1145%2F1045276">a response letter by Tim Finin and Paul Rutler in the November 1976 issue</a> (see pages 4–5)
pointing out that Greussay’s solution runs in quadratic time and memory
but also remarking that
“the SAMEFRINGE problem has been notoriously overused as a justification for coroutines.”
That discussion prompted <a href="https://dl.acm.org/action/showFmPdf?doi=10.1145%2F1045283">a response letter by John McCarthy in the February 1977 issue</a> (see page 4).
<p>
In his response, titled “Another samefringe”, McCarthy gives the following LISP solution:
<pre>(DE SAMEFRINGE (X Y)
(OR (EQ X Y)
(AND (NOT (ATOM X))
(NOT (ATOM Y))
(SAME (GOPHER X) (GOPHER Y)))))
(DE SAME (X Y)
(AND (EQ (CAR X) (CAR Y))
(SAMEFRINGE (CDR X) (CDR Y))))
(DE GOPHER (U)
(COND ((ATOM (CAR U)) U)
(T (GOPHER (CONS (CAAR U)
(CONS (CDAR U) (CDR U)))))))
</pre>
<p>
He then explains:<blockquote>
<p>
<i>gopher</i> digs up the first atom in an S-expression, piling up the <i>cdr</i> parts
(with its hind legs) so that indexing through the atoms can be resumed.
Because of shared structure, the number of new cells in use in each argument
at any time (apart from those occupied by the original expression and assuming
iterative execution) is the number of <i>cars</i> required to go from the top to the
current atom – usually a small fraction of the size of the S-expression.</blockquote>
<p>
In modern terms, McCarthy’s <code>GOPHER</code> loops applying <a href="https://en.wikipedia.org/wiki/Tree_rotation">right tree rotations</a>
until the leftmost node is at the top of the tree.
<code>SAMEFRINGE</code> applies <code>GOPHER</code> to the two trees, compares the tops,
and then loops to consider the remainders.
<p>
After presenting a second, more elaborate solution, McCarthy remarks:<blockquote>
<p>
I think all this shows that <i>samefringe</i> is not an example of the need for co-routines,
and a new “simplest example” should be found.
There is no merit in merely moving information from data structure to control structure,
and it makes some kinds of modification harder.</blockquote>
<p>
I disagree with “no merit”. We can view McCarthy’s <code>GOPHER</code>-ized trees as an encoding
of the same stack that <code>NewIter</code> maintains but in tree form.
The correctness follows for the same reasons: it is simulating a
simple recursive traversal.
This <code>GOPHER</code> is clever, but it only works on trees.
If you’re not John McCarthy, it’s easier to write the recursive traversal
and then rely on the general, concurrency-based <code>gopher</code> we saw earlier
to do the rest.
<p>
My experience is that when it is possible,
moving information from data structure to control structure
usually makes programs clearer, easier to understand,
and easier to maintain.
I hope you find similar results.
Opting In to Transparent Telemetrytag:research.swtch.com,2012:research.swtch.com/telemetry-opt-in2023-02-24T08:59:00-05:002023-02-24T09:01:00-05:00Updating the transparent telemetry design to be opt-in. (Transparent Telemetry, Part 4)
<p>
Earlier this month I posted “<a href="telemetry-intro">Transparent Telemetry for Open-Source Projects</a>”,
making the case that open-source software projects need to find an open-source-friendly way
to collect basic usage and performance information about their software,
to help maintainers understand how their software is used and prioritize their work.
I invited feedback on a GitHub discussion and over email.
<p>
In general, the feedback was mostly constructive, and mostly positive.
In the GitHub discussion, there were some unconstructive trolls
with no connection to Go who showed up for a while,
but they were the exception rather than the rule:
most people seemed to be engaging in good faith.
I also saw some good discussion on Twitter and Mastodon,
and I received some good feedback via email.
<p>
People who read the posts seemed to agree that there's a real problem here
for open-source maintainers and that transparent telemetry or something
like it is an appropriately minimal amount of collection.
By far the most common suggestion was to make the system
opt-in (default off) instead of opt-out (default on).
I have revised the design to do that.
<p>
The rest of this post discusses the reasons for the change to opt-in
as well as the effects on the rest of the design.
<a class=anchor href="#opt-in_vs_opt-out"><h2 id="opt-in_vs_opt-out">Opt-in vs Opt-out</h2></a>
<p>
Many people were fine with this telemetry being on by default,
precisely because it is carefully designed to be so minimal.
On the other hand, multiple people told me things like
“this is the first system I've seen that I'd actually opt in to, but I'd still
turn it off if it was on by default.”
A few people were concerned about a kind of reidentification attack
on the published records:
if you knew that company X was on record as being the only user
of a given obscure configuration, you might be able to find their
records and learn other things about which Go features they use.
My imagination is not good enough to figure out how an attacker
could exploit data like that,
but even so it does seem not worth dismissing entirely.
<p>
Another concern I have with an opt-out system is that my goal
is to find a path that works for many open source projects, not just Go.
Although we've carefully architected the system not to collect
sensitive data, if Go's telemetry were opt-out, that might be used to
justify other opt-out systems that are not as careful about
what they collect.
For example, despite the blog posts discussing at length
the distinction between
everything .NET collects and the much more limited data I am suggesting
that Go collect,
too many people still seemed to equate them.
<p>
For all these reasons, I've revised the design to be opt-in (default off).
Doing so does introduce at least two costs:
first, we must run an ongoing campaign to educate users
about opting in is a good choice for them,
and second, with fewer systems reporting, the telemetry cost
imposed on any particular user is higher.
<a class=anchor href="#campaign"><h2 id="campaign">The Campaign Cost of Opt-In</h2></a>
<p>
The first new cost in an opt-in system is the ongoing overhead
of asking users to opt in, with a clear explanation of what is and
is not collected.
There are a few obvious times when that makes sense:
<ul>
<li>
During a graphical install of Go,
with two different buttons for the two choices
(no click-through default).
<li>
In blog posts and release notes for new Go releases.
<li>
During our traditional user surveys.
<li>
The first time VS Code Go is invoked on Go code.</ul>
<p>
We may think of others ways too, of course, including giving talks
about this topic, explaining why we designed the system as we did,
and encouraging users to opt in.
<p>
One person suggested making the system opt-in and then
collecting tons more information, like the details of every
command invocation, geolocation, and more.
I think that would be a serious mistake.
I would be happy to stand in front of a crowd and
explain the current system and why users should opt in.
I would be embarrassed to try to do the same for
the more detailed telemetry this person suggested,
because it seems indefensible: we simply don't need
all that information to inform our decisions.
<a class=anchor href="#privacy"><h2 id="privacy">The Privacy Cost of Opt-In</h2></a>
<p>
Even with an ongoing campaign to have users opt in,
we should expect that fewer systems will end up with
telemetry enabled than if it were opt-out.
I don't have a good sense of what level of opting in
we should expect from an effective opt-in campaign.
In part, this is because most systems can't run such
a campaign (see the previous paragraph).
However, in the past I have heard some people say that
typical opt-in rates can be as low as a few percent.
<p>
I think we would be happy to get 10% and thrilled to get 20%, but
I don't know whether that's possible.
We don't even have a precise way to measure the opt-in rate,
since by definition we can't see anything about
people who have opted out.
We have estimated the
<a href="gophercount">number of Go developers</a>
at a few million, so seeing a few hundred thousand
systems opted in would, I think, be a success.
<p>
Transparent telemetry uses sampling to reduce the privacy cost
to any one system: we only need around 16,000 reports
in a given week for 1% accuracy at a 99% confidence level.
If there are a million systems reporting, which seemed plausible
in the opt-out system, then any given system only needs to
report 1.6% of the time, or less than once a year.
On the other hand, if we'd be happy to get 100,000 systems
opted in, then we need each system to report 16% of the time,
or about once every month and a half.
The opt-in system is fundamentally more invasive
to any given installation than the opt-out system.
Of course, by design it is still not terribly invasive,
since uploaded reports do not contain any identifying
information or any strings not already known to the collection system.
But still, individual systems carry a larger burden when there
are fewer, as there will be in an opt-in system.
<a class=anchor href="#decide"><h2 id="decide">Can We Still Make Good Decisions?</h2></a>
<p>
Thinking about even lower opt-in rates,
will the system still able to provide data to help us making decisions?
I think it will be, although the sampling rates will be higher.
<p>
A common mistake when thinking about polls and other sampled data collection
is to confuse the fraction of samples
(surveying, say, 1% of a population) with the accuracy of the result
(making a claim with 1% accuracy).
Those are different percentages.
As we saw in “<a href="sample">The Magic of Sampling, and its Limitations</a>”,
if there are a billion systems and we only sample 16,000 of them
(0.0016%), we can still make claims with 1% accuracy,
despite ignoring 99.9984% of the systems.
Of course, there are not billions of Go installations (yet!).
Assuming there are three million Go installations,
as long as there is no correlation between whether a system is opted in
and the value we want to measure,
having even a 1% opt-in rate (30,000 systems available to report)
should give us fairly accurate data.
And of course as more systems opt in,
each individual system will be sampled less often.
<p>
The “as long as there is no correlation” caveat deserves more scrutiny.
What might be correlated with opting in to Go telemetry?
Certainly the opt-in campaign methods will skew who opts in:
<ul>
<li>
People using the Go graphical installers will be more likely to opt in.
<li>
People who read our blog and release notes will be more likely to opt in.
<li>
People who take our surveys will be more likely to opt in.
<li>
People who use VS Code Go will be more likely to opt in.</ul>
<p>
The next question is whether any of these would be correlated
with and therefore skew metrics we are interested in,
and how we would know.
One obvious blind spot would be people installing Go
through a package manager like <code>apt-get</code> or Homebrew.
Perhaps the telemetry would undercount Linux and Mac.
It might also overcount VS Code Go users.
We can check many of these high-level machine demographics
against the Go developer survey and others.
These are biases to be aware of when interpreting the data,
but I've been clear from the start that data is one input
to the decision making process, not the determining factor.
Some data is almost always better than no data:
if the 10,000 systems you can see are all working properly,
it's incredibly unlikely that there's a problem affecting
even as many as 5% of all systems.
<a class=anchor href="#none"><h2 id="none">No Telemetry At All?</h2></a>
<p>
Before ending the post, I want to look at a couple other
common suggestions.
The first was to not have any telemetry at all:
just use bug reports and surveys.
Of all the people I saw make this comment,
not one of them acknowledged reading the
<a href="telemetry-intro#why">explanation of why those don't work</a>
at the start of the first post.
<p>
Some people made a “slippery slope” argument that
having any telemetry at all would make it easier to add
more invasive telemetry:
good telemetry opens the door to bad telemetry.
When you examine this argument more closely, it doesn't hold up.
The process for adding “bad telemetry” to Go
would be the same whether or not there's already “good telemetry”:
modify the source code, get it checked in, and wait for it
to go out in a new release.
The defense strategy in either case is that Go is open source:
anyone can inspect its source code, including recompiling it and
verifying that the distributed binaries match that source code.
Also, anyone can watch and participate in its development.
If you are concerned about bad actors making changes,
watch the development of Go, or trust others to watch it.
If you don't believe enough people are watching,
then you probably shouldn't use Go at all.
You might <a href="deps">reconsider other open-source software you use</a> too.
<a class=anchor href="#crypto"><h2 id="crypto">Cryptographic Approaches</h2></a>
<p>
The most interesting suggestion was to use a system like
<a href="https://crypto.stanford.edu/prio/paper.pdf">Prio</a>
or <a href="https://arxiv.org/pdf/2001.03618.pdf">Prochlo</a>.
Prio in particular is being productionized by
ISRG (the same group that runs Let's Encrypt)
as <a href="https://divviup.org/about/">Divvi Up</a>.
<p>
Those systems are not yet widely available,
and perhaps it would make sense to use them in the future.
Both depend on having two separate entities to run the system,
with users trusting that the two will not collude to unmask their
privacy.
This kind of scheme provides enough privacy to
allow collecting far more sensitive data,
such as fine-grained location data or web page URLs.
Luckily, for Go we have no need of anything that sensitive.
<p>
Another problem with these systems is that they are difficult to explain and inspect.
The math behind them is solid, and if you're collecting that kind of
sensitive data, you absolutely need something like them.
But if, like in the Go design, you're not sending anything sensitive,
then it could be better to use a simpler system
that is easier to explain, so that people understand exactly
what they are opting in to, and that is easier to watch,
so that people can see exactly what information is being sent.
People might well be more comfortable
and more likely to opt in to a system they
can more easily understand and verify.
<p>
Adopting a cryptographic schemes would therefore
require significantly more effort and explanation for a relatively small privacy improvement;
For now, it doesn't seem worth using one of those.
If in the future one of those systems became
a standard, widely accepted mechanism for open source telemetry,
it would make sense to reexamine that decision.
<p>
Another possibility would be for operating system vendors
to serve the role of collectors in one of these systems,
taking care of collection from many programs running
on the system instead.
If done right, this could ensure greater privacy
(provided users trust their operating system vendors!)
instead of each system inventing a new wheel.
If privacy-respecting telemetry collection became a standard
operating system functionality, then it would absolutely make
sense for Go to use that.
Some day, perhaps.
<a class=anchor href="#next_steps"><h2 id="next_steps">Next Steps</h2></a>
<p>
Next week I intend to submit an actual Go proposal to add
<i>opt-in</i> transparent telemetry to the Go toolchain.
I have also added notes to the original blog posts mentioning
the design change and pointing to this one.
<p>
It probably makes sense to prototype the system in <code>gopls</code>,
the LSP server used by VS Code Go and other editors,
to allow faster iteration.
Once we are happy with the system, then it would
make sense to add it to the standard Go toolchain as well,
to enable the <a href="telemetry-uses">many use cases</a> in the earlier post.
<p>
Thanks to everyone who took the time to write constructive,
helpful feedback.
Those discussions are open source at its best.
Use Cases for Transparent Telemetrytag:research.swtch.com,2012:research.swtch.com/telemetry-uses2023-02-08T08:00:03-05:002023-02-08T08:02:03-05:00Many examples of ways that transparent telemetry would help developers understand their software better. (Transparent Telemetry, Part 3)
<p>
I believe open-source software projects need to find an open-source-friendly way to do telemetry.
This post is part of a short series of posts describing <i>transparent telemetry</i>,
one possible answer to that challenge.
For more about the rationale and background, see <a href="telemetry-intro">the introductory post</a>.
For details about the design, see <a href="telemetry-design">the previous post</a>.
<p>
For many years I believed we could make good enough decisions for Go
without collecting any telemetry, by focusing on testing, benchmarks,
and surveys.
Over that time, however, I have collected many examples of decisions
that can’t be made in a principled way without better data,
as well as performance problems that would go unnoticed.
This post presents some of those examples.
<a class=anchor href="#gopath-usage"><h3 id="gopath-usage">What fraction of <code>go</code> command invocations still use GOPATH mode (as compared to Go modules)? What fraction of Go installations do?</h3></a>
<p>
Understanding how often GOPATH mode (non-module mode) is used is important
for understanding how important it is to keep running, and for which use cases.
Like many of these questions, knowing the basic usage statistics
would help us decide whether or not to ask more specific questions on a Go survey
or in research meetings with Go users.
We are also considering changes like fine-grained compatibility
(<a href="https://go.dev/issue/56986">#56986</a> and <a href="https://go.dev/issue/57001">#57001</a>)
and per-iteration loop scoping (<a href="https://go.dev/issue/56010">#56010</a>)
that depend heavily on version information available only in module mode.
Understanding how often GOPATH mode is used would help us
understand how many users those changes would leave behind.
<p>
To answer these questions, we would need two counters:
P, the number of times the <code>go</code> command is invoked in GOPATH mode, and
N, the the number of times the <code>go</code> command is invoked at all.
The answer to the first question is P divided by N.
When collected across all sampled systems, it wouldn’t make sense
to present the total P divided by the total N.
Instead, we’d want to present the distribution of P/N in sampled reports,
plotting the cumulative distribution of P/N over all systems.
The answer to the second question is the number of reports with P > 0
divided by the total number of reports.
This can also be read off the cumulative distribution graph by
looking for P/N > 0.
<a class=anchor href="#work-usage"><h3 id="work-usage">What fraction of <code>go</code> command invocations use Go workspaces (a <code>go.work</code> file)? What fraction of Go installations do?</h3></a>
<p>
We added Go workspaces in Go 1.18.
Most of the time, <code>go.work</code> files will be most appropriate in modules that
are not dependencies of other modules, such as closed-source projects.
Understanding how often workspaces are used in practice would help us
prioritize work on them as well as understand whether they are as useful
as we expected.
That knowledge would again inform future survey questions and research interviews.
<p>
Answering these questions requires the N counter from above
as well as a new counter for the number of times the <code>go</code> command is invoked in a Go workspace.
<p>
<a class=anchor href="#buildmode=shared"><h3 id="buildmode=shared">What fraction of <code>go</code> command invocations or Go installations use <code>-buildmode=shared</code>? What fraction of Go installations do?</h3></a>
<p>
As mentioned in the <a href="telemetry-intro">introductory post</a>,
<code>-buildmode=shared</code> has never worked particularly well,
and its current design is essentially incompatible with Go modules.
We have <a href="https://github.com/golang/go/issues/47788#issuecomment-954890659">a rough design</a>
for how we might make it work, but that’s a lot of work, and it’s unclear
whether it should be a high priority.
Our guess is that there are essentially no users of this flag,
and that therefore it shouldn’t be prioritized.
But that’s just a guess; data would be better.
<p>
Answering these questions requires the N counter as well as a new counter
for the number of times the <code>go</code> command is invoked with <code>-buildmode=shared</code>.
In general we would probably want to define a counter for the usage of every
command flag. When the set of flag values is limited to a fixed set,
as it is for <code>-buildmode</code>, we would also include the flag value,
to distinguish, for example, <code>-buildmode=c-shared</code> (which is working and easy to support)
from <code>-buildmode=shared</code> (which is not). As noted in the discussion of this
example in the main text, it is possible that only certain
build machines using <code>-buildmode=shared</code> and those all also
have telemetry disabled. If so, we won’t see them.
This is a fundamental limitation of telemetry: it serves
as an additional input into decisions, not as the only deciding factor.
<a class=anchor href="#goosage"><h3 id="goosage">What fraction of <code>go</code> command invocations use a specific GOOS, GOARCH, or subarchitecture setting? What fraction of Go installations do?</h3></a>
<p>
Go has a <a href="https://go.dev/wiki/PortingPolicy">porting policy</a> that distinguishes
between first-class and secondary ports.
The first-class ports are the ones that the Go team at Google supports directly
and that are important enough to stop a release if a serious bug is encountered.
We do not have a principled, data-driven way to decide which ports are
first-class and which are secondary.
Instead, we make guesses based on prior knowledge and other non-Go usage statistics.
Today the first-class ports are Linux, macOS, and Windows running on 386, amd64, arm, and arm64.
Should we add FreeBSD to the list? (Maybe?) What about RISC-V? (Probably not yet?)
At the subarchitecture level, is it time to retire support for <code>GOARM=5</code> (ARMv5),
which does not have support for atomic memory operations in hardware
and must rely on the operating system kernel to provide them? (I have no idea.)
Data would be helpful in making those decisions.
<p>
Answering these questions requires the N counter as well as a new counter
for the number of times each of the Go-specific environment variables
takes one of their known values. For example, there would be different counters
for the <code>go</code> command running with
<code>GOARCH=amd64</code>, <code>GOARCH=arm</code>, <code>GOOS=linux</code>, <code>GOOS=windows</code>, <code>GOARM=5</code>, <code>GO386=sse2</code>, and so on.
<a class=anchor href="#windows7"><h3 id="windows7">What fraction of Go installations run on a particular major version of an operating system such as Windows 7, Windows 8, or WSL?</h3></a>
<p>
As operating systems get older, they often become harder to support.
It gets harder to run them in modern virtual machines for testing,
kernel bugs affecting Go are left unfixed, and so on.
One of the <a href="https://github.com/golang/go/wiki/PortingPolicy#removing-old-operating-system-and-architecture-versions">considerations</a>
when deciding to end support for a port is the amount of usage it gets,
yet we have no reliable way to measure that.
Recently we decided to end support for Windows 7 (<a href="https://go.dev/issue/57003">#57003</a>) and Windows 8 (<a href="https://go.dev/issue/57004">#57004</a>).
We were able to rely on other considerations, namely Microsoft’s own support policy,
as well as operating system statistics collected by companies deploying Go binaries.
It would be helpful to have our own direct statistics as well.
More recently, we announced that Go 1.20 will be the last
release to support macOS High Sierra and were
promptly asked to keep it around (<a href="https://go.dev/issue/57125#issuecomment-1416277589">#57125</a>).
<p>
Answering these questions would require counters for each major operating system
version like <code>Windows 7</code>, <code>Windows 8</code>, <code>Debian Buster</code>, <code>Linux 3.2.x</code>, and so on.
There would be no need to collect fine-grained information like specific patch releases,
unless some specific release was important for some reason
(for example, we might want more resolution on Linux 2.6.x,
to inform advancing the minimum Linux kernel version beyond 2.6.32).
<a class=anchor href="#cross"><h3 id="cross">What fraction of builds cross-compile, and what are the most common cross-compilation targets?</h3></a>
<p>
Go has <a href="https://go.dev/blog/ports">always been portable and supported cross-compilation</a>.
That’s clearly important and won’t be removed,
but what are the host/target pairs that are most important to make work well?
For example, Go 1.20 will improve compiling macOS binaries from a Linux system:
the binaries will now use the host DNS resolver, same as native macOS builds.
After I made this change, multiple users reached out privately to thank me
for solving a major pain point for them.
This was a surprise to me: I was unaware that cross-compiling macOS binaries
from Linux was so common.
Are there other common cross-compilation targets that merit special attention?
<p>
Answering these questions uses the same counters as in the last section,
along with information about the native default <code>GOOS</code> and <code>GOARCH</code> settings.
<a class=anchor href="#miss1"><h3 id="miss1">What is the overall cache miss rate in the Go build cache?</h3></a>
<p>
The Go build cache is a critical part of the user experience,
but we don’t know how well it works in practice.
The original build cache heuristics were decided in 2017 based on
a tracing facility that we <a href="https://go.dev/issue/22990">asked users to run for a few weeks</a>
and submit traces.
Since then, many things about Go builds have changed,
and we have no idea how well things are working.
<p>
Answering this question requires a counter for build cache misses
and another for build cache hits or accesses.
<a class=anchor href="#miss2"><h3 id="miss2">What is the typical cache miss rate in the Go build cache for a given build?</h3></a>
<p>
A more detailed question would be what a typical cache miss rate is for a given build.
We hope it would be very low, and if it went up, we’d want to investigate why.
This is something that users are unlikely to notice and report directly,
but it would still indicate a serious problem worth fixing.
<p>
Answering this question would require keeping an in-memory
total of cache misses and hits, and then at the end of the build
computing the overall rate and incrementing a counter
corresponding to one column of a histogram, as described in the <a href="telemetry-intro">introductory post</a>.
<a class=anchor href="#stdmiss"><h3 id="stdmiss">What fraction of <code>go</code> command invocations recompile the standard library?</h3></a>
<p>
The cache hit rate for standard library packages should be very high,
since users are not editing those packages.
Before Go 1.20, distributions shipped with pre-compiled <code>.a</code> files
that should have served as cache hits for default builds.
Only cross-compilation or changing build tags or compiler flags
should have caused a rebuild, and then those would have been cached too.
The cache hit rate in the standard library can therefore serve as
a good indicator that the build cache is working properly.
Especially if the overall hit rate is low, checking the standard library hit rate
should help distinguish a few possible failure modes.
As noted at the <a href="telemetry-intro">introductory post</a>, this metric would have detected
a bug invalidating the pre-compiled macOS <code>.a</code> files that lingered
for six releases instead.
<p>
Answering this question would require keeping a counter of
the number of <code>go</code> command invocations that recompile any
standard library package, along with the total number of invocations N.
<a class=anchor href="#cacheage"><h3 id="cacheage">What is the typical cache hit age in the Go build cache?</h3></a>
<p>
Another important question is how big the cache needs to be.
Today the build cache has a fixed policy of keeping entries for 5 days
after last use and then deleting them, based on the traces we collected in 2017.
Some users have reported the build cache being far too large
for their machines (<a href="https://go.dev/issue/29561">#29561</a>).
One possible change would be to reduce the maximum time since last use.
Perhaps 1 day or 3 days would be almost as effective,
at a fraction of the cost.
<p>
Answering this question would require making another histogram
of counters, this time for a cache hit in an entry with a given age range.
<a class=anchor href="#cachegc"><h3 id="cachegc">Would a generational build cache work effectively?</h3></a>
<p>
Another possible change to the build cache would be to
adopt a <a href="https://github.com/golang/go/issues/29561#issuecomment-1255769245">collection policy inspired by generational garbage collection</a>:
an entry is deleted after a few hours, unless it is used a second time,
in which case it is kept until it hasn’t been used again for a few days.
Would this work?
<p>
Answering this question would require a second histogram
counting the age distribution of cache entries at their first reuse,
as well as a counter of the number of cache entries
deleted without any reuse at all.
<a class=anchor href="#modules"><h3 id="modules">How many modules are listed in a typical Go workspace?</h3></a>
<p>
Go 1.18 workspaces allow working in multiple work modules simultaneously.
How many work modules are in a typical workspace?
We don’t know, so we don’t know how important per-module overheads are.
<p>
Answering this question would require a histogram of workspace size.
<a class=anchor href="#modfiles"><h3 id="modfiles">How many <code>go.mod</code> files are loaded in a typical <code>go</code> command invocation?</h3></a>
<p>
In addition to the one or more modules making up the workspace,
every time the <code>go</code> command starts up it needs to load information
about all the dependency modules.
In many projects, the number of direct module requirements
appears small but indirect requirements add many more modules.
What is a typical number of modules the <code>go</code> command must load?
<p>
Go 1.17 introduced module graph pruning, which lists all used transitive dependencies
in the top-level <code>go.mod</code> file for a project and then avoids reading <code>go.mod</code> files
from unused indirect dependencies.
That is reduces the number of <code>go.mod</code> files loaded.
We know it works on our test cases and a few real repositories like Kubernetes,
but we don’t know how well it works across a variety of Go installations.
If we’d had telemetry before the change, we should have seen a significant
drop in this statistic. If not, that would be a bug to investigate.
<p>
Answering this question would require a histogram
of the number of modules loaded.
<a class=anchor href="#download"><h3 id="download">What is the distribution of time spent downloading modules in a given build?</h3></a>
<p>
Even with module graph pruning, the <code>go</code> command still spends time downloading
<code>go.mod</code> files and module content into the cache as needed.
How much time is spent on downloads?
If it’s a lot, then maybe we should work on reducing that number,
such as by parallelizing downloads or improving download speed.
<p>
Answering this question would require a histogram of time
spent in the module download phase of the <code>go</code> command.
<a class=anchor href="#latency"><h3 id="latency">What is the latency distribution of a Go workspace load?</h3></a>
<p>
Most <code>go</code> command invocations will not need to download anything.
Those are bottlenecked by the overall workspace load,
which spends its time reading the <code>go.mod</code> files and
scanning all the Go packages to understand the import graph
and relevant source files.
How long does that typically take?
<p>
We found out from discussions with users that the number was
significantly higher in module mode than in GOPATH mode
in a real system, so in Go 1.19 we introduced module cache indexing,
which lets the <code>go</code> command avoid reading the top of every Go source file in
every dependency at startup to learn the import graph.
If we’d had telemetry before the change, we should have seen a
significant drop in this statistic. If not, that would be a bug to investigate.
<p>
Answering this question would require a histogram of time spent
in the workspace load phase of the <code>go</code> command.
<a class=anchor href="#indexmiss"><h3 id="indexmiss">What is the module cache index miss rate?</h3></a>
<p>
The module cache index should have a very low miss rate,
because changes to the module’s dependency graph are much
less common than changes to source files.
Is that true on real systems? If not, there’s a bug to find.
<p>
Answering this question would require computing the overall
module cache index miss rate during a build and then keeping
a histogram of rates observed, just like for the build cache miss rate.
<a class=anchor href="#corrupt"><h3 id="corrupt">What fraction of builds encounter corrupt Go source files in the module cache? Does this percentage correlate with host operating system?</h3></a>
<p>
We have some reports of corrupted Go source files when using the module index (<a href="https://go.dev/issue/54651">#54651</a>).
This corruption seems very uncommon, and it seems to manifest only
when using <code>GOOS=linux</code> under the Windows Linux subsystem WSL2,
suggesting that the problem may not be Go’s fault at all.
There is not an obvious path forward for that issue, and we don’t understand
the severity, although it appears to be low.
If we could change the Go toolchain to work around observed corruption
but also self-report its occurrence, then we could get a better
sense of the impact of this bug and how important it is to fix.
For a bug like this, we would typically reason that it is not
happening very often so it would be better to leave it open and
learn more about what causes it than to add a workaround and lose
that signal.
Telemetry lets us both add the workaround and watch to see that
the bug is not a sign of a larger problem.
<p>
Answering this question would require having a counter for
encountering invalid Go source files in the module cache.
That should be a rare event, and if we saw an uptick we could
look at correlation with other information like the default <code>GOOS</code>
and the operating system major versions,
provided WSL2 was reported as something other than Linux.
<a class=anchor href="#cpu"><h3 id="cpu">How much CPU and RAM does a typical invocation of the compiler or linker take?</h3></a>
<p>
We have invested significant effort into reducing the CPU and RAM requirements
of both the compiler and the linker.
The linker in particular was almost completely reengineered in Go 1.15 and Go 1.16.
We have some benchmarks to exercise them, but we don’t know how that translates
into real-world performance.
The linker work was motivated by performance limits in
Google’s build environment compiling Google’s Go source code.
Other environments may have significantly different performance
characteristics.
If so, that would be good to know.
<p>
Answering this question would require a CPU histogram for the compiler,
a RAM histogram for the compiler, and then CPU and RAM histograms for the linker as well.
<a class=anchor href="#versions"><h3 id="versions">What is the relative usage of different versions of the Go toolchain?</h3></a>
<p>
The current <a href="https://go.dev/doc/devel/release#policy">Go release support policy</a>
is that each release is supported until there are two newer major releases.
With work like the <a href="https://go.dev/doc/go1compat">Go compatibility policy</a>,
we aim to make it easy to update to newer releases, so that users do not
get stuck on unsupported releases.
How well is that working?
We don’t know.
The fine-grained compatibility work
(<a href="https://go.dev/issue/56986">#56986</a> and <a href="https://go.dev/issue/57001">#57001</a>)
aims to make it easier to update to a newer toolchain,
which should mean usage of older versions falls off quicker.
It would be helpful to confirm that with data.
Also, once there is a baseline for how quickly new versions are adopted,
seeing one lag would indicate a problem worth investigating.
For example, I recently learned about a Windows timer problem
(<a href="https://github.com/golang/go/issues/44343">#44343</a>)
that is keeping some Windows users on Go 1.15.
If we saw that in the data, we could look for what is holding users back.
<p>
Answering this question would not require any direct counters.
The reports are already annotated with which version
of Go was running,
so the percentage can be calculated as number of reports
from a specific version of Go divided by total reports.
<a class=anchor href="#golines"><h3 id="golines">What is the relative distribution of Go versions in <code>go</code> lines in the work module during a build?</h3></a>
<p>
Today the <code>go</code> line indicates roughly (but not exactly) the maximum version of Go that a module targets.
Part of proposal <a href="https://go.dev/issue/57001">#57001</a> is to change it to be the exact minimum permitted version of Go.
It would be good to know how often modules set that line to the latest version of Go
as compared to the previous version of Go or an unsupported earlier version,
for much the same reasons as the previous question.
<p>
Answering this question requires a counter for each released <code>go</code> version,
incremented for each build using a <code>go</code> line with that version.
Another way to answer this question would be to scan open-source Go modules,
but conventions in the open-source Go library ecosystem may well be different
from conventions in the production systems that use those packages.
We don’t want to overfit our decisions to what happens in open-source code.
<a class=anchor href="#godebug"><h3 id="godebug">What fraction of builds encounter <code>//go:debug</code> lines? What is the relative usage of each setting?</h3></a>
<p>
The fine-grained compatibility work (<a href="https://go.dev/issue/56986">#56986</a>) proposes to
add a new <code>//go:debug</code> line used in main packages to set default <code>GODEBUG</code> settings.
Those settings would be guaranteed to be provided for two years,
but after that we would be able to retire settings as needed.
Knowing how much each one is used would let us make informed decisions
about whether to retire specific settings.
<p>
Answering these questions requires a counter for each <code>GODEBUG</code> setting,
incremented for each build using a <code>//go:debug</code> line with that setting.
As before, another possibility
would be to scan open-source Go modules,
but the vast majority of open source is libraries, not main packages,
so there would be very little signal,
and the signal we do get could be very different from
the reality of production usage.
<a class=anchor href="#cgo"><h3 id="cgo">What fraction of builds use C code via cgo? What about C++, Objective-C, Fortran, or SWIG?</h3></a>
<p>
Cgo is another area where the open-source Go library ecosystem is almost certainly
not aligned with actual production usage:
cgo tends to be discouraged in most public Go libraries
but is a critical part of interoperation with existing code in many production systems.
We don’t know how many though, nor what the cgo use looks like.
Go added support for C++ and SWIG early on, due to usage at Google.
Objective-C and Fortran were added at user request.
Objective-C probably gets lots of use now on macOS and iOS,
but what about Fortran?
Is anyone using it at all?
Do we need to prioritize testing it or fixing bugs in it?
<p>
Answering these questions requires a counter for each <code>go</code> build of a package using cgo,
and then additional counters for when the cgo code includes each of the languages of interest.
Again we could scan open-source Go modules,
but production usage is likely to be different from open-source libraries.
<a class=anchor href="#cgo-disabled"><h3 id="cgo-disabled">What fraction of builds with cgo code explicitly disable cgo? What fraction implicitly disable it? What fraction of builds implicitly disable cgo because there is no host C compiler?</h3></a>
<p>
Many builds insist on not using cgo by setting <code>CGO_ENABLED=0</code>.
Cgo is also disabled by default when cross-compiling.
Starting in Go 1.20, cgo is disabled by default when there is no host C compiler.
It would be helpful to know how often these cases happen.
If lots of builds explicitly disable cgo, that would be something to ask about
in surveys and research interviews.
If lots of builds implicitly disable cgo due to cross-compiling,
we might want to look into improving support for cgo when cross-compiling.
If lots of builds implicitly disable cgo for not having a host C compiler,
we might want to talk to users to understand how it happens
and then update documentation appropriately.
<p>
Answering these questions requires counters for builds with cgo code, builds with an explicit <code>CGO_ENABLED=0</code>,
builds with implicit disabling due to cross-compilation, and builds with implicit disabling due to lack of host C compiler.
<a class=anchor href="#clang"><h3 id="clang">What fraction of Go installations use a particular version of a Go toolchain dependency such as Clang, GCC, Git, or SWIG?</h3></a>
<p>
The Go toolchain contains many workarounds for old versions
of tools that it invokes, like Clang, GCC, Git, and SWIG.
For example, Clang 3.8’s <code>-g</code> flag is incompatible
with <code>.note.GNU-stack</code> sections, so the <code>go</code> command
has to look for a specific error and retry without the flag.
The <code>go</code> command also invokes various <code>git</code> commands
to understand the repository it is working inside,
and it is difficult to tell whether a new command will be safe.
For example, <code>go</code> uses the global <code>git</code> <code>-c</code> option,
which was added in Git 1.7.2.
Is that old enough to depend on?
We decided yes in 2018, but that broke CentOS 6.
Even today we have not resolved what versions we should
require for version control dependencies (<a href="https://go.dev/issue/26746">#26746</a>).
Data would help us make the decision.
<p>
Answering these questions requires counters for builds that invoke each of these tools
along with per-tool-version counters, giving a version histogram for each tool.
<a class=anchor href="#boringcrypto"><h3 id="boringcrypto">What fraction of builds use <code>GOEXPERIMENT=boringcrypto</code>?</h3></a>
<p>
Go published a fork using <a href="https://boringssl.googlesource.com/boringssl/+/master/crypto/fipsmodule/FIPS.md?pli=1">BoringCrypto</a>
years ago; in Go 1.19 the code moved into the main tree, accessible with <code>GOEXPERIMENT=boringcrypto</code>.
The code is unsupported today, but if there was substantial usage,
that might make us think more about whether to start supporting it.
<p>
Answering this question requires a counter for builds with <code>GOEXPERIMENT=boringcrypto</code>.
<a class=anchor href="#compiler-panics"><h3 id="compiler-panics">What are the observed internal compiler errors in practice?</h3></a>
<p>
Because compilers try to keep processing a program even after finding the first error,
it is common to see index out of bounds or nil reference panics
happen in code that has already been diagnosed as invalid.
Instead of crashing in this situation, the Go compiler has a panic handler that
checks whether any errors have already been printed.
If so, then the compiler silently exits early (unsuccessfully),
leaving the user to fix the reported errors and try again.
After the reported errors are fixed, the next run is likely to work just fine.
Hiding these panics provides a much better experience to users,
but each such panic is still a bug we would like to fix.
Since the panics are successfully masked, users will never file bug reports for them.
It would be helpful to collect stack traces for these panics,
even when the compiler exits gracefully.
<p>
Answering this question would require a crash counter
(corresponding to the current call stack) to be incremented
in the panic handler even when the panic is otherwise silenced.
<a class=anchor href="#compiler-errors"><h3 id="compiler-errors">What is the relative distribution of compiler errors printed during development?</h3></a>
<p>
The errors printed by the compiler for invalid programs are a critical part
of the overall Go user experience.
In the early days of Go, I spent a lot of time making sure that the errors printed
for my own buggy programs were clear.
But different Go users make different mistakes:
helping new Go users with their own programs
was an important way I found out about other
errors that needed to be clearer,
Today, Go usage has grown far beyond anything we can understand from personal experience.
Understanding which errors people hit the most
would help us decide which ones to focus on in research interviews
and make clearer or more specific.
Commonly encountered errors might also prompt ideas about language changes.
For example, we made unused imports and variables errors before <code>go</code> <code>vet</code> existed.
If users hit those frequently enough to be bothersome,
we might consider making them no longer compiler errors
and instead leave them to be <code>go</code> <code>vet</code> errors.
<p>
It would also be interesting to compare the error distribution
from the compiler against the error distribution from <code>gopls</code>
for users who work in an IDE.
Does the IDE make certain errors less (or more) likely?
Does it keep code with certain errors from ever reaching the compiler?
<p>
Answering these questions would require a counter for each distinct error message printed.
One option would be to have each error-reporting call site in the compiler
also declare a counter passed to the error-reporting function.
Another option would be to treat a call to the error-reporting function itself
as a crash, incrementing a crash counter corresponding to the current call stack,
perhaps truncated to just a few frames.
<a class=anchor href="#vet-errors"><h3 id="vet-errors">What is the relative distribution of <code>go</code> <code>vet</code>-reported errors printed during development?</h3></a>
<p>
The errors printed by <code>go</code> <code>vet</code> are its entire user experience. They have to be good.
Just like with the compiler, we don’t know which ones are printed the most
and might need more attention.
If there are <code>vet</code> errors that are never or rarely printed,
that would also be good to know.
Perhaps the checks are broken, or perhaps they are no longer necessary and can be removed.
As a concrete example, we noticed recently that the
vet shift check has unfortunate false positives (<a href="https://go.dev/issue/58030">#58030</a>).
Data about how often the check triggers
would be a useful input to the decision about what to do about it.
<p>
Answering this question would use the same approach as in the compiler,
either explicit counters or short-stack crash counters.
<a class=anchor href="#vet-test"><h3 id="vet-test">What fraction of Go installations invoke <code>go</code> <code>vet</code> explicitly, as opposed to the automatic run during <code>go</code> <code>test</code>?</h3></a>
<p>
I suspect that the majority of Go users never run <code>go</code> <code>vet</code> themselves,
that they only run it implicitly as part of <code>go</code> <code>test</code>.
Am I right?
If so, we should probably spend more time improving the analyses
that were deemed too noisy for <code>go</code> <code>test</code>, because those aren’t running
and therefore aren’t helping find bugs.
<p>
Answering this question would require a counter for <code>go</code> <code>vet</code> invocations
and a counter for <code>go</code> <code>test</code>-initiated <code>vet</code> invocations.
<a class=anchor href="#vet-analyses"><h3 id="vet-analyses">What is the relative distribution of specific analyses run using <code>go</code> <code>vet</code>?</h3></a>
<p>
When <code>go</code> <code>vet</code> is run explicitly, passing no command-line flags runs all analyses.
Adding command-line flags runs only specific analyses.
Do most runs use all analyses or only specific ones?
If the latter, which ones?
If we found that some analyses were being avoided,
we might want to do further research to understand why.
If we found that some analyses were being run very frequently,
we might want to look into adding it to the <code>go</code> <code>test</code> set.
<p>
Answering this question would require a counter for <code>go</code> <code>vet</code> invocations
and a counter for each analysis mode.
<a class=anchor href="#gopls-versions"><h3 id="gopls-versions">What is the relative usage of different versions of <code>gopls</code>?</h3></a>
<p>
<code>Gopls</code> is the only widely-used Go tool maintained by the Go team
that ships separate from the Go distribution.
As with Go releases, knowing which versions are still commonly used
would help us understand which still need to be supported.
Knowing how frequently versions are updated would help us
tell when the upgrade process needs to be easier or more well documented.
<p>
Answering this question would require a counter incremented at
<code>gopls</code> startup, since reports already separate counter sets
by the version of the program being run.
<a class=anchor href="#gopls-editors"><h3 id="gopls-editors">What fraction of <code>gopls</code> installations connect to specific editors?</h3></a>
<p>
<code>Gopls</code> is an <a href="https://microsoft.github.io/language-server-protocol/">LSP server</a>
that can connect to any editor with LSP support.
Different editors speak LSP differently, though, and editor-specific bugs are common.
If we knew which editors are most commonly used,
we could prioritize those in testing.
<p>
Answering this question would require a counter for <code>gopls</code> being invoked
from each known editor (Emacs, Vim, VS Code, and so on),
along with a counter for “Other”, since we cannot collect counters
with arbitrary values. If “Other” became a significant percentage,
a survey or user research interview would help us identify a new,
more specific counter to add.
<a class=anchor href="#gopls-settings"><h3 id="gopls-settings">Which settings is <code>gopls</code> configured with?</h3></a>
<p>
<code>Gopls</code> has <a href="https://github.com/golang/tools/blob/master/gopls/doc/settings.md">many settings</a>,
some of which are deprecated or obsolete.
We don’t know which ones can be removed without disrupting users,
or how many users would be disrupted by each
(<a href="https://go.dev/issue/52897">#52897</a>, <a href="https://go.dev/issue/54180">#54180</a>, <a href="https://go.dev/issue/55268">#55268</a>, <a href="https://go.dev/issue/57329">#57329</a>).
We guess as best we can, but real data would be better.
<p>
Answering this question would require a counter for each setting,
incremented at <code>gopls</code> startup or after each setting change.
Like with the LSP editors and the compiler flags,
only a fixed collection of settings can be collected.
For settings that take arbitrary user-defined values,
there can only be a counter for the setting being used at all.
<a class=anchor href="#gopls-codelens"><h3 id="gopls-codelens">What fraction of <code>gopls</code> installations make use of a given code lens?</h3></a>
<p>
<code>Gopls</code> provides quite a few
<a href="https://github.com/golang/tools/blob/master/gopls/doc/settings.md#code-lenses">code lenses</a>
that users can invoke using their editor UI.
Which ones actually get used?
If one isn’t being used, we would want to use a survey or user research interview
to understand why and either improve or remove it.
<p>
Answering this question would require a counter for each code lens setting being enabled,
incremented at <code>gopls</code> startup or after each configuration change.
<a class=anchor href="#gopls-analyzer"><h3 id="gopls-analyzer">What fraction of <code>gopls</code> installations disable or enable a given analyzer?</h3></a>
<p>
<code>Gopls</code> provides quite a few
<a href="https://github.com/golang/tools/blob/master/gopls/doc/analyzers.md">analyzers</a> as well.
These run while a program is being edited and show warnings or
other information about the code.
If many users turn off an analyzer that is on by default,
that’s a useful signal that the analyzer has too many false positives
or is too noisy in some other way.
And if many users enable an analyzer that is off by default,
that’s a useful signal that we should consider turning it on by default instead.
<p>
Answering this question would require a counter for each analyzer being enabled,
incremented at <code>gopls</code> startup or after each configuration change.
<a class=anchor href="#gopls-suggest"><h3 id="gopls-suggest">What fraction of <code>gopls</code> code completion suggestions are accepted?</h3></a>
<p>
One core feature of <code>gopls</code> is making code completion suggestions.
Are they any good?
We only know what we see from our own usage
and from limited user research studies,
which are time-consuming to collect, cannot be repeated frequently,
and may not be representative of Go users overall.
If instead we knew what fraction of suggestions were accepted,
we could tell whether suggestions need more work
and whether they are getting better or worse.
<p>
Answering this question would require a counter for the number of suggestions made
as well as the number of times the first suggestion was accepted, the number of times the second suggestion was accepted, and so on.
<a class=anchor href="#gopls-latency"><h3 id="gopls-latency">What is the latency distribution for important <code>gopls</code> editor operations, like saving a file, suggesting a completion, or finding a definition?</h3></a>
<p>
We believe that performance greatly influences overall user happiness with <code>gopls</code>:
when we spent a few months in 2022 working on performance improvements,
user-reported happiness in VS Code Go surveys noticeably increased.
We can run synthetic performance benchmarks,
but there is no substitute for actual real-world performance measurements.
<p>
Answering this question would require a histogram for each key operation.
<a class=anchor href="#gopls-crash"><h3 id="gopls-crash">How often does <code>gopls</code> crash? Why?</h3></a>
<p>
<code>Gopls</code> is a long-running server that can, over time, find its way into a bad state and crash.
We spent a few months in 2022 on overall <code>gopls</code> stability improvements,
and we believe the crash rate is much lower now, but we have no real-world data to back that up.
<p>
Answering this question would require counters for each time <code>gopls</code> starts
and each time it crashes.
Each crash should also increment a crash counter to record the stack.
<a class=anchor href="#gopls-restart"><h3 id="gopls-restart">How often does <code>gopls</code> get manually restarted?</h3></a>
<p>
Sometimes <code>gopls</code> hasn’t crashed but also isn’t behaving as it should,
so users manually restart it.
It is difficult to know why <code>gopls</code> is being restarted each time,
but at the least we can track how often that happens
and then follow up in sureys or user research interviews
to understand the root cause.
<p>
Answering this question would require counters for each time <code>gopls</code> starts
and each time it is manually restarted.
<a class=anchor href="#coverage"><h3 id="coverage">Is the body of this <code>if</code> statement ever reached?</h3></a>
<p>
In general, when refactoring or cleaning up code it is common
to see a workaround or special case and wonder if it ever happens anymore
and how.
<p>
Answering this question would require a stack counter
incremented at the start of the body of the <code>if</code> statement in question.
The reports would then include not just the fact that
the body is reached but also a few frames of context
explaining the call stack to it to help understand how it is reached.
<a class=anchor href="#more"><h3 id="more">And so on ...</h3></a>
<p>
I could keep going, but I will stop here.
It should be clear that knowing the answers to these questions
really would help developers working on Go to make better,
more informed decisions about feature work
and identify performance bugs and other latent problems in Go itself.
It should also be clear from the breadth of examples that there is no way
to ask all these questions on a survey, at least not one that users will finish.
Many of them, including all the histograms, are completely impractical as survey questions anyway.
<p>
I hope it is also clear that the basic primitive of counters for predefined events
is powerful enough to answer all these questions, while at the same time
not exposing any user data, path names, project names, identifiers,
or anything of that sort.
<p>
Finally, I hope it is clear that none of this is terribly specific to Go.
Any open source project with more than a few users has the problem
of understanding how the software is used and how well it’s working.
<p>
For more background about telemetry and why it is important,
see the <a href="telemetry-intro">introductory post</a>.
For details about the design, see <a href="telemetry-design">the previous post</a>.
The Design of Transparent Telemetrytag:research.swtch.com,2012:research.swtch.com/telemetry-design2023-02-08T08:00:02-05:002023-02-08T08:02:02-05:00Details about how transparent telemetry could work. (Transparent Telemetry, Part 2)
<p>
I believe open-source software projects need to find an open-source-friendly way to do telemetry.
This post is part of a short series of posts describing <i>transparent telemetry</i>,
one possible answer to that challenge.
For more about the rationale and background, see <a href="telemetry-intro">the previous post</a>.
For additional use cases, see <a href="telemetry-uses">the next post</a>.
<p>
Transparent telemetry is made up of five parts:
<ul>
<li>
Counting: Go toolchain programs store counter values in per-week files maintained locally.
<li>
Configuration: There is a reviewed public process for defining a new graph or metric to track
and publish on the Go web site. The exact counters that need to be collected,
along with the sampling rate needed for high accuracy results, are derived from
this configuration.
<li>
Reporting: Once a week, an automated reporting program
randomly decides whether to fetch the current configuration and
then whether to be one of the sampled systems that week.
If so, it reports the counters listed in the configuration to a server run by the Go team at Google.
In typical usage, we expect a particular Go installation to report each week with under 2% probability,
meaning less than once per year on average.
[<i>Update</i>, 2023-02-24: The design has been <a href="telemetry-feedback#opt-in">changed to be opt-in</a>,
which raises the expected reporting probabilities.]
<li>
Publishing: The server publishes each day’s reports in full (in a compressed form)
as well as publishing the tabular and graphical summaries defined in the configuration.
<li>
Opt-out: The system is enabled by default, but opting out is as straightforward, simple, and complete as possible.
[<i>Update</i>, 2023-02-24: The design has been <a href="telemetry-feedback#opt-in">changed to be opt-in</a>.]</ul>
<p>
This post details each of these parts in turn.
I wrote an implementation of local counter collection
to convince myself it could be made cheap enough,
but as of the publication of this post,
no other part of the system exists today in any form.
I hope that the system can be built for Go over the course of 2023,
and I hope that other open-source projects will be interested
to adopt this approach or inspired to explore others.
<a class=anchor href="#counting"><h2 id="counting">Counting</h2></a>
<p>
Go toolchain programs
(<code>go</code> and the other programs that ship in the Go distribution,
like <code>go</code> <code>tool</code> <code>compile</code> and <code>go</code> <code>tool</code> <code>vet</code>,
along with other Go team-maintained programs like
<code>gopls</code> and <code>govulncheck</code>)
collect counter values in local files using a simple API:
<pre>package counter
func New(name string) *Counter
func (c *Counter) Inc()
func NewStack(name string, frames int) *Stack
func (s *Stack) Inc()
</pre>
<p>
Basic named counters are created with <code>counter.New</code>,
typically assigned to a global variable (this has no init-time overhead),
and then incremented as the program runs by using the <code>Inc</code> method.
<p>
For example, suppose we want to monitor the typical build cache miss rate
for a <code>go</code> <code>build</code> command.
Each <code>go</code> command invocation can track the miss rate during its run
and then increment one bucket of a histogram
with exponentially-spaced buckets:
0%, <0.1%, <0.2%, <0.5%, <1%, <2%, <5%, <10%, <20%, <50%, and <100%.
After a week, those 11 counters record the distribution of build cache miss rate
experienced on that system.
<p>
Stack counters are similar, but the constructor also takes
the maximum number of frames to record.
Each frame is represented by an import path, function name,
and line number relative to the start of the function,
such as <code>cmd/compile/internal/base.Errorf+10</code>.
The counter name is the concatenation of the the name passed
to the constructor and the given number of frames.
For example, the counter name for one increment of
the result of <code>NewStack("missing-std-import",</code> <code>5)</code> might be
<pre>missing-std-import
cmd/compile/internal/types2.(*Checker).importPackage+39
cmd/compile/internal/types2.(*Checker).collectObjects+54
cmd/compile/internal/types2.(*Checker).checkFiles+18
cmd/compile/internal/types2.(*Checker).Files+0
cmd/compile/internal/types2.(*Config).Check+2
</pre>
<p>
A line number relative to the start of the function
is fairly stable across unrelated edits in the source code,
making it possible to identify the same stack trace
even across different versions of a program.
<p>
One of the key properties of transparent telemetry is that
uploaded reports only contain strings that are already known to the collection server.
Using an import path instead of the full file path
allows aggregation across different systems
and more importantly avoids exposing details like the full path
to a directory where the Go compiler source code was stored when it was built.
Another important consideration is that function names in
modified copies of Go tools might contain unexpected strings:
we don’t want to know about a modified copy that adds
<code>types2.checkWithChatGPT</code> to the call stack.
This problem is handled by only saving stack traces
from specific unmodified, released versions of tools.
The version information and the presence of any
modifications can be identified using the
<a href="https://pkg.go.dev/runtime/debug#ReadBuildInfo">build information embedded in the binary</a>.
(In contrast, .NET
reports full file system paths and then documents that it is the developer’s responsibility
to “<a href="https://learn.microsoft.com/en-us/dotnet/core/tools/telemetry#avoid-inadvertent-disclosure-of-information">avoid inadvertent disclosure of information</a>”
by not building the software in “directories whose path names
expose personal or sensitive information”.
The burden of not exposing private data in a telemetry system
should never be placed on users.)
<p>
The counter files are stored in the directory <code><user>/go/telemetry/local/</code>,
where <code><user></code> is the user configuration directory,
as reported by <a href="https://pkg.go.dev/os/#UserConfigDir"><code>os.UserConfigDir</code></a>.
Each file is named by the program’s name, version, build toolchain version,
GOOS, and GOARCH, along with the date of the start of the week.
For example:
<pre>/Users/rsc/Library/Application Support/go/telemetry/local/
compile-go1.21.1-darwin-arm64-2023-01-04.v1.count
gopls@v0.11.0-go1.21.1-linux-386-2023-01-04.v1.count
...
</pre>
<p>
The version and build toolchain version are recorded only as <code>devel</code>
for unversioned tools, such as when developing Go itself.
<p>
Aggregating counters by week has two important purposes.
First, it should help reduce privacy concerns
by making clear that there is no way to reconstruct
any kind of fine-grained trace of user behavior.
Second, reporting counters by week
reduces statistical noise caused by persistent
usage variations such as weekends.
Every long-term monitoring dashboard I have ever seen begins
by computing 7- or 28-day averages of the data to remove
this high-frequency noise.
Removing it client-side gives both more privacy and cleaner reports.
<p>
When Go telemetry first creates the <code>local</code> directory,
it randomly selects the start of that system’s week.
The system in our example has chosen weeks beginning on Wednesday.
The random choice of week start spreads the server load over the week
and also provides prompt reporting of new problems:
if a new Go distribution is published on Tuesday,
one seventh of the systems that install it immediately
will include Tuesday’s operation in the Wednesday uploads.
<p>
These files use a custom binary format that starts with a simple key-value header
repeating the information that went into the file name:
<pre>Week: 2023-01-04
Program: compile
GoVersion: go1.21.1
GOOS: darwin
GOARCH: arm64
</pre>
<p>
After the header come the counters, in an on-disk hash table suitable for memory mapping into
each running instance of the program.
Those instances use lock-free atomic operations to increment counters and maintain the file,
keeping the overhead associated with telemetry very low.
The design also avoids any possibility of deadlock or unbounded latency when a counter is incremented,
even if one instance of the program is hung or otherwise misbehaving.
A tool, perhaps called <code>go</code> <code>tool</code> <code>telemetry</code>, will convert one of these binary files to JSON for
processing by other interested programs.
<p>
Note that the raw data stored on disk is only names of counters and their associated 64-bit totals.
There is no event log or any more detailed kind of trace.
The decision to maintain the counters directly,
instead of deriving them from a more detailed trace, is motivated
mainly by concerns about disk space and update latency.
However, never having any kind of event log or trace
also reduces the privacy impact of the local collection.
<p>
A local web server (perhaps <code>go</code> <code>tool</code> <code>telemetry</code> <code>-http</code>)
will display the local counters and be able to graph
counter data over time for user inspection (at only 1-week granularity, of course).
<a class=anchor href="#configuration"><h2 id="configuration">Configuration</h2></a>
<p>
Data collection in transparent telemetry starts with the reason the data
is being collected: a specific graph that is going to be computed,
along with the specific margin of error desired for that graph.
From that graphing configuration, the transparent telemetry server
can compute the reporting configuration, declaring which counters to
report at what sampling rate in order to produce that graph.
<p>
For example, a graphing configuration for the Go build cache miss rate
graph we considered in the previous section might look like:
<pre>title: Go build cache miss rate
type: histogram
error: 1%
counter: go/buildcache/miss:{0,0.1,0.2,0.5,1,2,5,10,20,50,100}
</pre>
<p>
For a margin of error of 1% at a 99% confidence level, we need <a href="sample">about 16,000 samples</a>.
The server would keep track of an estimate of the number of reporting systems
and adjust the sampling rate each week to produce the right number of samples.
If there are one million reporting systems, then the sampling rate to get 16,000
samples is 1.6%, so the corresponding reporting configuration would sample
each counter with that probability.
<p>
Changing what is collected can have privacy
implications, so we have to ensure changes are properly reviewed.
As an example of a privacy mistake, suppose a developer
mistakenly decided it was important to
understand which standard library packages are most imported
and created a histogram of import paths using
<code>counter.New("import:"+path).Inc()</code>.
I don’t think that would be a useful histogram anyway,
but the privacy mistake is that the histogram
would include private user import paths as well as standard library paths.
However, the impact of the mistake would be limited to local collection,
because the graphing and reporting configurations would not
mention counters like <code>import:my.company/private/package</code>,
so they would never be reported.
<p>
Developers of the Go toolchain will probably want to add counters
to the toolchain purely for local use, to understand whether they
would be helpful to report.
That decision should not be overburdened with process,
because the stakes are relatively low.
Probably our standard code review process suffices,
paired with clear documentation about what kinds of
counters are and are not appropriate to introduce.
<p>
Changes to the server’s graphing configuration merit more attention,
since as we saw it is the graphing configuration that determines
which counters are reported.
It probably makes sense to require such changes to go through
a review by a small group charged with ownership and maintenance
of the configuration, either on the issue tracker or on the Gerrit server.
<p>
Finally, note the lack of any kind of wildcards in the graphing configuration.
It is impossible to ask for all the counters beginning with <code>import:</code>, which means
<code>import:my.company/private/package</code> will never be reported,
because the graphing configuration will never list that counter explicitly by name.
(Any attempt to do so would be caught by the public configuration review process.)
<a class=anchor href="#reporting"><h2 id="reporting">Reporting</h2></a>
<p>
When a counter file’s week is over,
toolchain programs (even long-running ones)
automatically start writing counters to the next week’s file.
Remember that “week” refers to a 7-day period that starts
on a weekday chosen randomly for each Go installation:
on some machines weeks are Sunday to Saturday, others use Tuesday to Monday, and so on.
At some point after the week ends, a reporting program
(probably the <code>go</code> command, perhaps also <code>gopls</code>)
will notice the completed week of counters and begin the reporting process.
<p>
The reporting program uses a reporting configuration
to find out which counters should be reported.
It would be served as a Go module
(perhaps <code>telemetry.go.dev/config</code>).
Visiting that same page in a browser would print a nice HTML page
listing all the counters that have ever been collected, annotating each
with the date ranges when it was collected
and the justification for collection.
In the event that some counter is deemed no longer necessary
or somehow problematic to collect,
it can be removed from the configuration,
and programs will immediately stop reporting it.
Similarly, if the system must be shut down for some reason,
serving an empty configuration would stop all reporting.
<p>
The reporting configuration would be JSON corresponding
to the Go type <code>ReportConfig</code> defined as:
<pre>type ReportConfig struct {
GOOS []string
GOARCH []string
GoVersion []string
Programs []ProgramConfig
}
type ProgramConfig struct {
Name string
Versions []string
Counters []CounterConfig
Stacks []CounterConfig
}
type CounterConfig struct {
Name string
Rate float64
}
</pre>
<p>
The <code>ReportConfig</code> lists the known GOOS, GOARCH, and Go versions that
can be reported. This ensures that programs testing with an experimental,
as-yet-unknown operating system, architecture, or Go version
are not accidentally collected.
Similarly, the <code>ProgramConfig</code> lists the programs that should be collected from
and their specific versions, if they are separate from the main Go toolchain
(like <code>gopls</code> and <code>govulncheck</code>).
The <code>CounterConfig</code> lists the specific counters being collected
and their individual sample rates.
<p>
The reporter starts by picking a random floating point number X between 0 and 1.
If X ≥ 0.1, then the reporter stops without even downloading the configuration.
For example, if the reporter picks X = 0.2, it stops immediately.
This step imposes a hard limit of 10% sampling rate for any counter or stack,
and it arranges that a particular Go installation won’t even download
the collection configuration more than once every couple of months on average.
<p>
Assuming X < 0.1, the reporter downloads and reads the collection configuration.
It then reads all the per-program counter files and filters them
to include only the ones with matching GOOS, GOARCH, Go version, program name,
and program version.
It further filters the selected reports to drop any counters for
which the configured rate is less than X.
For example, if the reporter picks X = 0.05, it will report counters
configured with rate 0.1 but not counters configured with rate 0.01.
If a particular program has no sampled counters, that program is dropped from the report.
If the report has no programs, no report is sent at all.
<p>
In a large deployment such as Go’s, a typical reporting rate will
be under 0.02 (2%), with the effect that each system will average
around one weekly report per year, or fewer.
One nice property of transparent telemetry
is that as more and more systems run with it enabled,
each system reports less and less data.
<p>
[<i>Update</i>, 2023-02-24: The hard limit of 10% and the expected reporting rate of 2%
were based on opt-out telemetry with millions of installations.
The design has <a href="telemetry-feedback#opt-in">changed to be opt-in</a>, which will raise those probabilities.]
<p>
When there is a report to send,
the reporting program prepares JSON corresponding to the Go type <code>Report</code> defined as:
<pre>type Report struct {
Config string
Week string
LastWeek string
X float64
Programs []Program
}
type ProgramReport struct {
Program string
Version string
GoVersion string
GOOS string
GOARCH string
Counters []Counter
Stacks []Counter
}
type Counter struct {
Name string
Count int64
Stack []string
}
</pre>
<p>
The <code>Report</code>’s <code>Config</code> field lists the configuration version
used for generating the report, so analysis can determine
the sampling rates applied.
<p>
On a system that uses Go only intermittently, a reporting program might not run
for a few days or more after the week ends.
The <code>Report</code>’s <code>Week</code> field identifies the week this report covers,
by giving its first day in <code>yyyy-mm-dd</code> format.
If it has been more than seven days since the last use of Go,
the now-weeks-old local report will not be uploaded.
This lets the server “close the books” on a given week’s telemetry after seven days.
<p>
In any data collection system it is important to quantify how much data is being discarded.
(This is why, for example, <code>pprof</code> attributes missed profile events to synthesized
functions like <code>_LostExternalCode</code>.)
In transparent telemetry, if a system is used one week
but then not used at all the next week,
the system will have no opportunity to (randomly decide to) report the
first week’s data.
The number of systems being used so intermittently
is probably low enough not to worry about having a statistically
significant effect on the results,
but it would be good to measure that rather than guess.
The <code>LastWeek</code> field reports the week prior to the one being reported
when the reporting system last gathered any counters at all.
On a frequently used system, <code>LastWeek</code> will always be seven days earlier than <code>Week</code>.
After a long pause in Go usage,
<code>LastWeek</code> will be two or more weeks earlier than <code>Week</code>,
indicating that this system never even considered
reporting counters from <code>LastWeek</code>.
If a substantial number of reports have a mutiweek gap,
we can conclude that the earlier week’s data may be less
accurate than previously estimated.
Again, this is generally unlikely, but perhaps it would happen after
vacations such as end-of-year holidays.
It would be good to have an explicit signal that those
numbers are not as trustworthy rather than puzzle through
why they look different.
The <code>LastWeek</code> field also makes it possible
to estimate the number of active users over longer
time periods, such as 4 weeks or 52 weeks,
which may be useful for understanding overall usage.
<p>
Note that the different programs’ counter sets are all uploaded together,
so that for example if the <code>go</code> command is taking a surprisingly long time
to run a build, the associated counters from the <code>compile</code> and <code>link</code> program
are in the same record.
Note also that there is no persistent identifier in the records
that would allow linking one week’s upload with a different week’s upload.
<p>
The server would necessarily observe the source IP address in the TCP session
uploading the report,
but the server would not record that address with the data,
a fact that can be confirmed by inspecting the reporting server source code
(the server would be open source like the rest of Go)
or by reference to a stated privacy policy like <a href="https://proxy.golang.org/privacy">the one for the Go module mirror</a>,
depending on whether you lean more toward trusting software engineers or lawyers.
A company could also run their own HTTP proxy to shield individual system’s IP addresses
and arrange for employee systems to set <code>GOTELEMETRY</code> to the address of that proxy.
It may also make sense to allow Go module proxies to proxy uploads,
so that the existing <code>GOPROXY</code> setting also works for redirecting the upload
and shielding the system’s IP address.
<p>
Recall from above that the local, binary counter files are stored in <code><user>/go/telemetry/local/</code>.
When a report is uploaded, the exact JSON that was uploaded is written to
<code><user>/go/telemetry/uploaded/</code>, named for the day of the upload
(<code>2006-01-02.json</code>).
The aim of both these directories (including their naming) is to make
the system’s overall operation as transparent as possible.
The expectation is that a typical report will be under 1,000 counters,
requiring about 50 kB in JSON format.
Assuming twice as many counters are counted locally
than are uploaded, that’s 2,000 counters in binary format,
which is another 100 kB.
The storage cost of keeping the local forms indefinitely is then under 100 kB/week or 5 MB/year.
An upload once or twice a year adds only another 100 kB/year.
A command like <code>go</code> <code>clean</code> <code>-telemetry</code> would delete all of these.
<p>
The privacy feature of waiting at least a week before uploading
anything at all (to give people plenty of time to opt out before any data is sent)
means that ephemeral machines such as build containers will never be counted.
The tradeoff of better privacy seems worth the loss of visiblity into these machines.
<a class=anchor href="#publishing"><h2 id="publishing">Publishing</h2></a>
<p>
Every day, the upload server takes the previous 24 hours’ worth of uploads
and updates the published graphs defined in the graph configuration.
<p>
It also publishes the full, raw JSON for the previous 24 hours worth of uploads,
in seven distinct data sets corresponding to the seven different possible weeks
(starting Sunday, Monday, Tuesday, ...) that could have been reported that day.
For example, the files published on 2023-01-18 would be:
<pre>week-2023-01-04-uploaded-2023-01-17.v1.reports
week-2023-01-05-uploaded-2023-01-17.v1.reports
week-2023-01-06-uploaded-2023-01-17.v1.reports
week-2023-01-07-uploaded-2023-01-17.v1.reports
week-2023-01-08-uploaded-2023-01-17.v1.reports
week-2023-01-09-uploaded-2023-01-17.v1.reports
week-2023-01-10-uploaded-2023-01-17.v1.reports
</pre>
<p>
Thanks to sampling, the collected uploads will be fairly small
and will not grow even as the number of active installations does.
Estimating 50 kB per uploaded report and a target of about 16,000 reports per week,
each week’s reports total only 800 MB
(split across the seven different starting days in that week).
Compression with Brotli should reduce the footprint by at least a factor of 10,
making each week at most 80 MB, or at most 4 GB for an entire year’s worth of uploads.
<a class=anchor href="#opt-out"><h2 id="opt-out">Opt-Out</h2></a>
<p>
[<i>Update</i>, 2023-02-24: The design has been <a href="telemetry-feedback#opt-in">changed to be opt-in</a>.
This section is unmodified from the original for historical purposes.]
<p>
An explicit goal of this design is to build a system that is
reasonable to have enabled by default, for two reasons.
First, the vast majority of users do not change any default settings.
In systems that have collection off by default, opt-in rates
tend to be very low, skewing the results toward power users
who understand the system well.
Second, the existence of an opt-in checkbox
is in my opinion too often used as justification for collecting
far more data than is necessary.
Aiming for an opt-out system with as few reasons as possible
to opt out led to this minimal design instead.
Also, because the design collects a fixed number of samples,
more systems being opted in means collecting less from any given system,
reducing the privacy impact to each individual system.
<p>
Enabling the system by default requires proper notice
to users who are installing the system.
As we did with the on-by-default module proxy and checksum database,
notices would be posted in the release notes for the first Go distribution
that enables telemetry as well as displayed next to the download links
on <a href="https://go.dev/"><code>go.dev</code></a> and <a href="https://go.dev/dl/"><code>go.dev/dl</code></a>.
<p>
Some users will want to opt out on general principle,
no matter how minimal the system is,
and that should be as easy as possible,
something like:
<pre>go env -w GOTELEMETRY=off
</pre>
<p>
Like all <code>go</code> <code>env</code> <code>-w</code> commands, this would configure a
per-user setting that applies to all installed Go toolchains,
present and future: a new Go toolchain installed tomorrow
would respect the setting too.
<p>
In addition, some Linux distributions may want to
prompt users during installation
or disable telemetry unconditionally.
We should make that easy to do too.
Proposal <a href="https://go.dev/issue/57179">#57179</a> introduced
a <code>go.env</code> file in the root of the Go toolchain that configures
per-toolchain settings. This will ship in Go 1.21.
Linux distributions that want to disable telemetry
could include a <code>go.env</code> file containing <code>GOTELEMETRY=off</code>.
<p>
Another dark pattern in opt-out systems is reporting
information before the user has a chance to opt out.
For example, I was once told about a popular developer tool
that showed a telemetry checkbox,
pre-checked, during the installation process, giving users
the opportunity to uncheck the box.
But at this point, a few screens into the installation, telemetry had already been sent,
allowing the company behind the tool
to track installation counts and opt-out rate
by the fact that telemetry suddenly stopped,
as well as tracking details like the IP and MAC addresses of systems that have opted out.
In that system, to avoid sending any telemetry at all, you had to set an environment variable
and then invoke the installer from the command line.
I can’t find concrete evidence anywhere for this story,
so I am not sure if the system in question still behaves this way
or ever did.
Either way, I strongly disagree with this kind of trick
as violating the entire spirit of an opt-out decision.
<p>
Transparent telemetry waits at least a week after installation
before sending any report or even fetching the collection configuration.
This should give plenty of time to run <code>go</code> <code>env</code> <code>-w</code> to opt out.
<p>
<a class=anchor href="#summary"><h2 id="summary">Summary</h2></a>
<p>
Repeating the summary from the <a href="telemetry-intro">introductory post</a>,
transparent telemetry has the following key properties:
<ul>
<li>
<p>
The decisions about what metrics to collect are made in an
open, public process.
<li>
<p>
The collection configuration is automatically generated from
the actively tracked metrics: no data is collected that isn’t needed
for the metrics.
<li>
<p>
The collection configuration is served using a tamper-evident
transparent log, making it very difficult to serve different
collection configurations to different systems.
<li>
<p>
The collection configuration is a cacheable, proxied Go module,
so any privacy-enhancing local Go proxy already in use for
ordinary modules will automatically be used for collection configuration.
To further ameliorate concerns about tracking systems
by the downloading of the collection configuration,
each installation only bothers downloading the configuration
each week with probability 10%,
so that each installation only asks for the configuration
about five times per year.
[<i>Update</i>, 2023-02-24: The design has <a href="telemetry-feedback#opt-in">changed to be opt-in</a>,
which requires raising these probabilities.]
<li>
<p>
Uploaded reports only include total event counts over a full week,
not any kind of time-ordered event trace.
<li>
<p>
Uploaded reports do not include user IDs, machine IDs, or any other kind of ID.
<li>
<p>
Uploaded reports only contain strings that are already known to the collection server:
counter names, program names, and version strings repeated from the collection configuration,
along with the names of functions in specific, unmodified Go toolchain programs
for stack traces.
The only types of non-string data in the reports are event counts, dates, and line numbers.
<li>
<p>
IP addresses exposed by the HTTP session that uploads the report are not
recorded with the reports.
<li>
<p>
Thanks to <a href="sample">sampling</a>, only a constant number of uploaded reports
are needed to achieve a specific accuracy target, no matter how many
installations exist. Specifically, only about 16,000 reports are needed
for 1% accuracy at a 99% confidence level.
This means that <i>as new systems are added to
the system, each system reports less often</i>.
With a conservative estimate of two million Go installations,
about 16,000 reporting each week corresponds to an overall
reporting rate of well under 2% per week,
meaning each installation would upload a report on average less than
once per year.
[<i>Update</i>, 2023-02-24: The design has <a href="telemetry-feedback#opt-in">changed to be opt-in</a>,
which requires raising these probabilities.]
<li>
<p>
The aggregate computed metrics are made public in graphical and tabular form.
<li>
<p>
The full raw data as collected is made public, so that project maintainers
have no proprietary advantage or insights in their role as the direct data collector.
<li>
<p>
The system is on by default, but opting out is easy, effective, and persistent.
[<i>Update</i>, 2023-02-24: The design has been <a href="telemetry-feedback#opt-in">changed to be opt-in</a>.]</ul>
<a class=anchor href="#next_steps"><h2 id="next_steps">Next Steps</h2></a>
<p>
For more background about telemetry and why it is important,
see the <a href="telemetry-intro">introductory post</a>.
For more use cases, see the <a href="telemetry-uses">next post</a>.
<p>
Although these posts use Go as the example system using
transparent telemetry, I hope that the ideas apply and
can be adopted by other open-source projects too,
in their own, separate collection systems.
<p>
I am posting these to start a <a href="https://go.dev/s/telemetry-discussion">discussion about how the Go toolchain can
adopt telemetry</a>
in some form to help the Go toolchain developers make better
decisions about the development and maintenance of Go.
I have written an implementation of local counter collection
to convince myself it could be made cheap enough,
but no other part of the system exists today in any form.
I hope that the system can be built over the course of 2023.
Transparent Telemetry for Open-Source Projectstag:research.swtch.com,2012:research.swtch.com/telemetry-intro2023-02-08T08:00:01-05:002023-02-08T08:02:01-05:00What would open-source-friendly telemetry look like? (Transparent Telemetry, Part 1)
<p>
How do software developers understand which parts of their software
are being used and whether they are performing as expected?
The modern answer is <i>telemetry</i>, which means software sending
data to answer those questions back to a collection server.
This post is about why I believe telemetry is important for open-source projects,
and what it might look like to approach telemetry
in an open-source-friendly way.
That leads to a new design I call <i>transparent telemetry</i>.
If you are impatient, skip to the <a href="#summary">summary at the end</a>.
Other posts in the series <a href="telemetry-design">detail the design</a>
and <a href="telemetry-uses">present various uses</a>.
<a class=anchor href="#why"><h2 id="why">Why Telemetry?</h2></a>
<p>
Without telemetry, developers rely on bug reports and surveys to find out
when their software isn’t working or how it is being used. Both of these
techniques are too limited in their effectiveness. Let’s look at each in turn.
<p>
<b>Bug reports are not enough.</b>
Users only file bug reports when they think something is broken.
If a function is not behaving as documented, that’s a clear bug to report.
But if a program is misbehaving in a way that doesn’t affect correctness,
users are much less likely to notice.
Statistics gathered by transparent telemetry make it possible for
developers to notice that something is going wrong even when users do not.
<p>
For example, during the Go 1.14 release process in early 2020 we made a change
to the way macOS Go distributions are built, as part of keeping them acceptable
to Apple’s signing tools.
Unfortunately, the way we made the change also made all
the pre-compiled <code>.a</code> files shipped in the distribution appear
stale to builds.
The effect was that the <code>go</code> command rebuilt and cached the
standard library on first run, which meant that compiling any
program using package <code>net</code> (which uses <code>cgo</code>) required Xcode to be installed.
So Go 1.14 and later unintentionally required Xcode to compile even
trivial demo Go programs like a basic HTTP server.
This is not the way we want Go to work on macOS.
On systems without Xcode, when <code>go</code> tried to invoke <code>clang</code>,
macOS popped up a box explaining how to install it.
Users simply accepted that this was necessary,
perhaps even thinking <code>go</code> had displayed the popup.
No one reported the bug over three years of Go releases.
We didn’t notice and fix the problem until late 2022 while investigating something else.
With telemetry for the miss rate in the cache of pre-compiled standard library packages,
the impact would have been obvious: all Macs running Go 1.14 or later
would have a pre-installed package miss rate of 100%.
This bug wasn’t caught by our unit tests because it was caused
by the distribution build machines having a modified environment
different from actual user machines.
The unit tests ran in the same modified environment
as the build and worked fine.
These kinds of unexpected differences between developer machines
and user machines are inevitable at scale.
Instrumenting the software on user machines is the most reliable
way to understand how well it is working.
<p>
<b>Surveys are not enough.</b>
Surveys help us understand what users want to do with Go,
but they are only a small sample and have limited resolution.
Asking about usage of infrequently-used features on a survey wastes
time for a majority of respondents, and it requires large response
counts to get an accurate measurement.
<p>
For example, we announced in the Go 1.13 release notes
that future releases would drop support for Native Client (<code>GOOS=nacl</code>).
Similarly, we announced in the Go 1.15 release notes
that future releases would drop support for hardware floating point
on 32-bit Intel CPUs without SSE2 instructions (<code>GO386=387</code>).
Both of those removals went off okay, retroactively proving that our
instincts about how few people would be affected were correct.
On the other hand, we drafted an announcement for Go 1.18
removing <code>-buildmode=shared</code>, because it had essentially been
broken since the introduction of modules,
but when we issued Go 1.18 beta 1 we got feedback
from at least a few people who were using it in some form.
We still don’t know how many people are using it or whether it is
worth the maintenance costs, so <a href="https://github.com/golang/go/issues/47788">it lingers on</a>.
Another question is how long to keep supporting ARMv5 (<code>GOARM=5</code>),
which doesn’t have modern atomic instructions.
More recently, we announced that Go 1.20 will be the last
release to support macOS High Sierra and were
<a href="https://github.com/golang/go/issues/57125#issuecomment-1416277589">promptly asked to keep it around</a>.
Usage information would help us make more informed decisions.
It’s important to note the limitations of this usage information:
if telemetry is disabled on all the machines that use the
feature in question, or if it is only used in machines
that don’t stay up long enough to report anything,
then we won’t observe the usage.
Telemetry is never perfect, but it’s a useful input to the decision
and much better than guessing.
A survey is not any better and usually worse:
there is a limit to how many questions we can
reasonably ask in a survey,
and asking a question where 99% of people answer “no I don’t use that”
is a waste of most people’s time.
<a class=anchor href="#why-open-source"><h2 id="why-open-source">Why Telemetry For Open Source?</h2></a>
<p>
When you hear the word telemetry, if you’re like me, you may have
a visceral negative reaction to a mental image of intrusive, detailed traces
of your every keystroke and mouse click headed back to the developers
of the software you’re using.
And for good reason! That mental image sounds like it must be an exaggeration
but turns out to be fairly accurate.
(Citations:
<a href="https://www.theverge.com/2020/1/31/21117217/amazon-kindle-tracking-page-turn-taps-e-reader-privacy-policy-security-whispersync">Kindle tracking individual page turns</a>,
<a href="https://www.roboleary.net/tools/2022/04/20/vscode-telemetry.html">VS Code telemetry logs</a>,
and
<a href="https://learn.microsoft.com/en-us/dotnet/core/tools/telemetry">.NET telemetry events</a>.)
<p>
Open-source software projects have tended to avoid this kind of telemetry, for two reasons.
The first is the significant privacy cost to users of collecting and storing detailed activity traces.
The second is the fact that access to this data must be restricted,
which would make the project less open than most strive to be.
When the choice is between this kind of invasive tracking or doing nothing,
doing nothing seems like an easy call.
Still, doing nothing has real disadvantages.
It means open-source developers like me tend not to understand as well
how our software is used or how it performs.
Then, because we lack that knowledge,
we end up wasting time by maintaining features that aren’t used,
hurting users by removing features that are still being used,
and delivering a poorer user experience by failing to notice
when our software is underperforming in real-world usage.
<p>
Some open-source projects have adopted traditional telemetry,
with mixed success and varying levels of user pushback.
For example: <a href="https://www.theregister.com/2021/05/07/audacity_telemetry/">Audacity</a>,
<a href="https://www.zdnet.com/article/gitlab-backs-down-on-planned-telemetry-changes-forced-tracking/">GitLab</a>,
and
<a href="https://news.ycombinator.com/item?id=11566720">Homebrew</a>.
Homebrew’s telemetry seems to be generally accepted by users,
and VS Code’s detailed telemetry has not stopped
it from being used by 74% of developers,
as reported by the <a href="https://survey.stackoverflow.co/2022/#integrated-development-environment">2022 StackOverflow survey</a>.
It could even be that the benefits from telemetry are
part of how VS Code’s developers have been able to build a tool that users like so much.
Even so, the vast majority of projects, even large ones that would benefit,
stay away from telemetry.
<p>
I believe that the choice between invasive tracking
and doing nothing at all is a false dichotomy,
and it’s harming open source.
Not having basic information
about how their software is used and how well it is performing
puts open-source developers at a disadvantage compared
to commercial software developers.
Not having this information makes it more difficult to understand
what’s important and what isn’t working,
making prioritization that much harder.
Not having clear prioritization in turn exacerbates
the pre-existing problems with maintainer burnout.
<p>
Eric Raymond famously declared that
“given enough eyeballs, all bugs are shallow,”
which he explained as meaning that
“[g]iven a large enough beta-tester and co-developer base,
almost every problem will be characterized quickly and the fix obvious to someone.”
Perhaps this was true in 1997 (perhaps not),
but it’s certainly not true today,
as the Go macOS cache bug shows.
A quarter century later, software is much larger,
and open-source software is used
by far more people who didn’t develop it
and aren’t familiar with how it should and should not behave.
Eyeballs don’t scale.
<p>
I believe that open-source software projects need to explore new
telemetry designs that help developers get
the information they need to work efficiently and effectively,
without collecting invasive traces of detailed user activity.
<a class=anchor href="#design"><h2 id="design">Transparent Telemetry</h2></a>
<p>
This series of blog posts
presents one such design, which I call <i>transparent telemetry</i>,
because it collects as little as possible (kilobytes per year from each installation)
and then publishes every bit that it collects, for public inspection and analysis.
<p>
I’d like to explore using this system, or one like it, in the Go toolchain,
which I hope will help Go project developers and users alike.
To be clear, I am only suggesting that the instrumentation be added to the Go
command-line tools written and distributed by the Go team,
such as the <code>go</code> command,
the Go compiler, <code>gopls</code>, and <code>govulncheck</code>.
I am <i>not</i> suggesting that instrumentation be added by the Go compiler
to all Go programs in the world: that’s clearly inappropriate.
Also, throughout these posts, “developer” refers to the authors of a given piece
of software, while “user” refers to the users
of that software. From the point of view of the Go toolchain,
“developer” means a Go toolchain developers like me,
while “user” means one of the millions of Go programmers
using that toolchain.
<p>
With transparent telemetry,
as programs from the Go toolchain run,
they would increment counters for various events of interest
(for example: cache hit, use of a given feature, measured latency in a given range) in a per-week on-disk file.
These files hold only counter values, not user data nor user identifiers.
Some counter names include a short stack trace (function names and line offsets only, no argument data).
<p>
The Go team at Google would run a collection server.
Each week, with 10% probability (averaging ~5 times per year)
the user’s Go installation would download a “collection configuration”
to find out which counter values are of interest to the server and at what sample rate.
The collection configuration would be served in a Go module
validated using the <a href="https://go.dev/design/25530-sumdb">Go checksum database</a>,
for added confidence that all clients are being served
the same configuration.
Based on the sample rates, the Go installation might send a report
containing the counter values of interest.
Typical sample rates would be around 2% (averaging ~1 report per installation per year),
but very rare events could be sampled at a higher rate, up to the 10% limit.
As more systems take part in transparent telemetry,
the overall sample rate on any given system will decrease,
because <a href="sample">only a fixed number of samples is necessary</a>.
<p>
The report would contain no ID of any form – no user login, no machine ID, no MAC address, no IP address,
no IP address prefix, no geolocation information,
no randomly-generated pseudo-ID, no other kind of identifiers.
The report would contain basic information about the toolchain,
such as its version and what operating system and architecture it was built for.
The report could also contain coarse-grained information about the version
of the host operating system (for example, “Windows 8”) and other tools the Go toolchain uses,
such as the local C compiler (“gcc 2.95”).
<p>
The server would collect each day’s uploaded reports,
update telemetry graphs served publicly on go.dev,
and post the full set of uploaded reports for public download, inspection, and analysis.
<p>
Although the report would not include any identifiers, the TCP connection uploading the report
would expose the system’s public IP address to the server if a proxy is not being used.
This IP address would not be associated with the uploaded reports in any way.
Standard system maintenance, including DoS prevention, might require logs that include the IP address,
but uploaded reports will be kept separate from those logs.
The privacy policy would be similar to the one used by
<a href="https://proxy.golang.org/privacy">the Go module mirror and checksum database</a>.
<p>
The <a href="https://go.dev/dl">Go home page</a> and <a href="https://go.dev/dl">download page</a>
already include a notice about the default
use of the Go module mirror and a link to more information.
That notice and link would be updated to disclose on-by-default telemetry.
To opt out, users would set <code>GOTELEMETRY=off</code> in their environment
or run a simple command like <code>go env -w GOTELEMETRY=off</code>;
The first telemetry report is not sent until at least one week after installation,
giving ample time to opt out.
Opting out stops all collection and reporting: no “opt out” event is sent.
It is simply impossible to see systems that install Go and then opt out in the next seven days.
<a class=anchor href="#summary"><h2 id="summary">Summary</h2></a>
<p>
Transparent telemetry has the following key properties:
<ul>
<li>
<p>
The decisions about what metrics to collect are made in an
open, public process.
<li>
<p>
The collection configuration is automatically generated from
the actively tracked metrics: no data is collected that isn’t needed
for the metrics.
<li>
<p>
The collection configuration is served using a tamper-evident
transparent log, making it very difficult to serve different
collection configurations to different systems.
<li>
<p>
The collection configuration is a cacheable, proxied Go module,
so any privacy-enhancing local Go proxy already in use for
ordinary modules will automatically be used for collection configuration.
To further ameliorate concerns about tracking systems
by the downloading of the collection configuration,
each installation only bothers downloading the configuration
each week with probability 10%,
so that each installation only asks for the configuration
about five times per year.
[<i>Update</i>, 2023-02-24: The design has <a href="telemetry-feedback#opt-in">changed to be opt-in</a>,
which requires raising these probabilities.]
<li>
<p>
Uploaded reports only include total event counts over a full week,
not any kind of time-ordered event trace.
<li>
<p>
Uploaded reports do not include user IDs, machine IDs, or any other kind of ID.
<li>
<p>
Uploaded reports only contain strings that are already known to the collection server:
counter names, program names, and version strings repeated from the collection configuration,
along with the names of functions in specific, unmodified Go toolchain programs
for stack traces.
The only types of non-string data in the reports are event counts, dates, and line numbers.
<li>
<p>
IP addresses exposed by the HTTP session that uploads the report are not
recorded with the reports.
<li>
<p>
Thanks to <a href="sample">sampling</a>, only a constant number of uploaded reports
are needed to achieve a specific accuracy target, no matter how many
installations exist. Specifically, only about 16,000 reports are needed
for 1% accuracy at a 99% confidence level.
This means that <i>as new systems are added to
the system, each system reports less often</i>.
With a conservative estimate of two million Go installations,
about 16,000 reporting each week corresponds to an overall
reporting rate of well under 2% per week,
meaning each installation would upload a report on average less than
once per year.
[<i>Update</i>, 2023-02-24: The design has <a href="telemetry-feedback#opt-in">changed to be opt-in</a>,
which requires raising these probabilities.]
<li>
<p>
The aggregate computed metrics are made public in graphical and tabular form.
<li>
<p>
The full raw data as collected is made public, so that project maintainers
have no proprietary advantage or insights in their role as the direct data collector.
<li>
<p>
The system is on by default, but opting out is easy, effective, and persistent.
[<i>Update</i>, 2023-02-24: The design has been <a href="telemetry-feedback#opt-in">changed to be opt-in</a>.]</ul>
<a class=anchor href="#next_steps"><h2 id="next_steps">Next Steps</h2></a>
<p>
For more detail about the design, see the <a href="telemetry-design">next post</a>.
For more use cases, see the <a href="telemetry-uses">post after that</a>.
<p>
Although these posts use Go as the example system using
transparent telemetry, I hope that the ideas apply and
can be adopted by other open-source projects too,
in their own, separate collection systems.
For example, even though VS Code collects high-resolution event traces
(sometimes tens or hundreds of events per minute),
a close reading of those traces shows hardly anything is new in each event.
That is, VS Code suffers the reputational hit of collecting lots of data
but appears to gather relatively little actual information.
Perhaps using transparent telemetry in VS Code or a similar editor could offer
the editor’s developers roughly equivalent insights and development velocity
at a much lower privacy cost to users.
<p>
I am posting these to start a <a href="https://go.dev/s/telemetry-discussion">discussion about how the Go toolchain can
adopt telemetry</a>
in some form to help the Go toolchain developers make better
decisions about the development and maintenance of Go.
I have written an implementation of local counter collection
to convince myself it could be made cheap enough,
but no other part of the system exists today in any form.
I hope that the system can be built over the course of 2023.
Transparent Telemetrytag:research.swtch.com,2012:research.swtch.com/telemetry2023-02-08T08:00:00-05:002023-02-08T08:02:00-05:00Topic Index
<p>
These are the posts in the February 2023 “Transparent Telemetry” series:
<ul>
<li>
“<a href="telemetry-intro">Transparent Telemetry for Open-Source Projects</a>” [<a href="telemetry-intro.pdf">PDF</a>].
<li>
“<a href="telemetry-design">The Design of Transparent Telemetry</a>” [<a href="telemetry-design.pdf">PDF</a>].
<li>
“<a href="telemetry-uses">Use Cases for Transparent Telemetry</a>” [<a href="telemtry-uses.pdf">PDF</a>].</ul>
<p>
A (now closed) GitHub discussion about adding transparent telemetry to Go is at
<a href="https://go.dev/s/telemetry-discussion">https://go.dev/s/telemetry-discussion</a>.
<p>
There is one followup post:
<ul>
<li>
“<a href="telemetry-opt-in">Opting In to Transparent Telemetry</a>” [<a href="telemtry-opt-in.pdf">PDF</a>].</ul>
The Magic of Sampling, and its Limitationstag:research.swtch.com,2012:research.swtch.com/sample2023-02-04T12:00:00-05:002023-02-04T12:02:00-05:00The magic of using small samples to learn about large data sets.
<p>
Suppose I have a large number of M&Ms
and want to estimate what fraction of them have <a href="https://spinroot.com/pjw">Peter’s face</a> on them.
As one does.
<p>
<img name="sample-pjw1" class="center pad resizable" width=450 height=276 src="sample-pjw1.jpg" srcset="sample-pjw1.jpg 1x, sample-pjw1@2x.jpg 2x, sample-pjw1@4x.jpg 4x">
<p>
If I am too lazy to count them all, I can estimate the true fraction using sampling:
pick N at random, count how many P have Peter’s face, and then estimate
the fraction to be P/N.
<p>
I can <a href="https://go.dev/play/p/GQr6ShQ_ivG">write a Go program</a> to pick 10 of the 37 M&Ms for me: 27 30 1 13 36 5 33 7 10 19.
(Yes, I am too lazy to count them, but I was not too lazy to number the M&Ms in order to use the Go program.)
<p>
<img name="sample-pjw2" class="center pad resizable" width=450 height=73 src="sample-pjw2.jpg" srcset="sample-pjw2.jpg 1x, sample-pjw2@2x.jpg 2x, sample-pjw2@4x.jpg 4x">
<p>
Based on this estimate, we can estimate that 3/10 = 30% of my M&Ms have Peter’s face.
We can do it a few more times:
<p>
<img name="sample-pjw3" class="center pad resizable" width=450 height=64 src="sample-pjw3.jpg" srcset="sample-pjw3.jpg 1x, sample-pjw3@2x.jpg 2x, sample-pjw3@4x.jpg 4x">
<p>
<img name="sample-pjw4" class="center pad resizable" width=450 height=61 src="sample-pjw4.jpg" srcset="sample-pjw4.jpg 1x, sample-pjw4@2x.jpg 2x, sample-pjw4@4x.jpg 4x">
<p>
<img name="sample-pjw5" class="center pad resizable" width=450 height=73 src="sample-pjw5.jpg" srcset="sample-pjw5.jpg 1x, sample-pjw5@2x.jpg 2x, sample-pjw5@4x.jpg 4x">
<p>
And we get a few new estimates: 30%, 40%, 20%. The actual fraction turns out to be 9/37 = 24.3%.
These estimates are perhaps not that impressive,
but we are only using 10 samples.
With not too many more samples, we can get far more accurate estimates,
even for much larger data sets.
Suppose we had many more M&Ms, again 24.3% Peter faces, and we sample 100 of them, or 1,000, or 10,000.
Since we’re lazy, let’s write <a href="https://go.dev/play/p/VcqirSSiS1Q">a program to simulate the process</a>.
<pre>$ go run sample.go
10: 40.0% 20.0% 30.0% 0.0% 10.0% 30.0% 10.0% 20.0% 20.0% 0.0%
100: 25.0% 26.0% 21.0% 26.0% 15.0% 25.0% 30.0% 30.0% 29.0% 20.0%
1000: 24.7% 23.8% 21.0% 25.4% 25.1% 24.2% 25.7% 22.9% 24.0% 23.8%
10000: 23.4% 24.6% 24.3% 24.3% 24.7% 24.6% 24.6% 24.7% 24.1% 25.0%
$
</pre>
<p>
Accuracy improves fairly quickly:
<ul>
<li>
With 10 samples, our estimates are accurate to within about 15%.
<li>
With 100 samples, our estimates are accurate to within about 5%.
<li>
With 1,000 samples, our estimates are accurate to within about 3%.
<li>
With 10,000 samples, our estimates are accurate to within about 1%.</ul>
<p>
Because we are estimating only the percentage of Peter faces,
not the total number, the accuracy (also measured in percentages)
does not depend on the total number of M&Ms, only on the number of samples.
So 10,000 samples is enough to get roughly 1% accuracy whether we have
100,000 M&Ms, 1 million M&Ms, or even 100 billion M&Ms!
In the last scenario, we have 1% accuracy despite only sampling 0.00001% of the M&Ms.
<p>
<b>The magic of sampling is that we can derive accurate estimates
about a very large population using a relatively small number of samples.</b>
<p>
Sampling turns many one-off estimations into jobs that are feasible to do by hand.
For example, suppose we are considering revising an error-prone API
and want to estimate how often that API is used incorrectly.
If we have a way to randomly sample uses of the API
(maybe <code>grep -Rn pkg.Func . | shuffle -m 100</code>),
then manually checking 100 of them will give us an estimate
that’s accurate to within 5% or so.
And checking 1,000 of them, which may not take more than an hour or so
if they’re easy to eyeball, improves the accuracy to 1.5% or so.
Real data to decide an important question
is usually well worth a small amount of manual effort.
<p>
For the kinds of decisions I look at related to Go,
this approach comes up all the time:
What fraction of <code>for</code> loops in real code have a <a href="https://github.com/golang/go/discussions/56010">loop scoping bug</a>?
What fraction of warnings by a new <code>go</code> <code>vet</code> check are false positives?
What fraction of modules have no dependencies?
These are drawn from my experience, and so they may seem specific to Go
or to language development, but once you realize that
sampling makes accurate estimates so easy to come by,
all kind of uses present themselves.
Any time you have a large data set,
<pre>select * from data order by random() limit 1000;
</pre>
<p>
is a very effective way to get a data set you can analyze by hand
and still derive many useful conclusions from.
<a class=anchor href="#accuracy"><h2 id="accuracy">Accuracy</h2></a>
<p>
Let’s work out what accuracy we should expect from these estimates.
The brute force approach would be to run many samples of a given size
and calculate the accuracy for each.
<a href="https://go.dev/play/p/NWUOanCpFtl">This program</a> runs 1,000 trials of 100 samples each,
calculating the observed error for each estimate
and then printing them all in sorted order.
If we plot those points one after the other along the x axis,
we get a picture like this:
<p>
<img name="sample1" class="center pad" width=370 height=369 src="sample1.png" srcset="sample1.png 1x, sample1@2x.png 2x">
<p>
The <a href="https://9fans.github.io/plan9port/man/man1/gview.html">data viewer I’m using in this screenshot</a> has scaled the x-axis labels by
a factor of 1,000 (“x in thousands”).
Eyeballing the scatterplot, we can see that half the time the error
is under 3%, and 80% of the time the error is under 5½%.
<p>
We might wonder at this point whether the error
depends on the actual answer (24.3% in our programs so far).
It does: the error will be lower when the population is lopsided.
Obviously, if the M&Ms are 0% or 100% Peter faces,
our estimates will have no error at all.
In a slightly less degenerate case,
if the M&Ms are 1% or 99% Peter faces, the most likely estimate
from just a few samples is 0% or 100%, which has only 1% error.
It turns out that, in general, the error is maximized when
the actual fraction is 50%,
so <a href="https://go.dev/play/p/Vm2s1SwlKKT">we’ll use that</a> for the rest of the analysis.
<p>
With an actual fraction of 50%, 1,000 sorted errors
from estimating by sampling 100 values look like:
<p>
<img name="sample2" class="center pad" width=369 height=369 src="sample2.png" srcset="sample2.png 1x, sample2@2x.png 2x">
<p>
The errors are a bit larger.
Now the half the time the error is 4% and 80% of the time the error is 6%.
Zooming in on the tail end of the plot produces:
<p>
<img name="sample3" class="center pad" width=390 height=368 src="sample3.png" srcset="sample3.png 1x, sample3@2x.png 2x">
<p>
We can see that 90% of the trials have error 8% or less,
95% of the trials have error 10% or less,
and 99% of the trials have error 12% or less.
The statistical way to phrase those statements
is that “a sample of size N = 100
produces a margin of error of 8% with 90% confidence,
10% with 95% confidence,
and 12% with 99% confidence.”
<p>
Instead of eyeballing the graphs, we can <a href="https://go.dev/play/p/Xq7WMyrNWxq">update the program</a>
to compute these numbers directly.
<pre>$ go run sample.go
N = 10: 90%: 30.00% 95%: 30.00% 99%: 40.00%
N = 100: 90%: 9.00% 95%: 11.00% 99%: 13.00%
N = 1000: 90%: 2.70% 95%: 3.20% 99%: 4.30%
N = 10000: 90%: 0.82% 95%: 0.98% 99%: 1.24%
$
</pre>
<p>
There is something meta about using sampling (of trials) to estimate the errors introduced
by sampling of an actual distribution.
What about the error being introduced by sampling the errors?
We could instead write a program to count all possible outcomes
and calculate the exact error distribution,
but counting won’t work for larger sample sizes.
Luckily, others have done the math for us
and even implemented the relevant functions
in Go’s standard <a href="https://pkg.go.dev/math">math package</a>.
The margin of error for a given confidence level
and sample size is:
<pre>func moe(confidence float64, N int) float64 {
return math.Erfinv(confidence) / math.Sqrt(2 * float64(N))
}
</pre>
<p>
That lets us compute the table <a href="https://go.dev/play/p/DKeNfDwLmJZ">more directly</a>.
<pre>$ go run sample.go
N = 10: 90%: 26.01% 95%: 30.99% 99%: 40.73%
N = 20: 90%: 18.39% 95%: 21.91% 99%: 28.80%
N = 50: 90%: 11.63% 95%: 13.86% 99%: 18.21%
N = 100: 90%: 8.22% 95%: 9.80% 99%: 12.88%
N = 200: 90%: 5.82% 95%: 6.93% 99%: 9.11%
N = 500: 90%: 3.68% 95%: 4.38% 99%: 5.76%
N = 1000: 90%: 2.60% 95%: 3.10% 99%: 4.07%
N = 2000: 90%: 1.84% 95%: 2.19% 99%: 2.88%
N = 5000: 90%: 1.16% 95%: 1.39% 99%: 1.82%
N = 10000: 90%: 0.82% 95%: 0.98% 99%: 1.29%
N = 20000: 90%: 0.58% 95%: 0.69% 99%: 0.91%
N = 50000: 90%: 0.37% 95%: 0.44% 99%: 0.58%
N = 100000: 90%: 0.26% 95%: 0.31% 99%: 0.41%
$
</pre>
<p>
We can also reverse the equation to compute the necessary
sample size from a given confidence level and margin of error:
<pre>func N(confidence, moe float64) int {
return int(math.Ceil(0.5 * math.Pow(math.Erfinv(confidence)/moe, 2)))
}
</pre>
<p>
That lets us <a href="https://go.dev/play/p/Y81_FORHvw5">compute this table</a>.
<pre>$ go run sample.go
moe = 5%: 90%: 271 95%: 385 99%: 664
moe = 2%: 90%: 1691 95%: 2401 99%: 4147
moe = 1%: 90%: 6764 95%: 9604 99%: 16588
$
</pre>
<a class=anchor href="#limitations"><h2 id="limitations">Limitations</h2></a>
<p>
To accurately estimate the fraction of items with
a given property, like M&Ms with Peter faces,
each item must have the same chance of being selected,
as each M&M did.
Suppose instead that we had ten bags of M&Ms:
nine one-pound bags with 500 M&Ms each,
and a small bag containing the 37 M&Ms we used before.
If we want to estimate the fraction of M&Ms with
Peter faces, it would not work to sample by
first picking a bag at random
and then picking an M&M at random from the bag.
The chance of picking any specific M&M from a one-pound bag
would be 1/10 × 1/500 = 1/5,000, while the chance
of picking any specific M&M from the small bag would be
1/10 × 1/37 = 1/370.
We would end up with an estimate of around 9/370 = 2.4% Peter faces,
even though the actual answer is 9/(9×500+37) = 0.2% Peter faces.
<p>
The problem here is not the kind of random sampling error
that we computed in the previous section.
Instead it is a systematic error caused by a sampling mechanism
that does not align with the statistic being estimated.
We could recover an accurate estimate by weighting
an M&M found in the small bag as only w = 37/500 of an M&M
in both the numerator and denominator of any estimate.
For example, if we picked 100 M&Ms with replacement from each bag
and found 24 Peter faces in the small bag,
then instead of 24/1000 = 2.4% we would compute 24w/(900+100w) = 0.2%.
<p>
As a less contrived example,
<a href="https://go.dev/blog/pprof">Go’s memory profiler</a>
aims to sample approximately one allocation per half-megabyte allocated
and then derive statistics about where programs allocate memory.
Roughly speaking, to do this the profiler maintains a sampling trigger,
initialized to a random number between 0 and one million.
Each time a new object is allocated,
the profiler decrements the trigger by the size of the object.
When an allocation decrements the trigger below zero,
the profiler samples that allocation
and then resets the trigger to a new random number
between 0 and one million.
<p>
This byte-based sampling means that to estimate the
fraction of bytes allocated in a given function,
the profiler can divide the total sampled bytes allocated in that function
divided by the total sampled bytes allocated in the entire program.
Using the same approach to
estimate the fraction of <i>objects</i> allocated in a given function
would be inaccurate: it would overcount large objects and undercount
small ones, because large objects are more likely to be sampled.
In order to recover accurate statistics about allocation counts,
the profiler applies a size-based weighting function
during the calcuation, just as in the M&M example.
(This is the reverse of the situation with the M&Ms:
we are randomly sampling individual bytes of allocated memory
but now want statistics about their “bags”.)
<p>
It is not always possible to undo skewed sampling,
and the skew makes margin of error calculation
more difficult too.
It is almost always better to make sure that the
sampling is aligned with the statistic you want to compute.
Go’s Version Control Historytag:research.swtch.com,2012:research.swtch.com/govcs2022-02-14T10:00:00-05:002022-02-14T10:02:00-05:00A tour of Go’s four version control systems.
<p>
Every once in a while someone notices the first commit in the Go repo is dated 1972:
<pre>% git log --reverse --stat
commit 7d7c6a97f815e9279d08cfaea7d5efb5e90695a8
Author: Brian Kernighan <bwk>
AuthorDate: Tue Jul 18 19:05:45 1972 -0500
Commit: Brian Kernighan <bwk>
CommitDate: Tue Jul 18 19:05:45 1972 -0500
hello, world
R=ken
DELTA=7 (7 added, 0 deleted, 0 changed)
src/pkg/debug/macho/testdata/hello.b | 7 +++++++
1 file changed, 7 insertions(+)
...
</pre>
<p>
Obviously something silly is going on, and people usually stop there.
But Go’s actual version control history is richer and more interesting.
For example, there are a few more fake commits and then the fifth
commit is the first real one:
<pre>commit 18c5b488a3b2e218c0e0cf2a7d4820d9da93a554
Author: Robert Griesemer <gri@golang.org>
AuthorDate: Sun Mar 2 20:47:34 2008 -0800
Commit: Robert Griesemer <gri@golang.org>
CommitDate: Sun Mar 2 20:47:34 2008 -0800
Go spec starting point.
SVN=111041
doc/go_spec | 1197 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1197 insertions(+)
</pre>
<p>
Why does that commit have a different trailer than the three fake commits?
<a class=anchor href="#subversion"><h2 id="subversion">Subversion</h2></a>
<p>
Go started out using Subversion, as part of a small experiment to evaluate
Subversion for wider use inside Google. The experiment did not result
in wider Subversion use, but the <code>SVN=111041</code> tag in the
<a href="https://go.googlesource.com/go/+/18c5b488a3b2e218c0e0cf2a7d4820d9da93a554">first real commit above</a>
records that on the original Subversion server, that Go commit
was revision 111,041. (Subversion assigns revision numbers in increasing order,
and the server was a small monorepo being used for a few other projects besides Go.
There were not 111,040 other Go commits that didn’t make it out.)
<a class=anchor href="#perforce"><h2 id="perforce">Perforce</h2></a>
<p>
The SVN tags continue in the logs <a href="https://go.googlesource.com/go/+/05caa7f82030327ccc9ae63a2b0121a029286501">until July 2008</a>,
where we see one last SVN commit and then a new form:
<pre>commit 777ee7163bba96f2c9b3dfe135d8ad4ab837c062
Author: Rob Pike <r@golang.org>
AuthorDate: Mon Jul 21 16:18:04 2008 -0700
Commit: Rob Pike <r@golang.org>
CommitDate: Mon Jul 21 16:18:04 2008 -0700
map delete
SVN=128258
doc/go_lang.txt | 6 ++++++
1 file changed, 6 insertions(+)
commit 05caa7f82030327ccc9ae63a2b0121a029286501
Author: Rob Pike <r@golang.org>
AuthorDate: Mon Jul 21 17:10:49 2008 -0700
Commit: Rob Pike <r@golang.org>
CommitDate: Mon Jul 21 17:10:49 2008 -0700
help management of empty pkg and lib directories in perforce
R=gri
DELTA=4 (4 added, 0 deleted, 0 changed)
OCL=13328
CL=13328
lib/place-holder | 2 ++
pkg/place-holder | 2 ++
src/cmd/gc/mksys.bash | 0
3 files changed, 4 insertions(+)
</pre>
<p>
This was the first commit after Go moved from a lightly-used
Subversion server to a lightly-used Perforce server.
Google was a <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/39983.pdf">very heavy Perforce user</a>
due to its use of a <a href="https://dl.acm.org/doi/pdf/10.1145/2854146">single giant monorepo for most code</a>.
Go was not part of that monorepo world, and at the time,
projects that didn’t have to be on the heavily-loaded main server
were hosted instead on secondary ones.
<p>
At the transition to Perforce, you can see the introduction of telltale
<code>DELTA=</code>, <code>OCL=</code>, and <code>CL=</code> tags.
Perforce imposes a linear ordering on change lists, like Subversion revisions,
but each change list ends up with two sequence numbers: it is assigned
one when it is created and uses that number while it is a local, pending change list,
including in our code review systems.
Then, when it is submitted and becomes part of the official history,
it is assigned a new number, to keep submitted changes in order.
The <code>OCL=</code> is the original change list number, while the <code>CL=</code> is the final one.
The <code>R=</code> line means that <code>gri</code> (Robert Griesemer) reviewed the change before
it was submitted,
using our internal code review system, then called Mondrian.
Because the server was so lightly used (and presumably the review so quick),
no new change lists had been
created or submitted while this one was pending, and its final submission
reused its original number instead of needing to create a new one.
<p>
Many other changes have the same <code>OCL=</code> and <code>CL=</code> because they were
created and submitted in a single Perforce command, without review,
like <a href="https://go.googlesource.com/go/+/c1f5eda7a2465dae196d1fa10baf6bfa9253808a">the next one</a>:
<pre>commit c1f5eda7a2465dae196d1fa10baf6bfa9253808a
Author: Rob Pike <r@golang.org>
AuthorDate: Mon Jul 21 18:06:39 2008 -0700
Commit: Rob Pike <r@golang.org>
CommitDate: Mon Jul 21 18:06:39 2008 -0700
change date
OCL=13331
CL=13331
doc/go_lang.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
</pre>
<p>
You can also see from this that the <code>DELTA=</code> line is added
by the code review system, not Perforce:
this unreviewed change doesn’t have it.
(The last two lines are being provided by the <code>git log --stat</code>
command as we explore the Git version of these changes.)
<p>
The bulk of the pre-open-source development of Go was done
on that Perforce server.
<a class=anchor href="#mercurial"><h2 id="mercurial">Mercurial</h2></a>
<p>
The <code>OCL=</code> and <code>CL=</code> lines continue until we get to <a href="https://go.googlesource.com/go/+/b74fd8ecb17c1959bbf2dbba6ccb8bae6bfabeb8">October 2009</a>, when they switch to a new form:
<pre>commit 942d6590d9005f89e971ed5af0374439a264a20e
Author: Kai Backman <kaib@golang.org>
AuthorDate: Fri Oct 23 11:03:16 2009 -0700
Commit: Kai Backman <kaib@golang.org>
CommitDate: Fri Oct 23 11:03:16 2009 -0700
one more argsize fix. we were copying with the correct
alignment but not enough (duh).
R=rsc
APPROVED=rsc
DELTA=16 (13 added, 0 deleted, 3 changed)
OCL=36020
CL=36024
src/cmd/5g/ggen.c | 2 +-
test/arm-pass.txt | 17 +++++++++++++++--
2 files changed, 16 insertions(+), 3 deletions(-)
commit b74fd8ecb17c1959bbf2dbba6ccb8bae6bfabeb8
Author: Kai Backman <kaib@golang.org>
AuthorDate: Fri Oct 23 12:43:01 2009 -0700
Commit: Kai Backman <kaib@golang.org>
CommitDate: Fri Oct 23 12:43:01 2009 -0700
fix build issue cause by transition to hg
R=rsc
http://go/go-review/1013012
src/make-arm.bash | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
</pre>
<p>
That commit introduces yet another kind of trailer line,
which says <code>http://go/go-review/1013012</code>.
At one point that link pointed, inside Google,
to an internal version of the Rietveld App Engine app,
which we used for code review on the way to submission
to a private Mercurial repo on Google Code Project Hosting.
We were doing that conversion, in October 2009, as part
of the preparation to open-source Go in November.
<p>
It was at this point, during the conversion to Mercurial,
that I introduced the “hello, world” commits.
The Subversion and Perforce repos had been Google-internal servers,
and we had stored essentially all the Go code in the world
alongside the Go implementation.
We each had “user directories” like <code>/usr/rsc</code> in the repo.
Those directories contained various code that wasn’t going
out in the release, whether because it was targeting Google-internal
technologies or because it was simply not worth publishing.
(You can still see references to <code>/usr</code> in a few commit messages
that did make it out.)
<p>
For the conversion, I ended up with a directory full of patches,
one per commit, and a script to run the patches to create a new
Mercurial repository.
Then I edited the patches (with automated help) to remove the files that weren’t
being released, including the entire <code>/usr</code> tree,
and to add the new open-source copyright notices to every file.
<p>
This took me about a week, and it was annoyingly difficult.
If I removed a file from some patches but then it got renamed
in others, Mercurial would complain about the rename using
a file that hadn’t been created in the first place.
And if I added a copyright notice when the file was created,
I had to be careful to update later patches to not have
merge conflicts when changing the top of the file.
And so on. I remember thinking that it was like being
a con man who constructs an elaborate false identity
and struggles to keep it all straight.
<a class=anchor href="#hello_world"><h2 id="hello_world">Hello, world</h2></a>
<p>
A month earlier, I had created object file parsing packages
like <a href="https://go.dev/pkg/debug/macho">debug/macho</a>,
and as test data for each of those packages I <a href="https://go.googlesource.com/go/+/bf69025825fd2b8e7aac01f27d5c974bd30af542">checked in an object
file</a> compiled from a trivial “hello, world” program, along with the
source code for them:
<pre>commit bf69025825fd2b8e7aac01f27d5c974bd30af542
Author: Russ Cox <rsc@golang.org>
AuthorDate: Fri Sep 18 11:49:22 2009 -0700
Commit: Russ Cox <rsc@golang.org>
CommitDate: Fri Sep 18 11:49:22 2009 -0700
Mach-O file reading
R=r
DELTA=784 (784 added, 0 deleted, 0 changed)
OCL=34715
CL=34788
src/pkg/debug/macho/Makefile | 12 +
src/pkg/debug/macho/file.go | 374 +++++++++++++++++++++
src/pkg/debug/macho/file_test.go | 159 +++++++++
src/pkg/debug/macho/macho.go | 230 +++++++++++++
src/pkg/debug/macho/testdata/gcc-386-darwin-exec | Bin 0 -> 12588 bytes
src/pkg/debug/macho/testdata/gcc-amd64-darwin-exec | Bin 0 -> 8512 bytes
.../macho/testdata/gcc-amd64-darwin-exec-debug | Bin 0 -> 4540 bytes
7 files changed, 775 insertions(+)
</pre>
<p>
This was the original commit that introduced <code>src/pkg/debug/macho/testdata/hello.c</code>, of course.
As I added copyright notices to files, it seemed wrong to
add a copyright notice to that <code>hello.c</code> file.
Instead, since I had the repo split into this patch-file-per-commit form,
it was easy to create a few fake commits that showed at least part of
the real history of that program, as an Easter egg for people who looked that closely:
<pre>commit 7d7c6a97f815e9279d08cfaea7d5efb5e90695a8
Author: Brian Kernighan <bwk>
AuthorDate: Tue Jul 18 19:05:45 1972 -0500
Commit: Brian Kernighan <bwk>
CommitDate: Tue Jul 18 19:05:45 1972 -0500
hello, world
R=ken
DELTA=7 (7 added, 0 deleted, 0 changed)
src/pkg/debug/macho/testdata/hello.b | 7 +++++++
1 file changed, 7 insertions(+)
commit 0bb0b61d6a85b2a1a33dcbc418089656f2754d32
Author: Brian Kernighan <bwk>
AuthorDate: Sun Jan 20 01:02:03 1974 -0400
Commit: Brian Kernighan <bwk>
CommitDate: Sun Jan 20 01:02:03 1974 -0400
convert to C
R=dmr
DELTA=6 (0 added, 3 deleted, 3 changed)
src/pkg/debug/macho/testdata/hello.b | 7 -------
src/pkg/debug/macho/testdata/hello.c | 3 +++
2 files changed, 3 insertions(+), 7 deletions(-)
commit 0744ac969119db8a0ad3253951d375eb77cfce9e
Author: Brian Kernighan <research!bwk>
AuthorDate: Fri Apr 1 02:02:04 1988 -0500
Commit: Brian Kernighan <research!bwk>
CommitDate: Fri Apr 1 02:02:04 1988 -0500
convert to Draft-Proposed ANSI C
R=dmr
DELTA=5 (2 added, 0 deleted, 3 changed)
src/pkg/debug/macho/testdata/hello.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
commit d82b11e4a46307f1f1415024f33263e819c222b8
Author: Brian Kernighan <bwk@research.att.com>
AuthorDate: Fri Apr 1 02:03:04 1988 -0500
Commit: Brian Kernighan <bwk@research.att.com>
CommitDate: Fri Apr 1 02:03:04 1988 -0500
last-minute fix: convert to ANSI C
R=dmr
DELTA=3 (2 added, 0 deleted, 1 changed)
src/pkg/debug/macho/testdata/hello.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
</pre>
<p>
The <a href="https://go.googlesource.com/go/+/7d7c6a97f815e9279d08cfaea7d5efb5e90695a8">July 1972 commit</a> shows the very first “hello, world” program,
copied from Brian Kernighan’s
“<a href="https://www.bell-labs.com/usr/dmr/www/btut.html">A Tutorial Introduction to the Language B</a>”:
<pre>main( ) {
extrn a, b, c;
putchar(a); putchar(b); putchar(c); putchar(’!*n’);
}
a ’hell’;
b ’o, w’;
c ’orld’;
</pre>
<p>
Various online sources refer to this as a “book”, but it definitely was not.
It was a printed, 17-page document that was later included as half of January 1973’s
“<a href="https://www.bell-labs.com/usr/dmr/www/bintro.html">Bell Laboratories Computing Science Technical Report #8: The Programming Language B</a>”
(still not even close to a book!).
The “hello, world” program appears in section 7,
preceded by the less catchy “hi!” program in section 6.
As I write this blog post in 2022, I cannot find any online
reference for the B tutorial being originally dated July 1972,
but I believe I must have had a good reason.
<p>
The <a href="https://go.googlesource.com/go/+/0bb0b61d6a85b2a1a33dcbc418089656f2754d32">January 1974 commit</a>
converts the program to C, as found in Kernighan’s “<a href="https://www.bell-labs.com/usr/dmr/www/ctut.pdf">Programming in C — A Tutorial</a>”.
The linked PDF is a retyped copy from Dennis Ritchie, without a date,
but Ritchie’s “<a href="https://www.bell-labs.com/usr/dmr/www/cman74.pdf">C Reference Manual</a>” technical memo
dated January 15, 1974 cites Kernighan’s tutorial as “Unpublished internal memorandum, Bell Laboratories, 1974”,
implying that the tutorial must have been written in January as well. That C program was shorter than
what we are used to today, but much nicer than the B program:
<pre>main( ) {
printf("hello, world");
}
</pre>
<p>
I skipped over the presentation in the first edition of <i>The C Programming Language</i>, which looks like:
<pre>main() {
printf("hello, world\n");
}
</pre>
<p>
The <a href="https://go.googlesource.com/go/+/0744ac969119db8a0ad3253951d375eb77cfce9e">April 1988 commit</a>
shows the program from the “Draft-Proposed ANSI C” version of the second edition of <i>The C Programming Language</i>:
<pre>#include <stdio.h>
main()
{
printf("hello, world\n");
}
</pre>
<p>
The <a href="https://go.googlesource.com/go/+/d82b11e4a46307f1f1415024f33263e819c222b8">second April 1988 commit</a>
shows the final full ANSI C version we know today:
<pre>#include <stdio.h>
int
main(void)
{
printf("hello, world\n");
return 0;
}
</pre>
<p>
Wikipedia says the second edition was published in April 1988.
I didn’t have a day, but April 1 seemed appropriate.
I didn’t have a time for either one of these, so I used 02:03:04 for
the second commit and therefore 02:02:04 for the first.
<p>
The email addresses in the commits are also period-appropriate,
although Mondrian’s <code>R=</code> and <code>DELTA=</code> tags clearly are not:
there was no code review happening back then!
<p>
For what it’s worth, I’ve since heard from many people about
how the 1972 dates have broken various presentations and
analyses they’ve done on the Go repository as compared to
other projects. Oops! My apologies.
(I also heard from someone who was analyzing Mercurial
repos for “branchiness” and decided to use the Go
repo as a large test case.
Their program said it had branchiness 0 and they
spent a while debugging their program before
realizing that the repo really was completely linear,
thanks to our “rebase and merge” policy.)
<a class=anchor href="#public_mercurial"><h2 id="public_mercurial">Public Mercurial</h2></a>
<p>
We published the Google Code Mercurial repo on November 10, 2009,
and to mark the occasion we added one more Easter egg.
A footnote in Brian Kernighan and Rob Pike’s 1984 book <i>The Unix Programming Environment</i>
says:<blockquote>
<p>
Ken Thompson was once asked what he would do differently
if he were redesigning the <small>UNIX</small> system.
His reply: “I’d spell <code>creat</code> with an <code>e</code>.”</blockquote>
<p>
This refers to the <a href="https://man7.org/linux/man-pages/man2/open.2.html">creat(2) system call</a>
and the <code>O_CREAT</code> file open flag.
<p>
Immediately once the release was public,
<a href="https://go.googlesource.com/go/+/c90d392ce3d3203e0c32b3f98d1e68c4c2b4c49b">Ken mailed me a change</a> fixing this mistake:
<pre>commit c90d392ce3d3203e0c32b3f98d1e68c4c2b4c49b
Author: Ken Thompson <ken@golang.org>
AuthorDate: Tue Nov 10 15:05:15 2009 -0800
Commit: Ken Thompson <ken@golang.org>
CommitDate: Tue Nov 10 15:05:15 2009 -0800
spell it with an "e"
R=rsc
http://go/go-review/1025037
src/pkg/os/file.go | 1 +
1 file changed, 1 insertion(+)
</pre>
<p>
Sadly, we didn’t execute the joke perfectly, in that the code review went to
our internal Rietveld server instead of the public one.
So the <a href="https://go.googlesource.com/go/+/44fb865a484b8e12adfa0a1413eacc807cec085b">very next commit</a>
updated the code review server in our configuration.
<p>
But here, for posterity, is the review, preserved in my email:
<p>
<img name="creat" class="center pad" width=776 height=1008 src="creat.png" srcset="creat.png 1x, creat@2x.png 2x">
<a class=anchor href="#git"><h2 id="git">Git</h2></a>
<p>
So things stood from November 2009 until late 2014,
when we knew that Google Code Project Hosting was
going to shut down and we needed a new home.
After investigating a few options, we ended up using
<a href="https://www.gerritcodereview.com/">Gerrit Code Review</a>, which has been a fantastic choice.
Many people think of Go as being hosted on GitHub,
but GitHub is only primary for our issue tracker:
the official primary copy of the sources is on <code>go.googlesource.com</code>.
<p>
You can see the transition from Mercurial to Git
in the logs when the <code>R=</code> lines end, followed by
a few completely unreviewed commits, and then
<code>Reviewed-by:</code> lines begin:
<pre>commit 94151eb2799809ece7e44ce3212aa3cbb9520849
Author: Russ Cox <rsc@golang.org>
AuthorDate: Fri Dec 5 21:33:07 2014 -0500
Commit: Russ Cox <rsc@golang.org>
CommitDate: Fri Dec 5 21:33:07 2014 -0500
encoding/xml: remove SyntaxError.Byte
It is unused. It was introduced in the CL that added InputOffset.
I suspect it was an editing mistake.
LGTM=bradfitz
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/182580043
src/encoding/xml/xml.go | 1 -
1 file changed, 1 deletion(-)
commit 258f53dee33b9055ea168cb186f8c076edee5905
Author: David Symonds <dsymonds@golang.org>
AuthorDate: Mon Dec 8 13:50:49 2014 +1100
Commit: David Symonds <dsymonds@golang.org>
CommitDate: Mon Dec 8 13:50:49 2014 +1100
remove .hgtags.
.hgtags | 140 ----------------------------------------------------------------
1 file changed, 140 deletions(-)
commit 369873c6e5d00314ae30276363f58e5af11b149c
Author: David Symonds <dsymonds@golang.org>
AuthorDate: Mon Dec 8 13:50:49 2014 +1100
Commit: David Symonds <dsymonds@golang.org>
CommitDate: Mon Dec 8 13:50:49 2014 +1100
convert .hgignore to .gitignore.
.hgignore => .gitignore | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
commit f33fc0eb95be84f0a688a62e25361a117e5b995b
Author: David Symonds <dsymonds@golang.org>
AuthorDate: Mon Dec 8 13:53:11 2014 +1100
Commit: David Symonds <dsymonds@golang.org>
CommitDate: Mon Dec 8 13:53:11 2014 +1100
cmd/dist: convert dist from Hg to Git.
src/cmd/dist/build.c | 100 ++++++++++++++++++++++++++++++---------------------
1 file changed, 59 insertions(+), 41 deletions(-)
commit 26399948e3402d3512cb14fe5901afaef54482fa
Author: David Symonds <dsymonds@golang.org>
AuthorDate: Mon Dec 8 11:39:11 2014 +1100
Commit: David Symonds <dsymonds@golang.org>
CommitDate: Mon Dec 8 04:42:22 2014 +0000
add bin/ to .gitignore.
Change-Id: I5c788d324e56ca88366fb54b67240cebf5dced2c
Reviewed-on: https://go-review.googlesource.com/1171
Reviewed-by: Andrew Gerrand <adg@golang.org>
.gitignore | 1 +
1 file changed, 1 insertion(+)
</pre>
<p>
As part of the conversion from Mercurial to Git, we did
not add visible lines to the commit bodies showing the
old Mercurial hashes, but we did record them in the
underlying Git commit objects. For example:
<pre>% git cat-file -p 7d7c6a97f815e9279d08cfaea7d5efb5e90695a8
tree e06bd601885e16ad3d72c2a8c9b411889b2e478e
author Brian Kernighan <bwk> 80352345 -0500
committer Brian Kernighan <bwk> 80352345 -0500
golang-hg f6182e5abf5eb0c762dddbb18f8854b7e350eaeb
hello, world
R=ken
DELTA=7 (7 added, 0 deleted, 0 changed)
%
</pre>
<p>
The <code>golang-hg</code> line records the original Mercurial commit hash.
<p>
And that’s the end of the story, until we move to a fifth
version control system at some point in the future.
Our Software Dependency Problemtag:research.swtch.com,2012:research.swtch.com/deps2019-01-23T11:00:00-05:002019-01-23T11:02:00-05:00Download and run code from strangers on the internet. What could go wrong?
<p>
For decades, discussion of software reuse was far more common than actual software reuse.
Today, the situation is reversed: developers reuse software written by others every day,
in the form of software dependencies,
and the situation goes mostly unexamined.
<p>
My own background includes a decade of working with
Google’s internal source code system,
which treats software dependencies as a first-class concept,<a class=footnote id=body1 href="#note1"><sup>1</sup></a>
and also developing support for
dependencies in the Go programming language.<a class=footnote id=body2 href="#note2"><sup>2</sup></a>
<p>
Software dependencies carry with them
serious risks that are too often overlooked.
The shift to easy, fine-grained software reuse has happened so quickly
that we do not yet understand the best practices for choosing
and using dependencies effectively,
or even for deciding when they are appropriate and when not.
My purpose in writing this article is to raise awareness of the risks
and encourage more investigation of solutions.
<a class=anchor href="#what_is_a_dependency"><h2 id="what_is_a_dependency">What is a dependency?</h2></a>
<p>
In today’s software development world,
a <i>dependency</i> is additional code that you want to call from your program.
Adding a dependency avoids repeating work already done:
designing, writing, testing, debugging, and maintaining a specific
unit of code.
In this article we’ll call that unit of code a <i>package</i>;
some systems use terms like library or module instead of package.
<p>
Taking on externally-written dependencies is an old practice:
most programmers have at one point in their careers
had to go through the steps of manually downloading and installing
a required library, like C’s PCRE or zlib, or C++’s Boost or Qt,
or Java’s JodaTime or JUnit.
These packages contain high-quality, debugged code
that required significant expertise to develop.
For a program that needs the functionality provided by one of these packages,
the tedious work of manually downloading, installing, and updating
the package
is easier than the work of redeveloping that functionality from scratch.
But the high fixed costs of reuse
mean that manually-reused packages tend to be big:
a tiny package would be easier to reimplement.
<p>
A <i>dependency manager</i>
(sometimes called a package manager)
automates the downloading and installation of dependency packages.
As dependency managers
make individual packages easier to download and install,
the lower fixed costs make
smaller packages economical to publish and reuse.
<p>
For example, the Node.js dependency manager NPM provides
access to over 750,000 packages.
One of them, <code>escape-string-regexp</code>,
provides a single function that escapes regular expression
operators in its input.
The entire implementation is:
<pre>var matchOperatorsRe = /[|\\{}()[\]^$+*?.]/g;
module.exports = function (str) {
if (typeof str !== 'string') {
throw new TypeError('Expected a string');
}
return str.replace(matchOperatorsRe, '\\$&');
};
</pre>
<p>
Before dependency managers, publishing an eight-line code library
would have been unthinkable: too much overhead for too little benefit.
But NPM has driven the overhead approximately to zero,
with the result that nearly-trivial functionality
can be packaged and reused.
In late January 2019, the <code>escape-string-regexp</code> package
is explicitly depended upon by almost a thousand
other NPM packages,
not to mention all the packages developers write for their own use
and don’t share.
<p>
Dependency managers now exist for essentially every programming language.
Maven Central (Java),
Nuget (.NET),
Packagist (PHP),
PyPI (Python),
and RubyGems (Ruby)
each host over 100,000 packages.
The arrival of this kind of fine-grained, widespread software reuse
is one of the most consequential shifts in software development
over the past two decades.
And if we’re not more careful, it will lead to serious problems.
<a class=anchor href="#what_could_go_wrong"><h2 id="what_could_go_wrong">What could go wrong?</h2></a>
<p>
A package, for this discussion, is code you download from the internet.
Adding a package as a dependency outsources the work of developing that
code—designing, writing, testing, debugging, and maintaining—to
someone else on the internet,
someone you often don’t know.
By using that code, you are exposing your own program
to all the failures and flaws in the dependency.
Your program’s execution now literally <i>depends</i>
on code downloaded from this stranger on the internet.
Presented this way, it sounds incredibly unsafe.
Why would anyone do this?
<p>
We do this because it’s easy,
because it seems to work,
because everyone else is doing it too,
and, most importantly, because
it seems like a natural continuation of
age-old established practice.
But there are important differences we’re ignoring.
<p>
Decades ago, most developers already
trusted others to write software they depended on,
such as operating systems and compilers.
That software was bought from known sources,
often with some kind of support agreement.
There was still a potential for bugs or outright mischief,<a class=footnote id=body3 href="#note3"><sup>3</sup></a>
but at least we knew who we were dealing with and usually
had commercial or legal recourses available.
<p>
The phenomenon of open-source software,
distributed at no cost over the internet,
has displaced many of those earlier software purchases.
When reuse was difficult, there were fewer projects publishing reusable code packages.
Even though their licenses typically disclaimed, among other things,
any “implied warranties of merchantability and fitness for
a particular purpose,”
the projects built up well-known reputations
that often factored heavily into people’s decisions about which to use.
The commercial and legal support for trusting our software sources
was replaced by reputational support.
Many common early packages still enjoy good reputations:
consider BLAS (published 1979), Netlib (1987), libjpeg (1991),
LAPACK (1992), HP STL (1994), and zlib (1995).
<p>
Dependency managers have scaled this open-source code reuse model down:
now, developers can share code at the granularity of
individual functions of tens of lines.
This is a major technical accomplishment.
There are myriad available packages,
and writing code can involve such a large number of them,
but the commercial, legal, and reputational support mechanisms
for trusting the code have not carried over.
We are trusting more code with less justification for doing so.
<p>
The cost of adopting a bad dependency can be viewed
as the sum, over all possible bad outcomes,
of the cost of each bad outcome
multiplied by its probability of happening (risk).
<p>
<img name="deps-cost" class="center pad" width=383 height=95 src="deps-cost.png" srcset="deps-cost.png 1x, deps-cost@1.5x.png 1.5x, deps-cost@2x.png 2x, deps-cost@3x.png 3x, deps-cost@4x.png 4x">
<p>
The context where a dependency will be used
determines the cost of a bad outcome.
At one end of the spectrum is a personal hobby project,
where the cost of most bad outcomes
is near zero:
you’re just having fun, bugs have no real impact other than
wasting some time, and even debugging them can be fun.
So the risk probability almost doesn’t matter: it’s being multiplied by zero.
At the other end of the spectrum is production software
that must be maintained for years.
Here, the cost of a bug in
a dependency can be very high:
servers may go down,
sensitive data may be divulged,
customers may be harmed,
companies may fail.
High failure costs make it much more important
to estimate and then reduce any risk of a serious failure.
<p>
No matter what the expected cost,
experiences with larger dependencies
suggest some approaches for
estimating and reducing the risks of adding a software dependency.
It is likely that better tooling is needed to help reduce
the costs of these approaches,
much as dependency managers have focused to date on
reducing the costs of download and installation.
<a class=anchor href="#inspect_the_dependency"><h2 id="inspect_the_dependency">Inspect the dependency</h2></a>
<p>
You would not hire a software developer you’ve never heard of
and know nothing about.
You would learn more about them first:
check references, conduct a job interview,
run background checks, and so on.
Before you depend on a package you found on the internet,
it is similarly prudent
to learn a bit about it first.
<p>
A basic inspection can give you a sense
of how likely you are to run into problems trying to use this code.
If the inspection reveals likely minor problems,
you can take steps to prepare for or maybe avoid them.
If the inspection reveals major problems,
it may be best not to use the package:
maybe you’ll find a more suitable one,
or maybe you need to develop one yourself.
Remember that open-source packages are published
by their authors in the hope that they will be useful
but with no guarantee of usability or support.
In the middle of a production outage, you’ll be the one debugging it.
As the original GNU General Public License warned,
“The entire risk as to the quality and performance of the
program is with you.
Should the program prove defective, you assume the cost of all
necessary servicing, repair or correction.”<a class=footnote id=body4 href="#note4"><sup>4</sup></a>
<p>
The rest of this section outlines some considerations when inspecting a package
and deciding whether to depend on it.
<a class=anchor href="#design"><h3 id="design">Design</h3></a>
<p>
Is package’s documentation clear? Does the API have a clear design?
If the authors can explain the package’s API and its design well to you, the user,
in the documentation,
that increases the likelihood they have explained the implementation well to the computer, in the source code.
Writing code for a clear, well-designed API is also easier, faster, and hopefully less error-prone.
Have the authors documented what they expect from client code
in order to make future upgrades compatible?
(Examples include the C++<a class=footnote id=body5 href="#note5"><sup>5</sup></a> and Go<a class=footnote id=body6 href="#note6"><sup>6</sup></a> compatibility documents.)
<a class=anchor href="#code_quality"><h3 id="code_quality">Code Quality</h3></a>
<p>
Is the code well-written?
Read some of it.
Does it look like the authors have been careful, conscientious, and consistent?
Does it look like code you’d want to debug? You may need to.
<p>
Develop your own systematic ways to check code quality.
For example, something as simple as compiling a C or C++ program with
important compiler warnings enabled (for example, <code>-Wall</code>)
can give you a sense of how seriously the developers work to avoid
various undefined behaviors.
Recent languages like Go, Rust, and Swift use an <code>unsafe</code> keyword to mark
code that violates the type system; look to see how much unsafe code there is.
More advanced semantic tools like Infer<a class=footnote id=body7 href="#note7"><sup>7</sup></a> or SpotBugs<a class=footnote id=body8 href="#note8"><sup>8</sup></a> are helpful too.
Linters are less helpful: you should ignore rote suggestions
about topics like brace style and focus instead on semantic problems.
<p>
Keep an open mind to development practices you may not be familiar with.
For example, the SQLite library ships as a single 200,000-line C source file
and a single 11,000-line header, the “amalgamation.”
The sheer size of these files should raise an initial red flag,
but closer investigation would turn up the
actual development source code, a traditional file tree with
over a hundred C source files, tests, and support scripts.
It turns out that the single-file distribution is built automatically from the original sources
and is easier for end users, especially those without dependency managers.
(The compiled code also runs faster, because the compiler can see more optimization opportunities.)
<a class=anchor href="#testing"><h3 id="testing">Testing</h3></a>
<p>
Does the code have tests?
Can you run them?
Do they pass?
Tests establish that the code’s basic functionality is correct,
and they signal that the developer is serious about keeping it correct.
For example, the SQLite development tree has an incredibly thorough test suite
with over 30,000 individual test cases
as well as developer documentation explaining the testing strategy.<a class=footnote id=body9 href="#note9"><sup>9</sup></a>
On the other hand,
if there are few tests or no tests, or if the tests fail, that’s a serious red flag:
future changes to the package
are likely to introduce regressions that could easily have been caught.
If you insist on tests in code you write yourself (you do, right?),
you should insist on tests in code you outsource to others.
<p>
Assuming the tests exist, run, and pass, you can gather more
information by running them with run-time instrumentation
like code coverage analysis, race detection,<a class=footnote id=body10 href="#note10"><sup>10</sup></a>
memory allocation checking,
and memory leak detection.
<a class=anchor href="#debugging"><h3 id="debugging">Debugging</h3></a>
<p>
Find the package’s issue tracker.
Are there many open bug reports? How long have they been open?
Are there many fixed bugs? Have any bugs been fixed recently?
If you see lots of open issues about what look like real bugs,
especially if they have been open for a long time,
that’s not a good sign.
On the other hand, if the closed issues show that bugs are
rarely found and promptly fixed,
that’s great.
<a class=anchor href="#maintenance"><h3 id="maintenance">Maintenance</h3></a>
<p>
Look at the package’s commit history.
How long has the code been actively maintained?
Is it actively maintained now?
Packages that have been actively maintained for an extended
amount of time are more likely to continue to be maintained.
How many people work on the package?
Many packages are personal projects that developers
create and share for fun in their spare time.
Others are the result of thousands of hours of work
by a group of paid developers.
In general, the latter kind of package is more likely to have
prompt bug fixes, steady improvements, and general upkeep.
<p>
On the other hand, some code really is “done.”
For example, NPM’s <code>escape-string-regexp</code>,
shown earlier, may never need to be modified again.
<a class=anchor href="#usage"><h3 id="usage">Usage</h3></a>
<p>
Do many other packages depend on this code?
Dependency managers can often provide statistics about usage,
or you can use a web search to estimate how often
others write about using the package.
More users should at least mean more people for whom
the code works well enough,
along with faster detection of new bugs.
Widespread usage is also a hedge against the question of continued maintenance:
if a widely-used package loses its maintainer,
an interested user is likely to step forward.
<p>
For example, libraries like PCRE or Boost or JUnit
are incredibly widely used.
That makes it more likely—although certainly not guaranteed—that
bugs you might otherwise run into have already been fixed,
because others ran into them first.
<a class=anchor href="#security"><h3 id="security">Security</h3></a>
<p>
Will you be processing untrusted inputs with the package?
If so, does it seem to be robust against malicious inputs?
Does it have a history of security problems
listed in the National Vulnerability Database (NVD)?<a class=footnote id=body11 href="#note11"><sup>11</sup></a>
<p>
For example, when Jeff Dean and I started work on
Google Code Search<a class=footnote id=body12 href="#note12"><sup>12</sup></a>—<code>grep</code> over public source code—in 2006,
the popular PCRE regular expression library seemed like an obvious choice.
In an early discussion with Google’s security team, however,
we learned that PCRE had a history of problems like buffer overflows,
especially in its parser.
We could have learned the same by searching for PCRE in the NVD.
That discovery didn’t immediately cause us to abandon PCRE,
but it did make us think more carefully about testing and isolation.
<a class=anchor href="#licensing"><h3 id="licensing">Licensing</h3></a>
<p>
Is the code properly licensed?
Does it have a license at all?
Is the license acceptable for your project or company?
A surprising fraction of projects on GitHub have no clear license.
Your project or company may impose further restrictions on the
allowed licenses of dependencies.
For example, Google disallows the use of code licensed under
AGPL-like licenses (too onerous) as well as WTFPL-like licenses (too vague).<a class=footnote id=body13 href="#note13"><sup>13</sup></a>
<a class=anchor href="#dependencies"><h3 id="dependencies">Dependencies</h3></a>
<p>
Does the code have dependencies of its own?
Flaws in indirect dependencies are just as bad for your program
as flaws in direct dependencies.
Dependency managers can list all the transitive dependencies
of a given package, and each of them should ideally be inspected as
described in this section.
A package with many dependencies incurs additional inspection work,
because those same dependencies incur additional risk
that needs to be evaluated.
<p>
Many developers have never looked at the full list of transitive
dependencies of their code and don’t know what they depend on.
For example, in March 2016 the NPM user community discovered
that many popular projects—including Babel, Ember, and React—all depended
indirectly on a tiny package called <code>left-pad</code>,
consisting of a single 8-line function body.
They discovered this when
the author of <code>left-pad</code> deleted that package from NPM,
inadvertently breaking most Node.js users’ builds.<a class=footnote id=body14 href="#note14"><sup>14</sup></a>
And <code>left-pad</code> is hardly exceptional in this regard.
For example, 30% of the
750,000 packages published on NPM
depend—at least indirectly—on <code>escape-string-regexp</code>.
Adapting Leslie Lamport’s observation about distributed systems,
a dependency manager can easily
create a situation in which the failure of a package you didn’t
even know existed can render your own code unusable.
<a class=anchor href="#test_the_dependency"><h2 id="test_the_dependency">Test the dependency</h2></a>
<p>
The inspection process should include running a package’s own tests.
If the package passes the inspection and you decide to make your
project depend on it,
the next step should be to write new tests focused on the functionality
needed by your application.
These tests often start out as short standalone programs
written to make sure you can understand the package’s API
and that it does what you think it does.
(If you can’t or it doesn’t, turn back now!)
It is worth then taking the extra effort to turn those programs
into automated tests that can be run against newer versions of the package.
If you find a bug and have a potential fix,
you’ll want to be able to rerun these project-specific tests
easily, to make sure that the fix did not break anything else.
<p>
It is especially worth exercising the likely problem areas
identified by the
basic inspection.
For Code Search, we knew from past experience
that PCRE sometimes took
a long time to execute certain regular expression searches.
Our initial plan was to have separate thread pools for
“simple” and “complicated” regular expression searches.
One of the first tests we ran was a benchmark,
comparing <code>pcregrep</code> with a few other <code>grep</code> implementations.
When we found that, for one basic test case,
<code>pcregrep</code> was 70X slower than the
fastest <code>grep</code> available,
we started to rethink our plan to use PCRE.
Even though we eventually dropped PCRE entirely,
that benchmark remains in our code base today.
<a class=anchor href="#abstract_the_dependency"><h2 id="abstract_the_dependency">Abstract the dependency</h2></a>
<p>
Depending on a package is a decision that you are likely to
revisit later.
Perhaps updates will take the package in a new direction.
Perhaps serious security problems will be found.
Perhaps a better option will come along.
For all these reasons, it is worth the effort
to make it easy to migrate your project to a new dependency.
<p>
If the package will be used from many places in your project’s source code,
migrating to a new dependency would require making
changes to all those different source locations.
Worse, if the package will be exposed in your own project’s API,
migrating to a new dependency would require making
changes in all the code calling your API,
which you might not control.
To avoid these costs, it makes sense to
define an interface of your own,
along with a thin wrapper implementing that
interface using the dependency.
Note that the wrapper should include only
what your project needs from the dependency,
not everything the dependency offers.
Ideally, that allows you to
substitute a different, equally appropriate dependency later,
by changing only the wrapper.
Migrating your per-project tests to use the new interface
tests the interface and wrapper implementation
and also makes it easy to test any potential replacements
for the dependency.
<p>
For Code Search, we developed an abstract <code>Regexp</code> class
that defined the interface Code Search needed from any
regular expression engine.
Then we wrote a thin wrapper around PCRE
implementing that interface.
The indirection made it easy to test alternate libraries,
and it kept us from accidentally introducing knowledge
of PCRE internals into the rest of the source tree.
That in turn ensured that it would be easy to switch
to a different dependency if needed.
<a class=anchor href="#isolate_the_dependency"><h2 id="isolate_the_dependency">Isolate the dependency</h2></a>
<p>
It may also be appropriate to isolate a dependency
at run-time, to limit the possible damage caused by bugs in it.
For example, Google Chrome allows users to add dependencies—extension code—to the browser.
When Chrome launched in 2008, it introduced
the critical feature (now standard in all browsers)
of isolating each extension in a sandbox running in a separate
operating-system process.<a class=footnote id=body15 href="#note15"><sup>15</sup></a>
An exploitable bug in an badly-written extension
therefore did not automatically have access to the entire memory
of the browser itself
and could be stopped from making inappropriate system calls.<a class=footnote id=body16 href="#note16"><sup>16</sup></a>
For Code Search, until we dropped PCRE entirely,
our plan was to isolate at least the PCRE parser
in a similar sandbox.
Today,
another option would be a lightweight hypervisor-based sandbox
like gVisor.<a class=footnote id=body17 href="#note17"><sup>17</sup></a>
Isolating dependencies
reduces the associated risks of running that code.
<p>
Even with these examples and other off-the-shelf options,
run-time isolation of suspect code is still too difficult and rarely done.
True isolation would require a completely memory-safe language,
with no escape hatch into untyped code.
That’s challenging not just in entirely unsafe languages like C and C++
but also in languages that provide restricted unsafe operations,
like Java when including JNI, or like Go, Rust, and Swift
when including their “unsafe” features.
Even in a memory-safe language like JavaScript,
code often has access to far more than it needs.
In November 2018, the latest version of the NPM package <code>event-stream</code>,
which provided a functional streaming API for JavaScript events,
was discovered to contain obfuscated malicious code that had been
added two and a half months earlier.
The code, which harvested large Bitcoin wallets from users of the Copay mobile app,
was accessing system resources entirely unrelated to processing
event streams.<a class=footnote id=body18 href="#note18"><sup>18</sup></a>
One of many possible defenses to this kind of problem
would be to better restrict what dependencies can access.
<a class=anchor href="#avoid_the_dependency"><h2 id="avoid_the_dependency">Avoid the dependency</h2></a>
<p>
If a dependency seems too risky and you can’t find
a way to isolate it, the best answer may be to avoid it entirely,
or at least to avoid the parts you’ve identified as most problematic.
<p>
For example, as we better understood the risks and costs associated
with PCRE, our plan for Google Code Search evolved
from “use PCRE directly,” to “use PCRE but sandbox the parser,”
to “write a new regular expression parser but keep the PCRE execution engine,”
to “write a new parser and connect it to a different, more efficient open-source execution engine.”
Later we rewrote the execution engine as well,
so that no dependencies were left,
and we open-sourced the result: RE2.<a class=footnote id=body19 href="#note19"><sup>19</sup></a>
<p>
If you only need a
tiny fraction of a dependency,
it may be simplest to make a copy of what you need
(preserving appropriate copyright and other legal notices, of course).
You are taking on responsibility for fixing bugs, maintenance, and so on,
but you’re also completely isolated from the larger risks.
The Go developer community has a proverb about this:
“A little copying is better than a little dependency.”<a class=footnote id=body20 href="#note20"><sup>20</sup></a>
<a class=anchor href="#upgrade_the_dependency"><h2 id="upgrade_the_dependency">Upgrade the dependency</h2></a>
<p>
For a long time, the conventional wisdom about software was “if it ain’t broke, don’t fix it.”
Upgrading carries a chance of introducing new bugs;
without a corresponding reward—like a new feature you need—why take the risk?
This analysis ignores two costs.
The first is the cost of the eventual upgrade.
In software, the difficulty of making code changes does not scale linearly:
making ten small changes is less work and easier to get right
than making one equivalent large change.
The second is the cost of discovering already-fixed bugs the hard way.
Especially in a security context, where known bugs are actively exploited,
every day you wait is another day that attackers can break in.
<p>
For example, consider the year 2017 at Equifax, as recounted by executives
in detailed congressional testimony.<a class=footnote id=body21 href="#note21"><sup>21</sup></a>
On March 7, a new vulnerability in Apache Struts was disclosed, and a patched version was released.
On March 8, Equifax received a notice from US-CERT about the need to update
any uses of Apache Struts.
Equifax ran source code and network scans on March 9 and March 15, respectively;
neither scan turned up a particular group of public-facing web servers.
On May 13, attackers found the servers that Equifax’s security teams could not.
They used the Apache Struts vulnerability to breach Equifax’s network
and then steal detailed personal and financial information
about 148 million people
over the next two months.
Equifax finally noticed the breach on July 29
and publicly disclosed it on September 4.
By the end of September, Equifax’s CEO, CIO, and CSO had all resigned,
and a congressional investigation was underway.
<p>
Equifax’s experience drives home the point that
although dependency managers know the versions they are using at build time,
you need other arrangements to track that information
through your production deployment process.
For the Go language, we are experimenting with automatically
including a version manifest in every binary, so that deployment
processes can scan binaries for dependencies that need upgrading.
Go also makes that information available at run-time, so that
servers can consult databases of known bugs and self-report to
monitoring software when they are in need of upgrades.
<p>
Upgrading promptly is important, but upgrading means
adding new code to your project,
which should mean updating your evaluation of the risks
of using the dependency based on the new version.
As minimum, you’d want to skim the diffs showing the
changes being made from the current version to the
upgraded versions,
or at least read the release notes,
to identify the most likely areas of concern in the upgraded code.
If a lot of code is changing, so that the diffs are difficult to digest,
that is also information you can incorporate into your
risk assessment update.
<p>
You’ll also want to re-run the tests you’ve written
that are specific to your project,
to make sure the upgraded package is at least as suitable
for the project as the earlier version.
It also makes sense to re-run the package’s own tests.
If the package has its own dependencies,
it is entirely possible that your project’s configuration
uses different versions of those dependencies
(either older or newer ones) than the package’s authors use.
Running the package’s own tests can quickly identify problems
specific to your configuration.
<p>
Again, upgrades should not be completely automatic.
You need to verify that the upgraded versions are appropriate for
your environment before deploying them.<a class=footnote id=body22 href="#note22"><sup>22</sup></a>
<p>
If your upgrade process includes re-running the
integration and qualification tests you’ve already written for the dependency,
so that you are likely to identify new problems before they reach production,
then, in most cases, delaying an upgrade is riskier than upgrading quickly.
<p>
The window for security-critical upgrades is especially short.
In the aftermath of the Equifax breach, forensic security teams found
evidence that attackers (perhaps different ones)
had successfully exploited the Apache Struts
vulnerability on the affected servers on March 10, only three days
after it was publicly disclosed, but they’d only run a single <code>whoami</code> command.
<a class=anchor href="#watch_your_dependencies"><h2 id="watch_your_dependencies">Watch your dependencies</h2></a>
<p>
Even after all that work, you’re not done tending your dependencies.
It’s important to continue to monitor them and perhaps even
re-evaluate your decision to use them.
<p>
First, make sure that you keep using the
specific package versions you think you are.
Most dependency managers now make it easy or even automatic
to record the cryptographic hash of the expected source code
for a given package version
and then to check that hash when re-downloading the package
on another computer or in a test environment.
This ensures that your build use
the same dependency source code you inspected and tested.
These kinds of checks
prevented the <code>event-stream</code> attacker,
described earlier, from silently inserting
malicious code in the already-released version 3.3.5.
Instead, the attacker had to create a new version, 3.3.6,
and wait for people to upgrade (without looking closely at the changes).
<p>
It is also important to watch for new indirect dependencies creeping in:
upgrades can easily introduce new packages
upon which the success of your project now depends.
They deserve your attention as well.
In the case of <code>event-stream</code>, the malicious code was
hidden in a different package, <code>flatmap-stream</code>,
which the new <code>event-stream</code> release added as a
new dependency.
<p>
Creeping dependencies can also affect the size of your project.
During the development of Google’s Sawzall<a class=footnote id=body23 href="#note23"><sup>23</sup></a>—a JIT’ed
logs processing language—the authors discovered at various times that
the main interpreter binary contained not just Sawzall’s JIT
but also (unused) PostScript, Python, and JavaScript interpreters.
Each time, the culprit turned out to be unused dependencies
declared by some library Sawzall did depend on,
combined with the fact that Google’s build system
eliminated any manual effort needed to start using a new dependency..
This kind of error is the reason that the Go language
makes importing an unused package a compile-time error.
<p>
Upgrading is a natural time to revisit the decision to use a dependency that’s changing.
It’s also important to periodically revisit any dependency that <i>isn’t</i> changing.
Does it seem plausible that there are no security problems or other bugs to fix?
Has the project been abandoned?
Maybe it’s time to start planning to replace that dependency.
<p>
It’s also important to recheck the security history of each dependency.
For example, Apache Struts disclosed different major remote code execution
vulnerabilities in 2016, 2017, and 2018.
Even if you have a list of all the servers that run it and
update them promptly, that track record might make you rethink using it at all.
<a class=anchor href="#conclusion"><h2 id="conclusion">Conclusion</h2></a>
<p>
Software reuse is finally here,
and I don’t mean to understate its benefits:
it has brought an enormously positive transformation
for software developers.
Even so, we’ve accepted this transformation without
completely thinking through the potential consequences.
The old reasons for trusting dependencies are becoming less valid
at exactly the same time we have more dependencies than ever.
<p>
The kind of critical examination of specific dependencies that
I outlined in this article is a significant amount of work
and remains the exception rather than the rule.
But I doubt there are any developers who actually
make the effort to do this for every possible new dependency.
I have only done a subset of them for a subset of my own dependencies.
Most of the time the entirety of the decision is “let’s see what happens.”
Too often, anything more than that seems like too much effort.
<p>
But the Copay and Equifax attacks are clear warnings of
real problems in the way we consume software dependencies today.
We should not ignore the warnings.
I offer three broad recommendations.
<ol>
<li>
<p>
<i>Recognize the problem.</i>
If nothing else, I hope this article has convinced
you that there is a problem here worth addressing.
We need many people to focus significant effort on solving it.
<li>
<p>
<i>Establish best practices for today.</i>
We need to establish best practices for managing dependencies
using what’s available today.
This means working out processes that evaluate, reduce, and track risk,
from the original adoption decision through to production use.
In fact, just as some engineers specialize in testing,
it may be that we need engineers who specialize in managing dependencies.
<li>
<p>
<i>Develop better dependency technology for tomorrow.</i>
Dependency managers have essentially eliminated the cost of
downloading and installing a dependency.
Future development effort should focus on reducing the cost of
the kind of evaluation and maintenance necessary to use
a dependency.
For example, package discovery sites might work to find
more ways to allow developers to share their findings.
Build tools should, at the least, make it easy to run a package’s own tests.
More aggressively,
build tools and package management systems could also work together
to allow package authors to test new changes against all public clients
of their APIs.
Languages should also provide easy ways to isolate a suspect package.</ol>
<p>
There’s a lot of good software out there.
Let’s work together to find out how to reuse it safely.
<p>
<a class=anchor href="#references"><h2 id="references">References</h2></a>
<ol>
<li><a name=note1></a>
Rachel Potvin and Josh Levenberg, “Why Google Stores Billions of Lines of Code in a Single Repository,” <i>Communications of the ACM</i> 59(7) (July 2016), pp. 78-87. <a href="https://doi.org/10.1145/2854146">https://doi.org/10.1145/2854146</a> <a class=back href="#body1">(⇡)</a>
<li><a name=note2></a>
Russ Cox, “Go & Versioning,” February 2018. <a href="https://research.swtch.com/vgo">https://research.swtch.com/vgo</a> <a class=back href="#body2">(⇡)</a>
<li><a name=note3></a>
Ken Thompson, “Reflections on Trusting Trust,” <i>Communications of the ACM</i> 27(8) (August 1984), pp. 761–763. <a href="https://doi.org/10.1145/358198.358210">https://doi.org/10.1145/358198.358210</a> <a class=back href="#body3">(⇡)</a>
<li><a name=note4></a>
GNU Project, “GNU General Public License, version 1,” February 1989. <a href="https://www.gnu.org/licenses/old-licenses/gpl-1.0.html">https://www.gnu.org/licenses/old-licenses/gpl-1.0.html</a> <a class=back href="#body4">(⇡)</a>
<li><a name=note5></a>
Titus Winters, “SD-8: Standard Library Compatibility,” C++ Standing Document, August 2018. <a href="https://isocpp.org/std/standing-documents/sd-8-standard-library-compatibility">https://isocpp.org/std/standing-documents/sd-8-standard-library-compatibility</a> <a class=back href="#body5">(⇡)</a>
<li><a name=note6></a>
Go Project, “Go 1 and the Future of Go Programs,” September 2013. <a href="https://golang.org/doc/go1compat">https://golang.org/doc/go1compat</a> <a class=back href="#body6">(⇡)</a>
<li><a name=note7></a>
Facebook, “Infer: A tool to detect bugs in Java and C/C++/Objective-C code before it ships.” <a href="https://fbinfer.com/">https://fbinfer.com/</a> <a class=back href="#body7">(⇡)</a>
<li><a name=note8></a>
“SpotBugs: Find bugs in Java Programs.” <a href="https://spotbugs.github.io/">https://spotbugs.github.io/</a> <a class=back href="#body8">(⇡)</a>
<li><a name=note9></a>
D. Richard Hipp, “How SQLite is Tested.” <a href="https://www.sqlite.org/testing.html">https://www.sqlite.org/testing.html</a> <a class=back href="#body9">(⇡)</a>
<li><a name=note10></a>
Alexander Potapenko, “Testing Chromium: ThreadSanitizer v2, a next-gen data race detector,” April 2014. <a href="https://blog.chromium.org/2014/04/testing-chromium-threadsanitizer-v2.html">https://blog.chromium.org/2014/04/testing-chromium-threadsanitizer-v2.html</a> <a class=back href="#body10">(⇡)</a>
<li><a name=note11></a>
NIST, “National Vulnerability Database – Search and Statistics.” <a href="https://nvd.nist.gov/vuln/search">https://nvd.nist.gov/vuln/search</a> <a class=back href="#body11">(⇡)</a>
<li><a name=note12></a>
Russ Cox, “Regular Expression Matching with a Trigram Index, or How Google Code Search Worked,” January 2012. <a href="https://swtch.com/~rsc/regexp/regexp4.html">https://swtch.com/~rsc/regexp/regexp4.html</a> <a class=back href="#body12">(⇡)</a>
<li><a name=note13></a>
Google, “Google Open Source: Using Third-Party Licenses.” <a href="https://opensource.google.com/docs/thirdparty/licenses/#banned">https://opensource.google.com/docs/thirdparty/licenses/#banned</a> <a class=back href="#body13">(⇡)</a>
<li><a name=note14></a>
Nathan Willis, “A single Node of failure,” LWN, March 2016. <a href="https://lwn.net/Articles/681410/">https://lwn.net/Articles/681410/</a> <a class=back href="#body14">(⇡)</a>
<li><a name=note15></a>
Charlie Reis, “Multi-process Architecture,” September 2008. <a href="https://blog.chromium.org/2008/09/multi-process-architecture.html">https://blog.chromium.org/2008/09/multi-process-architecture.html</a> <a class=back href="#body15">(⇡)</a>
<li><a name=note16></a>
Adam Langley, “Chromium’s seccomp Sandbox,” August 2009. <a href="https://www.imperialviolet.org/2009/08/26/seccomp.html">https://www.imperialviolet.org/2009/08/26/seccomp.html</a> <a class=back href="#body16">(⇡)</a>
<li><a name=note17></a>
Nicolas Lacasse, “Open-sourcing gVisor, a sandboxed container runtime,” May 2018. <a href="https://cloud.google.com/blog/products/gcp/open-sourcing-gvisor-a-sandboxed-container-runtime">https://cloud.google.com/blog/products/gcp/open-sourcing-gvisor-a-sandboxed-container-runtime</a> <a class=back href="#body17">(⇡)</a>
<li><a name=note18></a>
Adam Baldwin, “Details about the event-stream incident,” November 2018. <a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident">https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident</a> <a class=back href="#body18">(⇡)</a>
<li><a name=note19></a>
Russ Cox, “RE2: a principled approach to regular expression matching,” March 2010. <a href="https://opensource.googleblog.com/2010/03/re2-principled-approach-to-regular.html">https://opensource.googleblog.com/2010/03/re2-principled-approach-to-regular.html</a> <a class=back href="#body19">(⇡)</a>
<li><a name=note20></a>
Rob Pike, “Go Proverbs,” November 2015. <a href="https://go-proverbs.github.io/">https://go-proverbs.github.io/</a> <a class=back href="#body20">(⇡)</a>
<li><a name=note21></a>
U.S. House of Representatives Committee on Oversight and Government Reform, “The Equifax Data Breach,” Majority Staff Report, 115th Congress, December 2018. <a href="https://republicans-oversight.house.gov/wp-content/uploads/2018/12/Equifax-Report.pdf">https://republicans-oversight.house.gov/wp-content/uploads/2018/12/Equifax-Report.pdf</a> <a class=back href="#body21">(⇡)</a>
<li><a name=note22></a>
Russ Cox, “The Principles of Versioning in Go,” GopherCon Singapore, May 2018. <a href="https://www.youtube.com/watch?v=F8nrpe0XWRg">https://www.youtube.com/watch?v=F8nrpe0XWRg</a> <a class=back href="#body22">(⇡)</a>
<li><a name=note23></a>
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan, “Interpreting the Data: Parallel Analysis with Sawzall,” <i>Scientific Programming Journal</i>, vol. 13 (2005). <a href="https://doi.org/10.1155/2005/962135">https://doi.org/10.1155/2005/962135</a> <a class=back href="#body23">(⇡)</a></ol>
<a class=anchor href="#coda"><h2 id="coda">Coda</h2></a>
<p>
A version of this post was published
in <a href="https://queue.acm.org/detail.cfm?id=3344149">ACM Queue</a>
(March-April 2019) and then <a href="https://dl.acm.org/doi/pdf/10.1145/3347446">Communications of the ACM</a>
(August 2019) under the title “Surviving Software Dependencies.”
What is Software Engineering?tag:research.swtch.com,2012:research.swtch.com/vgo-eng2018-05-30T10:00:00-04:002018-05-30T10:02:00-04:00What is software engineering and what does Go mean by it? (Go & Versioning, Part 9)
<p>
Nearly all of Go’s distinctive design decisions
were aimed at making software engineering simpler and easier.
We've said this often.
The canonical reference is Rob Pike's 2012 article,
“<a href="https://talks.golang.org/2012/splash.article">Go at Google: Language Design in the Service of Software Engineering</a>.”
But what is software engineering?<blockquote>
<p>
<i>Software engineering is what happens to programming
<br>when you add time and other programmers.</i></blockquote>
<p>
Programming means getting a program working.
You have a problem to solve, you write some Go code,
you run it, you get your answer, you’re done.
That’s programming,
and that's difficult enough by itself.
But what if that code has to keep working, day after day?
What if five other programmers need to work on the code too?
Then you start to think about version control systems,
to track how the code changes over time
and to coordinate with the other programmers.
You add unit tests,
to make sure bugs you fix are not reintroduced over time,
not by you six months from now,
and not by that new team member who’s unfamiliar with the code.
You think about modularity and design patterns,
to divide the program into parts that team members
can work on mostly independently.
You use tools to help you find bugs earlier.
You look for ways to make programs as clear as possible,
so that bugs are less likely.
You make sure that small changes can be tested quickly,
even in large programs.
You're doing all of this because your programming
has turned into software engineering.
<p>
(This definition and explanation of software engineering
is my riff on an original theme by my Google colleague Titus Winters,
whose preferred phrasing is “software engineering is programming integrated over time.”
It's worth seven minutes of your time to see
<a href="https://www.youtube.com/watch?v=tISy7EJQPzI&t=8m17s">his presentation of this idea at CppCon 2017</a>,
from 8:17 to 15:00 in the video.)
<p>
As I said earlier,
nearly all of Go’s distinctive design decisions
have been motivated by concerns about software engineering,
by trying to accommodate time and other programmers
into the daily practice of programming.
<p>
For example, most people think that we format Go code with <code>gofmt</code>
to make code look nicer or to end debates among
team members about program layout.
But the <a href="https://groups.google.com/forum/#!msg/golang-nuts/HC2sDhrZW5Y/7iuKxdbLExkJ">most important reason for <code>gofmt</code></a>
is that if an algorithm defines how Go source code is formatted,
then programs, like <code>goimports</code> or <code>gorename</code> or <code>go</code> <code>fix</code>,
can edit the source code more easily,
without introducing spurious formatting changes when writing the code back.
This helps you maintain code over time.
<p>
As another example, Go import paths are URLs.
If code said <code>import</code> <code>"uuid"</code>,
you’d have to ask which <code>uuid</code> package.
Searching for <code>uuid</code> on <a href="https://godoc.org">godoc.org</a> turns up dozens of packages.
If instead the code says <code>import</code> <code>"github.com/pborman/uuid"</code>,
now it’s clear which package we mean.
Using URLs avoids ambiguity
and also reuses an existing mechanism for giving out names,
making it simpler and easier to coordinate with other programmers.
<p>
Continuing the example,
Go import paths are written in Go source files,
not in a separate build configuration file.
This makes Go source files self-contained,
which makes it easier to understand, modify, and copy them.
These decisions, and more, were all made with the goal of
simplifying software engineering.
<p>
In later posts I will talk specifically about why
versions are important for software engineering
and how software engineering concerns motivate
the design changes from dep to vgo.
Go and Dogmatag:research.swtch.com,2012:research.swtch.com/dogma2017-01-09T09:00:00-05:002017-01-09T09:02:00-05:00Programming language dogmatics.
<p>
[<i>Cross-posting from last year’s <a href="https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d05yyde/?context=3&st=ixq5hjko&sh=7affd469">Go contributors AMA</a> on Reddit, because it’s still important to remember.</i>]
<p>
One of the perks of working on Go these past years has been the chance to have many great discussions with other language designers and implementers, for example about how well various design decisions worked out or the common problems of implementing what look like very different languages (for example both Go and Haskell need some kind of “green threads”, so there are more shared runtime challenges than you might expect). In one such conversation, when I was talking to a group of early Lisp hackers, one of them pointed out that these discussions are basically never dogmatic. Designers and implementers remember working through the good arguments on both sides of a particular decision, and they’re often eager to hear about someone else’s experience with what happens when you make that decision differently. Contrast that kind of discussion with the heated arguments or overly zealous statements you sometimes see from users of the same languages. There’s a real disconnect, possibly because the users don’t have the experience of weighing the arguments on both sides and don’t realize how easily a particular decision might have gone the other way.
<p>
Language design and implementation is engineering. We make decisions using evaluations of costs and benefits or, if we must, using predictions of those based on past experience. I think we have an important responsibility to explain both sides of a particular decision, to make clear that the arguments for an alternate decision are actually good ones that we weighed and balanced, and to avoid the suggestion that particular design decisions approach dogma. I hope <a href="https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d05yyde/?context=3&st=ixq5hjko&sh=7affd469">the Reddit AMA</a> as well as discussion on <a href="https://groups.google.com/group/golang-nuts">golang-nuts</a> or <a href="http://stackoverflow.com/questions/tagged/go">StackOverflow</a> or the <a href="https://forum.golangbridge.org/">Go Forum</a> or at <a href="https://golang.org/wiki/Conferences">conferences</a> help with that.
<p>
But we need help from everyone. Remember that none of the decisions in Go are infallible; they’re just our best attempts at the time we made them, not wisdom received on stone tablets. If someone asks why Go does X instead of Y, please try to present the engineering reasons fairly, including for Y, and avoid argument solely by appeal to authority. It’s too easy to fall into the “well that’s just not how it’s done here” trap. And now that I know about and watch for that trap, I see it in nearly every technical community, although some more than others.
A Tour of Acmetag:research.swtch.com,2012:research.swtch.com/acme2012-09-17T11:00:00-04:002012-09-17T11:00:00-04:00A video introduction to Acme, the Plan 9 text editor
<p class="lp">
People I work with recognize my computer easily:
it's the one with nothing but yellow windows and blue bars on the screen.
That's the text editor acme, written by Rob Pike for Plan 9 in the early 1990s.
Acme focuses entirely on the idea of text as user interface.
It's difficult to explain acme without seeing it, though, so I've put together
a screencast explaining the basics of acme and showing a brief programming session.
Remember as you watch the video that the 854x480 screen is quite cramped.
Usually you'd run acme on a larger screen: even my MacBook Air has almost four times
as much screen real estate.
</p>
<center>
<div style="border: 1px solid black; width: 853px; height: 480px;"><iframe width="853" height="480" src="https://www.youtube.com/embed/dP1xVpMPn8M?rel=0" frameborder="0" allowfullscreen></iframe></div>
</center>
<p class=pp>
The video doesn't show everything acme can do, nor does it show all the ways you can use it.
Even small idioms like where you type text to be loaded or executed vary from user to user.
To learn more about acme, read Rob Pike's paper “<a href="/acme.pdf">Acme: A User Interface for Programmers</a>” and then try it.
</p>
<p class=pp>
Acme runs on most operating systems.
If you use <a href="https://9p.io/">Plan 9 from Bell Labs</a>, you already have it.
If you use FreeBSD, Linux, OS X, or most other Unix clones, you can get it as part of <a href="http://swtch.com/plan9port/">Plan 9 from User Space</a>.
If you use Windows, I suggest trying acme as packaged in <a href="http://code.google.com/p/acme-sac/">acme stand alone complex</a>, which is based on the Inferno programming environment.
</p>
<p class=lp><b>Mini-FAQ</b>:
<ul>
<li><i>Q. Can I use scalable fonts?</i> A. On the Mac, yes. If you run <code>acme -f /mnt/font/Monaco/16a/font</code> you get 16-point anti-aliased Monaco as your font, served via <a href="http://swtch.com/plan9port/man/man4/fontsrv.html">fontsrv</a>. If you'd like to add X11 support to fontsrv, I'd be happy to apply the patch.
<li><i>Q. Do I need X11 to build on the Mac?</i> A. No. The build will complain that it cannot build ‘snarfer’ but it should complete otherwise. You probably don't need snarfer.
</ul>
<p class=pp>
If you're interested in history, the predecessor to acme was called help. Rob Pike's paper “<a href="/help.pdf">A Minimalist Global User Interface</a>” describes it. See also “<a href="/sam.pdf">The Text Editor sam</a>”
</p>
<p class=pp>
<i>Correction</i>: the smiley program in the video was written by Ken Thompson.
I got it from Dennis Ritchie, the more meticulous archivist of the pair.
</p>
Minimal Boolean Formulastag:research.swtch.com,2012:research.swtch.com/boolean2011-05-18T00:00:00-04:002011-05-18T00:00:00-04:00Simplify equations with God
<p><style type="text/css">
p { line-height: 150%; }
blockquote { text-align: left; }
pre.alg { font-family: sans-serif; font-size: 100%; margin-left: 60px; }
td, th { padding-left; 5px; padding-right: 5px; vertical-align: top; }
#times td { text-align: right; }
table { padding-top: 1em; padding-bottom: 1em; }
#find td { text-align: center; }
</style>
<p class=lp>
<a href="http://oeis.org/A056287">28</a>.
That's the minimum number of AND or OR operators
you need in order to write any Boolean function of five variables.
<a href="http://alexhealy.net/">Alex Healy</a> and I computed that in April 2010. Until then,
I believe no one had ever known that little fact.
This post describes how we computed it
and how we almost got scooped by <a href="http://research.swtch.com/2011/01/knuth-volume-4a.html">Knuth's Volume 4A</a>
which considers the problem for AND, OR, and XOR.
</p>
<h3>A Naive Brute Force Approach</h3>
<p class=pp>
Any Boolean function of two variables
can be written with at most 3 AND or OR operators: the parity function
on two variables X XOR Y is (X AND Y') OR (X' AND Y), where X' denotes
“not X.” We can shorten the notation by writing AND and OR
like multiplication and addition: X XOR Y = X*Y' + X'*Y.
</p>
<p class=pp>
For three variables, parity is also a hardest function, requiring 9 operators:
X XOR Y XOR Z = (X*Z'+X'*Z+Y')*(X*Z+X'*Z'+Y).
</p>
<p class=pp>
For four variables, parity is still a hardest function, requiring 15 operators:
W XOR X XOR Y XOR Z = (X*Z'+X'*Z+W'*Y+W*Y')*(X*Z+X'*Z'+W*Y+W'*Y').
</p>
<p class=pp>
The sequence so far prompts a few questions. Is parity always a hardest function?
Does the minimum number of operators alternate between 2<sup>n</sup>−1 and 2<sup>n</sup>+1?
</p>
<p class=pp>
I computed these results in January 2001 after hearing
the problem from Neil Sloane, who suggested it as a variant
of a similar problem first studied by Claude Shannon.
</p>
<p class=pp>
The program I wrote to compute a(4) computes the minimum number of
operators for every Boolean function of n variables
in order to find the largest minimum over all functions.
There are 2<sup>4</sup> = 16 settings of four variables, and each function
can pick its own value for each setting, so there are 2<sup>16</sup> different
functions. To make matters worse, you build new functions
by taking pairs of old functions and joining them with AND or OR.
2<sup>16</sup> different functions means 2<sup>16</sup>·2<sup>16</sup> = 2<sup>32</sup> pairs of functions.
</p>
<p class=pp>
The program I wrote was a mangling of the Floyd-Warshall
all-pairs shortest paths algorithm. That algorithm is:
</p>
<pre class="indent alg">
// Floyd-Warshall all pairs shortest path
func compute():
for each node i
for each node j
dist[i][j] = direct distance, or ∞
for each node k
for each node i
for each node j
d = dist[i][k] + dist[k][j]
if d < dist[i][j]
dist[i][j] = d
return
</pre>
<p class=lp>
The algorithm begins with the distance table dist[i][j] set to
an actual distance if i is connected to j and infinity otherwise.
Then each round updates the table to account for paths
going through the node k: if it's shorter to go from i to k to j,
it saves that shorter distance in the table. The nodes are
numbered from 0 to n, so the variables i, j, k are simply integers.
Because there are only n nodes, we know we'll be done after
the outer loop finishes.
</p>
<p class=pp>
The program I wrote to find minimum Boolean formula sizes is
an adaptation, substituting formula sizes for distance.
</p>
<pre class="indent alg">
// Algorithm 1
func compute()
for each function f
size[f] = ∞
for each single variable function f = v
size[f] = 0
loop
changed = false
for each function f
for each function g
d = size[f] + 1 + size[g]
if d < size[f OR g]
size[f OR g] = d
changed = true
if d < size[f AND g]
size[f AND g] = d
changed = true
if not changed
return
</pre>
<p class=lp>
Algorithm 1 runs the same kind of iterative update loop as the Floyd-Warshall algorithm,
but it isn't as obvious when you can stop, because you don't
know the maximum formula size beforehand.
So it runs until a round doesn't find any new functions to make,
iterating until it finds a fixed point.
</p>
<p class=pp>
The pseudocode above glosses over some details, such as
the fact that the per-function loops can iterate over a
queue of functions known to have finite size, so that each
loop omits the functions that aren't
yet known. That's only a constant factor improvement,
but it's a useful one.
</p>
<p class=pp>
Another important detail missing above
is the representation of functions. The most convenient
representation is a binary truth table.
For example,
if we are computing the complexity of two-variable functions,
there are four possible inputs, which we can number as follows.
</p>
<center>
<table>
<tr><th>X <th>Y <th>Value
<tr><td>false <td>false <td>00<sub>2</sub> = 0
<tr><td>false <td>true <td>01<sub>2</sub> = 1
<tr><td>true <td>false <td>10<sub>2</sub> = 2
<tr><td>true <td>true <td>11<sub>2</sub> = 3
</table>
</center>
<p class=pp>
The functions are then the 4-bit numbers giving the value of the
function for each input. For example, function 13 = 1101<sub>2</sub>
is true for all inputs except X=false Y=true.
Three-variable functions correspond to 3-bit inputs generating 8-bit truth tables,
and so on.
</p>
<p class=pp>
This representation has two key advantages. The first is that
the numbering is dense, so that you can implement a map keyed
by function using a simple array. The second is that the operations
“f AND g” and “f OR g” can be implemented using
bitwise operators: the truth table for “f AND g” is the bitwise
AND of the truth tables for f and g.
</p>
<p class=pp>
That program worked well enough in 2001 to compute the
minimum number of operators necessary to write any
1-, 2-, 3-, and 4-variable Boolean function. Each round
takes asymptotically O(2<sup>2<sup>n</sup></sup>·2<sup>2<sup>n</sup></sup>) = O(2<sup>2<sup>n+1</sup></sup>) time, and the number of
rounds needed is O(the final answer). The answer for n=4
is 15, so the computation required on the order of
15·2<sup>2<sup>5</sup></sup> = 15·2<sup>32</sup> iterations of the innermost loop.
That was plausible on the computer I was using at
the time, but the answer for n=5, likely around 30,
would need 30·2<sup>64</sup> iterations to compute, which
seemed well out of reach.
At the time, it seemed plausible that parity was always
a hardest function and that the minimum size would
continue to alternate between 2<sup>n</sup>−1 and 2<sup>n</sup>+1.
It's a nice pattern.
</p>
<h3>Exploiting Symmetry</h3>
<p class=pp>
Five years later, though, Alex Healy and I got to talking about this sequence,
and Alex shot down both conjectures using results from the theory
of circuit complexity. (Theorists!) Neil Sloane added this note to
the <a href="http://oeis.org/history?seq=A056287">entry for the sequence</a> in his Online Encyclopedia of Integer Sequences:
</p>
<blockquote>
<tt>
%E A056287 Russ Cox conjectures that X<sub>1</sub> XOR ... XOR X<sub>n</sub> is always a worst f and that a(5) = 33 and a(6) = 63. But (Jan 27 2006) Alex Healy points out that this conjecture is definitely false for large n. So what is a(5)?
</tt>
</blockquote>
<p class=lp>
Indeed. What is a(5)? No one knew, and it wasn't obvious how to find out.
</p>
<p class=pp>
In January 2010, Alex and I started looking into ways to
speed up the computation for a(5). 30·2<sup>64</sup> is too many
iterations but maybe we could find ways to cut that number.
</p>
<p class=pp>
In general, if we can identify a class of functions f whose
members are guaranteed to have the same complexity,
then we can save just one representative of the class as
long as we recreate the entire class in the loop body.
What used to be:
</p>
<pre class="indent alg">
for each function f
for each function g
visit f AND g
visit f OR g
</pre>
<p class=lp>
can be rewritten as
</p>
<pre class="indent alg">
for each canonical function f
for each canonical function g
for each ff equivalent to f
for each gg equivalent to g
visit ff AND gg
visit ff OR gg
</pre>
<p class=lp>
That doesn't look like an improvement: it's doing all
the same work. But it can open the door to new optimizations
depending on the equivalences chosen.
For example, the functions “f” and “¬f” are guaranteed
to have the same complexity, by <a href="http://en.wikipedia.org/wiki/De_Morgan's_laws">DeMorgan's laws</a>.
If we keep just one of
those two on the lists that “for each function” iterates over,
we can unroll the inner two loops, producing:
</p>
<pre class="indent alg">
for each canonical function f
for each canonical function g
visit f OR g
visit f AND g
visit ¬f OR g
visit ¬f AND g
visit f OR ¬g
visit f AND ¬g
visit ¬f OR ¬g
visit ¬f AND ¬g
</pre>
<p class=lp>
That's still not an improvement, but it's no worse.
Each of the two loops considers half as many functions
but the inner iteration is four times longer.
Now we can notice that half of tests aren't
worth doing: “f AND g” is the negation of
“¬f OR ¬g,” and so on, so only half
of them are necessary.
</p>
<p class=pp>
Let's suppose that when choosing between “f” and “¬f”
we keep the one that is false when presented with all true inputs.
(This has the nice property that <code>f ^ (int32(f) >> 31)</code>
is the truth table for the canonical form of <code>f</code>.)
Then we can tell which combinations above will produce
canonical functions when f and g are already canonical:
</p>
<pre class="indent alg">
for each canonical function f
for each canonical function g
visit f OR g
visit f AND g
visit ¬f AND g
visit f AND ¬g
</pre>
<p class=lp>
That's a factor of two improvement over the original loop.
</p>
<p class=pp>
Another observation is that permuting
the inputs to a function doesn't change its complexity:
“f(V, W, X, Y, Z)” and “f(Z, Y, X, W, V)” will have the same
minimum size. For complex functions, each of the
5! = 120 permutations will produce a different truth table.
A factor of 120 reduction in storage is good but again
we have the problem of expanding the class in the
iteration. This time, there's a different trick for reducing
the work in the innermost iteration.
Since we only need to produce one member of
the equivalence class, it doesn't make sense to
permute the inputs to both f and g. Instead,
permuting just the inputs to f while fixing g
is guaranteed to hit at least one member
of each class that permuting both f and g would.
So we gain the factor of 120 twice in the loops
and lose it once in the iteration, for a net savings
of 120.
(In some ways, this is the same trick we did with “f” vs “¬f.”)
</p>
<p class=pp>
A final observation is that negating any of the inputs
to the function doesn't change its complexity,
because X and X' have the same complexity.
The same argument we used for permutations applies
here, for another constant factor of 2<sup>5</sup> = 32.
</p>
<p class=pp>
The code stores a single function for each equivalence class
and then recomputes the equivalent functions for f, but not g.
</p>
<pre class="indent alg">
for each canonical function f
for each function ff equivalent to f
for each canonical function g
visit ff OR g
visit ff AND g
visit ¬ff AND g
visit ff AND ¬g
</pre>
<p class=lp>
In all, we just got a savings of 2·120·32 = 7680,
cutting the total number of iterations from 30·2<sup>64</sup> = 5×10<sup>20</sup>
to 7×10<sup>16</sup>. If you figure we can do around
10<sup>9</sup> iterations per second, that's still 800 days of CPU time.
</p>
<p class=pp>
The full algorithm at this point is:
</p>
<pre class="indent alg">
// Algorithm 2
func compute():
for each function f
size[f] = ∞
for each single variable function f = v
size[f] = 0
loop
changed = false
for each canonical function f
for each function ff equivalent to f
for each canonical function g
d = size[ff] + 1 + size[g]
changed |= visit(d, ff OR g)
changed |= visit(d, ff AND g)
changed |= visit(d, ff AND ¬g)
changed |= visit(d, ¬ff AND g)
if not changed
return
func visit(d, fg):
if size[fg] != ∞
return false
record fg as canonical
for each function ffgg equivalent to fg
size[ffgg] = d
return true
</pre>
<p class=lp>
The helper function “visit” must set the size not only of its argument fg
but also all equivalent functions under permutation or inversion of the inputs,
so that future tests will see that they have been computed.
</p>
<h3>Methodical Exploration</h3>
<p class=pp>
There's one final improvement we can make.
The approach of looping until things stop changing
considers each function pair multiple times
as their sizes go down. Instead, we can consider functions
in order of complexity, so that the main loop
builds first all the functions of minimum complexity 1,
then all the functions of minimum complexity 2,
and so on. If we do that, we'll consider each function pair at most once.
We can stop when all functions are accounted for.
</p>
<p class=pp>
Applying this idea to Algorithm 1 (before canonicalization) yields:
</p>
<pre class="indent alg">
// Algorithm 3
func compute()
for each function f
size[f] = ∞
for each single variable function f = v
size[f] = 0
for k = 1 to ∞
for each function f
for each function g of size k − size(f) − 1
if size[f AND g] == ∞
size[f AND g] = k
nsize++
if size[f OR g] == ∞
size[f OR g] = k
nsize++
if nsize == 2<sup>2<sup>n</sup></sup>
return
</pre>
<p class=lp>
Applying the idea to Algorithm 2 (after canonicalization) yields:
</p>
<pre class="indent alg">
// Algorithm 4
func compute():
for each function f
size[f] = ∞
for each single variable function f = v
size[f] = 0
for k = 1 to ∞
for each canonical function f
for each function ff equivalent to f
for each canonical function g of size k − size(f) − 1
visit(k, ff OR g)
visit(k, ff AND g)
visit(k, ff AND ¬g)
visit(k, ¬ff AND g)
if nvisited == 2<sup>2<sup>n</sup></sup>
return
func visit(d, fg):
if size[fg] != ∞
return
record fg as canonical
for each function ffgg equivalent to fg
if size[ffgg] != ∞
size[ffgg] = d
nvisited += 2 // counts ffgg and ¬ffgg
return
</pre>
<p class=lp>
The original loop in Algorithms 1 and 2 considered each pair f, g in every
iteration of the loop after they were computed.
The new loop in Algorithms 3 and 4 considers each pair f, g only once,
when k = size(f) + size(g) + 1. This removes the
leading factor of 30 (the number of times we
expected the first loop to run) from our estimation
of the run time.
Now the expected number of iterations is around
2<sup>64</sup>/7680 = 2.4×10<sup>15</sup>. If we can do 10<sup>9</sup> iterations
per second, that's only 28 days of CPU time,
which I can deliver if you can wait a month.
</p>
<p class=pp>
Our estimate does not include the fact that not all function pairs need
to be considered. For example, if the maximum size is 30, then the
functions of size 14 need never be paired against the functions of size 16,
because any result would have size 14+1+16 = 31.
So even 2.4×10<sup>15</sup> is an overestimate, but it's in the right ballpark.
(With hindsight I can report that only 1.7×10<sup>14</sup> pairs need to be considered
but also that our estimate of 10<sup>9</sup> iterations
per second was optimistic. The actual calculation ran for 20 days,
an average of about 10<sup>8</sup> iterations per second.)
</p>
<h3>Endgame: Directed Search</h3>
<p class=pp>
A month is still a long time to wait, and we can do better.
Near the end (after k is bigger than, say, 22), we are exploring
the fairly large space of function pairs in hopes of finding a
fairly small number of remaining functions.
At that point it makes sense to change from the
bottom-up “bang things together and see what we make”
to the top-down “try to make this one of these specific functions.”
That is, the core of the current search is:
</p>
<pre class="indent alg">
for each canonical function f
for each function ff equivalent to f
for each canonical function g of size k − size(f) − 1
visit(k, ff OR g)
visit(k, ff AND g)
visit(k, ff AND ¬g)
visit(k, ¬ff AND g)
</pre>
<p class=lp>
We can change it to:
</p>
<pre class="indent alg">
for each missing function fg
for each canonical function g
for all possible f such that one of these holds
* fg = f OR g
* fg = f AND g
* fg = ¬f AND g
* fg = f AND ¬g
if size[f] == k − size(g) − 1
visit(k, fg)
next fg
</pre>
<p class=lp>
By the time we're at the end, exploring all the possible f to make
the missing functions—a directed search—is much less work than
the brute force of exploring all combinations.
</p>
<p class=pp>
As an example, suppose we are looking for f such that fg = f OR g.
The equation is only possible to satisfy if fg OR g == fg.
That is, if g has any extraneous 1 bits, no f will work, so we can move on.
Otherwise, the remaining condition is that
f AND ¬g == fg AND ¬g. That is, for the bit positions where g is 0, f must match fg.
The other bits of f (the bits where g has 1s)
can take any value.
We can enumerate the possible f values by recursively trying all
possible values for the “don't care” bits.
</p>
<pre class="indent alg">
func find(x, any, xsize):
if size(x) == xsize
return x
while any != 0
bit = any AND −any // rightmost 1 bit in any
any = any AND ¬bit
if f = find(x OR bit, any, xsize) succeeds
return f
return failure
</pre>
<p class=lp>
It doesn't matter which 1 bit we choose for the recursion,
but finding the rightmost 1 bit is cheap: it is isolated by the
(admittedly surprising) expression “any AND −any.”
</p>
<p class=pp>
Given <code>find</code>, the loop above can try these four cases:
</p>
<center>
<table id=find>
<tr><th>Formula <th>Condition <th>Base x <th>“Any” bits
<tr><td>fg = f OR g <td>fg OR g == fg <td>fg AND ¬g <td>g
<tr><td>fg = f OR ¬g <td>fg OR ¬g == fg <td>fg AND g <td>¬g
<tr><td>¬fg = f OR g <td>¬fg OR g == fg <td>¬fg AND ¬g <td>g
<tr><td>¬fg = f OR ¬g <td>¬fg OR ¬g == ¬fg <td>¬fg AND g <td>¬g
</table>
</center>
<p class=lp>
Rewriting the Boolean expressions to use only the four OR forms
means that we only need to write the “adding bits” version of find.
</p>
<p class=pp>
The final algorithm is:
</p>
<pre class="indent alg">
// Algorithm 5
func compute():
for each function f
size[f] = ∞
for each single variable function f = v
size[f] = 0
// Generate functions.
for k = 1 to max_generate
for each canonical function f
for each function ff equivalent to f
for each canonical function g of size k − size(f) − 1
visit(k, ff OR g)
visit(k, ff AND g)
visit(k, ff AND ¬g)
visit(k, ¬ff AND g)
// Search for functions.
for k = max_generate+1 to ∞
for each missing function fg
for each canonical function g
fsize = k − size(g) − 1
if fg OR g == fg
if f = find(fg AND ¬g, g, fsize) succeeds
visit(k, fg)
next fg
if fg OR ¬g == fg
if f = find(fg AND g, ¬g, fsize) succeeds
visit(k, fg)
next fg
if ¬fg OR g == ¬fg
if f = find(¬fg AND ¬g, g, fsize) succeeds
visit(k, fg)
next fg
if ¬fg OR ¬g == ¬fg
if f = find(¬fg AND g, ¬g, fsize) succeeds
visit(k, fg)
next fg
if nvisited == 2<sup>2<sup>n</sup></sup>
return
func visit(d, fg):
if size[fg] != ∞
return
record fg as canonical
for each function ffgg equivalent to fg
if size[ffgg] != ∞
size[ffgg] = d
nvisited += 2 // counts ffgg and ¬ffgg
return
func find(x, any, xsize):
if size(x) == xsize
return x
while any != 0
bit = any AND −any // rightmost 1 bit in any
any = any AND ¬bit
if f = find(x OR bit, any, xsize) succeeds
return f
return failure
</pre>
<p class=lp>
To get a sense of the speedup here, and to check my work,
I ran the program using both algorithms
on a 2.53 GHz Intel Core 2 Duo E7200.
</p>
<center>
<table id=times>
<tr><th> <th colspan=3>————— # of Functions —————<th colspan=2>———— Time ————
<tr><th>Size <th>Canonical <th>All <th>All, Cumulative <th>Generate <th>Search
<tr><td>0 <td>1 <td>10 <td>10
<tr><td>1 <td>2 <td>82 <td>92 <td>< 0.1 seconds <td>3.4 minutes
<tr><td>2 <td>2 <td>640 <td>732 <td>< 0.1 seconds <td>7.2 minutes
<tr><td>3 <td>7 <td>4420 <td>5152 <td>< 0.1 seconds <td>12.3 minutes
<tr><td>4 <td>19 <td>25276 <td>29696 <td>< 0.1 seconds <td>30.1 minutes
<tr><td>5 <td>44 <td>117440 <td>147136 <td>< 0.1 seconds <td>1.3 hours
<tr><td>6 <td>142 <td>515040 <td>662176 <td>< 0.1 seconds <td>3.5 hours
<tr><td>7 <td>436 <td>1999608 <td>2661784 <td>0.2 seconds <td>11.6 hours
<tr><td>8 <td>1209 <td>6598400 <td>9260184 <td>0.6 seconds <td>1.7 days
<tr><td>9 <td>3307 <td>19577332 <td>28837516 <td>1.7 seconds <td>4.9 days
<tr><td>10 <td>7741 <td>50822560 <td>79660076 <td>4.6 seconds <td>[ 10 days ? ]
<tr><td>11 <td>17257 <td>114619264 <td>194279340 <td>10.8 seconds <td>[ 20 days ? ]
<tr><td>12 <td>31851 <td>221301008 <td>415580348 <td>21.7 seconds <td>[ 50 days ? ]
<tr><td>13 <td>53901 <td>374704776 <td>790285124 <td>38.5 seconds <td>[ 80 days ? ]
<tr><td>14 <td>75248 <td>533594528 <td>1323879652 <td>58.7 seconds <td>[ 100 days ? ]
<tr><td>15 <td>94572 <td>667653642 <td>1991533294 <td>1.5 minutes <td>[ 120 days ? ]
<tr><td>16 <td>98237 <td>697228760 <td>2688762054 <td>2.1 minutes <td>[ 120 days ? ]
<tr><td>17 <td>89342 <td>628589440 <td>3317351494 <td>4.1 minutes <td>[ 90 days ? ]
<tr><td>18 <td>66951 <td>468552896 <td>3785904390 <td>9.1 minutes <td>[ 50 days ? ]
<tr><td>19 <td>41664 <td>287647616 <td>4073552006 <td>23.4 minutes <td>[ 30 days ? ]
<tr><td>20 <td>21481 <td>144079832 <td>4217631838 <td>57.0 minutes <td>[ 10 days ? ]
<tr><td>21 <td>8680 <td>55538224 <td>4273170062 <td>2.4 hours <td>2.5 days
<tr><td>22 <td>2730 <td>16099568 <td>4289269630 <td>5.2 hours <td>11.7 hours
<tr><td>23 <td>937 <td>4428800 <td>4293698430 <td>11.2 hours <td>2.2 hours
<tr><td>24 <td>228 <td>959328 <td>4294657758 <td>22.0 hours <td>33.2 minutes
<tr><td>25 <td>103 <td>283200 <td>4294940958 <td>1.7 days <td>4.0 minutes
<tr><td>26 <td>21 <td>22224 <td>4294963182 <td>2.9 days <td>42 seconds
<tr><td>27 <td>10 <td>3602 <td>4294966784 <td>4.7 days <td>2.4 seconds
<tr><td>28 <td>3 <td>512 <td>4294967296 <td>[ 7 days ? ] <td>0.1 seconds
</table>
</center>
<p class=pp>
The bracketed times are estimates based on the work involved: I did not
wait that long for the intermediate search steps.
The search algorithm is quite a bit worse than generate until there are
very few functions left to find.
However, it comes in handy just when it is most useful: when the
generate algorithm has slowed to a crawl.
If we run generate through formulas of size 22 and then switch
to search for 23 onward, we can run the whole computation in
just over half a day of CPU time.
</p>
<p class=pp>
The computation of a(5) identified the sizes of all 616,126
canonical Boolean functions of 5 inputs.
In contrast, there are <a href="http://oeis.org/A000370">just over 200 trillion canonical Boolean functions of 6 inputs</a>.
Determining a(6) is unlikely to happen by brute force computation, no matter what clever tricks we use.
</p>
<h3>Adding XOR</h3>
<p class=pp>We've assumed the use of just AND and OR as our
basis for the Boolean formulas. If we also allow XOR, functions
can be written using many fewer operators.
In particular, a hardest function for the 1-, 2-, 3-, and 4-input
cases—parity—is now trivial.
Knuth examines the complexity of 5-input Boolean functions
using AND, OR, and XOR in detail in <a href="http://www-cs-faculty.stanford.edu/~uno/taocp.html">The Art of Computer Programming, Volume 4A</a>.
Section 7.1.2's Algorithm L is the same as our Algorithm 3 above,
given for computing 4-input functions.
Knuth mentions that to adapt it for 5-input functions one must
treat only canonical functions and gives results for 5-input functions
with XOR allowed.
So another way to check our work is to add XOR to our Algorithm 4
and check that our results match Knuth's.
</p>
<p class=pp>
Because the minimum formula sizes are smaller (at most 12), the
computation of sizes with XOR is much faster than before:
</p>
<center>
<table>
<tr><th> <th><th colspan=5>————— # of Functions —————<th>
<tr><th>Size <th width=10><th>Canonical <th width=10><th>All <th width=10><th>All, Cumulative <th width=10><th>Time
<tr><td align=right>0 <td><td align=right>1 <td><td align=right>10 <td><td align=right>10 <td><td>
<tr><td align=right>1 <td><td align=right>3 <td><td align=right>102 <td><td align=right>112 <td><td align=right>< 0.1 seconds
<tr><td align=right>2 <td><td align=right>5 <td><td align=right>1140 <td><td align=right>1252 <td><td align=right>< 0.1 seconds
<tr><td align=right>3 <td><td align=right>20 <td><td align=right>11570 <td><td align=right>12822 <td><td align=right>< 0.1 seconds
<tr><td align=right>4 <td><td align=right>93 <td><td align=right>109826 <td><td align=right>122648 <td><td align=right>< 0.1 seconds
<tr><td align=right>5 <td><td align=right>366 <td><td align=right>936440 <td><td align=right>1059088 <td><td align=right>0.1 seconds
<tr><td align=right>6 <td><td align=right>1730 <td><td align=right>7236880 <td><td align=right>8295968 <td><td align=right>0.7 seconds
<tr><td align=right>7 <td><td align=right>8782 <td><td align=right>47739088 <td><td align=right>56035056 <td><td align=right>4.5 seconds
<tr><td align=right>8 <td><td align=right>40297 <td><td align=right>250674320 <td><td align=right>306709376 <td><td align=right>24.0 seconds
<tr><td align=right>9 <td><td align=right>141422 <td><td align=right>955812256 <td><td align=right>1262521632 <td><td align=right>95.5 seconds
<tr><td align=right>10 <td><td align=right>273277 <td><td align=right>1945383936 <td><td align=right>3207905568 <td><td align=right>200.7 seconds
<tr><td align=right>11 <td><td align=right>145707 <td><td align=right>1055912608 <td><td align=right>4263818176 <td><td align=right>121.2 seconds
<tr><td align=right>12 <td><td align=right>4423 <td><td align=right>31149120 <td><td align=right>4294967296 <td><td align=right>65.0 seconds
</table>
</center>
<p class=pp>
Knuth does not discuss anything like Algorithm 5,
because the search for specific functions does not apply to
the AND, OR, and XOR basis. XOR is a non-monotone
function (it can both turn bits on and turn bits off), so
there is no test like our “<code>if fg OR g == fg</code>”
and no small set of “don't care” bits to trim the search for f.
The search for an appropriate f in the XOR case would have
to try all f of the right size, which is exactly what Algorithm 4 already does.
</p>
<p class=pp>
Volume 4A also considers the problem of building minimal circuits,
which are like formulas but can use common subexpressions additional times for free,
and the problem of building the shallowest possible circuits.
See Section 7.1.2 for all the details.
</p>
<h3>Code and Web Site</h3>
<p class=pp>
The web site <a href="http://boolean-oracle.swtch.com">boolean-oracle.swtch.com</a>
lets you type in a Boolean expression and gives back the minimal formula for it.
It uses tables generated while running Algorithm 5; those tables and the
programs described in this post are also <a href="http://boolean-oracle.swtch.com/about">available on the site</a>.
</p>
<h3>Postscript: Generating All Permutations and Inversions</h3>
<p class=pp>
The algorithms given above depend crucially on the step
“<code>for each function ff equivalent to f</code>,”
which generates all the ff obtained by permuting or inverting inputs to f,
but I did not explain how to do that.
We already saw that we can manipulate the binary truth table representation
directly to turn <code>f</code> into <code>¬f</code> and to compute
combinations of functions.
We can also manipulate the binary representation directly to
invert a specific input or swap a pair of adjacent inputs.
Using those operations we can cycle through all the equivalent functions.
</p>
<p class=pp>
To invert a specific input,
let's consider the structure of the truth table.
The index of a bit in the truth table encodes the inputs for that entry.
For example, the low bit of the index gives the value of the first input.
So the even-numbered bits—at indices 0, 2, 4, 6, ...—correspond to
the first input being false, while the odd-numbered bits—at indices 1, 3, 5, 7, ...—correspond
to the first input being true.
Changing just that bit in the index corresponds to changing the
single variable, so indices 0, 1 differ only in the value of the first input,
as do 2, 3, and 4, 5, and 6, 7, and so on.
Given the truth table for f(V, W, X, Y, Z) we can compute
the truth table for f(¬V, W, X, Y, Z) by swapping adjacent bit pairs
in the original truth table.
Even better, we can do all the swaps in parallel using a bitwise
operation.
To invert a different input, we swap larger runs of bits.
</p>
<center>
<table>
<tr><th>Function <th width=10> <th>Truth Table (<span style="font-weight: normal;"><code>f</code> = f(V, W, X, Y, Z)</span>)
<tr><td>f(¬V, W, X, Y, Z) <td><td><code>(f&0x55555555)<< 1 | (f>> 1)&0x55555555</code>
<tr><td>f(V, ¬W, X, Y, Z) <td><td><code>(f&0x33333333)<< 2 | (f>> 2)&0x33333333</code>
<tr><td>f(V, W, ¬X, Y, Z) <td><td><code>(f&0x0f0f0f0f)<< 4 | (f>> 4)&0x0f0f0f0f</code>
<tr><td>f(V, W, X, ¬Y, Z) <td><td><code>(f&0x00ff00ff)<< 8 | (f>> 8)&0x00ff00ff</code>
<tr><td>f(V, W, X, Y, ¬Z) <td><td><code>(f&0x0000ffff)<<16 | (f>>16)&0x0000ffff</code>
</table>
</center>
<p class=lp>
Being able to invert a specific input lets us consider all possible
inversions by building them up one at a time.
The <a href="http://oeis.org/A003188">Gray code</a> lets us
enumerate all possible 5-bit input codes while changing only 1 bit at
a time as we move from one input to the next:
</p>
<center>
0, 1, 3, 2, 6, 7, 5, 4, <br>
12, 13, 15, 14, 10, 11, 9, 8, <br>
24, 25, 27, 26, 30, 31, 29, 28, <br>
20, 21, 23, 22, 18, 19, 17, 16
</center>
<p class=lp>
This minimizes
the number of inversions we need: to consider all 32 cases, we only
need 31 inversion operations.
In contrast, visiting the 5-bit input codes in the usual binary order 0, 1, 2, 3, 4, ...
would often need to change multiple bits, like when changing from 3 to 4.
</p>
<p class=pp>
To swap a pair of adjacent inputs, we can again take advantage of the truth table.
For a pair of inputs, there are four cases: 00, 01, 10, and 11. We can leave the
00 and 11 cases alone, because they are invariant under swapping,
and concentrate on swapping the 01 and 10 bits.
The first two inputs change most often in the truth table: each run of 4 bits
corresponds to those four cases.
In each run, we want to leave the first and fourth alone and swap the second and third.
For later inputs, the four cases consist of sections of bits instead of single bits.
</p>
<center>
<table>
<tr><th>Function <th width=10> <th>Truth Table (<span style="font-weight: normal;"><code>f</code> = f(V, W, X, Y, Z)</span>)
<tr><td>f(<b>W, V</b>, X, Y, Z) <td><td><code>f&0x99999999 | (f&0x22222222)<<1 | (f>>1)&0x22222222</code>
<tr><td>f(V, <b>X, W</b>, Y, Z) <td><td><code>f&0xc3c3c3c3 | (f&0x0c0c0c0c)<<1 | (f>>1)&0x0c0c0c0c</code>
<tr><td>f(V, W, <b>Y, X</b>, Z) <td><td><code>f&0xf00ff00f | (f&0x00f000f0)<<1 | (f>>1)&0x00f000f0</code>
<tr><td>f(V, W, X, <b>Z, Y</b>) <td><td><code>f&0xff0000ff | (f&0x0000ff00)<<8 | (f>>8)&0x0000ff00</code>
</table>
</center>
<p class=lp>
Being able to swap a pair of adjacent inputs lets us consider all
possible permutations by building them up one at a time.
Again it is convenient to have a way to visit all permutations by
applying only one swap at a time.
Here Volume 4A comes to the rescue.
Section 7.2.1.2 is titled “Generating All Permutations,” and Knuth delivers
many algorithms to do just that.
The most convenient for our purposes is Algorithm P, which
generates a sequence that considers all permutations exactly once
with only a single swap of adjacent inputs between steps.
Knuth calls it Algorithm P because it corresponds to the
“Plain changes” algorithm used by <a href="http://en.wikipedia.org/wiki/Change_ringing">bell ringers in 17th century England</a>
to ring a set of bells in all possible permutations.
The algorithm is described in a manuscript written around 1653!
</p>
<p class=pp>
We can examine all possible permutations and inversions by
nesting a loop over all permutations inside a loop over all inversions,
and in fact that's what my program does.
Knuth does one better, though: his Exercise 7.2.1.2-20
suggests that it is possible to build up all the possibilities
using only adjacent swaps and inversion of the first input.
Negating arbitrary inputs is not hard, though, and still does
minimal work, so the code sticks with Gray codes and Plain changes.
</p></p>
Zip Files All The Way Downtag:research.swtch.com,2012:research.swtch.com/zip2010-03-18T00:00:00-04:002010-03-18T00:00:00-04:00Did you think it was turtles?
<p><p class=lp>
Stephen Hawking begins <i><a href="http://www.amazon.com/-/dp/0553380168">A Brief History of Time</a></i> with this story:
</p>
<blockquote>
<p class=pp>
A well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast collection of stars called our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: “What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise.” The scientist gave a superior smile before replying, “What is the tortoise standing on?” “You're very clever, young man, very clever,” said the old lady. “But it's turtles all the way down!”
</p>
</blockquote>
<p class=lp>
Scientists today are pretty sure that the universe is not actually turtles all the way down,
but we can create that kind of situation in other contexts.
For example, here we have <a href="http://www.youtube.com/watch?v=Y-gqMTt3IUg">video monitors all the way down</a>
and <a href="http://www.amazon.com/gp/customer-media/product-gallery/0387900926/ref=cm_ciu_pdp_images_all">set theory books all the way down</a>,
and <a href="http://blog.makezine.com/archive/2009/01/thousands_of_shopping_carts_stake_o.html">shopping carts all the way down</a>.
</p>
<p class=pp>
And here's a computer storage equivalent:
look inside <a href="http://swtch.com/r.zip"><code>r.zip</code></a>.
It's zip files all the way down:
each one contains another zip file under the name <code>r/r.zip</code>.
(For the die-hard Unix fans, <a href="http://swtch.com/r.tar.gz"><code>r.tar.gz</code></a> is
gzipped tar files all the way down.)
Like the line of shopping carts, it never ends,
because it loops back onto itself: the zip file contains itself!
And it's probably less work to put together a self-reproducing zip file
than to put together all those shopping carts,
at least if you're the kind of person who would read this blog.
This post explains how.
</p>
<p class=pp>
Before we get to self-reproducing zip files, though,
we need to take a brief detour into self-reproducing programs.
</p>
<h3>Self-reproducing programs</h3>
<p class=pp>
The idea of self-reproducing programs dates back to the 1960s.
My favorite statement of the problem is the one Ken Thompson gave in his 1983 Turing Award address:
</p>
<blockquote>
<p class=pp>
In college, before video games, we would amuse ourselves by posing programming exercises. One of the favorites was to write the shortest self-reproducing program. Since this is an exercise divorced from reality, the usual vehicle was FORTRAN. Actually, FORTRAN was the language of choice for the same reason that three-legged races are popular.
</p>
<p class=pp>
More precisely stated, the problem is to write a source program that, when compiled and executed, will produce as output an exact copy of its source. If you have never done this, I urge you to try it on your own. The discovery of how to do it is a revelation that far surpasses any benefit obtained by being told how to do it. The part about “shortest” was just an incentive to demonstrate skill and determine a winner.
</p>
</blockquote>
<p class=lp>
<b>Spoiler alert!</b>
I agree: if you have never done this, I urge you to try it on your own.
The internet makes it so easy to look things up that it's refreshing
to discover something yourself once in a while.
Go ahead and spend a few days figuring out. This blog will still be here
when you get back.
(If you don't mind the spoilers, the entire <a href="http://cm.bell-labs.com/who/ken/trust.html">Turing award address</a> is worth reading.)
</p>
<center>
<br><br>
<i>(Spoiler blocker.)</i>
<br>
<a href="http://www.robertwechsler.com/projects.html"><img src="http://research.swtch.com/applied_geometry.jpg"></a>
<br>
<i><a href="http://www.robertwechsler.com/projects.html">http://www.robertwechsler.com/projects.html</a></i>
<br><br>
</center>
<p class=pp>
Let's try to write a Python program that prints itself.
It will probably be a <code>print</code> statement, so here's a first attempt,
run at the interpreter prompt:
</p>
<pre class=indent>
>>> print '<span style="color: #005500">hello</span>'
hello
</pre>
<p class=lp>
That didn't quite work. But now we know what the program is, so let's print it:
</p>
<pre class=indent>
>>> print "<span style="color: #005500">print 'hello'</span>"
print 'hello'
</pre>
<p class=lp>
That didn't quite work either. The problem is that when you execute
a simple print statement, it only prints part of itself: the argument to the print.
We need a way to print the rest of the program too.
</p>
<p class=pp>
The trick is to use recursion: you write a string that is the whole program,
but with itself missing, and then you plug it into itself before passing it to print.
</p>
<pre class=indent>
>>> s = '<span style="color: #005500">print %s</span>'; print s % repr(s)
print 'print %s'
</pre>
<p class=lp>
Not quite, but closer: the problem is that the string <code>s</code> isn't actually
the program. But now we know the general form of the program:
<code>s = '<span style="color: #005500">%s</span>'; print s % repr(s)</code>.
That's the string to use.
</p>
<pre class=indent>
>>> s = '<span style="color: #005500">s = %s; print s %% repr(s)</span>'; print s % repr(s)
s = 's = %s; print s %% repr(s)'; print s % repr(s)
</pre>
<p class=lp>
Recursion for the win.
</p>
<p class=pp>
This form of self-reproducing program is often called a <a href="http://en.wikipedia.org/wiki/Quine_(computing)">quine</a>,
in honor of the philosopher and logician W. V. O. Quine,
who discovered the paradoxical sentence:
</p>
<blockquote>
“Yields falsehood when preceded by its quotation”<br>yields falsehood when preceded by its quotation.
</blockquote>
<p class=lp>
The simplest English form of a self-reproducing quine is a command like:
</p>
<blockquote>
Print this, followed by its quotation:<br>“Print this, followed by its quotation:”
</blockquote>
<p class=lp>
There's nothing particularly special about Python that makes quining possible.
The most elegant quine I know is a Scheme program that is a direct, if somewhat inscrutable, translation of that
sentiment:
</p>
<pre class=indent>
((lambda (x) `<span style="color: #005500">(</span>,x <span style="color: #005500">'</span>,x<span style="color: #005500">)</span>)
'<span style="color: #005500">(lambda (x) `(,x ',x))</span>)
</pre>
<p class=lp>
I think the Go version is a clearer translation, at least as far as the quoting is concerned:
</p>
<pre class=indent>
/* Go quine */
package main
import "<span style="color: #005500">fmt</span>"
func main() {
fmt.Printf("<span style="color: #005500">%s%c%s%c\n</span>", q, 0x60, q, 0x60)
}
var q = `<span style="color: #005500">/* Go quine */
package main
import "fmt"
func main() {
fmt.Printf("%s%c%s%c\n", q, 0x60, q, 0x60)
}
var q = </span>`
</pre>
<p class=lp>(I've colored the data literals green throughout to make it clear what is program and what is data.)</p>
<p class=pp>The Go program has the interesting property that, ignoring the pesky newline
at the end, the entire program is the same thing twice (<code>/* Go quine */ ... q = `</code>).
That got me thinking: maybe it's possible to write a self-reproducing program
using only a repetition operator.
And you know what programming language has essentially only a repetition operator?
The language used to encode Lempel-Ziv compressed files
like the ones used by <code>gzip</code> and <code>zip</code>.
</p>
<h3>Self-reproducing Lempel-Ziv programs</h3>
<p class=pp>
Lempel-Ziv compressed data is a stream of instructions with two basic
opcodes: <code>literal(</code><i>n</i><code>)</code> followed by
<i>n</i> bytes of data means write those <i>n</i> bytes into the
decompressed output,
and <code>repeat(</code><i>d</i><code>,</code> <i>n</i><code>)</code>
means look backward <i>d</i> bytes from the current location
in the decompressed output and copy the <i>n</i> bytes you find there
into the output stream.
</p>
<p class=pp>
The programming exercise, then, is this: write a Lempel-Ziv program
using just those two opcodes that prints itself when run.
In other words, write a compressed data stream that decompresses to itself.
Feel free to assume any reasonable encoding for the <code>literal</code>
and <code>repeat</code> opcodes.
For the grand prize, find a program that decompresses to
itself surrounded by an arbitrary prefix and suffix,
so that the sequence could be embedded in an actual <code>gzip</code>
or <code>zip</code> file, which has a fixed-format header and trailer.
</p>
<p class=pp>
<b>Spoiler alert!</b>
I urge you to try this on your own before continuing to read.
It's a great way to spend a lazy afternoon, and you have
one critical advantage that I didn't: you know there is a solution.
</p>
<center>
<br><br>
<i>(Spoiler blocker.)</i>
<br>
<a href=""><img src="http://research.swtch.com/the_best_circular_bike(sbcc_sbma_students_roof).jpg"></a>
<br>
<i><a href="http://www.robertwechsler.com/thebest.html">http://www.robertwechsler.com/thebest.html</a></i>
<br><br>
</center>
<p class=lp>By the way, here's <a href="http://swtch.com/r.gz"><code>r.gz</code></a>, gzip files all the way down.
<pre class=indent>
$ gunzip < r.gz > r
$ cmp r r.gz
$
</pre>
<p class=lp>The nice thing about <code>r.gz</code> is that even broken web browsers
that ordinarily decompress downloaded gzip data before storing it to disk
will handle this file correctly!
</p>
<p class=pp>Enough stalling to hide the spoilers.
Let's use this shorthand to describe Lempel-Ziv instructions:
<code>L</code><i>n</i> and <code>R</code><i>n</i> are
shorthand for <code>literal(</code><i>n</i><code>)</code> and
<code>repeat(</code><i>n</i><code>,</code> <i>n</i><code>)</code>,
and the program assumes that each code is one byte.
<code>L0</code> is therefore the Lempel-Ziv no-op;
<code>L5</code> <code>hello</code> prints <code>hello</code>;
and so does <code>L3</code> <code>hel</code> <code>R1</code> <code>L1</code> <code>o</code>.
</p>
<p class=pp>
Here's a Lempel-Ziv program that prints itself.
(Each line is one instruction.)
</p>
<br>
<center>
<table border=0>
<tr><th></th><th width=30></th><th>Code</th><th width=30></th><th>Output</th></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">no-op</span></i></td><td></td><td><code>L0</code></td><td></td><td></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">no-op</span></i></td><td></td><td><code>L0</code></td><td></td><td></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">no-op</span></i></td><td></td><td><code>L0</code></td><td></td><td></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td><td></td><td><code>L4 <span style="color: #005500">L0 L0 L0 L4</span></code></td><td></td><td><code>L0 L0 L0 L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td><td></td><td><code>R4</code></td><td></td><td><code>L0 L0 L0 L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td><td></td><td><code>L4 <span style="color: #005500">R4 L4 R4 L4</span></code></td><td></td><td><code>R4 L4 R4 L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td><td></td><td><code>R4</code></td><td></td><td><code>R4 L4 R4 L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td><td></td><td><code>L4 <span style="color: #005500">L0 L0 L0 L0</span></code></td><td></td><td><code>L0 L0 L0 L0</code></td></tr>
</table>
</center>
<br>
<p class=lp>
(The two columns Code and Output contain the same byte sequence.)
</p>
<p class=pp>
The interesting core of this program is the 6-byte sequence
<code>L4 R4 L4 R4 L4 R4</code>, which prints the 8-byte sequence <code>R4 L4 R4 L4 R4 L4 R4 L4</code>.
That is, it prints itself with an extra byte before and after.
</p>
<p class=pp>
When we were trying to write the self-reproducing Python program,
the basic problem was that the print statement was always longer
than what it printed. We solved that problem with recursion,
computing the string to print by plugging it into itself.
Here we took a different approach.
The Lempel-Ziv program is
particularly repetitive, so that a repeated substring ends up
containing the entire fragment. The recursion is in the
representation of the program rather than its execution.
Either way, that fragment is the crucial point.
Before the final <code>R4</code>, the output lags behind the input.
Once it executes, the output is one code ahead.
</p>
<p class=pp>
The <code>L0</code> no-ops are plugged into
a more general variant of the program, which can reproduce itself
with the addition of an arbitrary three-byte prefix and suffix:
</p>
<br>
<center>
<table border=0>
<tr><th></th><th width=30></th><th>Code</th><th width=30></th><th>Output</th></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td><td></td><td><code>L4 <span style="color: #005500"><i>aa bb cc</i> L4</span></code></td><td></td><td><code><i>aa bb cc</i> L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td><td></td><td><code>R4</code></td><td></td><td><code><i>aa bb cc</i> L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td><td></td><td><code>L4 <span style="color: #005500">R4 L4 R4 L4</span></code></td><td></td><td><code>R4 L4 R4 L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td><td></td><td><code>R4</code></td><td></td><td><code>R4 L4 R4 L4</code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td><td></td><td><code>L4 <span style="color: #005500">R4 <i>xx yy zz</i></span></code></td><td></td><td><code>R4 <i>xx yy zz</i></code></td></tr>
<tr><td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td><td></td><td><code>R4</code></td><td></td><td><code>R4 <i>xx yy zz</i></code></td></tr>
</table>
</center>
<br>
<p class=lp>
(The byte sequence in the Output column is <code><i>aa bb cc</i></code>, then
the byte sequence from the Code column, then <code><i>xx yy zz</i></code>.)
</p>
<p class=pp>
It took me the better part of a quiet Sunday to get this far,
but by the time I got here I knew the game was over
and that I'd won.
From all that experimenting, I knew it was easy to create
a program fragment that printed itself minus a few instructions
or even one that printed an arbitrary prefix
and then itself, minus a few instructions.
The extra <code>aa bb cc</code> in the output
provides a place to attach such a program fragment.
Similarly, it's easy to create a fragment to attach
to the <code>xx yy zz</code> that prints itself,
minus the first three instructions, plus an arbitrary suffix.
We can use that generality to attach an appropriate
header and trailer.
</p>
<p class=pp>
Here is the final program, which prints itself surrounded by an
arbitrary prefix and suffix.
<code>[P]</code> denotes the <i>p</i>-byte compressed form of the prefix <code>P</code>;
similarly, <code>[S]</code> denotes the <i>s</i>-byte compressed form of the suffix <code>S</code>.
</p>
<br>
<center>
<table border=0>
<tr><th></th><th width=30></th><th>Code</th><th width=30></th><th>Output</th></tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">print prefix</span></i></td>
<td></td>
<td><code>[P]</code></td>
<td></td>
<td><code>P</code></td>
</tr>
<tr>
<td align=right><span style="font-size: 0.8em;"><i>print </i>p<i>+1 bytes</i></span></td>
<td></td>
<td><code>L</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code> <span style="color: #005500">[P] L</span></code><span style="color: #005500"><span style="font-size: 0.8em;"><i>p</i>+1</span></span><code></code></td>
<td></td>
<td><code>[P] L</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code></code></td>
</tr>
<tr>
<td align=right><span style="font-size: 0.8em;"><i>repeat last </i>p<i>+1 printed bytes</i></span></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code></code></td>
<td></td>
<td><code>[P] L</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code></code></td>
</tr>
<tr>
<td align=right><span style="font-size: 0.8em;"><i>print 1 byte</i></span></td>
<td></td>
<td><code>L1 <span style="color: #005500">R</span></code><span style="color: #005500"><span style="font-size: 0.8em;"><i>p</i>+1</span></span><code></code></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code></code></td>
</tr>
<tr>
<td align=right><span style="font-size: 0.8em;"><i>print 1 byte</i></span></td>
<td></td>
<td><code>L1 <span style="color: #005500">L1</span></code></td>
<td></td>
<td><code>L1</code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td>
<td></td>
<td><code>L4 <span style="color: #005500">R</span></code><span style="color: #005500"><span style="font-size: 0.8em;"><i>p</i>+1</span></span><code><span style="color: #005500"> L1 L1 L4</span></code></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code> L1 L1 L4</code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td>
<td></td>
<td><code>R4</code></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>p</i>+1</span><code> L1 L1 L4</code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td>
<td></td>
<td><code>L4 <span style="color: #005500">R4 L4 R4 L4</span></code></td>
<td></td>
<td><code>R4 L4 R4 L4</code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td>
<td></td>
<td><code>R4</code></td>
<td></td>
<td><code>R4 L4 R4 L4</code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">print 4 bytes</span></i></td>
<td></td>
<td><code>L4 <span style="color: #005500">R4 L0 L0 L</span></code><span style="color: #005500"><span style="font-size: 0.8em;"><i>s</i>+1</span></span><code><span style="color: #005500"></span></code></td>
<td></td>
<td><code>R4 L0 L0 L</code><span style="font-size: 0.8em;"><i>s</i>+1</span><code></code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">repeat last 4 printed bytes</span></i></td>
<td></td>
<td><code>R4</code></td>
<td></td>
<td><code>R4 L0 L0 L</code><span style="font-size: 0.8em;"><i>s</i>+1</span><code></code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">no-op</span></i></td>
<td></td>
<td><code>L0</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">no-op</span></i></td>
<td></td>
<td><code>L0</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td align=right><span style="font-size: 0.8em;"><i>print </i>s<i>+1 bytes</i></span></td>
<td></td>
<td><code>L</code><span style="font-size: 0.8em;"><i>s</i>+1</span><code> <span style="color: #005500">R</span></code><span style="color: #005500"><span style="font-size: 0.8em;"><i>s</i>+1</span></span><code><span style="color: #005500"> [S]</span></code></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>s</i>+1</span><code> [S]</code></td>
</tr>
<tr>
<td align=right><span style="font-size: 0.8em;"><i>repeat last </i>s<i>+1 bytes</i></span></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>s</i>+1</span><code></code></td>
<td></td>
<td><code>R</code><span style="font-size: 0.8em;"><i>s</i>+1</span><code> [S]</code></td>
</tr>
<tr>
<td align=right><i><span style="font-size: 0.8em;">print suffix</span></i></td>
<td></td>
<td><code>[S]</code></td>
<td></td>
<td><code>S</code></td>
</tr>
</table>
</center>
<br>
<p class=lp>
(The byte sequence in the Output column is <code><i>P</i></code>, then
the byte sequence from the Code column, then <code><i>S</i></code>.)
</p>
<h3>Self-reproducing zip files</h3>
<p class=pp>
Now the rubber meets the road.
We've solved the main theoretical obstacle to making a self-reproducing
zip file, but there are a couple practical obstacles
still in our way.
</p>
<p class=pp>
The first obstacle is to translate our self-reproducing Lempel-Ziv program,
written in simplified opcodes, into the real opcode encoding.
<a href="http://www.ietf.org/rfc/rfc1951.txt">RFC 1951</a> describes the DEFLATE format used in both gzip and zip: a sequence of blocks, each of which
is a sequence of opcodes encoded using Huffman codes.
Huffman codes assign different length bit strings
to different opcodes,
breaking our assumption above that opcodes have
fixed length.
But wait!
We can, with some care, find a set of fixed-size encodings
that says what we need to be able to express.
</p>
<p class=pp>
In DEFLATE, there are literal blocks and opcode blocks.
The header at the beginning of a literal block is 5 bytes:
</p>
<center>
<img src="http://research.swtch.com/zip1.png">
</center>
<p class=pp>
If the translation of our <code>L</code> opcodes above
are 5 bytes each, the translation of the <code>R</code> opcodes
must also be 5 bytes each, with all the byte counts
above scaled by a factor of 5.
(For example, <code>L4</code> now has a 20-byte argument,
and <code>R4</code> repeats the last 20 bytes of output.)
The opcode block
with a single <code>repeat(20,20)</code> instruction falls well short of
5 bytes:
</p>
<center>
<img src="http://research.swtch.com/zip2.png">
</center>
<p class=lp>Luckily, an opcode block containing two
<code>repeat(20,10)</code> instructions has the same effect and is exactly 5 bytes:
</p>
<center>
<img src="http://research.swtch.com/zip3.png">
</center>
<p class=lp>
Encoding the other sized repeats
(<code>R</code><span style="font-size: 0.8em;"><i>p</i>+1</span> and
<code>R</code><span style="font-size: 0.8em;"><i>s</i>+1</span>)
takes more effort
and some sleazy tricks, but it turns out that
we can design 5-byte codes that repeat any amount
from 9 to 64 bytes.
For example, here are the repeat blocks for 10 bytes and for 40 bytes:
</p>
<center>
<img src="http://research.swtch.com/zip4.png">
<br>
<img src="http://research.swtch.com/zip5.png">
</center>
<p class=lp>
The repeat block for 10 bytes is two bits too short,
but every repeat block is followed by a literal block,
which starts with three zero bits and then padding
to the next byte boundary.
If a repeat block ends two bits short of a byte
but is followed by a literal block, the literal block's
padding will insert the extra two bits.
Similarly, the repeat block for 40 bytes is five bits too long,
but they're all zero bits.
Starting a literal block five bits too late
steals the bits from the padding.
Both of these tricks only work because the last 7 bits of
any repeat block are zero and the bits in the first byte
of any literal block are also zero,
so the boundary isn't directly visible.
If the literal block started with a one bit,
this sleazy trick wouldn't work.
</p>
<p class=pp>The second obstacle is that zip archives (and gzip files)
record a CRC32 checksum of the uncompressed data.
Since the uncompressed data is the zip archive,
the data being checksummed includes the checksum itself.
So we need to find a value <i>x</i> such that writing <i>x</i> into
the checksum field causes the file to checksum to <i>x</i>.
Recursion strikes back.
</p>
<p class=pp>
The CRC32 checksum computation interprets the entire file as a big number and computes
the remainder when you divide that number by a specific constant
using a specific kind of division.
We could go through the effort of setting up the appropriate
equations and solving for <i>x</i>.
But frankly, we've already solved one nasty recursive puzzle
today, and <a href="http://www.youtube.com/watch?v=TQBLTB5f3j0">enough is enough</a>.
There are only four billion possibilities for <i>x</i>:
we can write a program to try each in turn, until it finds one that works.
</p>
<p class=pp>
If you want to recreate these files yourself, there are a
few more minor obstacles, like making sure the tar file is a multiple
of 512 bytes and compressing the rather large zip trailer to
at most 59 bytes so that <code>R</code><span style="font-size: 0.8em;"><i>s</i>+1</span> is
at most <code>R</code><span style="font-size: 0.8em;">64</span>.
But they're just a simple matter of programming.
</p>
<p class=pp>
So there you have it:
<code><a href="http://swtch.com/r.gz">r.gz</a></code> (gzip files all the way down),
<code><a href="http://swtch.com/r.tar.gz">r.tar.gz</a></code> (gzipped tar files all the way down),
and
<code><a href="http://swtch.com/r.zip">r.zip</a></code> (zip files all the way down).
I regret that I have been unable to find any programs
that insist on decompressing these files recursively, ad infinitum.
It would have been fun to watch them squirm, but
it looks like much less sophisticated
<a href="http://en.wikipedia.org/wiki/Zip_bomb">zip bombs</a> have spoiled the fun.
</p>
<p class=pp>
If you're feeling particularly ambitious, here is
<a href="http://swtch.com/rgzip.go">rgzip.go</a>,
the <a href="http://golang.org/">Go</a> program that generated these files.
I wonder if you can create a zip file that contains a gzipped tar file
that contains the original zip file.
Ken Thompson suggested trying to make a zip file that
contains a slightly larger copy of itself, recursively,
so that as you dive down the chain of zip files
each one gets a little bigger.
(If you do manage either of these, please leave a comment.)
</p>
<br>
<p class=lp><font size=-1>P.S. I can't end the post without sharing my favorite self-reproducing program: the one-line shell script <code>#!/bin/cat</code></font>.
</p></p>
</div>
</div>
</div>
UTF-8: Bits, Bytes, and Benefitstag:research.swtch.com,2012:research.swtch.com/utf82010-03-05T00:00:00-05:002010-03-05T00:00:00-05:00The reasons to switch to UTF-8
<p><p class=pp>
UTF-8 is a way to encode Unicode code points—integer values from
0 through 10FFFF—into a byte stream,
and it is far simpler than many people realize.
The easiest way to make it confusing or complicated
is to treat it as a black box, never looking inside.
So let's start by looking inside. Here it is:
</p>
<center>
<table cellspacing=5 cellpadding=0 border=0>
<tr height=10><th colspan=4></th></tr>
<tr><th align=center colspan=2>Unicode code points</th><th width=10><th align=center>UTF-8 encoding (binary)</th></tr>
<tr height=10><td colspan=4></td></tr>
<tr><td align=right>00-7F</td><td>(7 bits)</td><td></td><td align=right>0<i>tuvwxyz</i></td></tr>
<tr><td align=right>0080-07FF</td><td>(11 bits)</td><td></td><td align=right>110<i>pqrst</i> 10<i>uvwxyz</i></td></tr>
<tr><td align=right>0800-FFFF</td><td>(16 bits)</td><td></td><td align=right>1110<i>jklm</i> 10<i>npqrst</i> 10<i>uvwxyz</i></td></tr>
<tr><td align=right valign=top>010000-10FFFF</td><td>(21 bits)</td><td></td><td align=right valign=top>11110<i>efg</i> 10<i>hijklm</i> 10<i>npqrst</i> 10<i>uvwxyz</i></td>
<tr height=10><td colspan=4></td></tr>
</table>
</center>
<p class=lp>
The convenient properties of UTF-8 are all consequences of the choice of encoding.
</p>
<ol>
<li><i>All ASCII files are already UTF-8 files.</i><br>
The first 128 Unicode code points are the 7-bit ASCII character set,
and UTF-8 preserves their one-byte encoding.
</li>
<li><i>ASCII bytes always represent themselves in UTF-8 files. They never appear as part of other UTF-8 sequences.</i><br>
All the non-ASCII UTF-8 sequences consist of bytes
with the high bit set, so if you see the byte 0x7A in a UTF-8 file,
you can be sure it represents the character <code>z</code>.
</li>
<li><i>ASCII bytes are always represented as themselves in UTF-8 files. They cannot be hidden inside multibyte UTF-8 sequences.</i><br>
The ASCII <code>z</code> 01111010 cannot be encoded as a two-byte UTF-8 sequence
11000001 10111010</code>. Code points must be encoded using the shortest
possible sequence.
A corollary is that decoders must detect long-winded sequences as invalid.
In practice, it is useful for a decoder to use the Unicode replacement
character, code point FFFD, as the decoding of an invalid UTF-8 sequence
rather than stop processing the text.
</li>
<li><i>UTF-8 is self-synchronizing.</i><br>
Let's call a byte of the form 10<i>xxxxxx</i>
a continuation byte.
Every UTF-8 sequence is a byte that is not a continuation byte
followed by zero or more continuation bytes.
If you start processing a UTF-8 file at an arbitrary point,
you might not be at the beginning of a UTF-8 encoding,
but you can easily find one: skip over
continuation bytes until you find a non-continuation byte.
(The same applies to scanning backward.)
</li>
<li><i>Substring search is just byte string search.</i><br>
Properties 2, 3, and 4 imply that given a string
of correctly encoded UTF-8, the only way those bytes
can appear in a larger UTF-8 text is when they represent the
same code points. So you can use any 8-bit safe byte at a time
search function, like <code>strchr</code> or <code>strstr</code>, to run the search.
</li>
<li><i>Most programs that handle 8-bit files safely can handle UTF-8 safely.</i><br>
This also follows from Properties 2, 3, and 4.
I say “most” programs, because programs that
take apart a byte sequence expecting one character per byte
will not behave correctly, but very few programs do that.
It is far more common to split input at newline characters,
or split whitespace-separated fields, or do other similar parsing
around specific ASCII characters.
For example, Unix tools like cat, cmp, cp, diff, echo, head, tail, and tee
can process UTF-8 files as if they were plain ASCII files.
Most operating system kernels should also be able to handle
UTF-8 file names without any special arrangement, since the
only operations done on file names are comparisons
and splitting at <code>/</code>.
In contrast, tools like grep, sed, and wc, which inspect arbitrary
individual characters, do need modification.
</li>
<li><i>UTF-8 sequences sort in code point order.</i><br>
You can verify this by inspecting the encodings in the table above.
This means that Unix tools like join, ls, and sort (without options) don't need to handle
UTF-8 specially.
</li>
<li><i>UTF-8 has no “byte order.”</i><br>
UTF-8 is a byte encoding. It is not little endian or big endian.
Unicode defines a byte order mark (BOM) code point FFFE,
which are used to determine the byte order of a stream of
raw 16-bit values, like UCS-2 or UTF-16.
It has no place in a UTF-8 file.
Some programs like to write a UTF-8-encoded BOM
at the beginning of UTF-8 files, but this is unnecessary
(and annoying to programs that don't expect it).
</li>
</ol>
<p class=lp>
UTF-8 does give up the ability to do random
access using code point indices.
Programs that need to jump to the <i>n</i>th
Unicode code point in a file or on a line—text editors are the canonical example—will
typically convert incoming UTF-8 to an internal representation
like an array of code points and then convert back to UTF-8
for output,
but most programs are simpler when written to manipulate UTF-8 directly.
</p>
<p class=pp>
Programs that make UTF-8 more complicated than it needs to be
are typically trying to be too general,
not wanting to make assumptions that might not be true of
other encodings.
But there are good tools to convert other encodings to UTF-8,
and it is slowly becoming the standard encoding:
even the fraction of web pages
written in UTF-8 is
<a href="http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html">nearing 50%</a>.
UTF-8 was explicitly designed
to have these nice properties. Take advantage of them.
</p>
<p class=pp>
For more on UTF-8, see “<a href="https://9p.io/sys/doc/utf.html">Hello World
or
Καλημέρα κόσμε
or
こんにちは 世界</a>,” by Rob Pike
and Ken Thompson, and also this <a href="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">history</a>.
</p>
<br>
<font size=-1>
<p class=lp>
Notes: Property 6 assumes the tools do not strip the high bit from each byte.
Such mangling was common years ago but is very uncommon now.
Property 7 assumes the comparison is done treating
the bytes as unsigned, but such behavior is mandated
by the ANSI C standard for <code>memcmp</code>,
<code>strcmp</code>, and <code>strncmp</code>.
</p>
</font></p>
Computing History at Bell Labstag:research.swtch.com,2012:research.swtch.com/bell-labs2008-04-09T00:00:00-04:002008-04-09T00:00:00-04:00Doug McIlroy’s rememberances
<p><p class=pp>
In 1997, on his retirement from Bell Labs, <a href="http://www.cs.dartmouth.edu/~doug/">Doug McIlroy</a> gave a
fascinating talk about the “<a href="https://web.archive.org/web/20081022192943/http://cm.bell-labs.com/cm/cs/doug97.html"><b>History of Computing at Bell Labs</b></a>.”
Almost ten years ago I transcribed the audio but never did anything with it.
The transcript is below.
</p>
<p class=pp>
My favorite parts of the talk are the description of the bi-quinary decimal relay calculator
and the description of a team that spent over a year tracking down a race condition bug in
a missile detector (reliability was king: today you’d just stamp
“cannot reproduce” and send the report back).
But the whole thing contains many fantastic stories.
It’s well worth the read or listen.
I also like his recollection of programming using cards: “It’s the kind of thing you can be nostalgic about, but it wasn’t actually fun.”
</p>
<p class=pp>
For more information, Bernard D. Holbrook and W. Stanley Brown’s 1982
technical report
“<a href="cstr99.pdf">A History of Computing Research at Bell Laboratories (1937-1975)</a>”
covers the earlier history in more detail.
</p>
<p><i>Corrections added August 19, 2009. Links updated May 16, 2018.</i></p>
<p><i>Update, December 19, 2020.</i> The original audio files disappeared along with the rest of the Bell Labs site some time ago, but I discovered a saved copy on one of my computers: [<a href="mcilroy97history.mp3">MP3</a> | <a href="mcilroy97history.rm">original RealAudio</a>].
I also added a few corrections and notes from Doug McIlroy, dated 2015 [sic].</p>
<br>
<br>
<p class=lp><b>Transcript</b></p>
<p class=pp>
Computing at Bell Labs is certainly an outgrowth of the
<a href="https://web.archive.org/web/20080622172015/http://cm.bell-labs.com/cm/ms/history/history.html">mathematics department</a>, which grew from that first hiring
in 1897, G A Campbell. When Bell Labs was formally founded
in 1925, what it had been was the engineering department
of Western Electric.
When it was formally founded in 1925,
almost from the beginning there was a math department with Thornton Fry as the department head, and if you look at some of Fry’s work, it turns out that
he was fussing around in 1929 with trying to discover
information theory. It didn’t actually gel until twenty years later with Shannon.</p>
<p class=pp><span style="font-size: 0.7em;">1:10</span>
Of course, most of the mathematics at that time was continuous.
One was interested in analyzing circuits and propagation. And indeed, this is what led to the growth of computing in Bell Laboratories. The computations could not all be done symbolically. There were not closed form solutions. There was lots of numerical computation done.
The math department had a fair stable of computers,
which in those days meant people. [laughter]</p>
<p class=pp><span style="font-size: 0.7em;">2:00</span>
And in the late ’30s, <a href="http://en.wikipedia.org/wiki/George_Stibitz">George Stibitz</a> had an idea that some of
the work that they were doing on hand calculators might be
automated by using some of the equipment that the Bell System
was installing in central offices, namely relay circuits.
He went home, and on his kitchen table, he built out of relays
a binary arithmetic circuit. He decided that binary was really
the right way to compute.
However, when he finally came to build some equipment,
he determined that binary to decimal conversion and
decimal to binary conversion was a drag, and he didn’t
want to put it in the equipment, and so he finally built
in 1939, a relay calculator that worked in decimal,
and it worked in complex arithmetic.
Do you have a hand calculator now that does complex arithmetic?
Ten-digit, I believe, complex computations: add, subtract,
multiply, and divide.
The I/O equipment was teletypes, so essentially all the stuff to make such
machines out of was there.
Since the I/O was teletypes, it could be remotely accessed,
and there were in fact four stations in the West Street Laboratories
of Bell Labs. West Street is down on the left side of Manhattan.
I had the good fortune to work there one summer, right next to a
district where you’re likely to get bowled over by rolling beeves hanging from racks or tumbling cabbages. The building is still there. It’s called <a href="http://query.nytimes.com/gst/fullpage.html?res=950DE3DB1F38F931A35751C0A96F948260">Westbeth Apartments</a>. It’s now an artist’s colony.</p>
<p class=pp><span style="font-size: 0.7em;">4:29</span>
Anyway, in West Street, there were four separate remote stations from which the complex calculator could be accessed. It was not time sharing. You actually reserved your time on the machine, and only one of the four terminals worked at a time.
In 1940, this machine was shown off to the world at the AMS annual convention, which happened to be held in Hanover at Dartmouth that year, and mathematicians could wonder at remote computing, doing computation on an electromechanical calculator at 300 miles away.</p>
<p class=pp><span style="font-size: 0.7em;">5:22</span>
Stibitz went on from there to make a whole series of relay machines. Many of them were made for the government during the war. They were named, imaginatively, Mark I through Mark VI.
I have read some of his patents. They’re kind of fun. One is a patent on conditional transfer. [laughter] And how do you do a conditional transfer?
Well these gadgets were, the relay calculator was run from your fingers, I mean the complex calculator.
The later calculators, of course, if your fingers were a teletype, you could perfectly well feed a paper tape in,
because that was standard practice. And these later machines were intended really to be run more from paper tape.
And the conditional transfer was this: you had two teletypes, and there’s a code that says "time to read from the other teletype". Loops were of course easy to do. You take paper and [laughter; presumably Doug curled a piece of paper to form a physical loop].
These machines never got to the point of having stored programs.
But they got quite big. I saw, one of them was here in 1954, and I did see it, behind glass, and if you’ve ever seen these machines in the, there’s one in the Franklin Institute in Philadelphia, and there’s one in the Science Museum in San Jose, you know these machines that drop balls that go wandering sliding around and turning battle wheels and ringing bells and who knows what. It kind of looked like that.
It was a very quiet room, with just a little clicking of relays, which is what a central office used to be like. It was the one air-conditioned room in Murray Hill, I think. This machine ran, the Mark VI, well I think that was the Mark V, the Mark VI actually went to Aberdeen.
This machine ran for a good number of years, probably six, eight.
And it is said that it never made an undetected error. [laughter]</p>
<p class=pp><span style="font-size: 0.7em;">8:30</span>
What that means is that it never made an error that it did not diagnose itself and stop.
Relay technology was very very defensive. The telephone switching system had to work. It was full of self-checking,
and so were the calculators, so were the calculators that Stibitz made.</p>
<p class=pp><span style="font-size: 0.7em;">9:04</span>
Arithmetic was done in bi-quinary, a two out of five representation for decimal integers, and if there weren’t exactly two out of five relays activated it would stop.
This machine ran unattended over the weekends. People would
bring their tapes in, and the operator would paste everybody’s tapes together.
There was a beginning of job code on the tape and there was also a time indicator.
If the machine ran out of time, it automatically stopped and went to the next job. If the machine caught itself in an error, it backed up to the current job and tried it again.
They would load this machine on Friday night, and on Monday morning, all the tapes, all the entries would be available on output tapes.</p>
<p class=pp>Question: I take it they were using a different representation for loops
and conditionals by then.</p>
<p class=pp>Doug: Loops were done actually by they would run back and forth across the tape now, on this machine.</p>
<p class=pp><span style="font-size: 0.7em;">10:40</span>
Then came the transistor in ’48.
At Whippany, they actually had a transistorized computer, which was a respectable minicomputer, a box about this big, running in 1954, it ran from 1954 to 1956 solidly as a test run.
The notion was that this computer might fly in an airplane.
And during that two-year test run, one diode failed.
In 1957, this machine called <a href="http://www.cedmagic.com/history/tradic-transistorized.html">TRADIC</a>, did in fact fly in an airplane, but to the best of my knowledge, that machine was a demonstration machine. It didn’t turn into a production machine.
About that time, we started buying commercial machines.
It’s wonderful to think about the set of different architectures that existed in that time. The first machine we got was called a <a href="http://www.columbia.edu/acis/history/cpc.html">CPC from IBM</a>. And all it was was a big accounting machine with a very special plugboard on the side that provided an interpreter for doing ten-digit decimal arithmetic, including
opcodes for the trig functions and square root.</p>
<p class=pp><span style="font-size: 0.7em;">12:30</span>
It was also not a computer as we know it today,
because it wasn’t stored program, it had twenty-four memory locations as I recall, and it took its program instead of from tapes, from cards. This was not a total advantage. A tape didn’t get into trouble if you dropped it on the floor. [laughter].
CPC, the operator would stand in front of it, and there, you
would go through loops by taking cards out, it took human intervention, to take the cards out of the output of the card reader and put them in the ?top?. I actually ran some programs on the CPC ?...?. It’s the kind of thing you can be nostalgic about, but it wasn’t actually fun.
[laughter]</p>
<p class=pp><span style="font-size: 0.7em;">13:30</span>
The next machine was an <a href="http://www.columbia.edu/acis/history/650.html">IBM 650</a>, and here, this was a stored program, with the memory being on drum. There was no operating system for it. It came with a manual: this is what the machine does. And Michael Wolontis made an interpreter called the <a href="http://hopl.info/showlanguage2.prx?exp=6497">L1 interpreter</a> for this machine, so you could actually program in, the manual told you how to program in binary, and L1 allowed you to give something like 10 for add and 9 for subtract, and program in decimal instead. And of course that machine required interesting optimization, because it was a nice thing if the next program step were stored somewhere -- each program step had the address of the following step in it, and you would try to locate them around the drum so to minimize latency. So there were all kinds of optimizers around, but I don’t think Bell Labs made ?...? based on this called "soap" from Carnegie Mellon. That machine didn’t last very long. Fortunately, a machine with core memory came out from IBM in about ’56, the 704. Bell Labs was a little slow in getting one, in ’58. Again, the machine came without an operating system.
In fact, but it did have Fortran, which really changed the world.
It suddenly made it easy to write programs. But the way Fortran came from IBM, it came with a thing called the Fortran Stop Book.
This was a list of what happened, a diagnostic would execute the halt instruction, the operator would go read the panel lights and discover where the machine had stopped, you would then go look up in the stop book what that meant.
Bell Labs, with George Mealy and Gwen Hanson, made an operating system, and one of the things they did was to bring the stop book to heel. They took the compiler, replaced all the stop instructions with jumps to somewhere, and allowed the program instead of stopping to go on to the next trial.
By the time I arrived at Bell Labs in 1958, this thing was running nicely.</p>
<p class=pp>[<i>McIlroy comments, 2015</i>: I’m pretty sure I was wrong in saying Mealy and Hanson brought
the stop book to heel. They built the OS, but I believe Dolores
Leagus tamed Fortran. (Dolores was the most accurate programmer I
ever knew. She’d write 2000 lines of code before testing a single
line--and it would work.)]</p>
<p class=pp><span style="font-size: 0.7em;">16:36</span>
Bell Labs continued to be a major player in operating systems.
This was called BESYS. BE was the SHARE abbreviation for Bell Labs. Each company that belonged to Share, which was the IBM users group, ahd a two letter abbreviation. It’s hard to imagine taking all the computer users now and giving them a two-letter abbreviation. BESYS went through many generations, up to BESYS 5, I believe. Each one with innovations. IBM delivered a machine, the 7090, in 1960. This machine had interrupts in it, but IBM didn’t use them. But BESYS did. And that sent IBM back to the drawing board to make it work. [Laughter]</p>
<p class=pp><span style="font-size: 0.7em;">17:48</span>
Rob Pike: It also didn’t have memory protection.</p>
<p class=pp>Doug: It didn’t have memory protection either, and a lot of people actually got IBM to put memory protection in the 7090, so that one could leave the operating system resident in the presence of a wild program, an idea that the PC didn’t discover until, last year or something like that. [laughter]</p>
<p class=pp>Big players then, <a href="http://en.wikipedia.org/wiki/Richard_Hamming">Dick Hamming</a>, a name that I’m sure everybody knows,
was sort of the numerical analysis guru, and a seer.
He liked to make outrageous predictions. He predicted in 1960, that half of Bell Labs was going to be busy doing something with computers eventually.
?...? exaggerating some ?...? abstract in his thought.
He was wrong.
Half was a gross underestimate. Dick Hamming retired twenty years ago, and just this June he completed his full twenty years term in the Navy, which entitles him again to retire from the Naval Postgraduate Institute in Monterey. Stibitz, incidentally died, I think within the last year.
He was doing medical instrumentation at Dartmouth essentially, near the end.</p>
<p class=pp>[<i>McIlroy comments, 2015</i>: I’m not sure what exact unintelligible words I uttered about Dick
Hamming. When he predicted that half the Bell Labs budget would
be related to computing in a decade, people scoffed in terms like
“that’s just Dick being himelf, exaggerating for effect”.]</p>
<p class=pp><span style="font-size: 0.7em;">20:00</span>
Various problems intrigued, besides the numerical problems, which in fact were stock in trade, and were the real justification for buying machines, until at least the ’70s I would say. But some non-numerical problems had begun to tickle the palette of the math department. Even G A Campbell got interested in graph theory, the reason being he wanted to think of all the possible ways you could take the three wires and the various parts of the telephone and connect them together, and try permutations to see what you could do about reducing sidetone by putting things into the various parts of the circuit, and devised every possibly way of connecting the telephone up. And that was sort of the beginning of combinatorics at Bell Labs. John Reardon, a mathematician parlayed this into a major subject. Two problems which are now deemed as computing problems, have intrigued the math department for a very long time, and those are the minimum spanning tree problem, and the wonderfully ?comment about Joe Kruskal, laughter?</p>
<p class=pp><span style="font-size: 0.7em;">21:50</span>
And in the 50s Bob Prim and Kruskal, who I don’t think worked on the Labs at that point, invented algorithms for the minimum spanning tree. Somehow or other, computer scientists usually learn these algorithms, one of the two at least, as Dijkstra’s algorithm, but he was a latecomer.</p>
<p class=pp>[<i>McIlroy comments, 2015</i>:
I erred in attributing Dijkstra’s algorithm to Prim and Kruskal. That
honor belongs to yet a third member of the math department: Ed
Moore. (Dijkstra’s algorithm is for shortest path, not spanning
tree.)]</p>
<p class=pp>Another pet was the traveling salesman. There’s been a long list of people at Bell Labs who played with that: Shen Lin and Ron Graham and David Johnson and dozens more, oh and ?...?. And then another problem is the Steiner minimum spanning tree, where you’re allowed to add points to the graph. Every one of these problems grew, actually had a justification in telephone billing. One jurisdiction or another would specify that the way you bill for a private line network was in one jurisdiction by the minimum spanning tree. In another jurisdiction, by the traveling salesman route. NP-completeness wasn’t a word in the vocabulary of lawmakers [laughter]. And the <a href="http://en.wikipedia.org/wiki/Steiner_tree">Steiner problem</a> came up because customers discovered they could beat the system by inventing offices in the middle of Tennessee that had nothing to do with their business, but they could put the office at a Steiner point and reduce their phone bill by adding to what the service that the Bell System had to give them. So all of these problems actually had some justification in billing besides the fun.</p>
<p class=pp><span style="font-size: 0.7em;">24:15</span>
Come the 60s, we actually started to hire people for computing per se. I was perhaps the third person who was hired with a Ph.D. to help take care of the computers and I’m told that the then director and head of the math department, Hendrick Bode, had said to his people, "yeah, you can hire this guy, instead of a real mathematician, but what’s he gonna be doing in five years?" [laughter]</p>
<p class=pp><span style="font-size: 0.7em;">25:02</span>
Nevertheless, we started hiring for real in about ’67. Computer science got split off from the math department. I had the good fortune to move into the office that I’ve been in ever since then. Computing began to make, get a personality of its own. One of the interesting people that came to Bell Labs for a while was Hao Wang. Is his name well known? [Pause] One nod. Hao Wang was a philosopher and logician, and we got a letter from him in England out of the blue saying "hey you know, can I come and use your computers? I have an idea about theorem proving." There was theorem proving in the air in the late 50s, and it was mostly pretty thin stuff. Obvious that the methods being proposed wouldn’t possibly do anything more difficult than solve tic-tac-toe problems by enumeration. Wang had a notion that he could mechanically prove theorems in the style of Whitehead and Russell’s great treatise Principia Mathematica in the early patr of the century. He came here, learned how to program in machine language, and took all of Volume I of Principia Mathematica --
if you’ve ever hefted Principia, well that’s about all it’s good for, it’s a real good door stop. It’s really big. But it’s theorem after theorem after theorem in propositional calculus. Of course, there’s a decision procedure for propositional calculus, but he was proving them more in the style of Whitehead and Russell. And when he finally got them all coded and put them into the computer, he proved the entire contents of this immense book in eight minutes.
This was actually a neat accomplishment. Also that was the beginning of all the language theory. We hired people like <a href="http://www1.cs.columbia.edu/~aho/">Al Aho</a> and <a href="http://infolab.stanford.edu/~ullman/">Jeff Ullman</a>, who probed around every possible model of grammars, syntax, and all of the things that are now in the standard undergraduate curriculum, were pretty well nailed down here, on syntax and finite state machines and so on were pretty well nailed down in the 60s. Speaking of finite state machines, in the 50s, both Mealy and Moore, who have two of the well-known models of finite state machines, were here.</p>
<p class=pp><span style="font-size: 0.7em;">28:40</span>
During the 60s, we undertook an enormous development project in the guise of research, which was <a href="http://www.multicians.org/">MULTICS</a>, and it was the notion of MULTICS was computing was the public utility of the future. Machines were very expensive, and ?indeed? like you don’t own your own electric generator, you rely on the power company to do generation for you, and it was seen that this was a good way to do computing -- time sharing -- and it was also recognized that shared data was a very good thing. MIT pioneered this and Bell Labs joined in on the MULTICS project, and this occupied five years of system programming effort, until Bell Labs pulled out, because it turned out that MULTICS was too ambitious for the hardware at the time, and also with 80 people on it was not exactly a research project. But, that led to various people who were on the project, in particular <a href="http://en.wikipedia.org/wiki/Ken_Thompson">Ken Thompson</a> -- right there -- to think about how to -- <a href="http://en.wikipedia.org/wiki/Dennis_Ritchie">Dennis Ritchie</a> and Rudd Canaday were in on this too -- to think about how you might make a pleasant operating system with a little less resources.</p>
<p class=pp><span style="font-size: 0.7em;">30:30</span>
And Ken found -- this is a story that’s often been told, so I won’t go into very much of unix -- Ken found an old machine cast off in the corner, the <a href="http://en.wikipedia.org/wiki/GE-600_series">PDP-7</a>, and put up this little operating system on it, and we had immense <a href="http://en.wikipedia.org/wiki/GE-600_series">GE635</a> available at the comp center at the time, and I remember as the department head, muscling in to use this little computer to be, to get to be Unix’s first user, customer, because it was so much pleasanter to use this tiny machine than it was to use the big and capable machine in the comp center. And of course the rest of the story is known to everybody and has affected all college campuses in the country.</p>
<p class=pp><span style="font-size: 0.7em;">31:33</span>
Along with the operating system work, there was a fair amount of language work done at Bell Labs. Often curious off-beat languages. One of my favorites was called <a href="http://hopl.murdoch.edu.au/showlanguage.prx?exp=6937&language=BLODI-B">Blodi</a>, B L O D I, a block diagram compiler by Kelly and Vyssotsky. Perhaps the most interesting early uses of computers in the sense of being unexpected, were those that came from the acoustics research department, and what the Blodi compiler was invented in the acoustic research department for doing digital simulations of sample data system. DSPs are classic sample data systems,
where instead of passing analog signals around, you pass around streams of numerical values. And Blodi allowed you to say here’s a delay unit, here’s an amplifier, here’s an adder, the standard piece parts for a sample data system, and each one was described on a card, and with description of what it’s wired to. It was then compiled into one enormous single straight line loop for one time step. Of course, you had to rearrange the code because some one part of the sample data system would feed another and produce really very efficient 7090 code for simulating sample data systems.
By in large, from that time forth, the acoustic department stopped making hardware. It was much easier to do signal processing digitally than previous ways that had been analog. Blodi had an interesting property. It was the only programming language I know where -- this is not my original observation, Vyssotsky said -- where you could take the deck of cards, throw it up the stairs, and pick them up at the bottom of the stairs, feed them into the computer again, and get the same program out. Blodi had two, aside from syntax diagnostics, it did have one diagnostic when it would fail to compile, and that was "somewhere in your system is a loop that consists of all delays or has no delays" and you can imagine how they handled that.</p>
<p class=pp><span style="font-size: 0.7em;">35:09</span>
Another interesting programming language of the 60s was <a href="http://www.knowltonmosaics.com/">Ken Knowlten</a>’s <a href="http://beflix.com/beflix.php">Beflix</a>. This was for making movies on something with resolution kind of comparable to 640x480, really coarse, and the
programming notion in here was bugs. You put on your grid a bunch of bugs, and each bug carried along some data as baggage,
and then you would do things like cellular automata operations. You could program it or you could kind of let it go by itself. If a red bug is next to a blue bug then it turns into a green bug on the following step and so on. <span style="font-size: 0.7em;">36:28</span> He and Lillian Schwartz made some interesting abstract movies at the time. It also did some interesting picture processing. One wonderful picture of a reclining nude, something about the size of that blackboard over there, all made of pixels about a half inch high each with a different little picture in it, picked out for their density, and so if you looked at it close up it consisted of pickaxes and candles and dogs, and if you looked at it far enough away, it was a <a href="http://blog.the-eg.com/2007/12/03/ken-knowlton-mosaics/">reclining nude</a>. That picture got a lot of play all around the country.</p>
<p class=pp>Lorinda Cherry: That was with Leon, wasn’t it? That was with <a href="https://en.wikipedia.org/wiki/Leon_Harmon">Leon Harmon</a>.</p>
<p class=pp>Doug: Was that Harmon?</p>
<p class=pp>Lorinda: ?...?</p>
<p class=pp>Doug: Harmon was also an interesting character. He did more things than pictures. I’m glad you reminded me of him. I had him written down here. Harmon was a guy who among other things did a block diagram compiler for writing a handwriting recognition program. I never did understand how his scheme worked, and in fact I guess it didn’t work too well. [laughter]
It didn’t do any production ?things? but it was an absolutely
immense sample data circuit for doing handwriting recognition.
Harmon’s most famous work was trying to estimate the information content in a face. And every one of these pictures which are a cliche now, that show a face digitized very coarsely, go back to Harmon’s <a href="https://web.archive.org/web/20080807162812/http://www.doubletakeimages.com/history.htm">first psychological experiments</a>, when he tried to find out how many bits of picture he needed to try to make a face recognizable. He went around and digitized about 256 faces from Bell Labs and did real psychological experiments asking which faces could be distinguished from other ones. I had the good fortune to have one of the most distinguishable faces, and consequently you’ll find me in freshman psychology texts through no fault of my own.</p>
<p class=pp><span style="font-size: 0.7em;">39:15</span>
Another thing going on the 60s was the halting beginning here of interactive computing. And again the credit has to go to the acoustics research department, for good and sufficient reason. They wanted to be able to feed signals into the machine, and look at them, and get them back out. They bought yet another weird architecture machine called the <a href="http://www.piercefuller.com/library/pb250.html">Packard Bell 250</a>, where the memory elements were <a href="http://en.wikipedia.org/wiki/Delay_line_memory">mercury delay lines</a>.</p>
<p class=pp>Question: Packard Bell?</p>
<p class=pp>Doug: Packard Bell, same one that makes PCs today.</p>
<p class=pp><span style="font-size: 0.7em;">40:10</span>
They hung this off of the comp center 7090 and put in a scheme for quickly shipping jobs into the job stream on the 7090. The Packard Bell was the real-time terminal that you could play with and repair stuff, ?...? off the 7090, get it back, and then you could play it. From that grew some graphics machines also, built by ?...? et al. And it was one of the old graphics machines
in fact that Ken picked up to build Unix on.</p>
<p class=pp><span style="font-size: 0.7em;">40:55</span>
Another thing that went on in the acoustics department was synthetic speech and music. <a href="http://csounds.com/mathews/index.html">Max Mathews</a>, who was the the director of the department has long been interested in computer music. In fact since retirement he spent a lot of time with Pierre Boulez in Paris at a wonderful institute with lots of money simply for making synthetic music. He had a language called Music 5. Synthetic speech or, well first of all simply speech processing was pioneered particularly by <a href="http://en.wikipedia.org/wiki/John_Larry_Kelly,_Jr">John Kelly</a>. I remember my first contact with speech processing. It was customary for computer operators, for the benefit of computer operators, to put a loudspeaker on the low bit of some register on the machine, and normally the operator would just hear kind of white noise. But if you got into a loop, suddenly the machine would scream, and this signal could be used to the operator "oh the machines in a loop. Go stop it and go on to the next job." I remember feeding them an Ackermann’s function routine once. [laughter] They were right. It was a silly loop. But anyway. One day, the operators were ?...?. The machine started singing. Out of the blue. “Help! I’m caught in a loop.”. [laughter] And in a broad Texas accent, which was the recorded voice of John Kelly.</p>
<p class=pp><span style="font-size: 0.7em;">43:14</span>
However. From there Kelly went on to do some speech synthesis. Of course there’s been a lot more speech synthesis work done since, by <span style="font-size: 0.7em;">43:31</span> folks like Cecil Coker, Joe Olive. But they produced a record, which unfortunately I can’t play because records are not modern anymore. And everybody got one in the Bell Labs Record, which is a magazine, contained once a record from the acoustics department, with both speech and music and one very famous combination where the computer played and sang "A Bicycle Built For Two".</p>
<p class=pp>?...?</p>
<p class=pp><span style="font-size: 0.7em;">44:32</span>
At the same time as all this stuff is going on here, needless
to say computing is going on in the rest of the Labs. it was about early 1960 when the math department lost its monopoly on computing machines and other people started buying them too, but for switching. The first experiments with switching computers were operational in around 1960. They were planned for several years prior to that; essentially as soon as the transistor was invented, the making of electronic rather than electromechanical switching machines was anticipated. Part of the saga of the switching machines is cheap memory. These machines had enormous memories -- thousands of words. [laughter] And it was said that the present worth of each word of memory that programmers saved across the Bell System was something like eleven dollars, as I recall. And it was worthwhile to struggle to save some memory. Also, programs were permanent. You were going to load up the switching machine with switching program and that was going to run. You didn’t change it every minute or two. And it would be cheaper to put it in read only memory than in core memory. And there was a whole series of wild read-only memories, both tried and built.
The first experimental Essex System had a thing called the flying spot store
which was large photographic plates with bits on them and CRTs projecting on the plates and you would detect underneath on the photodetector whether the bit was set or not. That was the program store of Essex. The program store of the first ESS systems consisted of twistors, which I actually am not sure I understand to this day, but they consist of iron wire with a copper wire wrapped around them and vice versa. There were also experiments with an IC type memory called the waffle iron. Then there was a period when magnetic bubbles were all the rage. As far as I know, although microelectronics made a lot of memory, most of the memory work at Bell Labs has not had much effect on ?...?. Nice tries though.</p>
<p class=pp><span style="font-size: 0.7em;">48:28</span>
Another thing that folks began to work on was the application of (and of course, right from the start) computers to data processing. When you owned equipment scattered through every street in the country, and you have a hundred million customers, and you have bills for a hundred million transactions a day, there’s really some big data processing going on. And indeed in the early 60s, AT&T was thinking of making its own data processing computers solely for billing. Somehow they pulled out of that, and gave all the technology to IBM, and one piece of that technology went into use in high end equipment called tractor tapes. Inch wide magnetic tapes that would be used for a while.</p>
<p class=pp><span style="font-size: 0.7em;">49:50</span>
By in large, although Bell Labs has participated until fairly recently in data processing in quite a big way, AT&T never really quite trusted the Labs to do it right because here is where the money is. I can recall one occasion when during strike of temporary employees, a fill-in employee like from the
Laboratories and so on, lost a day’s billing tape in Chicago. And that was a million dollars. And that’s generally speaking the money people did not until fairly recently trust Bell Labs to take good care of money, even though they trusted the Labs very well to make extremely reliable computing equipment for switches.
The downtime on switches is still spectacular by any industry standards. The design for the first ones was two hours down in 40 years, and the design was met. Great emphasis on reliability and redundancy, testing.</p>
<p class=pp><span style="font-size: 0.7em;">51:35</span>
Another branch of computing was for the government. The whole Whippany Laboratories [time check]
Whippany, where we took on contracts for the government particularly in the computing era in anti-missile defense, missile defense, and underwater sound. Missile defense was a very impressive undertaking. It was about in the early ’63 time frame when it was estimated the amount of computation to do a reasonable job of tracking incoming missiles would be 30 M floating point operations a second. In the day of the Cray that doesn’t sound like a great lot, but it’s more than your high end PCs can do. And the machines were supposed to be reliable. They designed the machines at Whippany, a twelve-processor multiprocessor, to no specs, enormously rugged, one watt transistors. This thing in real life performed remarkably well. There were sixty-five missile shots, tests across the Pacific Ocean ?...? and Lorinda Cherry here actually sat there waiting for them to come in. [laughter] And only a half dozen of them really failed. As a measure of the interest in reliability, one of them failed apparently due to processor error. Two people were assigned to look at the dumps, enormous amounts of telemetry and logging information were taken during these tests, which are truly expensive to run. Two people were assigned to look at the dumps. A year later they had not found the trouble. The team was beefed up. They finally decided that there was a race condition in one circuit. They then realized that this particular kind of race condition had not been tested for in all the simulations. They went back and simulated the entire hardware system to see if its a remote possibility of any similar cases, found twelve of them, and changed the hardware. But to spend over a year looking for a bug is a sign of what reliability meant.</p>
<p class=pp><span style="font-size: 0.7em;">54:56</span>
Since I’m coming up on the end of an hour, one could go on and on and on,</p>
<p class=pp>Crowd: go on, go on. [laughter]</p>
<p class=pp><span style="font-size: 0.7em;">55:10</span>
Doug: I think I’d like to end up by mentioning a few of the programs that have been written at Bell Labs that I think are most surprising. Of course there are lots of grand programs that have been written.</p>
<p class=pp>I already mentioned the block diagram compiler.</p>
<p class=pp>Another really remarkable piece of work was <a href="eqn.pdf">eqn</a>, the equation
typesetting language, which has been imitated since, by Lorinda Cherry and Brian Kernighan. The notion of taking an auditory syntax, the way people talk about equations, but only talk, this was not borrowed from any written notation before, getting the auditory one down on paper, that was very successful and surprising.</p>
<p class=pp>Another of my favorites, and again Lorinda Cherry was in this one, with Bob Morris, was typo. This was a program for finding spelling errors. It didn’t know the first thing about spelling. It would read a document, measure its statistics, and print out the words of the document in increasing order of what it thought the likelihood of that word having come from the same statistical source as the document. The words that did not come from the statistical source of the document were likely to be typos, and now I mean typos as distinct from spelling errors, where you actually hit the wrong key. Those tend to be off the wall, whereas phonetic spelling errors you’ll never find. And this worked remarkably well. Typing errors would come right up to the top of the list. A really really neat program.</p>
<p class=pp><span style="font-size: 0.7em;">57:50</span>
Another one of my favorites was by Brenda Baker called <a href="http://doi.acm.org/10.1145/800168.811545">struct</a>, which took Fortran programs and converted them into a structured programming language called Ratfor, which was Fortran with C syntax. This seemed like a possible undertaking, like something you do by the seat of the pants and you get something out. In fact, folks at Lockheed had done things like that before. But Brenda managed to find theorems that said there’s really only one form, there’s a canonical form into which you can structure a Fortran program, and she did this. It took your Fortran program, completely mashed it, put it out perhaps in almost certainly a different order than it was in Fortran connected by GOTOs, without any GOTOs, and the really remarkable thing was that authors of the program who clearly knew the way they wrote it in the first place, preferred it after it had been rearranged by Brendan. I was astonished at the outcome of that project.</p>
<p class=pp><span style="font-size: 0.7em;">59:19</span>
Another first that happened around here was by Fred Grampp, who got interested in computer security. One day he decided he would make a program for sniffing the security arrangements on a computer, as a service: Fred would never do anything crooked. [laughter] This particular program did a remarkable job, and founded a whole minor industry within the company. A department was set up to take this idea and parlay it, and indeed ever since there has been some improvement in the way computer centers are managed, at least until we got Berkeley Unix.</p>
<p class=pp><span style="font-size: 0.7em;">60:24</span>
And the last interesting program that I have time to mention is one by <a href="http://www.cs.jhu.edu/~kchurch/">Ken Church</a>. He was dealing with -- text processing has always been a continuing ?...? of the research, and in some sense it has an application to our business because we’re handling speech, but he got into consulting with the department in North Carolina that has to translate manuals. There are millions of pages of manuals in the Bell System and its successors, and ever since we’ve gone global, these things had to get translated into many languages.</p>
<p class=pp><span style="font-size: 0.7em;">61:28</span>
To help in this, he was making tools which would put up on the screen, graphed on the screen quickly a piece of text and its translation, because a translator, particularly a technical translator, wants to know, the last time we mentioned this word how was it translated. You don’t want to be creative in translating technical text. You’d like to be able to go back into the archives and pull up examples of translated text. And the neat thing here is the idea for how do you align texts in two languages. You’ve got the original, you’ve got the translated one, how do you bring up on the screen, the two sentences that go together? And the following scam worked beautifully. This is on western languages. <span style="font-size: 0.7em;">62:33</span>
Simply look for common four letter tetragrams, four letter combinations between the two and as best as you can, line them up as nearly linearly with the lengths of the two types as possible. And this <a href="church-tetragram.pdf">very simple idea</a> works like storm. Something for nothing. I like that.</p>
<p class=pp><span style="font-size: 0.7em;">63:10</span>
The last thing is one slogan that sort of got started with Unix and is just rife within the industry now. Software tools. We were making software tools in Unix before we knew we were, just like the Molière character was amazed at discovering he’d been speaking prose all his life. [laughter] But then <a href="http://www.amazon.com/-/dp/020103669X">Kernighan and Plauger</a> came along and christened what was going on, making simple generally useful and compositional programs to do one thing and do it well and to fit together. They called it software tools, made a book, wrote a book, and this notion now is abroad in the industry. And it really did begin all up in the little attic room where you [points?] sat for many years writing up here.</p>
<p class=pp> Oh I forgot to. I haven’t used any slides. I’ve brought some, but I don’t like looking at bullets and you wouldn’t either, and I forgot to show you the one exhibit I brought, which I borrowed from Bob Kurshan. When Bell Labs was founded, it had of course some calculating machines, and it had one wonderful computer. This. That was bought in 1918. There’s almost no other computing equipment from any time prior to ten years ago that still exists in Bell Labs. This is an <a href="http://infolab.stanford.edu/pub/voy/museum/pictures/display/2-5-Mechanical.html">integraph</a>. It has two styluses. You trace a curve on a piece of paper with one stylus and the other stylus draws the indefinite integral here. There was somebody in the math department who gave this service to the whole company, with about 24 hours turnaround time, calculating integrals. Our recent vice president Arno Penzias actually did, he calculated integrals differently, with a different background. He had a chemical balance, and he cut the curves out of the paper and weighed them. This was bought in 1918, so it’s eighty years old. It used to be shiny metal, it’s a little bit rusty now. But it still works.</p>
<p class=pp><span style="font-size: 0.7em;">66:30</span>
Well, that’s a once over lightly of a whole lot of things that have gone on at Bell Labs. It’s just such a fun place that one I said I just could go on and on. If you’re interested, there actually is a history written. This is only one of about six volumes, <a href="http://www.amazon.com/gp/product/0932764061">this</a> is the one that has the mathematical computer sciences, the kind of things that I’ve mostly talked about here. A few people have copies of them. For some reason, the AT&T publishing house thinks that because they’re history they’re obsolete, and they stopped printing them. [laughter]</p>
<p class=pp>Thank you, and that’s all.</p></p>
Using Uninitialized Memory for Fun and Profittag:research.swtch.com,2012:research.swtch.com/sparse2008-03-14T00:00:00-04:002008-03-14T00:00:00-04:00An unusual but very useful data structure
<p><p class=lp>
This is the story of a clever trick that's been around for
at least 35 years, in which array values can be left
uninitialized and then read during normal operations,
yet the code behaves correctly no matter what garbage
is sitting in the array.
Like the best programming tricks, this one is the right tool for the
job in certain situations.
The sleaziness of uninitialized data
access is offset by performance improvements:
some important operations change from linear
to constant time.
</p>
<p class=pp>
Alfred Aho, John Hopcroft, and Jeffrey Ullman's 1974 book
<i>The Design and Analysis of Computer Algorithms</i>
hints at the trick in an exercise (Chapter 2, exercise 2.12):
</p>
<blockquote>
Develop a technique to initialize an entry of a matrix to zero
the first time it is accessed, thereby eliminating the <i>O</i>(||<i>V</i>||<sup>2</sup>) time
to initialize an adjacency matrix.
</blockquote>
<p class=lp>
Jon Bentley's 1986 book <a href="http://www.cs.bell-labs.com/cm/cs/pearls/"><i>Programming Pearls</i></a> expands
on the exercise (Column 1, exercise 8; <a href="http://www.cs.bell-labs.com/cm/cs/pearls/sec016.html">exercise 9</a> in the Second Edition):
</p>
<blockquote>
One problem with trading more space for less time is that
initializing the space can itself take a great deal of time.
Show how to circumvent this problem by designing a technique
to initialize an entry of a vector to zero the first time it is
accessed. Your scheme should use constant time for initialization
and each vector access; you may use extra space proportional
to the size of the vector. Because this method reduces
initialization time by using even more space, it should be
considered only when space is cheap, time is dear, and
the vector is sparse.
</blockquote>
<p class=lp>
Aho, Hopcroft, and Ullman's exercise talks about a matrix and
Bentley's exercise talks about a vector, but for now let's consider
just a simple set of integers.
</p>
<p class=pp>
One popular representation of a set of <i>n</i> integers ranging
from 0 to <i>m</i> is a bit vector, with 1 bits at the
positions corresponding to the integers in the set.
Adding a new integer to the set, removing an integer
from the set, and checking whether a particular integer
is in the set are all very fast constant-time operations
(just a few bit operations each).
Unfortunately, two important operations are slow:
iterating over all the elements in the set
takes time <i>O</i>(<i>m</i>), as does clearing the set.
If the common case is that
<i>m</i> is much larger than <i>n</i>
(that is, the set is only sparsely
populated) and iterating or clearing the set
happens frequently, then it could be better to
use a representation that makes those operations
more efficient. That's where the trick comes in.
</p>
<p class=pp>
Preston Briggs and Linda Torczon's 1993 paper,
“<a href="http://citeseer.ist.psu.edu/briggs93efficient.html"><b>An Efficient Representation for Sparse Sets</b></a>,”
describes the trick in detail.
Their solution represents the sparse set using an integer
array named <code>dense</code> and an integer <code>n</code>
that counts the number of elements in <code>dense</code>.
The <i>dense</i> array is simply a packed list of the elements in the
set, stored in order of insertion.
If the set contains the elements 5, 1, and 4, then <code>n = 3</code> and
<code>dense[0] = 5</code>, <code>dense[1] = 1</code>, <code>dense[2] = 4</code>:
</p>
<center>
<img src="http://research.swtch.com/sparse0.png" />
</center>
<p class=pp>
Together <code>n</code> and <code>dense</code> are
enough information to reconstruct the set, but this representation
is not very fast.
To make it fast, Briggs and Torczon
add a second array named <code>sparse</code>
which maps integers to their indices in <code>dense</code>.
Continuing the example,
<code>sparse[5] = 0</code>, <code>sparse[1] = 1</code>,
<code>sparse[4] = 2</code>.
Essentially, the set is a pair of arrays that point at
each other:
</p>
<center>
<img src="http://research.swtch.com/sparse0b.png" />
</center>
<p class=pp>
Adding a member to the set requires updating both of these arrays:
</p>
<pre class=indent>
add-member(i):
dense[n] = i
sparse[i] = n
n++
</pre>
<p class=lp>
It's not as efficient as flipping a bit in a bit vector, but it's
still very fast and constant time.
</p>
<p class=pp>
To check whether <code>i</code> is in the set, you verify that
the two arrays point at each other for that element:
</p>
<pre class=indent>
is-member(i):
return sparse[i] < n && dense[sparse[i]] == i
</pre>
<p class=lp>
If <code>i</code> is not in the set, then <i>it doesn't matter what <code>sparse[i]</code> is set to</i>:
either <code>sparse[i]</code>
will be bigger than <code>n</code> or it will point at a value in
<code>dense</code> that doesn't point back at it.
Either way, we're not fooled. For example, suppose <code>sparse</code>
actually looks like:
</p>
<center>
<img src="http://research.swtch.com/sparse1.png" />
</center>
<p class=lp>
<code>Is-member</code> knows to ignore
members of sparse that point past <code>n</code> or that
point at cells in <code>dense</code> that don't point back,
ignoring the grayed out entries:
<center>
<img src="http://research.swtch.com/sparse2.png" />
</center>
<p class=pp>
Notice what just happened:
<code>sparse</code> can have <i>any arbitrary values</i> in
the positions for integers not in the set,
those values actually get used during membership
tests, and yet the membership test behaves correctly!
(This would drive <a href="http://valgrind.org/">valgrind</a> nuts.)
</p>
<p class=pp>
Clearing the set can be done in constant time:
</p>
<pre class=indent>
clear-set():
n = 0
</pre>
<p class=lp>
Zeroing <code>n</code> effectively clears
<code>dense</code> (the code only ever accesses
entries in dense with indices less than <code>n</code>), and
<code>sparse</code> can be uninitialized, so there's no
need to clear out the old values.
</p>
<p class=pp>
This sparse set representation has one more trick up its sleeve:
the <code>dense</code> array allows an
efficient implementation of set iteration.
</p>
<pre class=indent>
iterate():
for(i=0; i<n; i++)
yield dense[i]
</pre>
<p class=pp>
Let's compare the run times of a bit vector
implementation against the sparse set:
</p>
<center>
<table>
<tr>
<td><i>Operation</i>
<td align=center width=10>
<td align=center><i>Bit Vector</i>
<td align=center width=10>
<td align=center><i>Sparse set</i>
</tr>
<tr>
<td>is-member
<td>
<td align=center><i>O</i>(1)
<td>
<td align=center><i>O</i>(1)
</tr>
<tr>
<td>add-member
<td>
<td align=center><i>O</i>(1)
<td>
<td align=center><i>O</i>(1)
</tr>
<tr>
<td>clear-set
<td><td align=center><i>O</i>(<i>m</i>)
<td><td align=center><i>O</i>(1)
</tr>
<tr>
<td>iterate
<td><td align=center><i>O</i>(<i>m</i>)
<td><td align=center><i>O</i>(<i>n</i>)
</tr>
</table>
</center>
<p class=lp>
The sparse set is as fast or faster than bit vectors for
every operation. The only problem is the space cost:
two words replace each bit.
Still, there are times when the speed differences are enough
to balance the added memory cost.
Briggs and Torczon point out that liveness sets used
during register allocation inside a compiler are usually
small and are cleared very frequently, making sparse sets the
representation of choice.
</p>
<p class=pp>
Another situation where sparse sets are the better choice
is work queue-based graph traversal algorithms.
Iteration over sparse sets visits elements
in the order they were inserted (above, 5, 1, 4),
so that new entries inserted during the iteration
will be visited later in the same iteration.
In contrast, iteration over bit vectors visits elements in
integer order (1, 4, 5), so that new elements inserted
during traversal might be missed, requiring repeated
iterations.
</p>
<p class=pp>
Returning to the original exercises, it is trivial to change
the set into a vector (or matrix) by making <code>dense</code>
an array of index-value pairs instead of just indices.
Alternately, one might add the value to the <code>sparse</code>
array or to a new array.
The relative space overhead isn't as bad if you would have been
storing values anyway.
</p>
<p class=pp>
Briggs and Torczon's paper implements additional set
operations and examines performance speedups from
using sparse sets inside a real compiler.
</p></p>
Play Tic-Tac-Toe with Knuthtag:research.swtch.com,2012:research.swtch.com/tictactoe2008-01-25T00:00:00-05:002008-01-25T00:00:00-05:00The only winning move is not to play.
<p><p class=lp>Section 7.1.2 of the <b><a href="http://www-cs-faculty.stanford.edu/~knuth/taocp.html#vol4">Volume 4 pre-fascicle 0A</a></b> of Donald Knuth's <i>The Art of Computer Programming</i> is titled “Boolean Evaluation.” In it, Knuth considers the construction of a set of nine boolean functions telling the correct next move in an optimal game of tic-tac-toe. In a footnote, Knuth tells this story:</p>
<blockquote><p class=lp>This setup is based on an exhibit from the early 1950s at the Museum of Science and Industry in Chicago, where the author was first introduced to the magic of switching circuits. The machine in Chicago, designed by researchers at Bell Telephone Laboratories, allowed me to go first; yet I soon discovered there was no way to defeat it. Therefore I decided to move as stupidly as possible, hoping that the designers had not anticipated such bizarre behavior. In fact I allowed the machine to reach a position where it had two winning moves; and it seized <i>both</i> of them! Moving twice is of course a flagrant violation of the rules, so I had won a moral victory even though the machine had announced that I had lost.</p></blockquote>
<p class=lp>
That story alone is fairly amusing. But turning the page, the reader finds a quotation from Charles Babbage's <i><a href="http://onlinebooks.library.upenn.edu/webbin/book/lookupid?key=olbp36384">Passages from the Life of a Philosopher</a></i>, published in 1864:</p>
<blockquote><p class=lp>I commenced an examination of a game called “tit-tat-to” ... to ascertain what number of combinations were required for all the possible variety of moves and situations. I found this to be comparatively insignificant. ... A difficulty, however, arose of a novel kind. When the automaton had to move, it might occur that there were two different moves, each equally conducive to his winning the game. ... Unless, also, some provision were made, the machine would attempt two contradictory motions.</p></blockquote>
<p class=lp>
The only real winning move is not to play.</p></p>
Crabs, the bitmap terror!tag:research.swtch.com,2012:research.swtch.com/crabs2008-01-09T00:00:00-05:002008-01-09T00:00:00-05:00A destructive, pointless violation of the rules
<p><p class=lp>Today, window systems seem as inevitable as hierarchical file systems, a fundamental building block of computer systems. But it wasn't always that way. This paper could only have been written in the beginning, when everything about user interfaces was up for grabs.</p>
<blockquote><p class=lp>A bitmap screen is a graphic universe where windows, cursors and icons live in harmony, cooperating with each other to achieve functionality and esthetics. A lot of effort goes into making this universe consistent, the basic law being that every window is a self contained, protected world. In particular, (1) a window shall not be affected by the internal activities of another window. (2) A window shall not be affected by activities of the window system not concerning it directly, i.e. (2.1) it shall not notice being obscured (partially or totally) by other windows or obscuring (partially or totally) other windows, (2.2) it shall not see the <i>image</i> of the cursor sliding on its surface (it can only ask for its position).</p>
<p class=pp>
Of course it is difficult to resist the temptation to break these rules. Violations can be destructive or non-destructive, useful or pointless. Useful non-destructive violations include programs printing out an image of the screen, or magnifying part of the screen in a <i>lens</i> window. Useful destructive violations are represented by the <i>pen</i> program, which allows one to scribble on the screen. Pointless non-destructive violations include a magnet program, where a moving picture of a magnet attracts the cursor, so that one has to continuously pull away from it to keep working. The first pointless, destructive program we wrote was <i>crabs</i>.</p>
</blockquote>
<p class=lp>As the crabs walk over the screen, they leave gray behind, “erasing” the apps underfoot:</p>
<blockquote><img src="http://research.swtch.com/crabs1.png">
</blockquote>
<p class=lp>
For the rest of the story, see Luca Cardelli's “<a style="font-weight: bold;" href="http://lucacardelli.name/Papers/Crabs.pdf">Crabs: the bitmap terror!</a>” (6.7MB). Additional details in “<a href="http://lucacardelli.name/Papers/Crabs%20%28History%20and%20Screen%20Dumps%29.pdf">Crabs (History and Screen Dumps)</a>” (57.1MB).</p></p>