research!rsc

Thoughts and links about programming, by

RSS

Defining Go Modules
(Go & Versioning, Part 6)
Posted on Thursday, February 22, 2018. PDF

As introduced in the overview post, a Go module is a collection of packages versioned as a unit, along with a go.mod file listing other required modules. The move to modules is an opportunity for us to revisit and fix many details of how the go command manages source code. The current go get model will be about ten years old when we retire it in favor of modules. We need to make sure that the module design will serve us well for the next decade. In particular:

This post presents the parts of the vgo design that address these issues. Everything here is preliminary: we will change the design if we find that it is not right.

Versioned Releases

Abstraction boundaries let projects scale. Originally, all Go packages could be imported by all other Go packages. We introduced the internal directory convention in Go 1.4 to eliminate the problem that developers who chose to structure a program as multiple packages had to worry about other users importing and depending on details of helper packages never meant for public use.

The Go community has a similar visibility problem now with repository commits. Today, it's very common for users to identify package versions by commit identifiers (usually Git hashes), with the result that developers who structure work as a sequence of commits need to worry, at least in the back of their mind, about users pinning to any of those commits, which again were never meant for public use. We need to change the expectations in the Go open source community, to establish a norm that authors tag releases and users prefer those.

I don't think this point, that users should be choosing from versions issued by authors instead of picking out individual commits from the Git history, is particularly controversial. The difficult part is shifting the norm. We need to make it easy for authors to tag commits and easy for users to use those tags.

The most common way authors share code today is on code hosting sites, especially GitHub. For code on GitHub, all authors will need to do is tag a commit and push the tag. We also plan to provide a tool, maybe called go release, to compare different versions of a module for API compatibility at the type level, to catch inadvertent breaking changes that are visible in the type system, and also to help authors decide between issuing should be a minor release (because it adds new API or changes many lines of code) or only a patch release.

For users, vgo itself operates entirely in terms of tagged versions. However, we know that at least during the transition from old practices to new, and perhaps indefinitely as a way to bootstrap new projects, an escape hatch will be necessary, to allow specifying a commit. This is possible in vgo, but it has been designed so as to make users prefer explicitly tagged versions.

Specifically, vgo understands the special pseudo-version v0.0.0-yyyymmddhhmmss-commit as referring to the given commit identifier, which is typically a shortened Git hash and which must have a commit time matching the (UTC) timestamp. This form is a valid semantic version string for a prerelease of v0.0.0. For example, this pair of Gopkg.toml stanzas:

[[projects]]
  name = "google.golang.org/appengine"
  packages = [
    "internal",
    "internal/base",
    "internal/datastore",
    "internal/log",
    "internal/remote_api",
    "internal/urlfetch",
    "urlfetch"
  ]
  revision = "150dc57a1b433e64154302bdc40b6bb8aefa313a"
  version = "v1.0.0"

[[projects]]
  branch = "master"
  name = "github.com/google/go-github"
  packages = ["github"]
  revision = "922ceac0585d40f97d283d921f872fc50480e06e"

correspond to these go.mod lines:

require (
    "google.golang.org/appengine" v1.0.0
    "github.com/google/go-github" v0.0.0-20180116225909-922ceac0585d
)

The pseudo-version form is chosen so that the standard semver precedence rules compare two pseudo-versions by commit time, because the timestamp encoding makes string comparison match time comparison. The form also ensures that vgo will always prefer a tagged semantic version over an untagged pseudo-version, beacuse even if v0.0.1 is very old, it has a greater semver precedence than any v0.0.0 prerelease. (Note also that this matches the choice made by dep when adding a new dependency to a project.) And of course pseudo-version strings are unwieldy: they stand out in go.mod files and vgo list -m output. All these inconveniences help encourage authors and users to prefer explicitly tagged versions, a bit like the extra step of having to write import "unsafe" encourages developers to prefer writing safe code.

The go.mod File

A module version is defined by a tree of source files. The go.mod file describes the module and also indicates the root directory. When vgo is run in a directory, it looks in the current directory and then successive parents to find the go.mod marking the root.

The file format is line-oriented, with // comments only. Each line holds a single directive, which is a single verb (module, require, exclude, or replace, as defined by minimum version selection), followed by arguments:

module "my/thing"
require "other/thing" v1.0.2
require "new/thing" v2.3.4
exclude "old/thing" v1.2.3
replace "bad/thing" v1.4.5 => "good/thing" v1.4.5

The leading verb can be factored out of adjacent lines, leading to a block, like in Go imports:

require (
    "new/thing" v2.3.4
    "old/thing" v1.2.3
)

My goals for the file format were that it be (1) clear and simple, (2) easy for people to read, edit, manipulate, and diff, (3) easy for programs like vgo to read, modify, and write back, preserving comments and general structure, and (4) have room for limited future growth. I looked at JSON, TOML, XML, and YAML but none of them seemed to have those four properties all at once. For example, the approach used in Gopkg.toml above leads to three lines for each requirement, making them harder to skim, sort, and diff. Instead I designed a minimal format reminiscent of the top of a Go program, but hopefully not close enough to be confusing. I adapted an existing comment-friendly parser.

The eventual go command integration may change the file format, perhaps even adopting a more standard framing, but for compatibility we will keep the ability to read today's go.mod files, just as vgo can also read requirement information from GLOCKFILE, Godeps/Godeps.json, Gopkg.lock, dependencies.tsv, glide.lock, vendor.conf, vendor.yml, vendor/manifest, and vendor/vendor.json files.

From Repository to Modules

Developers work in version control systems, and clearly vgo must make that as easy as possible. It is not reasonable to expect developers to prepare module archives themselves, for example. Instead, vgo makes it easy to export modules directly from any version control repository following some basic, unobtrusive conventions.

To start, it suffices to create a repository and tag a commit, using a semver-formatted tag like v0.1.0. The leading v is required, and having three numbers is also required. Although vgo itself accepts shorthands like v0.1 on the command line, the canonical form v0.1.0 must be used in repository tags, to avoid ambiguity. Only the tag is required. In order to use commits made without use of vgo, a go.mod file is not strictly required at this point. Creating new tagged commits creates new module versions. Easy.

When developers reach v2, semantic import versioning means that a /v2/ is added to the import path at the end of the module root prefix: my/thing/v2/sub/pkg. There are good reasons for this convention, as described in the earlier post, but it is still a departure from existing tools. Realizing this, vgo will not use any v2 or later tag in a source code repository without first checking that it has a go.mod with a module path declaration ending in that major version (for example, module "my/thing/v2"). Vgo uses that declaration as evidence that the author is using semantic import versioning to name packages within that module. This is especially important for multi-package modules, since the import paths within the module must contain the /v2/ element to avoid referring back to the v1 module.

We expect that most developers will prefer to follow the usual “major branch” convention, in which different major versions live in different branches. In this case, the root directory in a v2 branch would have a go.mod indicating v2, like this:

This is roughly how most developers already work. In the picture, the v1.0.0 tag points to a commit that predates vgo. It has no go.mod file at all, and that works fine. In the commit tagged v1.0.1, the author has added a go.mod file that says module "my/thing". After that commit, however, the author forks a new v2 development branch. In addition to whatever code changes prompted v2 (including the replacement of bar with quux), the go.mod in that new branch is updated to say module "my/thing/v2". The branches can then proceed independently. In truth, vgo really has no idea about branches. It just resolves the tag to a commit and then looks at the go.mod file in the commit. Again, the go.mod file is required for v2 and later so that vgo can use the module line as a sign that the code has been written with semantic import versioning in mind, so the imports in foo say my/thing/v2/foo/quux, not my/thing/foo/quux.

As an alternative, vgo also supports a “major subdirectory” convention, in which major versions above v1 are developed in subdirectories:

In this case, v2.0.0 is created not by forking the whole tree into a separate branch but by copying it into a subdirectory. Again the go.mod must be updated to say "my/thing/v2". Afterward, v1.x.x tags pointing at commits address the files in the root directory, excluding v2/, while v2.x.x tags pointing at commits address the files in the v2/ subdirectory only. The go.mod file lets vgo distinguishes the two cases. It would also be meaningful to have a v1.x.x and a v2.x.x tag pointing at the same commit: they would address different subtrees of the commit.

We expect that developers may feel strongly about choosing one convention or the other. Instead of taking sides, vgo supports both. Note that for major versions above v2, the major subdirectory approach may provide a more graceful transition for users of go get. On the other hand, users of dep or vendoring tools should be able to consume repositories using either convention. Certainly we will make sure dep can.

Multiple-Module Repositories

Developers may also find it useful to maintain a collection of modules in a single source code repository. We want vgo to support this possibility. In general, there is already wide variation in how different developers, teams, projects, and companies apply source control, and we do not believe it is productive to impose a single mapping like “one repository equals one module” onto all developers. Having some flexibility here should also help vgo adapt as best practices around souce control continue to change.

In the major subdirectory convention, v2/ contains the module "my/thing/v2". A natural extension is to allow subdirectories not named for major versions. For example, we could add a blue/ subdirectory that contains the module "my/thing/blue", confirmed by a blue/go.mod file with that module path. In this case, the source control commit tags addressing that module would take the form blue/v1.x.x. Similarly, the tag blue/v2.x.x would address the blue/v2/ subdirectory. The existence of the blue/go.mod file excludes the blue/ tree from the outer my/thing module.

In the Go project, we intend to explore using this convention to allow repositories like golang.org/x/text to define multiple, independent modules. This lets us retain the convenience of coarse-grained source control but still promote different subtrees to v1 at different times.

Deprecated Versions

Authors also need to be able to deprecate a version, to indicate that it should not be used anymore. This is not yet implemented in the vgo prototype, but one way it could work would be to define that on code hosting sites, the existence of a tag v1.0.0+deprecated (ideally pointing at the same commit as v1.0.0) would indicate that the commit is deprecated. It is of course important not to remove the tag entirely, because that will break builds. Deprecated modules would be highlighted in some way in vgo list -m -u output (“show me my modules and information about updates”), so that users would know to update.

Also, because programs will have access to their own module lists and versions at runtime, a program could also be configured to check its own module versions against some chosen authority and self-report in some way when it is running deprecated versions. Again, the details here are not worked out, but it's a good example of something that's possible once developers and tools share a vocabulary for describing versions.

Publishing

Given a source control repository, developers need to be able to publish it in a form that vgo can consume. In the general case, we will provide a command that authors run to turn their source control repositories into file trees that can be served to vgo by any web server capable of serving static files. Similar to current go get, vgo expects a page with a <meta> tag to help translate from a module name to the tree of files for that module. For example, to look up swtch.com/testmod, the vgo command fetches the usual page:

$ curl -sSL 'https://swtch.com/testmod?go-get=1'
<!DOCTYPE html>
<meta name="go-import" content="swtch.com/testmod mod https://storage.googleapis.com/gomodules/rsc">
Nothing to see here.
$ 

The mod server type indicates that modules are served in a file tree at that base URL. The relevant files at storage.googleapis.com/gomodules/rsc in this simple case are:

The exact meaning of these URLs is discussed in the “Download Protocol” section later in the post.

Code Hosting Sites

A huge amount of development happens on code hosting sites, and we want that work to integrate into vgo as smoothly as possible. Instead of expecting developers to publish modules elsewhere, vgo will have support for reading the information it needs from those sites directly, using their HTTP-based APIs. In general, archive downloads can be significantly faster than the existing version control checkouts. For example, working on a laptop with a gigabit internet connection, it takes 10 seconds to download the CockroachDB source tree as a zip file from GitHub but almost four minutes to git clone it. Sites need only provide an archive of any form that can be fetched with a simple HTTP GET. Gerrit servers, for example, only support downloading gzipped tar files. Vgo translates downloaded archives into the standard form.

The initial prototype only includes support for GitHub and the Go project's Gerrit server, but we will add support for Bitbucket and other major hosting sites too, before shipping anything in the main Go toolchain.

With the combination of the lightweight repository conventions, which mostly match what developers are already doing, and the support for known code hosting sites, we expect that most open source activity will be unaffected by the move to modules, other than simply adding a go.mod to each repository.

Companies taking advantage of old go get's direct use of git and other source control tools will need to adjust. Perhaps it would make sense to write a proxy that serves the vgo expectations but using version control tools. Companies could then run one of those to produce an experience much like using the open source hosting sites.

Module Archives

The mapping from repositories to modules is a bit complex, because the way developers use source control varies. The end goal is to map all that complexity down into a common, single format for Go modules for use by proxies or other code consumers (for example, godoc.org or any code checking tools).

The standard format in the vgo prototype is zip archives in which all paths begin with the module path and version. For example, after running vgo get of rsc.io/quote v1.5.2, you can find the zip file in vgo's download cache:

$ unzip -l $GOPATH/src/v/cache/rsc.io/quote/@v/v1.5.2.zip
     1479  00-00-1980 00:00   rsc.io/quote@v1.5.2/LICENSE
      131  00-00-1980 00:00   rsc.io/quote@v1.5.2/README.md
      240  00-00-1980 00:00   rsc.io/quote@v1.5.2/buggy/buggy_test.go
       55  00-00-1980 00:00   rsc.io/quote@v1.5.2/go.mod
      793  00-00-1980 00:00   rsc.io/quote@v1.5.2/quote.go
      917  00-00-1980 00:00   rsc.io/quote@v1.5.2/quote_test.go
$ 

I used zip because it is well-specified, widely supported, and cleanly extensible if needed, and it allows random access to individual files. (In contrast, tar files, the other obvious choice, are none of these things and don't.)

Download Protocol

To download information about modules, as well as the modules themselves, the vgo prototype issues only simple HTTP GET requests. A key design goal was to make it possible to serve modules from static hosting sites, so the requests have no URL query parameters.

As we saw earlier, custom domains can specify that a module is hosted at a particular base URL. As implemented in vgo today (but, like all of vgo, subject to change), that module-hosting server must serve four request forms:

The JSON information served in the version.info form will likely evolve, but today it corresponds to this struct:

type RevInfo struct {
    Version string    // version string
    Name    string    // complete ID in underlying repository
    Short   string    // shortened ID, for use in pseudo-version
    Time    time.Time // commit time
}

The vgo list -m -u command shows the commit time of each available update by using the Time field.

A general module-hosting server may optionally respond to version.info requests for non-semver versions as well. A vgo command like

vgo get my/thing/v2@1459def

will fetch 1459def.info and then derive a pseudo-version using the Time and Short fields.

There are two more optional request forms:

These support the use of untagged commits in vgo. If vgo is adding a module and finds no tagged commits at all, it uses the first form to find the latest commit as of now. It does the same when looking for available updates, assuming there are still no tagged commits. The branch-limited form is used for the internal simulation of gopkg.in. These forms also support the command line syntaxes:

vgo get my/thing/v2@2018-02-01T15:34:45
vgo get my/thing/v2@2018-02-01T15:34:45@branch

These might be a mistake, but they're in the prototype today, so I'm mentioning them.

Proxy Servers

Both individuals and companies may prefer to download Go modules from proxy servers, whether for efficiency, availability, security, license compliance, or any other reason. Having a standard Go module format and a standard download protocol, as described in the last two sections, makes it trivial to introduce support for proxies. If the $GOPROXY environment variable is set, vgo fetches all modules from the server at the given base URL, not from their usual locations. For easy debugging, $GOPROXY can even be a file:/// URL pointing at a local tree.

We intend to write a basic proxy server that serves from vgo's own local cache, downloading new modules as needed. Sharing such a proxy among a set of computers would help reduce redundant downloads from the proxy’s users but more importantly would ensure future availability, even if the original copies disappear. The proxy will also have an option not to allow downloads of new modules. In this mode, the proxy would limit the available modules to exactly those whitelisted by the proxy administrator. Both proxy modes are frequently requested features in corporate environments.

Perhaps some day it would make sense to establish a distributed collection of proxy servers used by default in go get, to ensure module availability and fast downloads for Go developers worldwide. But not yet. Today, we are focused on making sure that go get works as well as it can without assuming any kind of central proxy servers.

The End of Vendoring

Vendor directories serve two purposes. First, they specify by their contents the exact version of the dependencies to use during go build. Second, they ensure the availability of those dependencies, even if the original copies disappear. On the other hand, vendor directories are also difficult to manage and bloat the repositories in which they appear. With the go.mod file specifying the exact version of dependencies to use during vgo build, and with proxy servers for ensuring availability, vendor directories are now almost entirely redundant. They can, however, serve one final purpose: to enable a smooth transition to the new versioned world.

When building a module, vgo (and later go) will completely ignore vendored dependencies; those dependencies will also not be included in the module's zip file. To make it possible for authors to move to vgo and go.mod while still supporting users who haven't converted, the new vgo vendor command populates a module's vendor directory with the packages users need to reproduce the vgo-based build.

What's Next?

The details here may be revised, but today's go.mod files will be understood by any future tooling. Please start tagging your packages with release tags; add go.mod files if that makes sense for your project.

The next post in the series will cover changes to the go tool command line experience.