DISCLAIMER: This site is a mirror of original one that was once available at http://iki.fi/~tuomov/b/


1. Just like almost everyone else in the free software world, at some point I started managing the code of my projects with CVS. But CVS had many problems and once Subversion came along, I switched to it. But centralised version control is so cumbersome, depending on special centralised server software, and creating accounts on it for every contributor and so on. While, of the distributed version control systems, GNU Arch (tla) has been around for quite some time, I never liked it because it imposes too much policy instead of simply providing mechanisms, and has a very laboursome command set.

Then Darcs came along. It offered a very nice command set, and no policy. The architechture was also very simple: every working copy is a repository, and no setting up of any other storage is needed. Branching is then simple, and no trace is left of any scrap branches after they've been deleted from the file system, unlike in systems that demand creation of branches in a local or central repository. These repositories can be served with any plain Web server, but even more importantly, Darcs first took the communication aspect of version control rather seriously: there's excellent support for sending patches with email (if you omit the fact that all MTAs suck), and thanks to its patch theory, the merging or application of patches sent this way is quite conflict-free. Given Darcs' simplicy and quite approachable command set for any command line user, it is quite easy for a random contributor to create a patch that is then easy for the primary author(s) of a piece of software to apply, with all the necessary metadata already entered by the contributor. This has led me, for example, to not accept plain old diffs at all, for which I'd have to consider whether and where I want to enter the author's name, what should be the log message, and so on.

Unfortunately, there's one aspect of communication that Darcs doesn't take seriously at all: differing encodings. Darcs does not support locales and locale encodings even for patch metadata (and for other things the support is too difficult indeed), and instead expects every contributor to a project to use the same locale/encoding. One should not have to ask one's contributors to use a particular locale to enter their name! To tell them how to write their name! Some of the developers indeed appear to outright desire an UTF-8 monoculture1, an evolutionary dead-end, and the corresponding issue in the bug database is still marked as “wont-fix”, although after long persuasion, at least the lead developer has given a go-ahead for “write it yourself!” As if I had the time (and will) to study the mess-of-a-source-code to figure out all the things that need to be changed, and the developers have been quite unhelpful towards that end. For someone knowing the code already, the job would be so much simpler. (While Haskell is a nice language, it does not lend itself to easy refactoring.)

2. This elitist and monoculturist (and perhaps also anglophone?) reluctance on the Darcs developers' part on supporting locales (including locale or ISO-8601 time format instead of the illogically ordered anglophone one as well) or even assisting in implementing that support, has led me to look for alternatives to Darcs. Unfortunately, the grass isn't much greener on the other side of the fence. There's no going back to centralised version control, and given my distaste for tla, the contenders seem to be: Bazaar-ng (bzr), Mercurial (hg), and Git. For comparison, I will also include Monotone below, although it counts as semi-centralised, demanding specialised servers, and is thus not an option. SVK likewise is a decentralisation hack demanding Subversion (WebDAV) servers, and is therefore not considered. (It would fare quite similarly to Monotone in the summary table, but is slightly more cumbersome to use.) Codeville is also not considered, requiring special server software and seeming to be doomed into obscurity.

Various technical aspects of these version control systems have been covered elsewhere, with Darcs quite clearly at the top in most respects, although not without some serious problems (at least one of which, combinatorial explosions arising from conflicts caused by file deletions, they seem to be unwilling to solve in a simple manner). I will therefore concentrate in the following on some aspects of subjective usability and approachability, and in particular the aspect of communicating with random contributors, and what I see as obstacles for them use the system.

An important feature, of course, is that a simple $VCS get http://location/ command can be used to fetch the sources. All the contenders except Monotone succeed in this, all the others that require special server software having been excluded from the comparison in the first metres. The basic command sets of all these tools are also quite simple, unlike that of tla, although Darcs' appears to be the simplest, and I cringe on the popular use of commit for what is called record in Darcs: you simply don't “commit” to anything in a properly distributed version control until you “push” the changes into a public repository.

Unfortunately, it appears that only bzr and Monotone support locales. They are planned for Mercurial, but who knows when that will happen. What is it with this reluctance towards locale support? This fascination with an UTF-8 monoculture prevalent2 in the free software world these days? Is it the prevalence of English in the world of computers? Should we not let anglophones, people who have never had to suffer from encoding assumptions, near computers at all?

Another thing one sees when starting to use these alternatives, is that they don't take email seriously at all. First of all, bzr, hg and git all make (or made, this should've been fixed in hg recently) the assumption that user@hostname is a meaningful author of a change. Most of the time, it is totally meaningless! Most people don't do development on servers that match their email addresses, or even on computers that have meaningful hostnames. Darcs in its interactive bliss takes the path of least surprise and asks for the email address if it has not been configured, and stores it in the repository for future use, instead of making an assumption, the mother of all fuck-ups. (Unfortunately, this non-assumptiveness does not extend to encodings.) Monotone also asks you to configure a user and a cryptographic key before it lets you commit/record any changes.

Neither do these other version control systems take sending email seriously, once you've managed to figure out how to have a meaningful author for a patch. Monotone does not seem to support it at all, and bzr and git don't support it by default: you have to install unpackaged extra plugins for that. And once you've managed to do that, you still have to figure out the email address to send the patches to, because unlike Darcs, they don't take it from the upstream repository.

Furthermore, in Git, I've been told, patches sent by email are not handled like patches transmitted another in other ways, and there are all kinds of problems with synchronising repositories with patches sent by email applied to them: if you use email to send patches, you're supposed manage some sort of patch stacks with extra tools. Bzr also seems to suffer from this to a bit lesser degree.

Mercurial, on the other hand, while it to my knowledge does handle patches sent by email like any other changes, defaults to sending a zillion emails – one for each change – and I don't want a mailbomb in my mailbox. It has an option to change the behaviour, but entering such options is again too much asked from the random contributor. Even in general, Mercurial asks for too much manual configuration. It doesn't even remember where you've pushed or pulled previously, and instead expects you to tell the default sources and destinations in a configuration file. Same with bzr, but at least its commands have options to make it remember the parameters.

Another problem I have with many of these systems, is that they're difficult to install on systems on which you don't have root privileges on. The Darcs binary (from the Debian package) I can simply copy over, and it will work if the target system isn't too different. But tools having many library dependencies or written in scripting languages and spread into a zillion files are much more difficult to simply copy over, given the all-in-one-basket mess that the unix directory hierarchy is. I'd thus have to install these programs from the source package, and that is simply too much work. Of course, this problem is not limited to just these version control systems, and is not their fault as such. It would be nice to have a way to build self-contained executables from packages already installed on a system.

3. Here's a summary of the various aspects discussed above, and some others:

Darcs Mercurial Bazaar-ng Git Monotone
Plain HTTP for serving repositories × × × ×
Simple fully distributed repository model × × × ×
Simple command set × × × × ×
Ask and remember, do not assume ½ ½
Locale encoding support for metadata (unlikely) (planned) × ×
Takes email seriously × ½
Tools for full conversion3 from Darcs -

Any version control system that first fulfills all these criteria, and any other gripes I might come up with, I am likely to switch my projects over to.


1 Don't get more wrong, I like the extended character range that UTF-8 can offer (despite a lot of redundant and useless parts), but I don't want it at the price of a monoculture, an evolutionary dead-end. I used to suggest for people to switch to an UTF-8 locale. I no longer do, having seen where the switch to UTF-8 is heading: ignoring those locales and other encoding specifications. Now it's better to slow down the switch to UTF-8, because it's being done the wrong way.

2 In other news, apparently GHC has also joined the UTF-8 monoculture: It expects all source code to be UTF-8, instead of supporting specifying the encoding in the source file, e.g. in the format that some editors already support for specifying various attributes, in a comment at the top as follows:

-- -*- encoding: utf-8; -*-

3 That means, tags included. Tailor and darcs2hg do not support tags (from Darcs). (Interactive manual intervention is acceptable for tags that can not be converted as such.)