WDS Global Build Coordination

We have three groups of developers around the world, not by accident at locations 8 hours apart and each colocated within itself. Thus we sidestep the "distributed build token" problem. Except...just recently what with one thing and another we've had to move the EMEA team in a somewhat piecemeal fashion, and so have ended up with just that problem across two sites in the UK.

Some developers addressed this problem with a proposal to instrument the build script to fire up a database connection and record the start times, end times (and workstation name) of attempted builds in a table, and then have the build interrogate this table to find out if a collision was likely. And they were going to "do the simplest thing that could possible work" and spike this and iteramentally that and so on and so on. And because they were a little bit confused about what it is that they do that adds value to the customer, they were only going to tinker about with this in their own time. Meanwhile, invalid builds were tripping pairs up left right and center (to add to the fun, our build time is far, far too long).

Instead, I grabbed someone to work with and applied "do the right thing": we stopped working on the story before us (which was being held up by having to redo builds) and thought a bit about the problem. The interesting question is not "is someone else building?" but rather, "when my build finishes will I be blocked from checking in?"

We use Perforce, and the P4 server knows what everyone is doing. And we use ant, and ant has all sorts of hooks in it. So in a couple of hours we put together an alternative build script, "pant", which is really about 100 lines of Ruby (including some hefty here docs for messages) that asks P4, via its unix-stylee text interface, if any of the files not edited on this client are also not synched to the head revision on this client, and warn the user (but NOT stop the build) if so. And then every so often during the build looks to see if anyone else has checked in (ie, the highest numbered submitted changelist now is not the one it was when pant started), and warn the user if so (but NOT stop the build). If these conditions occur, pant calls out to ant to play a build failed sound (in this case, the theme from The Love Boat).

We did't download anything from the internet for this, but we did use examples (of how cruise control plays its failure sounds); source code (of the the existing cruise control script); loosley structured data (p4's textual reporting of change lists); dynamic typing (ruby); focussed components (P4, CruiseControl); composed components (we didn't change P4 or the existing build script); actual capabilities (stuff that P4 does that can be interpreted the right way); and simplified the problem.

This is the sort of thinking that we try to encourage: don't write code. Out codebase is currently 3 Gb of source, which is a shameful admission of failure. The team here has learned to celebrate the deletion of code, but are still a bit too fond of writing the stuff. We emphasise to our guys that they are not "programmers" (who neccessarily add value by, well, writing code), but "developers", who add value by building systems. We even tried to change the name of the department from "development" to "technology solutions", but that was a step too far and didn't stick.

But back to the distributed build token problem. A better solution has recently emerged serendipitously: being spread around we run up quite substantial phone bills, so have started to install VoIP phones. Turns out that a USB VoIP phone plugged into a Windows box becomes the default sound output device, so the build sounds play through it. And since we leave the connection open all the time, the folks at the other site hear that build sound too, so they know that a potential conflict is coming down the line. Problem solved with zero code written.

— Keith Braithwaite