Friday, November 20, 2009

Revisionist History

Remember the Subversion 1.0 celebration T-shirts? Alas, there are no `svnadmin dump' and `svnadmin load' for cottonware.
As you might have read elsewhere, the Subversion project has recently been accepted in the Apache Software Foundation's Incubator. As part of the process, we'll be migrating all kinds of goodies off of — which has been the project's home (and a good one at that) for the past near-decade — and onto Apache-hosted servers. Last weekend we took the first of those steps by migrating our version control history. I managed the Subversion side of this migration, prepping the data for delivery into the ASF Infrastructure team's able hands.

But I wanted to do more than simply move our Subversion history from one place to another. See, when Subversion began, it was a bunch of source code living in a CVS repository. When the source code compiled into something trustworthy, we let Subversion hold it own source code. (For the record, we were never given cause to regret that decision.) But at the time we made that change, there was no reliable and simple way to convert CVS history into Subversion history like there is with cvs2svn today. So we just exported the latest snapshot of our main development line, imported that into Subversion, and dealt with the severed history. However, this repository migration — which was going to be disruptive anyway — presented an opportunity to stitch together our old CVS history and our Subversion history. So I did. Here's how:

  1. Using cvs2svn, I converted all CVS history to Subversion and deposited it into a temporary repository, svn-from-cvs.
  2. Now, the CVS repository data contained some trailing changes that were created after the switch to Subversion back in 2001. Most of those were commits to www/ (which we manually mirrored for a while based on our Subversion commits to trunk/www/). A couple of them were things like system-wide automated tweaks to www/robots.txt made by CollabNet folk. Also, we had real tags and branches in our CVS repos that we didn't bring with us into Subversion. So I dumped the first 3654 revisions from svn-from-cvs — the pre-switchover changes only — and loaded that into the stitch repository, svn-complete.
  3. To historically preserve the fact that apparently we didn't care too much about those old CVS tags and branches, I committed their deletion from svn-complete (but left the branches/ and tags/ top-level directories themselves).
  4. Since the first revision of our project's Subversion history (in the main svn repository) was a massive import into trunk, that would have clashed mightily being loaded atop already-existing files and directories in svn-complete. So instead checked I out svn-complete/trunk@HEAD, then exported svn/trunk@1 atop it. The local mods were the small delta between what we got outta CVS on August 31, 2001, and what we put into Subversion. They were mostly the result of $Date$ keyword formatting differences. I committed those local mods, which now brought svn-complete into sync with svn@1' except that svn-complete still had empty tags/ and branches/ directories (which were added in r532 and r1237, respectively).
  5. I dumped -r2:531 of svn, loading the result into svn-complete.
  6. I skipped r532 (the revision in which we created our tags/ directory) from svn, instead adding a no-op placeholder revision to svn-complete.
  7. I then dumped -r533:1236 of svn, loading the result into svn-complete.
  8. Once again, I skipped r1237 (the revision in which we created our branches/ directory) from svn, instead adding another no-op placeholder revision to svn-complete.
  9. Finally, I dumped the rest of the svn history (r1238:r40515), loading those revisions into svn-complete.

The result was a single repository (svn-complete) of 44170 revisions (3654 from CVS, 40515 from Subversion, and 1 cleanup revision) that contained all of the Subversion project's version control history, starting with the inception of the project. It's this data that I handed off to the ASF Infrastructure team.

The ASF Infrastructure team took this repository's data and loaded it into the ASF repository (with external commits disabled to prevent interleaved commit history). At the time that this history was loaded into the ASF repository, that repository already had 836419 revisions in it. The next 3655 revisions represent the Subversion CVS history (plus a fixup revision). This means that any historical references found in Subversion's source code, issue trackers, mailing lists, etc. that refer to pre-migration revisions (which are easy to spot, as they are all quite a bit smaller than 800000!) may be found in the ASF repository by adding 836419 + 3655 = 840074 to the revision number.

We as a project are still really finding all the places that this change of address (and revision numbers) will affect us. I can assure you that the last place I expected it to hit was my clothes closet, though!

1 comment:

  1. The fact that my r8810 t-shirt would no longer be 100% accurate was actually the first thing that occurred to me when I saw your email about how the conversion was done ;-)

    Nice job by the way, very cool to have all that history brought along for the ride.