Tag: Subversion

How To Use Git-SVN as the Only Subversion Client You’ll Need

I’ve been using git as my favorite version control tool for quite a while now. One of its numerous distinguishing features is an optional component called git-svn, which serves as a bi-directional “bridge” that enables native git repositories to interact with a Subversion repository, performing all the normal operations you would need to use svn for. In other words, since you can checkout, commit to, and query the logs of Subversion repositories (among other things) using git-svn, git can serve as your all-in-one Subversion client.

One reason why you might use git-svn because your project actually resides in a Subversion repository and other people need to access it using Subversion-only tools. Another might be because you have multiple projects, some that use git and others that use Subversion, and you’re tired of switching between svn and git commands—like me. For us, it’s far easier to simply use git as a Subversion client and never have to call svn directly.

As an important aside, please note that I would strongly discourage people who are new to git from learning about it by using git-svn. Although you may think that moving to git from Subversion would be eased by using the git-svn bridge, I really don’t think that’s the case. You’re much, much better off simply using git by itself right off the bat, and you can do this even if your fellow committers are using subversion.

Also, I’m going to assume you’ve already got a Subversion repository set up somewhere.

First, checkout the subversion repository. In Subversion you would do this:

svn checkout http://example.com/path/to/svn/repo

With git-svn, you do this:

git svn clone http://example.com/path/to/svn/repo

This will cause git-svn to create a new directory called repo, switch to it, initialize a new git repository, configure the Subversion repository at http://example.com/path/to/svn/repo as a remote git branch (confusingly called git-svn by default, although you can specify your name by passing a -Rremote_name or --svn-remote=remote_name option), and then does a checkout.

The output of this command will be a little awkward. Here’s a sample from one my repositories:

r14 = dbd7266f328ef2ad061ea4532f39ce7cebaba0c5 (git-svn)
	M	trunk/Chapter 6/Chapter 6.doc
	M	trunk/Chapter 6/code examples/6.1.html
	A	trunk/Chapter 6/code examples/6.2.html
r15 = 4cca08341ab0600069cece77ce67afc449caca68 (git-svn)
	M	trunk/Chapter 6/Chapter 6.doc
	A	trunk/Chapter 6/code examples/print.css
	A	trunk/Chapter 6/code examples/screen.css
	M	trunk/Chapter 6/code examples/6.1.html
	M	trunk/Chapter 6/code examples/6.2.html
r16 = 7b2f3e0ccfd79be61b527b6ba325f8689475dc01 (git-svn)
	M	trunk/Chapter 5/Chapter 5.doc
r17 = a319764855361d92bb6e006cfd18a51319046cae (git-svn)
	M	trunk/Chapter 5/Chapter 5.doc
r18 = 4cd5cb43d33b2dd45bd39b9a2b7ea9416f9e3d8f (git-svn)
	M	trunk/Chapter 6/Chapter 6.doc
	M	trunk/Chapter 6/code examples/screen.css
	M	trunk/Chapter 6/code examples/6.1.html

As you can see, git-svn is associating specific Subversion revisions with particular git commit objects. Due to this required mapping, the initial cloning process of a Subversion repository may take some time. This is a good opportunity for your morning coffee break.

When this process is done, you’ll have a typical git repository with a local master branch and one remote branch for the Subversion repository:

Perseus:repo maymay$ git branch
* master
Perseus:repo maymay$ git branch -r

You can now treat the Subversion repository as though it were a remote branch of sorts. Say you’ve done a bunch of work and, as you typically do with git, you commit this work to your topic branch.

Perseus:repo maymay$ git checkout -b awesome-feature
Switched to a new branch "awesome-feature"
Perseus:repo maymay$ vim awesome-feature-stylesheet.css
Perseus:repo maymay$ git add awesome-feature-stylesheet.css 
Perseus:repo maymay$ git commit -m "Now I'm perty."
Created commit 07ee832: Now I'm perty.
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 awesome-feature-stylesheet.css

Right now your changes are still in the topic branch (called awesome-feature in the above example). To get them to Subversion, you merely need to say git svn dcommit:

Perseus:repo maymay$ git svn dcommit
Committing to http://example.com/path/to/svn/repo ...

Note that pesky extra “d” in the command. This is the equivalent of Subversion’s svn commit, but the commit message used is the one from the previous command, which in this case was git commit -m "Now I'm perty.". Also interesting to note here is that because Subversion doesn’t understand git branches, any change on any branch can be “pushed” to Subversion at any time using git svn dcommit—the git commits don’t have to be on any specific branch, since all git-svn does is map a git commit object to a Subversion revision and vice versa.

Similarly, you can at any time run the equivalent of svn update to get the latest changes from the Subversion repository into your Subversion branch.

  • To do this, without affecting your working tree—that is, to only fetch the latest changes but not write them to the filesystem, just to the git-svn metadata area and the remote git branch—use git svn fetch. To apply these changes to your local branch, you simply merge: git checkout master; git merge git-svn.
  • If you do want to write out the changes to the filesystem (as svn update would do), use git svn rebase, which automatically linearizes your local git commit history after the commit history of the incoming Subversion changesets. Very slick.

If your fetching/rebasing causes a conflict, you’ll be notified and will have to resolve it as per usual. If your “pushes” to the svn repo causes a Subversion conflict, you’ll be notified and you should again edit the appropriate files to resolve it, but this time make sure you run a git svn rebase before you try dcommit-ing again (since, remember, Subversion can only handle linear commit history).

As always, saying man git-svn or git help svn to your shell will give you all the other details. Among these, the most likely you’ll probably want to learn about is how to track multiple Subversion branches as normal git branches.

Are you missing the point of using a version control tool?

The other day I gave a brief (and overly-hyper) talk about git, the (very) dumb, (very) fast version control system. It was part of SyPy‘s Git vs. Hg vs. Bzr night. Rather than be flamingly competitive, however, I had a lot of fun that night learning about the differences between the DSCM tools, which was especially interesting since I’ve only ever used Git in real life scenarios.

Since I’m a Subversion refugee, my only experience with different version control systems is mostly with the distinctions between the centralized versus the distributed models, not between the various tools you can use in either paradigm. What struck me when I first began using git was how conceptually similar it felt to using Subversion when I was using it by myself (as a lone developer) but how radically different it suddenly felt the moment I was sharing my code with someone else.

Now, I’m a die-hard individualist. I want things to happen my way as much as possible, and I don’t really care what happens for anyone else as long as when I interact with other people those interactions are as mutually beneficial as they can possibly be. That’s why I love DSCM tools so much.

Distributed source code management systems feel much more like translator tools between the ways in which people work as opposed to feeling like a dogma of workflow management processes, like centralized systems do. This paradigm appeals both to my preferred way to work and, as it turns out, helps more people stay more productive all at the same time.

This is also why I’m a firm believer that most of the people I’ve worked with in the past completely missed the point of using version control systems. It seems to me that most developers I’ve worked with have thought of SCM tools as “the ‘Save As…’ button on steroids.” While these developers are technically correct, their narrow view of what a VCS does means they aren’t taking advantage of the full potential of the concept.

The power of a version control system isn’t just in that it gives you the ability to easily hit the proverbial “Save As…” button as much as you want, but rather in that you get to retrieve those other versions when you’re ready for them, regardless of what your fellow developers are doing to the code on their machines. This means that a version control system’s real purpose is to insulate you from changes of any sort until you’re ready to deal with them. A good tool also does this reciprocally; it will insulate your fellow developers from the changes you’re making until they’re ready for them.

Admittedly, that’s not a very concrete “feature.” It’s more like a fundamental philosophical principle, which is probably why it’s so hard to encode into the physical manifestation of a tool. Then on top of all of that complicatedness you have to add things like usability and interoperability and resource efficiency. That’s where I learned about the majority of the distinctions between the various DSCM tools discussed in SyPy’s presentation.

However, for me, all of those things ultimately get evaluated against the following question: Does Feature X help insulate me from change (does it help in persisting my view of the state of the world until I’m ready for it to change), or not?

For example, Bazaar’s interesting notion of “nested commits” with dotted revision numbers is really intriguing because it’s much (much) more user-friendly than git’s notion of exposing SHA-1 hashes to (mere mortal) end user’s eyes. Yet, while it’s certainly less painful than copying-and-pasting hashes all over the place, there’s little fundamental difference in the way these mechanisms actually portray the state of the world to me. Any given SHA-1 will always be the exact same commit object. Any given dotted revision number will also always be the same commit (within one’s own unchanged repository).

In contrast, I learned from Martin Pool that Bazaar has a “push over SFTP” feature to let you “export” or “archive” a version of code by transmitting it over an SFTP connection. Now that really caught my attention because it’s an example of the version control tool acting like that translator I was mentioning earlier; the interoperability helps people not need to change until they want to. In this case, it means you never have to install Bazaar on a remote server to get your content there via the tool. That’s very cool—much cooler than the mundane technical fact that bzr supports the SFTP protocol out of the box.

Of course, it’s technically pretty trivial to write an expect or shell script wrapper to enable git (or whatever other tool you want to use) mimic this behavior. And that’s exactly the point: technology is always the easy part. It’s doing it right at a fundamental level that’s actually really difficult to do correctly.

Fix Subversion “checksum mismatch” error by editing .svn/entries file

I can’t explain why this happened because in my several-year-long history with Subversion, I’ve never experienced this issue once. However, today, I fell into the (arguably) unfortunate circumstance of running into a most disturbing error from SVN. When trying to commit my changes, SVN barfed at me and complained of a “checksum mismatch”. It looked something like this:

Transmitting file data ..svn: Commit failed (details follow):
svn: Checksum mismatch for '/Users/maymay/Sites/path/to/subversion/working/copy/.svn/text-base/working-file.php.svn-base'; expected 'cde4d8fbd5c623da3a8b1a343aa7b3f4', actual: '270b2f20804a5fcdbd15eec5910f6e3f'

Of course, the path/to/subversion/working/copy bit was the path to my working copy file’s parent directory and the working-file.php was an actual file in my working directory.

I think what Subversion was trying to tell me is that its hashed copy of the working-file.php file and the copy I was asking it to commit weren’t the same. It would be nice if it would actually tell me why that happened, but it’s clearly more temperamental than that.

Anyway, to fix this issue (at least for now…?) I simply checked out a new working copy of this directory, examined the .svn/entries file from it and sure enough, found the actual checksums in there, just as Subversion reported expecting. I simply copied those expected checksums into the .svn/entries overwriting the old actual checksums and, voila, Subversion has been fooled. After that, I could commit my changes.

Step by step (because I’m sure someone, somewhere, somehow, will run into this again—if it’s not me that is!), this procedure looked like this:

  1. Copy the “expected” and “actual” checksums Subversion reports to you to a new text file so you can refer to them later. Note which one is the expected and which is the actual checksum.
  2. Go to where the problem is (that is, cd path/to/broken-files-parent-dir/.svn)
  3. Open the entries for editing (for example, vim entries)
  4. Search the file for the actual checksum.
  5. Replace it with the expected checksum. Be careful not to change any other part of the file.
  6. Save the file.
  7. Try to svn commit again.
  8. Lather, rinse, and repeat for any other files Subversion barfs at you about.

I’m sure this is not an elegant or even the recommended solution to this problem. The truth is I never bothered to look up what the recommended solution is, because it seems to me that any code repository that can’t guarantee what I get out of it is the same as what I put into it isn’t a versioning system I really want to trust the “recommended” solution of, anyway.

Also known as: this is another reason why I like git better now.

Quick ‘N’ Dirty Drupal Module SVN Tagging Script

In a (rather beastly) project at work today, I found myself needing to import a significant number of contributed Drupal modules into Subversion vendor branches to prepare for custom development. To do so manually would have been quite the hassle, so after downloading the appropriate tarballs and creating a module_name/current directory under my vendor/drupal/modules vendor branch directory, I concocted this little (relatively untested) script to handle the mass tagging operations I needed to perform.

for i in *; do
    v=`grep 'version = "' "$i/current/$i/"*.info |
      cut -d ':' -f 2 |
        sed -e 's/^version = "/v/' -e 's/"$//'`
    svn cp "$i/current" "$i/$v"

It’s a bit buggy for some modules that have multiple .info files, but I’m sure a few more pipeline stages can fix that. (Which, because I’m done with this at the moment, I will leave as an exercise to the reader.)

Chalk this one up as another testament to the power of shell scripting and how it can help every developer get their job done faster.

A Better Expect Subversion Post-Commit Hook

In a previous post I wrote a small expect script to update a remote web server’s deployed code on a new commit to a Subversion repository using Expect and Subversion’s post-commit hooks. That first script was extraordinarily basic, so I’ve been wanting to add some sanity and error checking to it for a while. I finally got around to it today.

This improved version of the post-commit hook does the same thing as the last one (that is, it logs into your web server over SSH with the given user and password, and yes, I’m aware of the scariness of embedding a password in such a way, so you should really set up SSH to use public keys for authentication for this), except now it also produces useful output.

Here’s the same script as before, but improved:

#!/usr/bin/expect -f

# AUTHOR: Meitar Moscovitz 
# DATE  : Thu Jun 21 16:32:42 EDT 2007

set HOST my.web.server
set USER someuser
set PASS xxx

# the working copy we're going to update
set WC /path/to/working/copy

# the path to the svn executable on the remote web server
set SVNBIN /usr/local/bin/svn

# our network is slow, set a long timeout
set timeout 30


# The post-commit hook is invoked after a commit.  Subversion runs
# this hook by invoking a program (script, executable, binary, etc.)
# named 'post-commit' (this file) with the
# following ordered arguments:
#   [1] REPOS-PATH   (the path to this repository)
#   [2] REV          (the number of the revision just committed)
# Note that Subversion does not provide this program with an environment
# of any kind. That means this program lacks a current working directory,
# a home directory, a $PATH, and so on.

set REPOS [lindex $argv 0]
set REV [lindex $argv 1]

# Define error codes
set E_NO_SSH     1 ;# can't find a usable SSH on our system
set E_NO_CONNECT 2 ;# failure to connect to remote server (timed out)
set E_WRONG_PASS 3 ;# password provided does not work
set E_UNKNOWN   25 ;# unexpected failure

# find the SSH binary on our system
if {[file executable /usr/bin/ssh]} {
	set SSHBIN /usr/bin/ssh
} elseif {[file executable /usr/local/bin/ssh]} {
	set SSHBIN /usr/local/bin/ssh
} else {
	send_error "Can't find a usable SSH on this system.\n"
	exit $E_NO_SSH

spawn $SSHBIN $USER@$HOST $SVNBIN update $WC

expect {
    "continue connecting (yes/no)? " { send "yes\r"; exp_continue; }
    -nocase "password:" { send "$PASS\r"; }
    timeout {
        send_error "\nWe have timed out after $timeout seconds while trying to connect to $HOST!\n";
        exit $E_NO_CONNECT;

expect {
	-nocase "password:" { ;# if we are asked for the password again, then we have provided the wrong password
		send_error "\nCan not log in to $HOST because the password provided for user $USER has been rejected.\n";
		exit $E_WRONG_PASS;
	-re "revision (\[0-9]+)." {
		if {$REV == $expect_out(1,string)} {
			send_user "\nSuccessfully updated $WC on $HOST to revision $REV.\n"
		} else {
			send_user "\nUpdated repository to revision $expect_out(1,string), but svn reports that we are at revision number $REV.\n"
			send_error "CAUTION: Repository updated to revision $expect_out(1,string), but committed revision $REV.\n"
	default {
		send_error "An unexpected error has occured. The process at spawn ID $spawn_id has produced the following output:\n"
		send_error $expect_out(buffer)
		exit $E_UNKNOWN

Use expect with Subversion’s post-commit hook to automatically update remote servers

In one of my web development projects, it became important to keep the staging web server in sync with the latest code that myself and several other developers were working on. There are a number of ways to mirror files and directories across machines, rsync being one of the most widely known. However, in addition to simply mirroring the files across the two servers, we also needed a way to kick off the mirroring process that cleanly integrated with our development workflow. Subversion’s post-commit hook allowed us to do just that.

Still, however, the problem was not exactly straightforward. We needed to execute a svn update command on a server other than the server on which the Subversion repository was being hosted. Shell scripts are the obvious choice for command-line automation in UNIX, but they don’t deal with interactive commands very well. So instead of writing a shell script, I wrote an expect script.

This expect script is really basic. There’s a better one in a future post on this topic.

#!/usr/bin/expect -f

# This expect post-commit hook connects to staging-webserver
# and updates the working copy hosted there to the latest checked-in code.
# This means that whenever code is committed to the repository, the web site hosted
# will always be running the latest version of the code.
# AUTHOR: Meitar Moscovitz

# [...]

set REPOS [lindex $argv 0]
set REV [lindex $argv 1]
set HOST staging-webserver
# Use update-user to log in to staging-webserver
# to update the working copy, but this can probably be improved so as not
# to expose this user's password.
set USER update-user
set PASS update-user's-password

spawn /usr/bin/ssh $USER@$HOST svn update /path/to/web/site/directory
expect "continue connecting (yes/no)? "
send "yes\r"
expect "password: "
send "$PASS\r"
expect eof

There’s one really tricky bit to this script, which is the understanding that when Subversion runs the post-commit hook, no environment information is passed to this script. As a result, there is no home directory or path information set for this executable. This is why everything is defined using absolute paths. Also, because there is no home directory for update-user, the expect script will always be prompted by SSH to re-verify the server’s identity. So rather than just expecting “password: “, we always expect “continue connecting (yes/no)? ” and say yes, and then send our password.

Note, of course, that update-user should be a user with limited access to the system, yet enough so that he may update the working copy on the web server. I’m sure there is probably a more secure way of doing this as well, so any sort of feedback on securing this or scaling it would be welcome.