Minor Problems During CVS to Subversion Conversion

This past weekend, I converted our legacy CVS repository over to Subversion maintaining the last 5 years worth of history. Overall, I am very pleased with the process and the end result. Over the past few weeks, I had done portions of the conversion into a test repository and worked out most of the kinks.

Here is the process we used:

  • Set the CVS repositories to read only (using the readers / writers configuration files)
  • Export each CVS module using the cvs2svn.py script provided by Subversion
  • Create any directories in Subversion needed to receive the exported CVS data
  • Import in the files created by cvs2svn.py

Most of these steps were scripted and ran unattended with me checking on them occasionally.

As with all plans, a couple of minor issues crept up. The export using cvs2svn.py went smoothly (it started at 7:30 pm Friday and ended sometime around 2:00 am on Saturday). When I examined the output of the scripts, I discovered 3 exports failed. One because I misspelled the name of the CVS module directory. The other 2 because of an error of the form “A CVS repository cannot contain both repo/path/file.txt,v and repo/path/Attic/file.txt,v”. The cvs2svn FAQ discusses this error. After looking at the files in question, I chose option 4 – rename the attic version and was able to re-export.

I also noticed that some of the exports did not result in a file being created. Looking at these, I discovered that while the module directories I was exporting from CVS existed, they were empty. I could safely ignore these.

All of my imports failed due to the directory specified not existing in the repository. I thought I had accounted for this by including the appropriate svn mkdir command before the import. Looking through the logs I discovered that all of the mkdir commands failed due to not having the proper authorization credentials. I was running my scripts in an account where I did not have credentials cached. I added the appropriate --username and --password items and let the script run.

All in all, most of the imports ran successfully and we have had no issues since starting to use the repository. Most of the team is pleased with the results of the conversion and the new tools. All in all, a good experience.


CVS to Subversion: Line Endings and Bad Binaries

We are in the process of converting our CVS repository to subversion. The provided cvs2svn script works really well. As part of the conversion, we did a dry run and have been performing a variety of tests to verify that everything converted correctly. One of the tests we performed was to check out an arbitrary branch from both the CVS and SVN repository and compare the contents.

The comparison did not go as smoothly as I hoped. Quite a few of the files did not match. The investigation resulted in a change we needed to make to the CVS repository and the discovery of a nice feature of subversion.

It ended up that our CVS repository had multiple binary files that were not marked as binary. There was always a chance that these files would be corrupted when checked out of CVS but we never encountered this. (This is not entirely true, we never encountered any corruption issues when building on Unix. However, we had seen issues when building on Windows.) After correcting the keyword expansion setting, the conversion process worked fine.

The second issue had to deal with DOS line endings vs. Unix line endings. It appears that subversion does a better job handling the various line endings and converting them appropriately. While this made the comparing files difficult, the end result is very beneficial.


Search and Replace Across Multiple Files (Perl)

I suggest that testers learn at least one scripting language. The first one that I learned was perl. Yesterday, I was reminded why a little bit of perl can save a lot of time (and eliminate the possibility of human error). Due to some changes in our corporate email system, we needed to change several hundred datafiles from using one email address to a different email address. While there are many to do this (editor macros, shell programming using tools like sed, etc.) I decided some perl executed from the command line would be appropriate.

I started with:

perl -p -i -w -e 's/oldemail\@meesqa.com/newemail\@meesqa.com/g;' *.data

For those unfamiliar with perl, this line can be explained pretty quickly:
perl — executes the perl interpreter
-p — assume loop like structure. Essentially execute all input to the line as if this were inside a while loop.
-i — edit files "in place" – Change the current copy of the files. It is possible to create a backup copy. I didn’t because all of the data files were checked into source code control.
-w — enable warnings
-e (following be some perl code) — The one line program to execute
‘s/searchvalue/replacevalue/g’ — This is the generic form of the program. The g at the end means to apply this expression globally — replacing all instances of searchvalue with replacevalue. In the code above, I needed to escape the @ character so it would be interpreted correctly.
*.data — Finally, what input to pass into the program, in this case all files that match the pattern *.data

After running this in one directory and examining the output, I realized that we had not been consistent with case for the email address domain. While I had fixed all of the instances of @meesqa.com, I missed @MEESQA.COM (or any other permutation of upper and lower case. Easy enough. I made the following minor change:

perl -p -i -w -e 's/oldemail\@meesqa.com/newemail\@meesqa.com/gi;' *.data

The i added after the /g tells perl to ignore case. Rerunning and all of the email addresses in that directory had been changed appropriately.

That fixed one directory of files. However, the rest of the files were scattered over a deep hierarchy of directories. Since I was doing this all on unix, I executed the following line:

find . -name *.data | xargs perl -p -i -w -e 's/oldemail\@meesqa.com/newemail\@meesqa.com/gi;'

The find . -name *.data creates a list of all of the *.data files in the current directory and subdirectories. This is piped to the next command. The xargs command allows this piped list of filenames to be passed in as an argument to the command following xargs — in this case my perl program. Running this was fairly quick. I verified the changes and checked into source while be reminded that a little scripting can go a long way.