28 March 2012

Bazaar for Version Control

One of the MATSIQEL RDM requirements is that software must support multiple versions of the research data. Due to data protection and ethical constraints, only some project partners may see all the research data, others may see and update the data, whilst others are denied access to raw (partially anonymised data) but may see the processed data.
These requirements, especially version control caused us to consider version control software as used for software development. Such software (as used in Microsoft's SharePoint) also gives differential user access rights and permissions. However, SharePoint is commercial product, so can not be recommended for unfunded work (i.e. when there is no funding for IT software/hardware).
Since most of the project team are not Computer Scientists software is needed that has the lowest possible barrier to entry, or it will not be used. That is, in addition to satisfying the technical requirements, data management software needs to be:
·         Easy to Use
·         Free
·         Multiplatform (Windows, Mac, Linux)
We examined several open source software source control management products to identify one to complement our case study.
Git is the most popular distributed version control system. Written by Linus Torvalds, it is used to manage development of the Linux kernel. It is (reputedly) fast, and allows free hosting on GitHub for projects that use it. Git however is designed for efficient software development and so saves versions of files as collections of incremental changes on a base file. That is, any particular version is assembled from pieces. This is counter to MATSIQEL requirements where versions of research data arrive externally and are not necessarily increments.
The other alternative evaluated is called Bazaar. Bazaar is version control software 'for everyone'. Sponsored by Canonical and used to develop Ubuntu Linux, Bazaar claims
·         "Version control for everyone
·         Work offline
·         Any workflow
·         Cross platform support
·         Rename tracking and smart merging
·         High storage efficiency and speed
·         Any workspace model
·         Plays well with others"

Bazaar is also a distributed version control system. This avoids central reliance on a single bottleneck and does allow multiple workflow styles. In particular, it is straightforward to set up a web based repository that end users can access freely and appropriately. Bazaar has several graphic clients that integrate well with Windows. Most ordinary users will be able to use bazaar version control and access to centrally stored data with minimum impact on their usual workflow.
Data under Bazaar version control is simply stored (invisibly) in subdirectories. Graphic version histories are readily available. Also, since Bazaar does not exploit proprietary storage mechanism, a bazaar repository may be zipped, archived (e.g. in Sharepoint), and revived intact as needed.
Bazaar may be simply configured on cloud based web servers, which may be set up with the kind of access controls needed, granting differential access rights as needed by the MATSIQEL project. In summary Bazaar is a  multiplatform product that fulfils requirements for research data management in our case study project MATSIQEL since it supports repositories.
Posted on behalf of Jeremy Ellman

No comments:

Post a Comment