Revision control

From Wikipedia, the free encyclopedia

Revision control (also known as version control, source control or (source) code management (SCM)) is the management of multiple revisions of the same unit of information. It is most commonly used in engineering and software development to manage ongoing development of digital documents like application source code, art resources such as blueprints or electronic models and other critical information that may be worked on by a team of people. Changes to these documents are identified by incrementing an associated number or letter code, termed the "revision number", "revision level", or simply "revision" and associated historically with the person making the change. A simple form of revision control, for example, has the initial issue of a drawing assigned the revision number "1". When the first change is made, the revision number is incremented to "2" and so on.

Software tools for revision control are increasingly recognized as being necessary for almost all software development projects.[citation needed]

Contents

[edit] Overview

Engineering revision control developed from formalized processes based on tracking revisions of early blueprints or bluelines. Implicit in this control was the ability to return to any earlier state of the design, for cases in which an engineering dead-end was reached in the development of the design. Likewise, in computer software engineering, revision control is any practice that tracks and provides control over changes to source code. Software developers sometimes use revision control software to maintain documentation and configuration files as well as source code. In theory, revision control can be applied to any type of information record. However, in practice, the more sophisticated techniques and tools for revision control have rarely been used outside of software development circles (though they could actually be of benefit in many other areas). They are beginning to be used for the electronic tracking of changes to CAD files, supplanting the "manual" electronic implementation of traditional revision control.

As software is developed and deployed, it is extremely common for multiple versions of the same software to be deployed in different sites, and for the software's developers to be working simultaneously on updates. Bugs and other issues with software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the program develops). Therefore, for the purposes of locating and fixing bugs, it is vitally important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (for instance, where one version has bugs fixed, but no new features, while the other version is where new features are worked on).

At the simplest level, developers could simply retain multiple copies of the different versions of the program, and number them appropriately. This simple approach has been used on many large software projects. While this method can work, it is inefficient as many near-identical copies of the program have to be maintained. This requires a lot of self-discipline on the part of developers, and often leads to mistakes. Consequently, systems to automate some or all of the revision control process have been developed.

[edit] Compression

Most revision control software uses delta compression, which retains only the differences between successive versions of files. This allows more efficient storage of many different versions of files.

[edit] Storage models

In most software development projects, multiple developers work on the program at the same time. If two developers try to change the same file at the same time, without some method of managing access the developers may easily end up overwriting each other's work. Revision control systems solve this problem in one of 3 different "storage models": file locking, version merging, and distributed version control.

Traditionally, revision control systems have used a centralized model, where all the revision control functions are performed on a shared server.

The merits and drawbacks of file locking are hotly debated. It can provide some protection against difficult merge conflicts when a user is making radical changes to many sections of a large file (or group of files). But if the files are left exclusively locked for too long, other developers can be tempted to simply bypass the revision control software and change the files locally anyway. That can lead to more serious problems.

Some systems attempt to manage who is allowed to make changes to different aspects of the program, for instance, allowing changes to a file to be checked by a designated reviewer before being added.

[edit] File locking

The simplest method of preventing "concurrent access" problems is to lock files so that only one developer at a time has write access to the central "repository" copies of those files. Once one developer "checks out" a file, others can read that file, but no one else is allowed to change that file until that developer "checks in" the updated version (or cancels the checkout).

[edit] Version merging

Most version control systems, such as CVS, allow multiple developers to be editing the same file at the same time. The first developer to "check in" changes to the central repository always succeeds. The system provides facilities to merge changes into the central repository, so the improvements from the first developer are preserved when the other programmers check in.

The concept of a "reserved edit" can provide an optional means to explicitly lock a file for exclusive write access, even though a merging capability exists.

[edit] Distributed version control

Distributed systems inherently allow multiple simultaneous editing. In a distributed revision control model, there is no such thing as checking in or out. Instead, every programmer has a working copy that includes the complete repository. All changes are distributed by merging (pushing/pulling) between repositories. This mode of operation allows developers to work without a network connection, and it also allows developers full revision control capabilities without requiring permissions to be granted by a central authority. One of the leading proponents of distributed revision control is Linus Torvalds, the main developer of the Linux kernel. He made the GIT distributed version control now being used by the Linux kernel developers.

Distributed version control systems include TeamWare, BitKeeper, Wandisco and Bazaar. Other distributed revision control systems are listed in a comparison of revision control software.

[edit] Integration

Some of the more advanced revision control tools offer many other facilities, allowing deeper integration with other tools and software engineering processes. Plugins are often available for IDEs such as IntelliJ IDEA, Eclipse and Visual Studio. NetBeans IDE comes with integrated version control support. AccuRev is one vendor example with its AccuBridge technology.

[edit] Common vocabulary

[1][2]

Repository 
The repository is where the current and historical file data is stored, often on a server. Sometimes also called a depot (e.g. with SVK, AccuRev and Perforce).
Working copy
The working copy is the local copy of files from a repository, at a specific time or revision. All work done to the files in a repository is initially done on a working copy, hence the name. Conceptually, it is a sandbox.
Check-out 
A check-out (or checkout or co) creates a local working copy from the repository. Either a specific revision is specified, or the latest is obtained.
Commit 
A commit (check-in, ci or, more rarely, install or submit) occurs when a copy of the changes made to the working copy is written or merged into the repository.
Change 
A change (or diff, or delta) represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.
Change list 
On many version control systems with atomic multi-change commits, a changelist, change set, or patch identifies the set of changes made in a single commit. This can also represent a sequential view of the source code, allowing source to be examined as of any particular changelist ID.
Update 
An update (or sync) merges changes that have been made in the repository (e.g. by other people) into the local working copy.
Branch 
A set of files under version control may be branched or forked at a point in time so that, from that time forward, two copies of those files may be developed at different speeds or in different ways independently of the other.
Merge 
A merge or integration brings together two sets of changes to a file or set of files into a unified revision of that file or files.
  • This may happen when one user, working on those files, updates their working copy with changes made, and checked into the repository, by other users. Conversely, this same process may happen in the repository when a user tries to check-in their changes.
  • It may happen after a set of files has been branched, then a problem that existed before the branching is fixed in one branch and this fix needs merging into the other.
  • It may happen after files have been branched, developed independently for a while and then are required to be merged back into a single unified trunk.
Dynamic stream 
A stream (a data structure that implements a configuration of the elements in a particular repository) whose configuration changes over time, with new versions promoted from child workspaces and/or from other dynamic streams. It also inherits versions from its parent stream.
Revision 
A revision or version is one version in a chain of changes.
Tag 
A tag or release refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number.
Import 
An import is the action of copying a local directory tree (that is not currently a working copy) into the repository for the first time.
Export 
An export is similar to a check-out except that it creates a clean directory tree without the version control metadata used in a working copy. Often used prior to publishing the contents.
Conflict 
A conflict occurs when two changes are made by different parties to the same document or place within a document. When the software is not intelligent enough to decide which change is 'correct', a user is required to resolve such a conflict.
Resolve 
The act of user intervention to address a conflict between different changes to the same document.
Baseline 
An approved revision of a document or source file from which subsequent changes can be made.

[edit] References

  1. ^ Collins-Sussman, Ben; Fitzpatrick, B.W. and Pilato, C.M. (2004). Version Control with Subversion. O'Reilly. ISBN 0-596-00448-6. 
  2. ^ Wingerd, Laura (2005). Practical Perforce. O'Reilly. ISBN 0-596-10185-6. 

[edit] See also

[edit] External links