Open source software development

From Wikipedia, the free encyclopedia

Open source software development is the process by which open source software (or similar software whose source is publicly available) is developed.

Contents

[edit] Types of open source development

There are several different types of tasks that are generally associated with the development of Open source software. These are:

[edit] Writing Code

This task involves working on the source code of the program - fixing bugs, adding new functionality, refactoring, etc. This task is probably the most prestigious of what falls under the umbrella of open source development.

[edit] Documentation

This task involves documenting open source programs or libraries. It either involves creating a full-coverage reference documentation, writing a how-to, writing tips or tutorials, or other types of documentation.

[edit] Localization and translations

This task involves translating the message emitted by the program or the ones that the user uses in the program's graphical user interface.

It should not be confused with internationalization, in which the not-necessarily localized program is adapted to be able to process text in different (mainly non-English) human languages. Assuming the program is not already internationalized, then internationalizing it usually requires modifications to the code (and so falls under actual programming). This is while translations and localizations can be done without involving much programming.

Translations could also involve the translation of the program's documentation.

[edit] Packaging

Open source software by its nature is often deployed on a large number of operating systems, and distributions. Packaging involves preparing a working source or binary package for the program, so it can be more easily deployed on such systems.

[edit] Bug reports and feature requests

This type of development involves reporting software bugs, or asking for Feature Requests to the developers who then register it somehow, for further resolution.

[edit] Infrastructure

This involves the various tasks of dealing with the project's online or offline infrastructure: managing the project's web-site, download area, bug tracker, version control system, arranging physical meetings of the developers, etc.

[edit] Answering questions

This task involves providing knowledgeable answers to questions raised by the people who are trying to use the open source project. (See also the "How To Ask Questions The Smart Way" document).

[edit] Other types

There may possibly be other types of activities that fall under the umbrella of open source development.

[edit] Types of open source projects

One can distinguish several different types of open source projects. First, there is the garden variety of software programs and libraries. They are standalone pieces of code. Some might even be dependent on other open source projects. These projects serve a specified purpose and fill a definite need. Examples of this type of project include the Linux kernel, the Firefox web-browser and OpenOffice.org office suite of tools.

Distributions are another type of open source project. Distributions are collections of software that are published from the same source with a common purpose. The most prominent example of a "distribution" is an operating system. There are a large number of Linux distributions (such as Debian, Fedora Core, Mandriva, Slackware, etc.) which ship the Linux kernel along with many user-land components. There are also other distributions, like ActivePerl, the Perl programming language for various operating system, and even the OpenCD and cygwin distributions of open-source programs for Microsoft Windows.

Other open source projects, like the BSD derivatives, maintain the source code of an entire operating system, the kernel and all of its core components, in one revision control system; developing the entire system together as a single team. These operating system development projects closely integrate their tools: more so than in the other distribution-based systems.

Finally, there is the book or standalone document project. These items usually do not shipped as part of an open source software package. The Linux Documentation Project hosts many such projects that document various aspects of the GNU/Linux operating system. There are many other examples of this type of open source project.

[edit] Starting an open source project

There are several ways in which work on an open source project can start:

  1. An individual who senses the need for a project announces the intent to develop the project in public. The individual may receive offers of help from others. The group may then proceed to work on the code.
  2. A developer working on a limited but working codebase, releases it to the public as the first version of an open-source program. The developer continues to work on improving it, and possibly is joined by other developers.
  3. The source code of a mature project is released to the public, after being developed as proprietary software or inhouse software.
  4. A well-established open-source project can be forked by an interested outside party. Several developers can then start a new project, whose source code then diverges from the original.

Eric Raymond observed in his famous essay "The Cathedral and the Bazaar" that announcing the intent for a project is usually inferior to releasing a working project to the public.

It's a common mistake to start an own project when contributing to an existing similar project would be more effective (NIH syndrome). To start a successful project it is very important to investigate what's already there.

[edit] Participants in OSS development projects

Participants in OSS development projects fall into two broad categories: the Core and the Peripheral.

The Core or Inner Circle are developers who modify the primary code that constitutes the project.

The Peripheral usually consists of users of the software. They report bugs, submit fixes, and suggest changes.

The participants can be divided into the following:

  1. Project leaders who have the overall responsibility (Core). Most of them might have been involved in coding the first release of the software. They control the overall direction of individual projects.
  2. Volunteer developers (Core / Periphery) who do actual coding for the project. These include:
    • Senior members with broader overall authority
    • Peripheral developers producing and submitting code fixes
    • Occasional contributors
    • Maintainers who work on different aspects of the project
  3. Everyday users (Periphery) who perform testing, identify bugs, deliver bug reports, etc.
  4. Posters (Periphery) who participate frequently in newsgroups and discussions, but do not do any coding.

Projects often exhibit an early geographical trend, even if there is international interest. For example, most of the core founders of the KDE Desktop Environment were German.

[edit] Tools used for open source development

[edit] Communication channels

Developers and users of an open source project are not all necessarily working on the project in proximity. They require some electronic means of communications.

[edit] E-mail

E-mail is one of the most common forms of communication among open source developers and users. Often, electronic mailing lists are used to make sure e-mail messages are delivered to all interested parties at once. This ensures that at least one of the members can reply to it (in private or to the whole mailing list).

A small project may have only one mailing list, but as it grows it often spawns several, each for a different purpose. Common mailing lists purposes include:

  • Announcements - a small-volume mailing lists dedicated for project announcements, and usually with a restricted or moderated who-can-post policy.
  • Commits - a mailing list in which all the check-ins to the revision control system are sent for verification by the peer developers.
  • Development - a mailing list dedicated to discussing the development of the code itself, as opposed to making use of the product.
  • User - a mailing list dedicated to helping users of the product with their problems.

[edit] Instant messaging

In order to communicate in real time, many projects use an instant messaging method such as IRC (although there are many others available). IRC is especially suitable because the project can set up one or more IRC channels for discussions among its participants as well as for users to get help. The Freenode IRC network has been especially popular for hosting channels for open source projects. There has been a lot of activity on other networks, some of which are also dedicated to open-source projects. Sometimes a project will use communication channels on more than one network.

Developers communicate using other instant messaging protocols, but IRC seems to be preferred. Many developers like the ease and transparency of IRC's multi-person chatrooms.

[edit] Web forums

Web forums have recently become a common way for users to get help with problems they encounter when using an open source product. To a lesser extent, they have been useful as ways for developers to communicate regarding the development of the core code, but most hardcore and experienced developers still tend to prefer e-mails over web forums.

[edit] Wikis

Wikis have become common as a communication medium for developers and users. They are used to collaboratively edit documents and keep track of other resources. Since the web was a somewhat late introduction to the open source development scene, and wikis even more so, the concept is still not as common as it could potentially become. Wikis often pose problems as a communication channel, because it is harder to have an electronic dialog using them. They are often dedicated as a resource for having easy-to-modify collaborative documents.

[edit] Software engineering tools

[edit] Version control systems

Copied from Open source software

Main article: Revision control

In OSS development the participants, who are mostly volunteers, are distributed amongst different geographic regions so there is need for tools to aid participants to collaborate in the development of source code.

Concurrent Versions System (CVS) is a prominent example of a source code collaboration tool being used in OSS projects. CVS helps manage the files and codes of a project when several people are working on the project at the same time. CVS allows several people to work on the same file at the same time. This is done by moving the file into the users’ directories and then merging the files when the users are done. CVS also enables one to easily retrieve a previous version of a file.

The Subversion revision control system (svn) was created to replace CVS. It is quickly gaining ground as an OSS project version control system.

There are many other version control systems.

[edit] Bug trackers and task lists

Most large-scale projects require a bug tracker (usually web or otherwise Internet based) to keep track of the status of various issues in the development of the project. A simple text file is not sufficient, because they have many such bugs, and because they wish to facilitate reporting and maintenance of bugs by users and secondary developers. Some popular bug trackers include:

  • Bugzilla - a sophisticated bug tracker from the Mozilla house. Web-based.
  • Mantis - a web-based PHP/MySQL bug tracker.
  • Trac - integrating a bug tracker with a wiki, and an interface to the Subversion version control system.
  • Request tracker - written in Perl. Given as a default to CPAN modules - see rt.cpan.org.
  • GNATS - The GNU Bugtracking system.
  • SourceForge and its forks provide a bug tracker as part of its services. As a result many projects hosted at SourceForge.net and similar services default to using it.

[edit] Build tools

[edit] Other tools

[edit] Web sites

[edit] Download areas

[edit] Common development methodologies

[edit] Refactoring, Rewrites and Other Revamps

Often Open source developers feel that their code requires a revamp. This can be either because the code was written or maintained without proper Refactoring (as is often the case if the code was inherited from a previous developer), or because a proposed enhancement or extension of it cannot be cleanly implemented with the existing codebase. A final reason for wishing to revamp the code is that the code "smells bad" (to quote Martin Fowler's Refactoring book) and does not meet the developer's standards.

There are several kinds of revamps:

  1. "Partial Rewrites" implies that the code is moved from one place to another, methods, functions or classes are extracted, duplicate code is eliminated and so forth - all while maintaining an integrity of the code. Such refactoring can be done in small amounts (so-called "continuous refactoring") to justify a certain change, or one can decide on large amounts of refactoring to an existing code that last for several days or weeks.
  2. "Partial rewrites" involve rewriting a certain part of the code from scratch, while keeping the rest of the code. Such partial rewrites have been common in the Linux kernel development, where several subsystems were rewritten or re-implemented from scratch, while keeping the rest of the code intact.
  3. Complete Rewrites involve starting the project from scratch, while possibly still making use of some old code. A good example of a complete rewrite was the Subversion version control system, whose developers started from scratch: they believed the codebase of CVS (an older attempt at creating a version control system), was useless and needed to be completely scrapped. Another good example of such a rewrite was the Apache web server, which was almost completely re-written between version 1.3.x and version 2.0.x.

Joel Spolsky's essays "Things you should Never do, Part I" and "Rub a Dub Dub" gave some arguments against complete or even partial rewrites in the context of a commercial software. This did not completely eliminate them from the open source world, but has made some people more conscious of their inherent problems and risks.

[edit] Automated tests

Software testing is an integral part of open source development. While many open source packages were known to be released with some glaring bugs even in some stable releases, most open source software eventually becomes very stable.

Traditionally, in most of the open source there was a general lack of awareness for automated tests, in which on writes automated test scripts and programs that run the software and try to find out if it behaves correctly. Recently, however, this awareness has been growing, possibly because of influence from Extreme Programming, and because of some high-profile software packages that incorporated such test suites.

Most open source software is either command line or alternatively APIs and as such is very easy to test automatically.

[edit] Publicizing a project

[edit] Software directories and release logs

Freshmeat, directory.fsf.org, etc.

[edit] Articles

O'Reilly Net, Linux Weekly News, IBM developerworks, etc.

[edit] Mailing lists

[edit] External links