Dependency hell

From Wikipedia, the free encyclopedia

Dependency hell is a colloquial term for the frustration of some software users who have installed software packages which have dependencies on specific versions of other software packages.

Contents

[edit] Overview

Often, rather than "reinventing the wheel", software is designed to take advantage of other software components that are already available, or have already been designed and implemented for use elsewhere. This could be compared to how people building a house might buy off-the-shelf components, such as bricks, windows, and doors, rather than building everything themselves.

Even for a builder, it can be a problem if a building is designed for a certain door type, and only doors with different specifications are available. However, in the software world, where components evolve rapidly, and components are often dependent on other components, this problem is more pronounced.

The issue of dependency hell may be regarded as an anti-pattern, where the fault lies less with the suppliers of the products than with the framework into which they have to fit.

[edit] Platform-specific

On specific computing platforms, "dependency hell" often goes by a local specific name, generally the name of components.

[edit] Problems

Dependency hell takes several forms:

many dependencies
An application depends on many libraries, requiring lengthy downloads, large amounts of disk space, and not being very portable (all libraries must be ported for the application to be ported). It can also be difficult to track down all the dependencies, which can be fixed by having a repository (see below). This is partly inevitable; an application built on a given platform (such as Java) requires that platform to be installed, but further applications do not require it. This is a particular problem if an application uses a small part of a big library (which can be solved by refactoring), or a simple application relies on many libraries.
Internet access hell
In some Linux distributions, you need to install new packages to configure Internet access, but you need Internet access to download the packages. This is circular dependency hell (a form of a catch 22).
long chains of dependencies
app depends on liba, which depends on libb, ..., which depends on libz. This is distinct from "many dependencies" if the dependencies must be resolved manually (e.g., on attempting to install app, you are prompted to install liba first. On attempting to install liba, you are then prompted to install libb.), otherwise it is equivalent to "many dependencies". This can be solved by having a package manager that resolves all dependencies automatically. Other than being a hassle (to resolve all the dependencies manually), manual resolution can mask dependency cycles or conflicts.
conflicting dependencies
If app1 depends on libfoo 1.2, and app2 depends on libfoo 1.3, and different versions of libfoo cannot be simultaneously installed, then app1 and app2 cannot simultaneously be used (or installed, if the installer checks dependencies). This can be solved by allowing simultaneous installation of different library versions.
circular dependencies
If appX, version 1 depends on app2, which depends on app3, which depends on app4, which depends on the original appX, version 0, then, in systems such as RPM or dpkg, the user must install all packages simultaneously - hence on Linux circular dependencies are often the result of a user misunderstanding the packaging system. On other platforms, however, the packaging system won't be able to resolve itself.

[edit] Solutions

The most obvious (and very common) solution to this problem is to have a standardised numbering system, wherein software uses a specific number for each version (aka major version), and also a subnumber for each revision (aka minor version), e.g.: 10.1, or 5.7. The major version only changes when programs that used that version will no longer be compatible. The minor version might change with even a simple revision that does not prevent other software from working with it. In cases like this, software packages can then simply request a component that has a particular major version, and any minor version (greater than or equal to a particular minor version). As such, they will continue to work, and dependencies will be resolved successfully, even if the minor version changes.

Some package managers can perform smart upgrades, in which interdependent software components are upgraded at the same time, thereby resolving the major number incompatibility issue too.

Many current Linux distributions have also implemented repository-based package management systems to try to solve the dependency problem. These systems are a layer on top of the RPM, dpkg, or other packaging systems that are designed to automatically resolve dependencies by searching in predefined software repositories. Typically these software repositories are FTP sites or websites, directories on the local computer or shared across a network or, much less commonly, directories on removable media such as CDs or DVDs. This eliminates dependency hell for software packaged in those repositories, which are typically maintained by the Linux distribution provider and mirrored worldwide. Although these repositories are often huge it is not possible to have every piece of software in them, so dependency hell can still occur. In all cases, dependency hell is still faced by the repository maintainers. Examples of these systems include Apt, Yum, Urpmi, Portage and others.

Because different pieces of software have different dependencies, it is possible to get into a vicious circle of dependency requirements, or (possibly worse) an ever-expanding tree of requirements, as each new package demands several more be installed. Systems such as Debian's APT can resolve this by presenting the user with a range of solutions, and allowing the user to accept or reject the solutions, as desired. The Haskell Compiler GHC is an example of a circular dependency. To compile it, you need GHC. It can be solved by downloading a binary version of GHC, and compiling the new version of GHC with this binary version.

[edit] Examples

James Donald, in his 2003 paper titled Improved Portability of Shared Libraries[1] argued that dependency hell is worse under Linux than Microsoft Windows. Several Linux distributions have had problems with software not packaged for the distribution when updating libraries, since the application programming interfaces of some Open Source libraries are prone to change between releases.

A modern example of dependency hell on Microsoft Windows, Linux, and Mac OS X is the Gecko Runtime Engine or GRE used by Mozilla projects. Each product released from the Mozilla foundation includes its own version of the complete Gecko Runtime Engine, due to the volatile nature of the programming interfaces used. Thus, if a user installs Thunderbird, Firefox, and Sunbird, there will be three copies of GRE on the machine. These may or may not be compatible, depending on when the GRE source tree was forked. Some external projects like Epiphany depend on specific versions of the Mozilla Suite to use GRE, and break if a different version is installed; while others such as Nvu bring their own copy of GRE. Observe that the duplication of the GRE is actually a work-around to the core problem of dependency hell.

By statically linking Gecko, the Mozilla developers avoid potential dependency hell, at the cost of increased disk usage. Given the fact that harddisk space comes quite cheap these days, static linking in itself is not so bad. Tools such as bash or make that are statically compiled will never complain about a missing shared object when the c library (glibc) is upgraded. Both approaches have advantages and disadvantages.

[edit] See also

  • apt-zip
  • Cygwin
  • Coupling - Forms of dependency among software artifacts
  • Configuration management - Techniques and tools for managing software versions
  • Explicit dependency, implicit dependency and recursive implicit dependency

[edit] References

[edit] External links

Languages