Molecular dynamics

From Wikipedia, the free encyclopedia

Molecular dynamics (MD) is a form of computer simulation where atoms and molecules are allowed to interact for a period of time under known laws of physics. Because in general molecular systems consist of a large number of particles, it is impossible to find the properties of such complex systems analytically. MD simulation circumvents this problem by using numerical methods. It represents an interface between laboratory experiments and theory and can be understood as a virtual experiment.

Molecular dynamics is a multidisciplinary field. Its laws and theories stem from mathematics, physics and chemistry. MD employs algorithms from computer science and information theory. It was originally conceived within theoretical physics in the 1950's, but it's mostly applied today in materials science and biomolecules.

Even though we know matter consists of interacting particles in motion at least since Boltzmann in the 19th Century, many still think of molecules as rigid museum models. Richard Feynman said in 1963 that "everything that living things do can be understood in terms of the jiggling and wiggling of atoms." [1] One of MD's key contributions is creating awareness that molecules like proteins and DNA are machines in motion. MD probes the relationship between molecular structure, movement and function.

Before it became possible to simulate molecular dynamics with computers, some undertook the hard work of trying it with physical models such as macroscopic spheres. The idea was to arrange them to replicate the properties of a liquid. Here's a quote from J.D. Bernal from 1962: "... I took a number of rubber balls and stuck them together with rods of a selection of different lengths ranging from 2.75 to 4 inch. I tried to do this in the first place as casually as possible, working in my own office, being interrupted every five minutes or so and not remembering what I had done before the interruption." [2] Fortunately, now computers keep track of bonds during a simulation.

MD has also been termed "statistical mechanics by numbers" and "Laplace's vision of Newtonian mechanics" of predicting the future by animating nature's forces. It is tempting to describe it as a virtual microscope. However, long MD simulations are mathematically ill conditioned. This generates cumulative numerical errors. This fact alone should dispel any illusions that the method acts like a molecular microscope that allows us to look at the actual trajectories a molecule would follow in time. Nevertheless, molecular dynamics techniques allow detailed time and space resolution into representative behavior in phase space.

More formally, MD is a special discipline of molecular modelling and Computer simulation. Based on molecular mechanics, it addresses numerical solutions of Newton's equations of motion i.e. Hamiltonian mechanics on an atomistic or similar model of a molecular system to obtain information about its equilibrium and dynamic properties. The main justification of the MD method is that statistical ensemble averages are equal to time averages of the system. This is called the Ergodic hypothesis.

Contents

[edit] Areas of Application

There is a significant difference between the focus and methods used by chemists and physicists, and this is reflected in differences in the jargon used by the different fields.

Beginning in theoretical physics, the method of MD gained popularity in materials science and since the 1970s also in biochemistry and biophysics. In chemistry, MD serves as an important tool in protein structure determination and refinement using experimental tools such as X-ray crystallography and NMR. It has also been applied with limited success as a method of refining protein structure predictions. In physics, MD is used to examine the dynamics of atomic-level phenomena that cannot be observed directly, such as thin film growth and ion-subplantation. It is also used to examine the physical properties of nanotechnological devices that have not or cannot yet be created.

In chemistry and biophysics, the interaction between the objects is either described by a force field (classical MD), a quantum chemical model, or a mix between the two. These terms are not used in physics, where the interactions are usually described by the name of the theory or approximation being used.

In applied mathematics and theoretical physics, molecular dynamics is a part of the research realm of dynamical systems, ergodic theory and statistical mechanics in general. Some techniques to calculate conformational entropy such as principal components analysis come from information theory. Mathematical techniques such as the transfer operator become applicable when MD is seen as a Markov chain. Also, there is a large community of mathematicians working on volume preserving, symplectic integrators for more computationally efficient MD simulations.

MD can also be seen as a special case of the discrete element method (DEM) in which the particles have spherical shape (e.g. with the size of their van der Waals radii.) Some authors in the DEM community employ the term MD rather loosely, even when their simulations do not model actual molecules.

[edit] Design Constraints

Design of a molecular dynamics simulation should account for the available computational power. Simulation size (n=number of particles), timestep and total time duration must be selected so that the calculation can finish within a reasonable time period. However, the simulations should be long enough to be relevant to the time scales of the natural processes being studied. Most scientific publications about the dynamics of proteins and DNA use data from simulations spanning nanoseconds (1E-9 s) to microseconds (1E-6 s). To obtain these simulations, several CPU-days to CPU-years are needed. Parallel algorithms allow the load to be distributed among CPUs.

During a classical MD simulation, the most CPU intensive task is the evaluation of the potential (force field) as a function of the particles' internal coordinates. Within that energy evaluation, the most expensive one is the non-bonded or non-covalent part. In Big O notation, common molecular dynamics simulations scale by O(n2) if all pair-wise electrostatic and van der Waals interactions must be accounted for explicitly. This computational cost can be reduced by employing electrostatics methods such as Particle Mesh Ewald ( O(nlog(n)) ) or good spherical cutoff techniques ( O(n) ).

Another factor that impacts total CPU time required by a simulation is the size of the integration timestep. This is the time length between evaluations of the potential. The timestep must be chosen small enough to avoid discretization errors (i.e. smaller than the fastest vibrational frequency in the system). Typical timesteps for classical MD are in the order of 1 femtosecond (1E-15 s). This value may be extended by using algorithms such as SHAKE and LINCS, which fix the fastest atoms (e.g. hydrogens) into place. Variable timestep methods have also been developed.

A choice should be made between explicit solvent and implicit solvent. Explicit solvent particles (such as water types TIP3P and SPC/E) must be calculated expensively by the force field, while implicit solvents use a mean-field approach. The impact of explicit solvents on CPU-time can be 10-fold or more. But the granularity and viscosity of explicit solvent is essential to reproduce certain properties of the solute molecules.

In simulations with explicit solvent molecules, the simulation box size must be large enough to avoid boundary condition artifacts. Boundary conditions are often treated by choosing fixed values at the edges, or by employing periodic boundary conditions in which one side of the simulation loops back to the opposite side, mimicking a bulk phase.

[edit] Physical Principles

[edit] Microcanonical ensemble (NVE)

In the microcanonical or NVE ensemble, the system is isolated from changes in moles (N), volume (V) and energy (E). It corresponds to an adiabatic process with no heat exchange. A microcanonical molecular dynamics trajectory may be seen as an exchange of potential and kinetic energy, with total energy being conserved. For a system of N particles with coordinates X and velocities V, the following pair of first order differential equations may be written in Newton's notation as

F(X) = -\nabla U(X)=M\dot{V}(t)
V(t) = \dot{X} (t)

The potential energy function U(X) of the system is a function of the particle coordinates X. It is referred to simply as the potential in Physics, or the force field in Chemistry. The first equation comes from Newton's laws; the force F acting on each particle in the system can be calculated as the negative gradient of U(X).

For every timestep, each particle's position X and velocity V may be integrated with a symplectic method such as Verlet. The time evolution of X and V is called a trajectory. Given the initial positions (e.g. from theoretical knowledge) and velocities (e.g. randomized Gaussian), we can calculate all future (or past) positions and velocities.

One frequent source of confusion is the meaning of temperature in MD. Commonly we have experience with macroscopic temperatures, which involve a huge number of particles. But temperature is a statistical quantity. If there is a large enough number of atoms, statistical temperature can be estimated from the instantaneous temperature, which is found by equating the kinetic energy of the system to nkBT/2 where n is the number of degrees of freedom of the system.

A temperature-related phenomenon arises due to the small number of atoms that are used in MD simulations. For example, consider simulating the growth of a copper film starting with a substrate containing 500 atoms and a deposition energy of 100 eV. In the real world, the 100 eV from the deposited atom would rapidly be transported through and shared among a large number of atoms (1010 or more) with no big change in temperature. When there are only 500 atoms, however, the substrate is almost immediately vaporized by the deposition. Something similar happens in biophysical simulations. The temperature of the system in NVE is naturally raised when macromolecules such as proteins undergo exothermic conformational changes and binding.

[edit] Canonical ensemble (NVT)

In the canonical ensemble, moles (N), volume (V) and temperature (T) are conserved. It is also sometimes called constant temperature molecular dynamics (CTMD). In NVT, the energy of endothermic and exothermic processes is exchanged with a thermostat.

A variety of thermostat methods are required to add and remove energy from the boundaries of an MD system in a realistic way, approximating the canonical ensemble. Popular techniques to control temperature include the Nosé-Hoover thermostat and Langevin dynamics.

[edit] Isothermal-Isobaric (NPT) ensemble

In the isothermal-isobaric ensemble, moles (N), pressure (P) and temperature (T) are conserved. In addition to a thermostat, a barostat is needed. It corresponds most closely to laboratory conditions with a flask open to ambient temperature and pressure.

In the simulation of biological membranes, isotropic pressure control is not appropriate. For lipid bilayers, pressure control occurs under constant membrane area (NPAT) or constant surface tension "gamma" (NPγT).

[edit] Generalized ensembles

The replica exchange method is a generalized ensemble. It was originally created to deal with the slow dynamics of disordered spin systems. It is also called parallel tempering. The replica exchange MD (REMD) formulation [3] tries to overcome the multiple-minima problem by exchanging the temperature of non-interacting replicas of the system running at several temperatures.

[edit] Types of MD Systems

Systems studied via MD are typically divided into the following categories:

[edit] Empirical Potentials

Empirical potentials either ignore quantum mechanical effects, or attempt to capture quantum effects in a limited way through entirely empirical equations. Parameters in the potential are fitted against known physical properties of the atoms being simulated, such as elastic constants and lattice parameters.

Empirical potentials can further be subcategorized into pair potentials, and many-body potentials.

For pair potentials, the total potential energy of a system can be calculated from the sum of energy contributions from pairs of atoms. One example of a pair potential is the Lennard-Jones potential (also known as the 6-12 potential).

U(r) = 4\varepsilon \left[ \left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6} \right]
U_{ij}(r_{ij}) = \sum \frac {z_i z_j}{4 \pi \epsilon_0} \frac {1}{r_{ij}} + \sum A_l exp \frac {-r_{ij}}{p_l} + \sum C _l r_{ij}^{-n_j} + ...

Another example is the Born (ionic) model of the ionic lattice. The first term in the above equation is coulomb's law for a pair of ions, the second term is the short-range repulsion explained by Pauli's exclusion principle and the final term is the dispersion interaction term. Usually in a simulation only the dipolar (sometimes quadrupolar) term is included.


In many-body potentials, the potential energy cannot be found by a sum over pairs of atoms. For example, the Tersoff Potential, which is used to simulate Silicon and Germanium, involves a sum over groups of three atoms, with the angles between the atoms being an important factor in the potential. Another example is the Tight-Binding Second Moment Approximation (TBSMA), where the electron density of states in the region of an atom is calculated from a sum of contributions from surrounding atoms, and the potential energy contribution is then a function of this sum.

[edit] Semi-Empirical Potentials

Semi-empirical potentials make use of the matrix representation from quantum mechanics. However, the values of the matrix elements are found through empirical formulae that estimate the degree of overlap of specific atomic orbitals. The matrix is then diagonalized to determine the occupancy of the different atomic orbitals, and empirical formulae are used once again to determine the energy contributions of the orbitals.

There are a wide variety of semi-empirical potentials, known as tight-binding potentials, which vary according to the atoms being modeled.

[edit] Ab-initio (First Principles) Methods

Ab-initio (first principles) methods use full quantum-mechanical formula to calculate the potential energy of a system of atoms or molecules. This calculation is usually made "locally", i.e., for nuclei in the close neighborhood of the reaction coordinate. Although various approximations may be used, these are based on theoretical considerations, not on empirical fitting. Ab-Initio produce a large amount of information that is not available from the empirical methods, such as density of states information.

Perhaps the most famous ab-initio package is the Car-Parrinello Molecular Dynamics (CPMD) package based on the density functional theory.

[edit] Coarse-Graining: "Reduced Representations"

MD simulations on very large systems may require such large computer resources that they cannot easily be studied by traditional all-atom methods. Similarly, simulations of processes on long timescales (beyond about 1 microsecond) are prohibitively expensive, because they require so many timesteps. In these cases, one can sometimes tackle the problem by using reduced representations, which are also called coarse-grained models. Instead of explicitly representing every atom of the system, one uses pseudoatoms to represent groups of atoms.

The earliest MD simulations on proteins, lipids and nucleic acids generally used "united atoms", which represents the lowest level of coarse graining. For example, instead of treating all four atoms of a methyl group explicitly (or all three atoms of a methylene group), one represents the whole group with a single pseudoatom. This pseudoatom must, of course, be properly parameterized so that its van der Waals interactions with other groups have the proper distance-dependence. Similar considerations apply to the bonds, angles, and torsions in which the pseudoatom participates. In this kind of united atom representation, one typically eliminates all explicit hydrogen atoms except those that have the capability to participate in hydrogen bonds ("polar hydrogens"). The polar hydrogens are usually retained in the model, because proper treatment of hydrogen bonds requires a reasonably accurate description of the geometry and the electrostatic interactions between the donor and acceptor groups. A hydroxyl group, for example, can be both a hydrogen bond donor and a hydrogen bond acceptor, and it would be impossible to treat this with a single OH pseudoatom. Note that about half the atoms in a protein or nucleic acid are nonpolar hydrogens, so the use of united atoms can provide a substantial savings in computer time.

Similar united atom approximations have been used in MD simulations on biological membranes. In these, the aliphatic tails of lipids are represented by a few pseudoatoms by gathering 2-4 methylene groups into each pseudoatom.

When coarse-graining is done at higher levels, the accuracy of the dynamic description may be less reliable. But very coarse-grained models have been used successfully to examine a wide range of questions in structural biology. Following are just a few examples.

  • protein folding studies are often carried out using a single (or a few) pseudoatoms per amino acid;
  • DNA supercoiling has been investigated using 1-3 pseudoatoms per basepair, and at even lower resolution;
  • Packaging of double-helical DNA into bacteriophage has been investigated with models where one pseudoatom represents one turn (about 10 basepairs) of the double helix;
  • RNA structure in the ribosome and other large systems has been modeled with one pseudoatom per nucleotide.

The parameterization of these very coarse-grained models must be done empirically, by matching the behavior of the model to appropriate experimental data.

[edit] Examples of applications

Molecular dynamics is used in many fields of science.

  • First macromolecular MD simulation published (1977, Size: 500 atoms, Simulation Time: 9.2 ps=0.0092 ns, Program: CHARMM precursor) Protein: Bovine Pancreatic Trypsine Inhibitor. This is one of the best studied proteins in terms of folding and kinetics. Its simulation published in Nature magazine paved the way for understanding protein motion as essential in function and not just accessory. [4]

The following two biophysical examples are not run-of-the-mill MD simulations. They illustrate almost heroic efforts to produce simulations of a system of very large size (a complete virus) and very long simulation times (500 microseconds):

  • MD simulation of the complete satellite tobacco mosaic virus (STMV) (2006, Size: 1 million atoms, Simulation time: 50 ns, program: NAMD) This virus is a small, icosahedral plant virus which worsens the symptoms of infection by Tobacco Mosaic Virus (TMV). Molecular dynamics simulations were used to probe the mechanisms of viral assembly. The entire STMV particle consists of 60 identical copies of a single protein that make up the viral capsid (coating), and a 1063 nucleotide single stranded RNA genome. One key finding is that the capsid is very unstable when there is no RNA inside. The simulation would take a single 2006 desktop computer around 35 years to complete. It was thus done in many processors in parallel with continuous communication between them. [5]
  • Folding Simulations of the Villin Headpiece in All-Atom Detail (2006, Size: 20,000 atoms; Simulation time: 500 µs = 500,000 ns, Program: folding@home) This simulation was run in 200,000 CPU's of participating personal computers around the world. These computers had the folding@home program installed, a large-scale distributed computing effort coordinated by Vijay Pande at Stanford University. The kinetic properties of the Villin Headpiece protein were probed by using many independent, short trajectories run by CPU's without continuous real-time communication. One technique employed was the Pfold value analysis, which measures the probability of folding before unfolding of a specific starting conformation. Pfold gives information about transition state structures and an ordering of conformations along the folding pathway. Each trajectory in a Pfold calculation can be relatively short, but many independent trajectories are needed. [6]

[edit] Major software for MD simulations

[edit] Related software

  • VMD - MD simulation trajectories can be loaded and visualized
  • PyMol - Molecular Modelling software written in python

[edit] See also

[edit] References

  • M. P. Allen, D. J. Tildesley (1989) Computer simulation of liquids. Oxford University Press. ISBN 0-19-855645-4.
  • J. A. McCammon, S. C. Harvey (1987) Dynamics of Proteins and Nucleic Acids. Cambridge University Press. ISBN 0-521-35652-0 (paperback); ISBN 0-52-130750(hardback).
  • D. C. Rapaport (1996) The Art of Molecular Dynamics Simulation. ISBN 0-521-44561-2.
  • Daan Frenkel, Berend Smit (2001) Understanding Molecular Simulation. Academic Press. ISBN 0-12-267351-4.
  • J. M. Haile (2001) Molecular Dynamics Simulation: Elementary Methods. ISBN 0-471-18439-X
  • Oren M. Becker, Alexander D. Mackerell Jr, Benoît Roux, Masakatsu Watanabe (2001) Computational Biochemistry and Biophysics. Marcel Dekker. ISBN 0-8247-0455-X.
  • Tamar Schlick (2002) Molecular Modeling and Simulation. Springer. ISBN 0-387-95404-X.
  1. ^ Feynman, Richard (1963). "Lectures on Physics" 1: 3-6.
  2. ^ Bernal, J.D. (1964). "The Bakerian lecture, 1962: The structure of liquids". Proc. R. Soc. 280: 299-322.
  3. ^ Sugita, Yuji, Yuko Okamoto (1999). "Replica-exchange molecular dynamics method for protein folding". Chem Phys Letters 314: 141-151.
  4. ^ McCammon, J, JB Gelin, M Karplus (1977). "Dynamics of folded proteins". Nature 267: 585-590.
  5. ^ Molecular dynamics simulation of the Satellite Tobacco Mosaic Virus (STMV) Peter Freddolino, Anton Arkhipov, Steven B. Larson, Alexander McPherson, Klaus Schulten. Theoretical and Computational Biophysics Group, University of Illinois at Urbana Champaign
  6. ^ The Folding@Home Project and recent papers published using trajectories from it. Vijay Pande Group. Stanford University

[edit] External links