Molecular docking

From Wikipedia, the free encyclopedia

Computational molecular docking is a research technique for predicting whether one molecule will bind to another, usually a protein. Protein-protein, protein-DNA and protein-ligand docking predictions are all performed, though the techniques employed in each area are highly various. Protein-ligand docking is done by modelling the interaction between protein and ligand: if the geometry of the pair is complementary and involves favorable biochemical interactions, the ligand will potentially bind the protein in vitro or in vivo.

Contents

[edit] Applications

A binding interaction may mean that the ligand inhibits the protein's function or acts as an agonist. Docking is most pertinent to the field of drug design—most drugs are small molecules, and using a computational approach allows researchers to quickly screen large databases of potential drugs (e.g., the ZINC database of compounds for virtual screening) against protein targets such as HIV reverse transcriptase. Traditional discovery of drug candidates occurs by chance or through painstaking work in the lab. For example, virtual screening and related combinatorial chemistry techniques are particularly important in searching for new antibiotics as strains of resistant bacteria increasingly appear due to overuse of antibiotics.

[edit] The mechanics of docking

To perform a docking screen, the first requirement is a structure of your protein of interest. Usually the structure has been determined in the lab using a biophysical technique such as x-ray crystallography, or less often, NMR spectroscopy. This protein structure and a database of potential ligands serve as inputs to a docking program. The success of a docking program depends on two components: the search algorithm and the scoring function.

[edit] The search algorithm

The search space consists of all possible orientations and conformations of the protein paired with the ligand. With present computing resources, it is impossible to exhaustively explore the search space—this would involve enumerating all possible distortions of each molecule (molecules are dynamic and exist in an ensemble of conformational states) and all possible rotational and translational orientations of the ligand relative to the protein at a given level of granularity. Most docking programs in use account for a flexible ligand, and several are attempting to model a flexible protein receptor. Each "snapshot" of the pair is referred to as a pose. There are many strategies for sampling the search space. Here are some examples:

  • Use a coarse-grained molecular dynamics simulation to propose energetically reasonable poses
  • Use a "linear combination" of multiple structures determined for the same protein to emulate receptor flexibility
  • Use a genetic algorithm to "evolve" new poses that are successively more and more likely to represent favorable binding interactions

[edit] The scoring function

The scoring function takes a pose as input and returns a number indicating the likelihood that the pose represents a favorable binding interaction.

Most scoring functions are physics-based molecular mechanics force fields that estimate the energy of the pose; a low (negative) energy indicates a stable system and thus a likely binding interaction. An alternative approach is to derive a statistical potential for interactions from a large database of protein-ligand complexes, such as the Protein Data Bank, and evaluate the fit of the pose according to this inferred potential.

There are a lot of structures from X-ray diffraction for complexes between proteins and high affinity ligands, but very few for low affinity ligands as these do not stay bound for long enough to be seen. Scoring functions trained with this data can dock high affinity ligands correctly, but they will also give plausible docked conformations for ligands that really are inactive. This gives a large number of false positive hits, i.e., ligands predicted to bind to the protein that actually don't when placed together in a test tube.

One way to reduce the number of false positives is to recalculate the energy of the top-hit poses using a higher resolution (and therefore slow) technique like Generalized Born or Poisson-Boltzmann methods[1]. However, typically the researcher will screen a database of tens to hundreds of thousands of compounds and test the top 60 or so in vitro, and to identify any true binders is still considered a success.

[edit] See also

[edit] External links

[edit] References

  1.   Feig, et al. (2004). Performance comparison of generalized born and Poisson methods in the calculation of electrostatic solvation energies for protein structures. J Comput Chem, 25(2):265-84.