Biological network inference

From Wikipedia, the free encyclopedia

Many types of biological networks exist. Few such networks are known in anything approaching their complete structure, even in the simplest bacteria. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a eukaryotic cell or bacterial organism at a given point in the future. Systems biology, in this sense, is still in its infancy. Prediction is the subject of dynamic modeling. This article focuses on a necessary prerequisite to dynamic modeling of a network: inference of the topology, that is, prediction of the "wiring diagram" of the network. More specifically, we focus here on inference of biological network structure using the growing sets of high-throughput expression data for genes, proteins, and metabolites.

Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence [1]. Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such algorithms work. Such algorithms can be of use in inferring the topology of any network where the change in state of one node can affect the state of other nodes.

[edit] Computational inference methods

In a topological sense, a network is a set of nodes and a set of directed or undirected edges between the nodes. Biological networks currently under study using such computational inference methods include:

1) Transcriptional regulatory networks. Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an RNA or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms used to infer the topology take as primary input the data from a set of microarray runs measuring the mRNA expression levels of the genes under consideration for inclusion in the network.

As of 2007, the great bulk of high-throughput data being fed into correlation-based algorithms comes from microarray experiments, and such analysis is the most fruitful point of biological application for such algorithms. (This is reflected in the reference list at bottom, where almost all bioinformatic algorithm references are directed toward use of microarray data.) Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments. The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of cancer, or to predict differential responses to a drug (pharmacogenomics). But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network. This can be done by using background literature, or information in public databases, combined with the clustering results. It can also be done by the application of a correlation-based inference algorithm, as will be discussed below, an approach which is having increased success as the size of the available microarray sets keeps increasing [2][3]

2) Signal transduction networks (very important in the biology of cancer). Proteins are the nodes and the edges are directed. Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation / dephosphorylation) across a set of proteins.

3) Metabolite networks. Metabolites are the nodes and the edges are directed. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.

4) Intraspecies or interspecies communication networks in microbial communities. Nodes are excreted organic compounds and the edges are directed. Input into an inference algorithm is data from a set of experiments measuring levels of excreted molecules.

Protein-protein interaction networks are also under very active study. However, reconstruction of these networks does not use correlation-based inference in the sense discussed for the networks already described (interaction does not necessarily imply a change in protein state), and a description of such interaction network reconstruction is left to other articles.

[edit] Co-relation-based inference algorithms

1) from classical statistics - STUB

  - baseline: Pearson correlation

2) from information theory - STUB

  concept of mutual information
  - ARACNE algorithm
  - CLR algorithm

3) from graphical probabilistic models - STUB

   - Bayesian network structure learning
   - K2 alg - needs a node ordering
   - BANJO toolkit

DREAM project - stub

Platforms for network inference - STUB

   - geWorkbench, Columbia
   - SEBINI

Visualization of inferred network - STUB

   - Cytoscape tool

Expansion of inferred network using public databases - data integration - STUB

   - CABIN tool

[edit] References

  1. ^ Sprites, P., Glymour, C. & Scheines, R. (2000). Causation, prediction, and search: Adaptive computation and machine learning. 2nd ed.. Cambridge, MA: MIT Press. 
  2. ^ Faith, J.J., et al. (2007). "Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles". PLoS Biology 5 (1): 54–66. doi:10.1371/journal.pbio.0050008. 
  3. ^ Hayete, B., T.S. Gardner, and J.J. Collins (2007). "Size matters: network inference tackles the genome scale". Molecular Systems Biology 3: 77. doi:10.1038/msb4100118. 
  • Bansal, M., et al., How to infer gene networks from expression profiles. Molecular Systems Biology, 2007. 3(78).
  • Bansal, M., G.D. Gatta, and D. di Bernardo, Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 2006. 22(7): p. 815-822.
  • Faith, J.J., et al., Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biology, 2007. 5(1): p. 54-66.
  • Barrett, C.L. and B.O. Palsson, Iterative Reconstruction of Transcriptional Regulatory Networks: An Algorithmic Approach. PLoS Comput. Biol., 2006. 2(5): p. e52.
  • Basso, K., et al., Reverse engineering of regulatory networks in human B cells. Nat. Genet., 2005. 37(4): p. 382-390. (uses ARACNE algorithm)
  • Bonneau, R., et al., The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology, 2006. 7(5, Article R36).
  • Gardner lab. Context Likelihood or Relatedness (CLR) algorithm. 2006 [CLR algorithm web site at Boston University]. Available from: http://gardnerlab.bu.edu.
  • Chen, X.-W., G. Anantha, and X. Wang, An effective structure learning method for constructing gene networks. Bioinformatics, 2006. 22(11): p. 1367-1374.
  • Chu, T., et al., A statistical problem for inference to regulatory structure from associations of gene expression measurements with microarrays. Bioinformatics, 2003. 19(9): p. 1147-1152.
  • Cover, T.M. and J.A. Thomas, Elements of Information Theory. 1st ed. 1991, New York: John Wiley & Sons.
  • Daub, C.O., et al. (2004) Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics Volume, DOI: 10.1186/1471-2105-5-118
  • de Jong, H., Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol, 2002. 9(1): p. 67-103.
  • de la Fuente, A., et al., Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 2004. 20(18): p. 3565-3574.
  • Filkov, V., Identifying Gene Regulatory Networks from Gene Expression Data (chapter 27), in Handbook of Computational Molecular Biology. 2005, Chapman & Hall / CRC.
  • Hartemink, A.J. Bayesian Network Inference with Java Objects (BANJO). 2005 [BANJO algorithm web site at Duke University]. Available from: http://www.cs.duke.edu/~amink/software/banjo/
  • Hartemink, A.J., Reverse engineering gene regulatory networks. Nat. Biotech., 2005. 23(5): p. 554-5.
  • Husmeier, D., Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics, 2003. 19(17): p. 2271-2282.
  • Ideker, T., V. Thorsson, and R.M. Karp. Discovery of regulatory interactions through perturbation: inference and experimental design. in Pacific Symposium on Biocomputing 2000. Hawaii.
  • Liang, S., S. Fuhrman, and R. Somogyi, REVEAL: a general reverse engineering algorithm for inference of genetic network architectures. Pac. Symp. Biocomput., 1998. 3: p. 18-29.
  • Margolin, A.A., et al., Reverse engineering cellular networks. Nature Protocols, 2006. 1(2): p. 663-672. (full description of ARACNE algorithm)
  • Margolin, A.A., et al., ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 2006. 7(Suppl1): p. S1-7.
  • Markowetz, F. A bibliography on learning causal networks of gene interactions (July 31, 2006).[available from: http://www.molgen.mpg.de/~markowet/doc/network-bib.pdf; http://genomics.princeton.edu/~florian/docs/network-bib.pdf]
  • Papin, J.A., et al., Reconstruction of cellular signalling networks and analysis of their properties. Nat. Rev. Mol. Cell Biol., 2005. 6(2): p. 99-111.
  • Pe'er, D., et al., Inferring subnetworks from perturbed expression profiles. Bioinformatics, 2001. 17: p. 215S-224S.
  • Pe'er, D. Bayesian Network Analysis of Signaling Networks: A Primer. Science STKE 2005 [on-line primer]. Available from: www.stke.org/content/full/sigtrans.
  • Perrin, B.E., et al., Gene networks inference using dynamic Bayesian networks. Bioinformatics, 2003. 19(S2): p. ii138-ii148.
  • Sachs, K., et al., Causal protein-signaling networks derived from multiparameter single-cell data. Science, 2005. 308: p. 523-529.
  • Schadt, E.E., et al., An integrative genomics approach to infer causal associations between gene expressiona and disease. Nat. Genet., 2005. 37(7): p. 710-717.
  • Segal, E., R. Yelensky, and D. Koller, Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics, 2003. 19(Suppl1): p. i264-272.
  • Segal, E., et al., Rich probabilistic models for gene expression. Bioinformatics, 2001. 17(Suppl1): p. S243-252.
  • Shannon, P., et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res., 2003. 13(11): p. 2498-504.
  • Singhal, M. and K. Domico, CABIN: Collective Analysis of Biological Interaction Networks. Journal of Computational Biology and Chemistry, (accepted for publication in 2007)
  • Taylor, R.C., et al., SEBINI: Software Environment for BIological Network Inference. Bioinformatics, 2006. 21: p. 2706-2708.
  • Troyanskaya, O.G., et al., A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl. Acad. Sci. USA, 2003. 100: p. 8348-8353.
  • van Someren, E.P., et al., Genetic network modeling. Pharmacogenomics, 2002. 3(4): p. 507-25.
  • Weaver, D.C., C.T. Workman, and G.D. Stromo, Modeling regulatory networks with weight matrices. Pac. Symp. Biocomput., 1999.
  • Werhli, A.V., M. Grezegorczyk, and D. Husmeier, Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks. Bioinformatics, 2006. 22(20): p. 2523-2531.
  • Wessels, L.F., E.P. van Someren, and M.J. Reinders, A comparison of genetic network models. Pac. Symp. Biocomput., 2001.
  • Yu, H., et al., Advances to bayesian network inference for generating causal networks form observational biological data. Bioinformatics, 2004. 20: p. 3594-3603.
  • Zhao, W., E. Serpedin, and E.R. Dougherty, Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics, 2006. 22(17): p. 2129-35.
  • Zhou, X., et al., A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks. Bioinformatics, 2004. 20(17): p. 2918-27.
  • Zou, M. and S.D. Conzen, A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course data. Bioinformatics, 2005. 21(1): p. 71-79.