Modeling Cell Regulatory Networks

Transcriptional regulatory module controlling the mesenchymal signature of high-grade glioma, computationally inferred and validated by biochemical and functional assays.

Our work is grounded on the premise that cellular phenotypes not only result from alterations in the genomic code, but also depend on the influences of multiscale networks of molecular interactions that regulate gene expression, protein abundance, epigenetic state, and signaling activity. Therefore, understanding how the code embedded in the genome eventually generates cellular phenotypes requires the development of methods for integrating data describing these various levels of activity. We have hypothesized that the multiscale networks that emerge from such factors are context-specific; that is, certain clearly-defined molecular programs control specific functions in particular cell types, and become dysregulated in specific ways as a cell enters a disease state.

Because it is impossible to empirically observe such a complex system in action, our lab has spent the last decade developing a rigorous pipeline of quantitative algorithms for building computational models of biological systems of interest. Based on concepts from information theory and Bayesian statistics, our methods use large collections of biological data to reverse engineer genome-wide interaction networks, called interactomes, and then identify modules within these networks that are essential to producing a particular phenotype of interest. Among our key accomplishments are:

  • the first transcriptional network of a human cell (normal and tumor related human B cells)
  • the first network representing all post-translational modulators of transcription factor’s activity in human cells
  • the first protein-protein interaction network using 3D protein structure information and functional evidence
  • discovery of an entirely novel regulatory layer of microRNA-mediated interactions between competitive endogenous RNA species

A critical component of our practice is the rigorous experimental validation of the predictions our computational algorithms generate. These steps have consistently shown that our algorithms exhibit accuracy and sensitivity comparable to and often exceeding those of medium and high-throughput experimental assays. In all cases, our laboratory has been involved in the design of the algorithm, its validation strategy, and its implementation.

Our pipeline

Over the past decade, our laboratory has developed an integrated toolbox of software for elucidating key features of regulatory networks. A critical input into these algorithms is gene expression data, which research in our lab and laboratories of other investigators has shown to be the best indicator of what a cell is actually doing at the time of measurement. By subjecting cells to perturbations in a high-throughput fashion (using small molecule or RNAi screens, for example) and seeing how these perturbations lead to differential gene expression patterns, we gain insights into how regulatory networks are organized.

Our computational pipeline has evolved in a programmatic way, beginning with a method for distinguishing interactions between transcription factors and genes on a genome-wide scale, and then adding additional layers of analysis to provide an increasingly high-resolution model of how these individual interactions are organized in context-specific networks. The following are some key tools that we have developed.

Algorithm for the Reconstruction of Accurate Cellular Networks

Using microarray expression profiles as an input, ARACNe applies methods based in information theory to identify high-probability interactions between transcription factors and their target genes, and to eliminate the vast majority of indirect interactions typically inferred by pairwise statistical analysis. It is specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems.

Algorithm for the genome-wide discovery of modulators of transcriptional interactions

MINDy identifies multivariate statistical dependencies between a transcription factor and one or more of its targets, conditional on the presence (or absence) of a candidate modulator gene. In this way it enables the systematic identification of genes that modulate a transcription factor’s transcriptional program at the post-translational level; i.e., genes that encode proteins that affect the TF’s activity without changing its mRNA abundance. The results can also include proteins that do not physically interact with the TF, such as those in its upstream signaling pathways.

Master Regulator Inference Algorithm

MARINa analyzes gene expression profile data through the lens of a genome-wide interaction models produced using software such as ARACNe or MINDy. Its purpose is to infer transcription factors that 1) control the transition from one phenotype to another, and 2) maintain the second phenotype. These “master regulator” genes produce transcription factors whose targets within the regulatory network are the most differentially expressed genes between the two cellular phenotypes. Repeated applications of MARINa across a wide range of disease and cell types has demonstrated that inhibiting master regulators causes a total collapse of the disease phenotype because of the essential roles they play.

Virtual Inference of Protein-activity by Enriched Regulon Analysis

VIPER uses computational methods to infer protein activity, on an individual sample basis, from gene expression profile data. It analyzes these data within the context of ARACNe- and MARINa-derived regulatory networks, using the expression of genes that are most directly regulated by a given protein, such as the targets of a transcription factor (TF), as an accurate reporter of its activity. VIPER offers an alternative to methods that use mRNA abundance as a sign of protein activity, as mRNA abundance fails to account for post-transcriptional and post-translational interactions that can affect its activity.

Driver-Gene Inference by Genetic-Genomic Information Theory

DIGGIT identifies genetic determinants of disease phenotypes by systematically exploring the regulatory/signaling networks that lie upstream of master regulators. By identifying genetic alterations that cause changes in the expression of master regulator genes, DIGGIT distinguishes between disease-driving genes and genetic alterations that are not upstream of master regulators and are therefore merely coincidental. The algorithm therefore significantly collapses the number of testable hypotheses concerning the genetic origins of disease and provides mechanistic information for determining the network context for exactly how genetic alterations lead to phenotypic changes.

Predicting Protein-Protein Interactions

In collaboration with Barry Honig, our lab also helped to develop the first algorithm that integrates perspectives from structural biology into systems biology. Using homology modeling principles developed in the Honig Lab for predicting protein structure, PrePPI computationally predicts whether two proteins are capable of interacting, eliminating predictions of structurally impossible interactions. In this way PrePPI provides information that can constrain interaction models generated using other means.

Reference networks available

Using the above approach, our laboratory has generated a growing number of downloadable regulatory models. These include interactomes for specific cancer subtypes as well as the transcriptional interactions involved in human B-cells.

Related publications

Chiu HS, Llobet-Navas D, Yang X, Chung WJ, Ambesi-Impiombato A, Iyer A, Kim HR, Seviour EG, Luo Z, Sehgal V, Moss T, Lu Y, Ram P, Silva J, Mills GB, Califano A, Sumazin P. Cupid: simultaneous reconstruction of microRNA-target and ceRNA networks. Genome Res. 2015 Feb;25(2):257-67.

Chen JC, Alvarez MJ, Talos F, Dhruv H, Rieckhof GE, Iyer A, Diefes KL, Aldape K, Berens M, Shen MM, Califano A. Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks. Cell. 2014 Oct 9;159(2):402-14.

Sumazin P, Yang X, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, Califano A. An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell. 2011 Oct 14;147(2):370-81.

Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R, Califano A. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol Syst Biol. 2010 Jun 8;6:377.

Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006 Mar 20;7 Suppl 1:S7.