MDP: Mutations and Drugs Portal

The Mutations and Drugs Portal (MDP) is a public accessible database that combines the cell-based NCI60 pharmacological screening with genomic data extracted from the Cancer Cell Line Encyclopedia (CCLE) and the NCI60 DTP projects. MDP currently contains drug sensitivity data for more than 50,800 compounds, describing responde to drugs across 115 cancer cell lines. To identify genomic features associated to drug response, cell line drug sensitivity data are integrated with large genomic datasets, including information on somatic mutations and transciptional data. MDP can be queried for drugs active in cancer cell lines carrying mutations or transcriptional alterations in specific cancer genes and signaling pathways of for genetic and transcriptional profiles associated to sensitivity or resistance to a given compound. Results are presented through graphical representations with links to related data and are fully downloadable. MDP provides a user-friendly web resource to perform in-silico high-throughput screenings of thousand od compounds and facilitat the discovery od associations between genomic portraits and drug responses.

Methods

MDP has been constructed using the drug response file GI50 Data (Sept 2014 release) retrieved from the NCI60 DTP portal, and sequencing data and variant classifications retrieved from the CCLE and NCI60 public repositories. Gene expression profiles for CCLE have been downloaded from the CCLE repository and from Gene Expression Omnibus (GEO) (GSE5720 for data obtained with Affymetrix Microarray U133 and GSE32474 for data obtained with Affymetrix Microarray U133 plus 2.0). GI50 Data file contains a matrix of GI50 values which are computed, for any compound, as minus the log10 of IC50, i.e., the drug concentration necessary to inhibit 50% growth of treated cells relative to untreated controls. Prior to analysis, for any single compound, first the GI50 is transformed back to IC50 and then the IC50 value is normalized dividing the IC50 of any cell line by the average of the IC50 across all the cell lines. The normalized IC50 (in log2 scale) of a compound is used to define the response for any combination of drug and cell line in terms of i) good response if the normalized log2 IC50 is lower than two standard deviations of the distribution of all log2 IC50 in a given cell line, and ii) bad response otherwise.

Statistical Analysis

In the from gene to drug analysis, starting from a specific set of mutation/s, compounds with increased activity in cases (cell lines treated carrying the selected set of mutations) as compared to controls (cell lines treated without the specific set of mutations) are identified. In the from signature to drug section, MDP first calculates a signature score for each cell line summing the standardized expression values of all genes composing the gene signature. Cell lines are then labeled as high signature if the expression score is positive, or as low signature if the expression score is negative. Using this labelling process, by selecting cell lines with upregulated signature profiles, the high signature cell lines will constitute the cases group, while low signature cell lines will constitute the controls group, and vice versa when selecting cell lines with downregulated signature profiles. These compounds are then ranked based on a scoring function, defined by the fraction of good response in cases multiplied by the fraction of bad response in controls. This score ranks each drug based on the enrichment of good response in the case group. The statistical significance of this ranking (p-value) is computed using a one-tailed Fisher’s exact test for the enrichment of good response in cases as compared to bad response in controls, given the number of bad response in cases and good response in controls. In the from drug to gene section, the normalized log2 IC50 of the selected compound is used to first retrieve the two groups of cell lines with good and bad response for that drug. Then, for each mutation, MDP calculates the number of cases as the fraction of cell lines carrying the mutation in the good response set and the number of controls as the fraction of cell lines carrying the mutation in the bad response set.