EviCor supports simple user requests regarding correlation of molecular profiles in large scale datasets.
This may be helpful in a range of contexts, from evaluating agreement between omics platforms to identifying potential markers of drug response.
The data from major public resources, The Cancer Genome Atlas
and The Cancer Cell Line Encyclopedia can be either explored as plots of gene,
protein, and drug sensitivity profiles or help in identifying correlates of response to anti-cancer drugs. Furthermore, such correlaions can be traced to clinical
treatment profiles in TCGA. An extra feature of EviCor is network context. This is instrumental in both biomarker discovery
and evaluation of candidate molecules in the network context.
Refer to this page to get answers to your questions and instructions for some typical tasks.
Open as new pageicon
Demo "2D"
Demo "gene X drug"
Demo "survival"
EviCor website offers a variety of interactive plots. User can use plots for explorative data analysis. Plots can be shared vi automatically generated urls.
Available plot types
Plot type is always decided on the data types of chosen variables. The following table describes available plot types and their dimensionality.
Plot type |
Dimensionality |
Description |
Bar |
1D |
Basic bar plot for categorial data |
Pie |
1D |
Alternative representation for categorial data |
Histogram |
1D/2D |
1D: basic histogram, 2D: stacked histogram (by categories) |
Venn |
2D |
Venn diagrams for categorial data, shows intersection of 2 categories |
Box |
2D/3D |
Interactive box plots with whiskers and outliers. In case of 3D - groupped box plot, one of the groups is represented via color |
Kaplan-Meier plot |
2D/3D |
Kaplan-Meier survival plots, numeric and categorial data can be used to define groups |
Scatter |
2D/3D |
2D plot for numerical data. In case of 3D plot, 3rd dimension is represented via data points color and/or shape, data for the 3rd dimension can be either numeric or categorial. |
Meta-codes
Data for plotting can be selected either by codes (for TCGA) or tissue of origin (CCLE) or meta-codes. Meta-codes are synonyms for a collection of codes, e.g. "metastatic" metacode corresponds to all TCGA codes meant to represent samples from metastases.
Available data transformations
EviCor allows user to scale data when working with numerical data types. The following transformations are available:
Scale |
Description |
Original |
Keep values as they are stored in EviCor DB |
Square root (sqrt) |
Apply square root tranformation to data |
Logarithmic (log) |
Apply log transformation to data |
Beta |
Convert values into beta-scale (methylation data only) |
M (M-value) |
Convert values into M-values (methylation data only) |
Plot legends
In many cases plot legends contain additional information. Some of this information is descriptive and pre-defined by user (chosen data source etc), some is calculated by R scripts, such as different types of correlaions. Correlations, when available, are calculated for data on X and Y axis.
REST API
EviCor web platform allows user to create plots using REST API. Refer to the respective tab for more details.
EviCor web platform can create predictive models from pre-loaded data.
Available models
At the moment EviCor supports only
glmnet models for regression and binomial classification. GLMnet is an implementation of elastic net algorithm. You can find more information on GLmnet, including original publications, on the
official CRAN page of the respective package.
Predictive variables
EviCor allows user to combine variables from different data types for better perofrmance. For example, you can try to predict gene expression of TP53 based on gene expression of certain set of genes and copy number of another set of genes (these sets can intersect). Additionally, you can have predictive variables of the same data type (e.g. gene expression) for the same genes, but measured using different platforms (e.g. Agilent and Illumina HiSeq v.2). You can use all ids as predictive variables for a chosen platform, for this type
[all]
in the respective IDs input.
Transfering significant correlates
Once you have retrieved significant correaltions via "Correlates of drug response" tab, you may use some (or all of them) as predictive variables. EviCor allows you to transfer them to the "Multivariate models" tab using built-in buffer. You can either add IDs one by one by pressing "Add to clipboard" (icon with two documents one above another) button or add all unique IDs by pressing "Copy all IDs for transfering" in the table header. This buffer stores only unique records. Next, you can pick the predictive variables input on the "Mutivariate models" tab, then either pick required ids in the clipboard window and pick option to ransfer chosen ids or copy all ids.
Meta-variables
Just like meta-codes EviCor allows usage of meta-variables. Meta-variable is a mnemonic code which "hides" a lot of ids behind. As for version 1.3.1, only one metacode is supported -
[all]
. Use this variable if you want, for example, to test all genes from a certain platform as predictive variables.
Validation parameters
EviCor offers user to withhold some part of data to use for model validation. To use this option, set tick in "Model validation" checkbox, then you can adjust number of folds in k-fold validation and portion of samples withheld for validation.
Exploring model creatiuon results
After model successfully created, user will see a multi-tab dialog. This dialog shows training procedure (refer to glmnet documentation for explanation), correlation of predicted vs actual values on training and testing (if validation was selected) data sets.
Performance metrics
EviCor offers a number of performance metrics, which are calculated on training and testing (if validation option is selected) data sets. EviCor offers the following performance metrics:
Using generated models in your pipeline
The generated model can be downloaded by pressing the respective button and loaded into user's R environment (model name:
model
). Refer to glmnet documentation for model usage tips.
Known compatibility issues
This site best works with Google Chrome (v. 71 or later) and Mozilla Firefox (v. 60 or later). It is also compatible with Edge.
Some functions do not work with IE 11.
The site also works with Apple Safari (v. 12.1, macOS Mojave v10.14.6), other versions of Safari browser can have compatibility issues.
EviCor offers REST API for all functional tabs. Some functions, such as batch jobs, are available only via REST API. All scripts are accessible through https://www.evicor.org/cgi/script_name.
Retrieving correlations
Script name: cor_datatables_json.cgi
Returns: correlations in JSON format
Parameters:
- source: mandatory, correlation source (TCGA/CCLE);
- datatype: mandatory, data type (e.g. GE, COPY...). Can be set to
all
;
- cohort: mandatory, data cohort, can be set to 'all' for TCGA, must be set to
all
for CCLE;
- platform: mandatory, correlaion platform (e.g.
Agilent
), can be set to all
;
- screen: mandatory, correlation screen (e.g.
GDSC1
), can be set to all
for CCLE, must be set to all
for TCGA;
- id: mandatory, gene/protein/pathway/drug name;
- fdr: mandatory, FDR, use dot as delimeter;
- mindrug: mandatory, minimal number of samples treated with drug;
- data_columns: mandatory, comma-separated list of columns to retrieve from SQL:
- for CCLE:
gene,feature,ancova_q_1x,ancova_p_2x_cov1,ancova_p_2x_feature,ancova_q_2x_feature
- for TCGA:
gene,feature,followup,followup_part,q_drug,q_expr,q_interaction,n_patients,n_treated
- filter_columns: mandatory, columns to use for filtering (e.g. column with FDR values);
- concat_operator: mandatory if more than one value specified in filter_columns, comma-separated list of logical operators to use for filtering conditions generation. For n columns in filter_columns this list should either contain n-1 operators (applied in order) or one operator (applied to all);
- limit_by: mandatory, name of the column by which results are sorted (in ascending order), number of returned results is limited to 1000.
Example: https://www.evicor.org/cgi/cor_datatables_json.cgi?source=TCGA&datatype=GE&cohort=BRCA&platform=IlluminaHiSeq_RNASeqV2&screen=all&id=tamoxifen&fdr=0.005&mindrug=10&data_columns=gene,feature,followup,followup_part,q_drug,q_expr,q_interaction,n_patients,n_treated&filter_columns=q_expr,q_interaction&concat_operator=OR&limit_by=q_interaction
Usage tips: Option
concat_operator may be confusing. The following short example is designed to clarify it. Let's say columns
col1
,
col2
and
col3
are chosen for filtering, FDR threshold is 0.05. If
concat_operator is set to
OR,AND
the resulting condition will be:
(col1<0.05) OR (col2<0.05) AND (col3<0.05)
. Setting this parameter to
AND
will result in the following filtering condition:
(col1<0.05) AND (col2<0.05) AND (col3<0.05)
.
Creating plots
Script name: rplot.cgi
Returns: plot file name and meta-info in JSON format
Parameters:
- type: mandatory, plot type (e.g.
scatter
);
- source: mandatory, data source (TCGA or CCLE);
- cohort: mandatory, used cohort;
- datatypes: mandatory, list of datatypes spearated by commas;
- platforms: mandatory, list of platforms separated by commas;
- ids: mandatory, list of genes/antibodies/pathways/drugs names, must match length of datatypes/platforms, if platform has no ids - id should be skipped (e.g.
tp53,,mdm2
);
- codes: mandatory, TCGA codes (
01
, 06
...) or metacodes (all
, normal
...) for TCGA, all
or tissue name for CCLE;
- scales: mandatory for the most types of plots, axis scales (
original
...).
- surv_period: optional, used for KM plots only; part of followup interval which should be used, 1 = 100% = full interval, 0.25 = 25% etc.; default is 1;
Example: https://www.evicor.org/cgi/rplot.cgi?type=bar&source=TCGA&cohort=BRCA&datatypes=CLIN&platforms=vital_status&ids=%2C&codes=&scales=
Usage tips:This script returns plot filename and plot metadata. Plot can either be an html page (plotly plots, for the majority of plot types) or png image (for Venn diagrams). If only the plot name was returned - this is a sign of an error. REST API does not provide error explanations.
Creating models
Script name: model_predict.cgi
Returns: model name (without any file extenstions)
Parameters:
- method: mandatory, method for creating models, at the moment only
glmnet
is supported;
- source: mandatory, data source for model creation (TCGA/CCLE);
- cohort: mandatory, cohort for model creation;
- multiopt: mandatory, TCGA codes or metacodes (for TCGA) or CCLE tissues;
- xdatatypes: mandatory, comma-separated list of data types for independent variables;
- xplatforms: mandatory, comma-separated list of platforms for independent variables;
- xids: mandatory, lists of ids (if applicable) for independent variables, empty space if ids do not exist for the chosen platforms (refer to the examples for format); can be set to
[all]
;
- rdatatype: mandatory, datatype for independent (response) variable;
- rplatform: mandatory, platform for independent (response) variable;
- rid: mandatory, response variable id (if applicable, otherwise empty);
- family: mandatory for glmnet, glmnet family (
gaussian
/cox
...), refer to glmnet documentation for further explanation;
- measure: mandatory for glmnet with cross-validation, which loss should be used to pick the best model, refer to glmnet documentation;
- alpha: mandatory for glmnet, mixing parameter describing balance between lasso and ridge regression (1 - lasso, 0 - ridge);
- nlambda: mandatory for glmnet, number of lambda values, refer to glmnet documentation;
- minlambda: mandatory for glmnet, min lambda value, refer to glmnet documentation;
- validation: mandatory, if cross-validation should be used (binary flag);
- validation_fraction: mandatory for glmnet with validation flag set to
TRUE
, numeric, defines share of records to be used for independent validation, use dot as a decimal separator;
- nfolds: mandatory for glmnet with validation flag set to
TRUE
, number of folds for N-folds cross-validation (see cv.glmnet in glmnet documentation);
- standardize: mandatory for glmnet, if variables should be standardized (binary flag);
- stat_file: mandatory, filename to which performance metrics should be saved (use
auto
to automatically assign the name - see the table below);
- extended_output: mandatory, if model creation parameters should be saved to the specified stat_file;
- header: mandatory, if specified stat_file should have header; recommended to be set to
TRUE
. You can collect information on many models in one file, in this case specify this parameter as TRUE
only for the first query and make sure that all models belong to the same family and validation option is always the same - so columns are the same (also, extended_output should be set the same for all the models).
Example: https://www.evicor.org/cgi/model_predict.cgi?method=glmnet&source=TCGA&cohort=BRCA&rdatatype=GE&rplatform=illuminahiseq_rnaseq&rid=TP53&xdatatypes=GE&xplatforms=illuminahiseq_rnaseq&xids=%5BA1CF%7CA2BP1%7CA4GALT%7CA4GNT%7CAAA1%7CAADAC%7CAADACL2%7CAADACL3%7CAADACL4%7CAADAT%7CAAK1%7CAANAT%7CAARS2%7CAASS%7CABCA4%7CABI2%7CABP1%7CABR%7CACAA1%7CACAN%7CACTA1%7CACTN1%7CADA%7CADAM7%7CADAP1%7CADAT2%7CADAT3%7CAFM%7CAGBL4%7CAHDC1%7CFOXA2%7CFOXB2%7CBCL2%7CTGFBI%7CTGFBRAP1%7CPTGFR%7CARID1A%7CARID1B%7CARID2%7CARID3B%7CARID4A%7CARID4B%7CCACNA1B%7CCACNA1C%7CCACNA2D1%7CCACNA2D2%7CCACNG2%7CCACNA2D4%7CCACNG3%7CCACNG4%7CCACNG7%7CCACNG8%7CVEGFC%7CHRAS%7CKRAS%7CNRAS%7CPDGFA%7CPDGFC%7CPDGFD%7CMDM1%7CMDM2%7CRBL1%7CRBL2%7CRBP1%7CCTRB1%7CPRB1%5D&multiopt=cancer&family=gaussian&measure=deviance&standardize=false&alpha=0.5&nlambda=10&minlambda=0.01&validation=true&nfolds=10&validation_fraction=0.1&stat_file=auto&extended_output=false&header=true
Usage tips: Using just model name, you can access graphical interpretation of the created model, RData model file, coefficients (in JSON format) and performance metrics (in JSON and csv format). If you have received value 'model578045416464' from the script, you can access the following files in https://www.evicor.org/pics/plots/ :
File name |
Meaning |
model578045416464.RData |
glmnet model for R environment |
coeff.model578045416464.json |
Model coefficients (with weights) in JSON format |
model578045416464_model.png |
Graphical abstract of the created model |
model578045416464_training.png |
Graphical abstract on training step |
model578045416464_validation.png |
Graphical abstract on validation step (if set) |
model578045416464.csv |
Performance metrics and some additional information in csv format with header |
perf.model578045416464.json |
Performance metrics in JSON format |
If the error for this model was thrown - it is stored in file model578045416464_error.json.
Batch models creation
Script name: models_from_correlations_batch.cgi
Returns: error or nothing (
done
returned only after all models are created, so don't use synchronous calls with it, link with the results will be sent to the specified email)
Parameters:
- source: mandatory, data source (TCGA/CCLE) for both correlation retrieval and model creation;
- datatype: mandatory, same as datatype for
cor_datatables_json.cgi
; however, several comma-separated values are allowed, number should be the same as number of predictor types;
- cohort: mandatory, same as cohort for
cor_datatables_json.cgi
; may be a comma-separated list, length matches with datatype parameter list;
- screen: mandatory, same as screen for
cor_datatables_json.cgi
; similar to datatype and cohort may be a list;
- id: mandatory, same as id for
cor_datatables_json.cgi
; similar to previous parameters;
- fdr: mandatory, same as fdr for
cor_datatables_json.cgi
; similar to previous parameters;
- mindrug: mandatory, same as mindrug for
cor_datatables_json.cgi
; unlike previous parameters, this one is always a single value;
- columns: mandatory, same as columns for
cor_datatables_json.cgi
;
- method: mandatory, same as method for
model_predict.cgi
;
- model_cohort: mandatory, same as cohort for
model_predict.cgi
;
- multiopt: mandatory, same as multiopt for
model_predict.cgi
;
- xdatatypes: mandatory, same as xdatatypes for
model_predict.cgi
;
- xplatforms: mandatory, same as xplatforms for
model_predict.cgi
;
- additional_xids: optional, specifies which ids should be passed to glmnet even if they do not suffice filters; recommended, if no additional xids - leave blank;
- rdatatype: mandatory, same as rdatatype for
model_predict.cgi
;
- rplatform: mandatory, same as rplatform for
model_predict.cgi
;
- rid: mandatory, same as rid for
model_predict.cgi
;
- family: mandatory for glmnet, same as family for
model_predict.cgi
;
- measure: mandatory for glmnet, same as measure for
model_predict.cgi
;
- alpha: mandatory for glmnet, same as alpha for
model_predict.cgi
;
- nlambda: mandatory for glmnet, same as nlambda for
model_predict.cgi
;
- minlambda: mandatory for glmnet, same as minlambda for
model_predict.cgi
;
- validation: mandatory, same as validation for
model_predict.cgi
;
- validation_fraction: mandatory for glmnet with validation flag set to
TRUE
, same as validation_fraction for model_predict.cgi
;
- nfolds: mandatory for glmnet with validation flag set to
TRUE
, same as nfolds for model_predict.cgi
;
- standardize: mandatory, same as standardize for
model_predict.cgi
;
- iter: mandatory, desired number of models;
- stat_file: mandatory, name of the file without extenstion to save your data in (try to give a unique name to your stat_file);
- extended_output: mandatory, binary flag, if set to
TRUE
- some additional info will be written into stat_file (such as source, rdatatype etc.);
- mail: mandatory, provide the correct email, links with the results will be sent to it when your batch job is done;
Example: https://www.evicor.org/cgi/models_from_correlations_batch.cgi?source=TCGA&datatype=GE&cohort=LUAD&platform=all&screen=all&id=gemcitabine&fdr=0.05&mindrug=10&columns=gene,feature,followup_part,interaction,drug,expr,n_patients,n_treated,followup,q&iter=3&model_cohort=LUAD;method=glmnet&rdatatype=CLIN&rplatform=os&xdatatypes=GE&xplatforms=illuminahiseq_rnaseqv2&additional_xids=&multiopt=01&family=cox&measure=deviance&standardize=false&alpha=1&nlambda=10&minlambda=0.01&validation=true&nfolds=10&validation_fraction=0.1&stat_file=/your_file/&mail=/your_email/
Usage tips: Always try to specify a unique name for your output file! Files are created in the append mode, so if you (or someone) has used the specified file name before - you will get "mixed" results! When the job is done, the reults will be available in file
https://www.evicor.org/pics/plots/your_file.csv.
Using JS API
All the aforementioned functions are available through API specified in drugs.js
module. Some parameters (like columns) are specified in druggable_config.js
.
Datatypes, platforms and screens
Below you can find valid names of datatypes, platforms, variables to use with REST API.
Datatypes
Platforms
Independent variables
Dependent variables
Network-based biomarker discovery and validation:
Marcela Franco, Ashwini Jeggari, Sylvain Peuget, Franziska Böttger, Galina Selivanova, Andrey Alexeyenko
Prediction of response to anti-cancer drugs becomes robust via network integration of molecular data
Sci Rep 9, 2379 (2019) doi: 10.1186/1471-2105-13-226.
EviNet web resource:
Ashwini Jeggari, Zhanna Alekseenko, José Dias, Johan Ericson, and Andrey Alexeyenko EviNet: a web platform for network enrichment analysis with flexible definition of gene sets Nucleic Acids Res 2018 Jul 2;46(W1):W163-W170. doi: 10.1002/1878-0261.12350.
Version history:
1.0.0 (25th of March, 2020) - stable release: exploring correlations, creating and sharing plots, creating and downloading predictive models.
1.1.0 (8th of April, 2021) - stable release: improved functionality, model comparison, REST API.
1.2.0 (21st of June, 2021) - experimental version: improved functionality, more responsive design.
1.3.0 (10th of September, 2021) - stable version: improved REST API, improved performance, new data, style changes, new tools for working with plots and models.
1.3.1 (9th of November, 2021) - stable version: improved error handling, visual adjustments, updated documentation, bugfixes.
1.3.2 (22nd of December, 2021) - stable version: improved KM plots, updated legend format for all plots, model viewer window now uses plotly, visual improvements, bugfixes
1.3.3 (20th of January, 2022) - stable version: 3D boxplots, visual improvements, bugfixes, extended help