This report summarises differential gene analysis as performed by the nf-core/differentialabundance pipeline.
A summary of sample metadata is below:
Comparisons were made between sample groups defined using metadata columns, as described in the following table of contrasts:
Input was a matrix of 63241 genes for 16 samples, reduced to 35292 genes after filtering for low abundance.
The following plots show the abundance value distributions of input matrices. A log2 transformation is applied where not already performed.
Whiskers in the above boxplots show 1.5 times the inter-quartile range.
Principal components analysis was conducted based on the 500 most variable genes. Each component was annotated with its percent contribution to variance.
For the variance stabilised matrix, an ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).
The resulting p values are illustrated below.
The variable ‘cellline’ shows an association with PC1 (80.3%) (p = 0.00). The variable ‘condition’ shows an association with PC1 (80.3%) (p = 0.00). The variable ‘treatment’ shows an association with PC3 (1.8%) (p = 0.01).
A hierarchical clustering of genes was undertaken based on the 500
most variable genes. Distances between genes were estimated based on
spearman correlation, which were then used to produce a clustering via
the ward.D2 method with hclust()
in R.
Outlier detection based on median absolute deviation was undertaken, the outlier scoring is plotted below.
1 possible outliers were detected in groups defined by cellline: Atreated3
1 possible outliers were detected in groups defined by condition: Atreated3
1 possible outliers were detected in groups defined by treatment: Atreated3
The DESeq2 R
package was used for differential analysis.
p-values were adjusted with the BH method to reduce the number of false
positives. Genes were considered differential if, for the respective
contrast, the adjusted p-value was equal to or lower than 0.1 and the
absolute log2 fold change was equal to or higher than 0.
Filtering was carried out by selecting genes with an abundance of at least 1 in at least 1 samples.
Note: For a more detailed accounting of the software and commands used (including containers), consult the execution report produced as part of the ‘pipeline info’ for this workflow.
Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.
Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of affymetrix genechip data at the probe level. Bioinformatics. 2004;20(3):307-315.
Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.
H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Davis S, Meltzer PS. Geoquery: a bridge between the gene expression omnibus (Geo) and bioconductor. Bioinformatics. 2007;23(14):1846-1847.
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.
Trevor L Davis (2018). optparse: Command Line Option Parser.
C. Sievert (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.
Gierlinski M, Gastaldello F, Cole C, Barton GJ. Proteus : An r Package for Downstream Analysis of Maxquant Output. Bioinformatics; 2018.
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes.
JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2022). rmarkdown: Dynamic Documents for R.
Jonathan R Manning (2022). Shiny apps for NGS etc based on reusable components created using Shiny modules. Computer software. Vers. 1.5.3. Jonathan Manning, Dec. 2022. Web.
Morgan M, Obenchain V, Hester J and Pagès H (2020). SummarizedExperiment: SummarizedExperiment container.
Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.
Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.