Abstract

This report summarises differential gene analysis as performed by the nf-core/differentialabundance pipeline.

Data

Samples

A summary of sample metadata is below:

Contrasts

Comparisons were made between sample groups defined using metadata columns, as described in the following table of contrasts:

Results

Counts

Input was a matrix of 63241 genes for 16 samples, reduced to 35292 genes after filtering for low abundance.

Exploratory analysis

Abundance value distributions

The following plots show the abundance value distributions of input matrices. A log2 transformation is applied where not already performed.

Box plots

Whiskers in the above boxplots show 1.5 times the inter-quartile range.

Density plots

Sample relationships

Principal components plots

Principal components analysis was conducted based on the 500 most variable genes. Each component was annotated with its percent contribution to variance.

Variance stabilised (cellline)
Variance stabilised (condition)
Variance stabilised (treatment)
Normalised (cellline)
Normalised (condition)
Normalised (treatment)
Raw (cellline)
Raw (condition)
Raw (treatment)

Principal components/ metadata associations

For the variance stabilised matrix, an ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).

The resulting p values are illustrated below.

The variable ‘cellline’ shows an association with PC1 (80.3%) (p = 0.00). The variable ‘condition’ shows an association with PC1 (80.3%) (p = 0.00). The variable ‘treatment’ shows an association with PC3 (1.8%) (p = 0.01).

Clustering dendrograms

A hierarchical clustering of genes was undertaken based on the 500 most variable genes. Distances between genes were estimated based on spearman correlation, which were then used to produce a clustering via the ward.D2 method with hclust() in R.

Variance stabilised (cellline)

Variance stabilised (condition)

Variance stabilised (treatment)

Normalised (cellline)

Normalised (condition)

Normalised (treatment)

Raw (cellline)

Raw (condition)

Raw (treatment)

Outlier detection

Outlier detection based on median absolute deviation was undertaken, the outlier scoring is plotted below.

cellline

1 possible outliers were detected in groups defined by cellline: Atreated3

condition

1 possible outliers were detected in groups defined by condition: Atreated3

treatment

1 possible outliers were detected in groups defined by treatment: Atreated3

Differential analysis

The DESeq2 R package was used for differential analysis. p-values were adjusted with the BH method to reduce the number of false positives. Genes were considered differential if, for the respective contrast, the adjusted p-value was equal to or lower than 0.1 and the absolute log2 fold change was equal to or higher than 0.

Differential gene counts

Adjusted

Unadjusted

Differential gene details

B versus A in cellline

Adjusted p values
Unadjusted p values

Atreated versus Acontrol in condition

Adjusted p values
Unadjusted p values

Btreated versus Bcontrol in condition

Adjusted p values
Unadjusted p values

Gene set analysis

GSEA

c5.all.v2023.2.Hs.symbols
B versus A in cellline
Atreated versus Acontrol in condition
Btreated versus Bcontrol in condition

Methods

Filtering

Filtering was carried out by selecting genes with an abundance of at least 1 in at least 1 samples.

Exploratory analysis

Differential analysis

Gene set analysis

GSEA

Appendices

All parameters

Software versions

Note: For a more detailed accounting of the software and commands used (including containers), consult the execution report produced as part of the ‘pipeline info’ for this workflow.

nf-core/differentialabundance: Citations

nf-core

Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

Nextflow

Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

Pipeline tools

  • GSEA

    Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.

R packages

  • affy

    Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of affymetrix genechip data at the probe level. Bioinformatics. 2004;20(3):307-315.

  • DESeq2

    Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.

  • ggplot2

    H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

  • GEOQuery

    Davis S, Meltzer PS. Geoquery: a bridge between the gene expression omnibus (Geo) and bioconductor. Bioinformatics. 2007;23(14):1846-1847.

  • Limma

    Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

  • optparse

    Trevor L Davis (2018). optparse: Command Line Option Parser.

  • plotly

    C. Sievert (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.

  • Proteus

    Gierlinski M, Gastaldello F, Cole C, Barton GJ. Proteus : An r Package for Downstream Analysis of Maxquant Output. Bioinformatics; 2018.

  • R

    R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

  • RColorBrewer

    Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes.

  • RMarkdown

    JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2022). rmarkdown: Dynamic Documents for R.

  • shinyngs

    Jonathan R Manning (2022). Shiny apps for NGS etc based on reusable components created using Shiny modules. Computer software. Vers. 1.5.3. Jonathan Manning, Dec. 2022. Web.

  • SummarizedExperiment

    Morgan M, Obenchain V, Hester J and Pagès H (2020). SummarizedExperiment: SummarizedExperiment container.

Software packaging/containerisation tools

  • Anaconda

    Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

  • Bioconda

    Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

  • BioContainers

    da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

  • Docker

    Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

  • Singularity

    Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.