nf-core/DifferentialAbundance

DifferentialAbundance_pipeline

nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.

  1. Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).
  2. Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.
  3. Run differential analysis over all contrasts specified.
  4. Optionally run a differential gene set analysis.
  5. Generate exploratory and differential analysis plots for interpretation.
  6. Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.
  7. Build an HTML report based on R markdown, with interactive plots (where possible) and tables.

Set-up run

for those with more confidence this run can be prepared from scratch in a new directory.

  • the pipeline will taake ~40mins to run
  • for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions
  • a good chance to demonstrate the resume feature of nextflow
mkdir ~/workshop/DifferentialAbundance
cd ~/workshop/DifferentialAbundance
ls -l

optional you can use the nf-core launch command to build a launch command, but these instructions will be using a ’nextflow run’ command

You can check the ~/.nextflow/assets folders to see what is already installed

ls -l ~/.nextflow/assets/nf-core/

If you don’t see the pipeline, you can pull it from the nf-core website.

nextflow pull nf-core/differentialabundance

but this happeds automatically when running the nextflow run nf-core/differentialabundance

nfpull

to identify the requirements of the pipeline go to the website https://nf-co.re/differentialabundance/1.5.0/docs/usage/ The minimal input requirements are

  1. Sample sheet, containg the sample information, metadata and group relationships
  2. A counts table, such as that made with the RNAseq pipeline
  3. a transcript length table - this is output from the RNAseq pipeline. It allows for more acurate normalisation based on transcript length.
  4. profile, refering to the config file and the experiment type. rnaseq, singularity.
  5. a gtf file - ideally this is the same genome reference that was used in the mapping step of the nf-core/RNAseq run
  6. optional. gsea gene set for gsea analysis. this can be downloaded from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp

For those who just want to get the result, the previous run can be resumed with the following command

cd ~/workshop/nfDifferentialAbundance
sh nf_differentialabundance.sh

multiQC summary

link to multiQC

R Shiny App

  • one of the key outputs from the nf-core/DifferentialAbundance pipeline is an Shiny app.
  • This runs R processes in the background and presents the data as a site.
  • One way to run the app is to kick it off in R studio, to do this you can download the App directory and use the following R code to open it.

Launching the App on your local R Studio

option 1

The easiest way to run your R shiny app is using R studio.

Firstly, you will need to download the nf-core/DifferentialAbundance outputs which you have generated to your pc using scp.

IP=[yourIPaddress]

mkdir -p  ~/workshop_RNAseq/nfDifferentialAbundance 
scp -r workshop@$IP:/home/workshop/workshop/nfDifferentialAbundance/outs/ ~/workshop_RNAseq/nfDifferentialAbundance/

cd ~/workshop_RNAseq/nfDifferentialAbundance/outs

Open R studio, and paste the following R code. This also requires installation of a few packages in Rstudio including the *shinyngs package. The installation code is hashed out in this case.

Within R studio, in the top left panel you can paste this code

##To run shiny App
# run this in R studio

# install packages
install.packages("remotes")
remotes::install_github("pinin4fjords/shinyngs")
library(remotes)
library(shinyngs)
library(markdown)

# if you have downloaded the app locally then you will need to add your path to the working directory
# if you downloaded it to the directory described above, this is where it will be  '~/workshop_RNAseq/nfDifferentialAbundance'
setwd("/Users/danielthomson/workshop_RNAseq/nfDifferentialAbundance/outs/shinyngs_app/SAGC_Workshop_RNAseq")

#####
esel <- readRDS("data.rds") # you need to navigate to the 'shinyngs_app' directory
app <- prepareApp("rnaseq", esel)
shiny::shinyApp(app$ui, app$server)

Using Rstudio via Nectar

option 2

You can also run Rstudio from a virual machine. In some cases this is preferable when dealing with large datasets, where you can control the compute resources used. \

settingDescription
Hostuse the instance IP address, or the hostname [hostname].[project].cloud.edu.au
Loginuse the username you created in the Configure Application dialog

you shoulud see the following login page, access with your passoword
Nectar Rstudio

Within R studio, in the top left panel you can paste this code, making sure the path to the app directory is correct.

##To run shiny App
# run this in R studio

#######################
# install packages
#install.packages("remotes")
#if (!require("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install(c("SummarizedExperiment", "GSEABase", "limma"), force = TRUE)
#devtools::install_github('pinin4fjords/shinyngs', force = TRUE)

library(remotes)
library(shinyngs)
library(markdown)

#######################

# navigate to where the shiny app is
# if you are working on Rstudio running on your nectar instance, then this should be in the 'out' directory from where you ran the nf-core/DiferentialAbundance pipeline
setwd("/home/workshop/workshop/nfDifferentialAbundance/outs/shinyngs_app/SAGC_Workshop_RNAseq/")

#####
esel <- readRDS("data.rds") # you need to navigate to the 'shinyngs_app' directory
app <- prepareApp("rnaseq", esel)
shiny::shinyApp(app$ui, app$server)

Interractive RNAseq data analysis

This previous step is worth the hastle, because it gives you access to the R Shiny App with all your results available.

From here, you will see all the parameters you set up in your nextflow run kickoff script. And it will give you access to the data analysis.

Once you’ve made it to this point take some time to navigate around. You will see interactive versions of many key differential expression analysis tools. In the background, it is being run in your Rstudio session.

Now that we have done the combined work of what would take days writing R code for DEseq2 and ggplot2, there is a fair bit to unpack and understand
We’ll walk through some of the key analysis, results.

You can see from the ‘Experimental data’ page, all of the information came from the sample sheet provided to nf-core/DifferentialAbudance.

cd home/workshop/workshop/nfDifferentialAbundance/
cat SampleSheet.csv 

sample,fastq_1,fastq_2,treatment,cellline,condition Acontrol1,,,control,A,Acontrol Acontrol2,,,control,A,Acontrol Acontrol3,,,control,A,Acontrol Acontrol4,,,control,A,Acontrol Atreated1,,,treated,A,Atreated Atreated2,,,treated,A,Atreated Atreated3,,,treated,A,Atreated Atreated4,,,treated,A,Atreated Bcontrol1,,,control,B,Bcontrol Bcontrol2,,,control,B,Bcontrol Bcontrol3,,,control,B,Bcontrol Bcontrol4,,,control,B,Bcontrol Btreated1,,,treated,B,Btreated Btreated2,,,treated,B,Btreated Btreated3,,,treated,B,Btreated Btreated4,,,treated,B,Btreated

cat contrasts.csv 
id,variable,reference,target
cellline,cellline,A,B
treatedVScontrol1,condition,Acontrol,Atreated
treatedVScontrol2,condition,Bcontrol,Btreated

There is the opportunity to add as many extra columns to this table, which will all provide extra possible comparisons to the app.

Looking at the ‘row metadata’ page, you can see that all the gene information comes from the gtf file which we used.
For well annotated genomes like Hg38 (human) there is alot of extra information that can be pulled out.

head -n 10 genome.gtf
head -n 10 genome.gtf

normalisation

PCA

cluster

foldchange

PDX1