Nextflow Introduction

John Salamon

Aug 28, 2024

A brief history of Nextflow

Problems Nextflow aims to solve

How Nextflow aims to solve these problems

Core concepts (Workflow, Channel, Process)

  • A Nextflow workflow explicitly describes how processes (tasks) are connected
    • Every process is a self-contained black box
    • A process can run as soon as all of its inputs are valid
    • Processes always run in parallel
  • Processes are connected by channels
    • Every input and output of a process is a channel
    • Usually channels contain files

Core concepts (Operators)

# Piping
A | B | C

# Versus, more procedural style (same outcome)
ch_out = A()
ch_out_2 = B(ch_out)
C(ch_out_2)

Implementation details

// Use two forward slashes for single line comments
// Assign variables like this:
x = 2

// define Lists
myList = [1,2,3]
// and Maps
myMap = ["key": "value"]

// access object methods and attributes with a dot
myList.size() // returns 3

// parentheses can be omitted for single parameter functions 
// e.g., the following two lines are equivalent:
println("hello world")
println "hello world"

High level comparison with Snakemake

Nextflow Snakemake
Language extends Groovy Python
DAG is defined Explicitly Implicitly
Root of graph is Inputs Outputs
History More commercial More academic

Paradigms

Snakemake - makefile style (start by naming outputs) - define multiple rules - naming a target then generates your DAG by combining rules - your workflow structure is implicit

Nextflow - dataflow programming (start by naming inputs) - define multiple processes - join them together in a workflow, explicity - You just provide inputs and everything runs

nf-core

In the next section…

Let’s write a workflow!