MGIKIT
mgikit is a collection of tools used to demultiplex fastq files and generate demultiplexing and quality reports.
The toolkit includes the following commands:
demultiplex
This command is used to demultiplex fastq files and assign the sequencing reads to their associated samples. The tool requires the following mandatory input files to perform the demultiplexing:
- Fastq files (single/paired-end).
- Sample sheet which contains sample indexes and their templates (will be explained in detail).
Simply, the tool reads the barcodes at the end of R2 (reverse) reads for paired-end reads input, or the end of R1 (forward) reads for single read input. Based on the barcode, it assigns the read to the relevant sample, allowing for mismatches less than a specific threshold. The tool outputs fastq files for each sample as well as some summary reports that can be visualised through the MultiQC tool and mgikit plugin.
mgikit reports can be parsed by mgikit-multiqc plugin to generate an HTML report using multiqc tool summarising the results of the demultiplexing and the quality of the output data as described here.
template
This command is used to detect the location and form of the indexes within the read barcode. It simply goes through a small number of the reads and investigates the number of matches with the indexes in the sample sheet within each possible location in the read barcode, and considers the indexes as is and their reverse complement.
It reports matches for all possible combinations and uses the read template that has the maximum number of matches. This process happens for each sample individually, and therefore, the best-matching template for each sample will be reported.
Using this comprehensive scan, the tool can detect the templates for mixed libraries.
report
This command is to merge demultiplexing and quality reports from multiple lanes into one comprehensive report for MultQC reports visualisation.
reformat
This command is to reformat fastq files generated by splitBarcode
into Illumina format and generate quality reports.
Important notes
-
The tools only accept Unix line breakers `\n’. If your data has other than Unix line breakers, consider reformating it (using dos2unix or other tools).
-
The
--flexible
parameters can handle input reads with variable length as long as there is no read longer than double the length of the shortest read. (We don’t expect the user to have such a case, but just in case the user is experimenting with something.) -
The
template
functionality checks the top reads in the file; if the input data is sorted, the functionality might not find all samples. Make sure your data is not sorted, or wait for our future improvements.
Installation
You can use the static binary under bins directly; however, if you would like to build it from the source code:
You need to have Rust
and Cargo
installed first, check Rust documentation
git clone https://github.com/sagc-bioinformatics/mgikit.git
cd mgikit
cargo build --release
You can also install mgikit through conda, as it is available on Bioconda.
conda install bioconda::mgikit
Or create a conda environment with mgikit
conda create -n mgikit_env bioconda::mgikit
conda activate mgikit_env
Additionally, Biocontainers has docker image and singularity image for mgikit.
User Guide Table of Contents
Commercial Use
Please contact us if you want to use the software for commercial purposes.
Citation
If you use mgikit in your research, please cite the following publication:
ContributorsAl Bkhetan, Ziad, and Sen Wang. “mgikit: demultiplexing toolkit for MGI fastq files.” Bioinformatics 40.9 (2024): btae554.