MetaShot is a curated set of Docker images and Nextflow workflows for metagenomics and microbiome genomics
Get started now View it on GitHub
Getting started
Dependencies
MetaShot requires Nextflow1 and Docker. MetaShot works also with alternative container engines, like Charliecloud and Singularity). See this page for more information about Nextflow and container engines.
Example - Use Singularity
For instance, if you want to use Singularity instead of Docker, comment the Docker lines in the nextflow.config
file (this file is present in each workflow) and add the following:
singularity.enabled = true
singularity.autoMounts = true
Alternatively, you can provide an extra configuration file by using the command line option -c <config_file>
(documentation).
HPC environments
MetaShot can run on several high-performance computational (HPC) environments, including GridEngine, SLURM, PBS, Amazon AWS, Google Cloud and Microsoft Azure platforms (see this page).
Quick start: run the Kraken2/Braken workflow on the local machine
This example shows how to run metashot/kraken2, a pipeline for the taxonomic classification of reads and abundance estimation of species in metagenomic samples. It relies on two related software, Kraken2 and Bracken.
- Download and extract/unzip a Kraken2/Bracken database available at this page;
-
Start running the analysis on the compressed paired-end sequences in FASTQ format:
nextflow run metashot/kraken2 -r 1.0.1 \ --reads '*_R{1,2}.fastq.gz' \ --kraken2_db k2db \ --read_len 100 \ --outdir results
-
The pipeline will create in the
results
folder the following files and directories:bracken combined_bracken combined_bracken_mpa combined_bracken_report combined.kraken2.mpa combined.kraken2.report kraken2 raw_reads_stats
System requirements
Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For some of the steps in the pipeline, if the job exits with an error it will automatically resubmit with higher requests (see the file process.config
). You can customize the compute resources that the pipeline requests by either:
- setting the global parameters
--max_cpus
,--max_memory
and--max_time
in the command line, or - creating a custom config file (
-c
or-C
parameters), or - modifying the
process.config
file.
Reproducibility
We recommend to specify a pipeline version when running the workflow on your data with the -r
parameter, e.g.:
nextflow run metashot/kraken2 -r 1.0.0 ...
The workflows use the docker images available at MetaShot Docker Hub repositories for reproducibility. You can check the version of the software used in each workflow by opening the file process.config
. For example container = metashot/kraken2:2.0.9-beta-6
means that the version of kraken2 is the 2.0.9-beta
(the last number, 6, is the metashot release of this image).
Credits
MetaShot is maintained by Davide Albanese at the FEM’s Unit of Computational Biology.
-
Di Tommaso, P., Chatzou, M., Floden, E. et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319 (2017). https://doi.org/10.1038/nbt.3820 ↩