Home - MetaShot

MetaShot is a curated set of Docker images and Nextflow workflows for metagenomics and microbiome genomics

Getting started

Dependencies

MetaShot requires Nextflow¹ and Docker. MetaShot works also with alternative container engines, like Charliecloud and Singularity). See this page for more information about Nextflow and container engines.

Example - Use Singularity

For instance, if you want to use Singularity instead of Docker, comment the Docker lines in the nextflow.config file (this file is present in each workflow) and add the following:

singularity.enabled = true
singularity.autoMounts = true

Alternatively, you can provide an extra configuration file by using the command line option -c <config_file> (documentation).

HPC environments

MetaShot can run on several high-performance computational (HPC) environments, including GridEngine, SLURM, PBS, Amazon AWS, Google Cloud and Microsoft Azure platforms (see this page).

Quick start: run the Kraken2/Braken workflow on the local machine

This example shows how to run metashot/kraken2, a pipeline for the taxonomic classification of reads and abundance estimation of species in metagenomic samples. It relies on two related software, Kraken2 and Bracken.

Download and extract/unzip a Kraken2/Bracken database available at this page;

Start running the analysis on the compressed paired-end sequences in FASTQ format:

nextflow run metashot/kraken2 -r 1.0.1 \
  --reads '*_R{1,2}.fastq.gz' \
  --kraken2_db k2db \
  --read_len 100 \
  --outdir results

The pipeline will create in the results folder the following files and directories:

bracken  combined_bracken  combined_bracken_mpa  combined_bracken_report  combined.kraken2.mpa  combined.kraken2.report  kraken2  raw_reads_stats

System requirements

Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For some of the steps in the pipeline, if the job exits with an error it will automatically resubmit with higher requests (see the file process.config). You can customize the compute resources that the pipeline requests by either:

setting the global parameters --max_cpus, --max_memory and --max_time in the command line, or
creating a custom config file (-c or -C parameters), or
modifying the process.config file.

Reproducibility

We recommend to specify a pipeline version when running the workflow on your data with the -r parameter, e.g.:

  nextflow run metashot/kraken2 -r 1.0.0 ...

The workflows use the docker images available at MetaShot Docker Hub repositories for reproducibility. You can check the version of the software used in each workflow by opening the file process.config. For example container = metashot/kraken2:2.0.9-beta-6 means that the version of kraken2 is the 2.0.9-beta (the last number, 6, is the metashot release of this image).

Credits

MetaShot is maintained by Davide Albanese at the FEM’s Unit of Computational Biology.

Di Tommaso, P., Chatzou, M., Floden, E. et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319 (2017). https://doi.org/10.1038/nbt.3820 ↩