Running the Pipeline

Getting Help

All steps in the MAVIS pipeline are called following the main mavis entry point. The usage menu can be viewed by running without any arguments, or by giving the -h/–help option

Example:

>>> mavis -h

Help sub-menus can be found by giving the pipeline step followed by no arguments or the -h options

>>> mavis cluster -h

Running MAVIS using a Job Scheduler

The default setup and main ‘pipeline’ step of MAVIS is set up to use a job scheduler on a compute cluster. Two schedulers are currently supported: SLURM and SGE. Using the pipeline step will generate submission scripts and a wrapper bash script for the user to execute on their cluster head node.

_images/pipeline_options.svg

Figure 1. The MAVIS pipeline is highly configurable. Some pipeline steps (cluster, validate) are optional and can be automatically skipped. The standard pipeline is far-left.

Standard

The most common use case is auto-generating a configuration file and then running the pipeline setup step. The pipeline setup step will run clustering and create scripts for running the other steps.

>>> mavis config .... -w config.cfg
>>> mavis pipeline config.cfg -o /path/to/top/output_dir

This will create submission scripts as follows

output_dir/
|-- library1/
|   |-- validation/<jobdir>/submit.sh
|   `-- annotation/<jobdir>/submit.sh
|-- library2/
|   |-- validation/<jobdir>/submit.sh
|   `-- annotation/<jobdir>/submit.sh
|-- pairing/submit.sh
|-- summary/submit.sh
`-- submit_pipeline_<batchid>.sh

The submit_pipeline_<batchid>.sh is the wrapper script which can be executed on the head node

>>> ssh cluster_head_node
>>> cd /path/to/output_dir
>>> bash submit_pipeline_<batchid>.sh

Non-Standard

To set up a non-standard pipeline and skip steps use the skip stage option.

>>> mavis pipeline /path/to/config -o /path/to/output/dir --skip_stage cluster
>>> mavis pipeline /path/to/config -o /path/to/output/dir --skip_stage validate

Or to skip both clustering and validation, simply call the option twice.

>>> mavis pipeline /path/to/config -o /path/to/output/dir --skip_stage cluster --skip_stage validate

Note

skipping clustering will still produce and output directory and files, but no merging will be done

Configuring Scheduler Settings

There are mutiple ways to configure the scheduler settings. Some of the configurable options are listed below

For example to set the job queue default using an environment variable

export MAVIS_QUEUE=QUEUENAME

Or to give it as an argument during config generation

mavis config -w /path/to/config --queue QUEUENAME

Finally it can also be added to the config file manually

[schedule]
queue = QUEUENAME