Quickstart
In this quickstart you will learn to: 1. create a workflow 2. create a workflow run 3. execute the run
1. Create a workflow
We assume you downloaded and unzipped molgenis compute commandline and are now in the directory you downloaded.
You can generate a template for a new workflow using command:
This will create a new directory for the workflow:
The directory contains a typical Molgenis Compute workflow structure
Define workflow
You can define a workflow of steps using the workflow.csv file. For example:
This example consists of two steps 'step1' and 'step2', where 'step2' depends on 'step1'. 'step1' has its contents in the file protocols/step1.sh and 'step2' in the file protocols/step2.sh respectively.
If we want parameter values to flow between steps, we can also map the parameters:
Define parameters
To feed parameter values to your workflow you can also use simple csv files. In this example, one parameter 'input' has two values 'hello' and 'bye':
Define step contents
Finally, you need to implement what needs to happen at each step. We therefor define for each step a 'protocol'. Protocols are simply bash scripts containing the commands you want to run
For example protocols/step1.sh:
Given the parameters above, 'input' will be substituted with values 'hello' or 'bye'. In addition, the contents of 'out' will be available to the next step.
Inputs can either be '#string' for variables with a single value or '#list' for variables with multiple values. The outputs are specified with the flag '#output'
In the same way, we can map outputs of one step to the inputs of the next steps. In our example, 'strings' in the 'step2', which has protocol step2.sh
The example protocols has the following listings:
In our example variables 'date' and 'wf' are defined in an additional parameters file +workflow.defaults.csv+.
In this way, the parameters can be divided in several groups and re-used in different workflows. If users do not like to map parameters, they should use the same names in protocols and parameters files. This makes parameters a kind of global.
2. Generate jobs
Once you defined your workflow you can generate 1000s of jobs. Just change the parameter values to have different runs.
N.B. always use full paths to parameter files, workflow etc
or with a short command-line version
The directory rundir
is created.
It contains a number of files
.sh are actual scripts generated from the specified workflow. 'step1' has two scripts and 'step2' has only one, because it treats outputs from scripts of the 'step1' as a list, which is specified in step2.sh by
user.env contains all actual parameters mappings. In this example:
Parameters, which are known before hand can be connected to the environment file or weaved directly in the protocols (if 'weave' flag is set in command-line options). In our example, two shell scripts are generated for the 'step1'. The weaved version of generated files are shown below.
step1_0.sh:
and step1_1.sh
The output values of the first steps are not known beforehand, so, 'string' cannot be weaved and will stay in the generated for the 'step2' script as it was. However, the 'wf' and 'date' values are weaved.
step2_0.sh:
If values can be known, the script will have the following content
step2_0.sh with all known values:
If 'weaved' flag is not set, +step1_0.sh+ file, for example looks as follows:
In this way, users can choose how generated files look like. In the current implementation, values are first taken from parameter files. If they are not present, then compute looks, if these values can be known at run-time, by analysing all previous steps of the protocol, where values are unknown. If values cannot be known at run-time, compute will give a generation error.
3. Execute workflow
Execute locally
Compute can execute the jobs locally with command:
Now, rundir contains more files
.started and .finished files are created, when certain jobs are started and finished respectively.
In our example, 'strings' variable from 'step2' requires run-time values produced in 'step1'. These values are taken from step1_X.env files. For example:
step1_0.env:
In the workflow.csv file, it is specified with a simple '.'
and substituted with 'has' in generated script files.
Last updated