Resources for Reusing Tools and Scripts

Overview

Teaching: 0 min
Exercises: 0 min

Questions

How to find other solutions/CWL recipes for awkward problems?

Objectives

Know good resources for finding solutions to common problems

Pre-written tool descriptions

When you start a CWL workflow, it is recommended to check if there is already a CWL document available for the tools you want to use. Bio-cwl-tools is a library of CWL documents for biology/life-sciences related tools.

The CWL documents of the previous steps were already provided for you, however, you can also find them in this library. In this episode you will use the bio-cwl-tools library to add the last step to the workflow.

Adding new step in workflow

The last step of our workflow is counting the RNA-seq reads for which we will use the featureCounts tool.

Exercise

Find the featureCounts tool in the bio-cwl-tools library. Have a look at the CWL document. Which inputs does this tool need? And what are the outputs of this tool?

Solution

The featureCounts CWL document can be found in the GitHub repo; it has 2 inputs: annotations (line 6) and mapped_reads, both files. These inputs can be found on lines 6 and 9. The output of this tool is a file called featurecounts (line 21).

We need a local copy of featureCounts in order to use it in our workflow. We already imported this as a git submodule during setup, so the tool should be located at bio-cwl-tools/subread/featureCounts.cwl.

Exercise

Add the featureCounts tool to the workflow. Similar to the STAR tool, this tool also needs more RAM than the default. To run the tool a minimum of 500 MiB of RAM is needed. Use a requirements entry with ResourceRequirement to allocate a ramMin of 500. Use the inputs and output of the previous exercise to connect this step to previous steps.

Solution

The workflow is complete and we only need to complete the YAML input file. The last entry in the input file is the annotations file.

workflow_input.yml

rna_reads_forward:
  class: File
  location: rnaseq/GSM461177_1_subsampled.fastqsanger
  format: https://edamontology.org/format_1930  # FASTQ
rna_reads_reverse:
  class: File
  location: rnaseq/GSM461177_2_subsampled.fastqsanger
  format: https://edamontology.org/format_1930  # FASTQ
ref_genome:
  class: Directory
  location: rnaseq/dm6-STAR-index
gene_model:
  class: File
  location: rnaseq/Drosophila_melanogaster.BDGP6.87.gtf
  format: https://edamontology.org/format_2306

You have finished the workflow and the input file and now you can run the whole workflow.

cwltool rna_seq_workflow.cwl workflow_input.yml

Key Points

bio-cwl-tools is a library of CWL documents for biology/life-sciences related tools

previous episode

Introduction to Workflows with Common Workflow Language

next episode

Resources for Reusing Tools and Scripts

Overview

Pre-written tool descriptions

Adding new step in workflow

Exercise

Solution

Exercise

Solution

Key Points

previous episode

next episode