Process definitions

ATAC-Seq

data:workflow:atacseqworkflow-atac-seq (data:reads:fastq  reads, data:index:bowtie2  genome, data:bed  promoter, basic:string  mode, basic:string  speed, basic:boolean  use_se, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:integer  trim_5, basic:integer  trim_3, basic:integer  trim_iter, basic:integer  trim_nucl, basic:string  rep_mode, basic:integer  k_reports, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:boolean  tagalign, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff)[Source: v3.1.1]

This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment, pre-peakcall QC, and calling peaks (with post-peakcall QC). First, reads are aligned to a genome using [Bowtie2](http://bowtie-bio.sourceforge.net/index.shtml) aligner. Next, pre-peakcall QC metrics are calculated. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). Finally, the peaks are called using [MACS2](https://github.com/taoliu/MACS/). The post-peakcall QC report includes additional QC metrics – number of peaks, fraction of reads in peaks (FRiP), number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq

genome
label:

Genome

type:

data:index:bowtie2

promoter
label:

Promoter regions BED file

type:

data:bed

description:

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required:

False

alignment.mode
label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--local

choices:

  • end to end mode: --end-to-end

  • local: --local

alignment.speed
label:

Speed vs. Sensitivity

type:

basic:string

default:

--sensitive

choices:

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

alignment.PE_options.use_se
label:

Map as single-ended (for paired-end reads only)

type:

basic:boolean

description:

If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.

default:

False

alignment.PE_options.discordantly
label:

Report discordantly matched read

type:

basic:boolean

description:

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default:

True

alignment.PE_options.rep_se
label:

Report single ended

type:

basic:boolean

description:

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.

default:

True

alignment.PE_options.minins
label:

Minimal distance

type:

basic:integer

description:

The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.

default:

0

alignment.PE_options.maxins
label:

Maximal distance

type:

basic:integer

description:

The maximum fragment length for valid paired-end alignments.

default:

2000

alignment.start_trimming.trim_5
label:

Bases to trim from 5’

type:

basic:integer

description:

Number of bases to trim from from 5’ (left) end of each read before alignment.

default:

0

alignment.start_trimming.trim_3
label:

Bases to trim from 3’

type:

basic:integer

description:

Number of bases to trim from from 3’ (right) end of each read before alignment

default:

0

alignment.trimming.trim_iter
label:

Iterations

type:

basic:integer

description:

Number of iterations.

default:

0

alignment.trimming.trim_nucl
label:

Bases to trim

type:

basic:integer

description:

Number of bases to trim from 3’ end in each iteration.

default:

2

alignment.reporting.rep_mode
label:

Report mode

type:

basic:string

description:

Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments

default:

def

choices:

  • Default mode: def

  • -k mode: k

  • -a mode (very slow): a

alignment.reporting.k_reports
label:

Number of reports (for -k mode only)

type:

basic:integer

description:

Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first.

default:

5

prepeakqc_settings.q_threshold
label:

Quality filtering threshold

type:

basic:integer

default:

30

prepeakqc_settings.n_sub
label:

Number of reads to subsample

type:

basic:integer

default:

25000000

prepeakqc_settings.tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default:

True

prepeakqc_settings.shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

default:

0

settings.tagalign
label:

Use tagAlign files

type:

basic:boolean

description:

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

default:

True

settings.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

settings.tagalign

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!settings.tagalign

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label:

Q-value cutoff

type:

basic:decimal

description:

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required:

False

disabled:

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

False

disabled:

settings.qvalue

hidden:

settings.tagalign

settings.pvalue_prepeak
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled:

settings.qvalue

hidden:

!settings.tagalign || settings.qvalue

default:

0.01

settings.cap_num
label:

Cap number of peaks by taking top N peaks

type:

basic:integer

description:

To keep all peaks set value to 0.

disabled:

settings.broad

default:

300000

settings.mfold_lower
label:

MFOLD range (lower limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.mfold_upper
label:

MFOLD range (upper limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.slocal
label:

Small local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.llocal
label:

Large local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.extsize
label:

extsize

type:

basic:integer

description:

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

default:

150

settings.shift
label:

Shift

type:

basic:integer

description:

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

default:

-75

settings.band_width
label:

Band width

type:

basic:integer

description:

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required:

False

settings.nolambda
label:

Use backgroud lambda as local lambda

type:

basic:boolean

description:

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default:

False

settings.fix_bimodal
label:

Turn on the auto paired-peak model process

type:

basic:boolean

description:

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default:

False

settings.nomodel
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

settings.tagalign

default:

False

settings.nomodel_prepeak
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

!settings.tagalign

default:

True

settings.down_sample
label:

Down-sample

type:

basic:boolean

description:

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default:

False

settings.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default:

True

settings.spmr
label:

Save signal per million reads for fragment pileup profiles

type:

basic:boolean

disabled:

settings.bedgraph === false

default:

True

settings.call_summits
label:

Call summits

type:

basic:boolean

description:

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default:

True

settings.broad
label:

Composite broad regions

type:

basic:boolean

description:

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled:

settings.call_summits === true

default:

False

settings.broad_cutoff
label:

Broad cutoff

type:

basic:decimal

description:

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required:

False

disabled:

settings.call_summits === true || settings.broad !== true

Output results

Abstract alignment process

data:alignmentabstract-alignment ()[Source: v1.0.1]

Input arguments

Output results

bam
label:

Alignment file

type:

basic:file

bai
label:

Alignment index BAI

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Abstract annotation process

data:annotationabstract-annotation ()[Source: v1.0.1]

Input arguments

Output results

annot
label:

Uploaded file

type:

basic:file

source
label:

Gene ID source

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Abstract bed process

data:bedabstract-bed ()[Source: v1.0.2]

Input arguments

Output results

bed
label:

BED

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Abstract differential expression process

data:differentialexpressionabstract-differentialexpression ()[Source: v1.0.1]

Input arguments

Output results

raw
label:

Differential expression (gene level)

type:

basic:file

de_json
label:

Results table (JSON)

type:

basic:json

de_file
label:

Results table (file)

type:

basic:file

source
label:

Gene ID source

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

Abstract expression process

data:expressionabstract-expression ()[Source: v1.0.1]

Input arguments

Output results

exp
label:

Normalized expression

type:

basic:file

rc
label:

Read counts

type:

basic:file

required:

False

exp_json
label:

Expression (json)

type:

basic:json

exp_type
label:

Expression type

type:

basic:string

source
label:

Gene ID source

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

Annotate novel splice junctions (regtools)

data:junctions:regtoolsregtools-junctions-annotate (data:seq:nucleotide  genome, data:annotation:gtf  annotation, data:alignment:bam:star  alignment_star, data:alignment:bam  alignment, data:bed  input_bed_junctions)[Source: v1.3.1]

Identify novel splice junctions by using regtools to annotate against a reference. The process accepts reference genome, reference genome annotation (GTF), and input with reads information (STAR aligment or reads aligned by any other aligner or junctions in BED12 format). If STAR aligner data is given as input, the process calculates BED12 file from STAR ‘SJ.out.tab’ file, and annotates all junctions with ‘regtools junctions annotate’ command. When reads are aligned by other aligner, junctions are extracted with ‘regtools junctions extract’ tool and then annotated with ‘junction annotate’ command. Third option allows user to provide directly BED12 file with junctions, which are then annotated. Finnally, annotated novel junctions are filtered in a separate output file. More information can be found in the [regtools manual](https://regtools.readthedocs.io/en/latest/).

Input arguments

genome
label:

Reference genome

type:

data:seq:nucleotide

annotation
label:

Reference genome annotation (GTF)

type:

data:annotation:gtf

alignment_star
label:

STAR alignment

type:

data:alignment:bam:star

description:

Splice junctions detected by STAR aligner (SJ.out.tab STAR output file). Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.

required:

False

alignment
label:

Alignment

type:

data:alignment:bam

description:

Aligned reads from which splice junctions are going to be extracted. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.

required:

False

input_bed_junctions
label:

Junctions in BED12 format

type:

data:bed

description:

Splice junctions in BED12 format. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.

required:

False

Output results

novel_splice_junctions
label:

Table of annotated novel splice junctions

type:

basic:file

splice_junctions
label:

Table of annotated splice junctions

type:

basic:file

novel_sj_bed
label:

Novel splice junctions in BED format

type:

basic:file

bed
label:

Splice junctions in BED format

type:

basic:file

novel_sj_bigbed_igv_ucsc
label:

Novel splice junctions in BigBed format

type:

basic:file

required:

False

bigbed_igv_ucsc
label:

Splice junctions in BigBed format

type:

basic:file

required:

False

novel_sj_tbi_jbrowse
label:

Novel splice junctions bed tbi index for JBrowse

type:

basic:file

tbi_jbrowse
label:

Bed tbi index for JBrowse

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Archive samples

data:archive:samplesarchive-samples (list:data  data, list:basic:string  fields, basic:boolean  j)[Source: v0.5.2]

Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names.

Input arguments

data
label:

Data list

type:

list:data

fields
label:

Output file fields

type:

list:basic:string

j
label:

Junk paths

type:

basic:boolean

description:

Store just names of saved files (junk the path)

default:

False

Output results

archive
label:

Archive

type:

basic:file

BAM file

data:alignment:bam:uploadupload-bam (basic:file  src, basic:string  species, basic:string  build)[Source: v1.8.0]

Import a BAM file (.bam), which is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

Input arguments

src
label:

Mapping (BAM)

type:

basic:file

description:

A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.

validate_regex:

\.(bam)$

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Build

type:

basic:string

Output results

bam
label:

Uploaded file

type:

basic:file

bai
label:

Index BAI

type:

basic:file

stats
label:

Alignment statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BAM file and index

data:alignment:bam:uploadupload-bam-indexed (basic:file  src, basic:file  src2, basic:string  species, basic:string  build)[Source: v1.8.0]

Import a BAM file (.bam) and BAM index (.bam.bai). BAM file is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

Input arguments

src
label:

Mapping (BAM)

type:

basic:file

description:

A mapping file in BAM format.

validate_regex:

\.(bam)$

src2
label:

bam index (*.bam.bai file)

type:

basic:file

description:

An index file of a BAM mapping file (ending with bam.bai).

validate_regex:

\.(bam.bai)$

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Build

type:

basic:string

Output results

bam
label:

Uploaded file

type:

basic:file

bai
label:

Index BAI

type:

basic:file

stats
label:

Alignment statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BBDuk (paired-end)

data:reads:fastq:paired:bbduk:bbduk-paired (data:reads:fastq:paired  reads, basic:integer  min_length, list:data:seq:nucleotide  sequences, list:basic:string  literal_sequences, basic:integer  kmer_length, basic:boolean  check_reverse_complements, basic:boolean  mask_middle_base, basic:integer  min_kmer_hits, basic:decimal  min_kmer_fraction, basic:decimal  min_coverage_fraction, basic:integer  hamming_distance, basic:integer  query_hamming_distance, basic:integer  edit_distance, basic:integer  hamming_distance2, basic:integer  query_hamming_distance2, basic:integer  edit_distance2, basic:boolean  forbid_N, basic:boolean  find_best_match, basic:boolean  remove_if_either_bad, basic:boolean  perform_error_correction, basic:string  k_trim, basic:string  k_mask, basic:boolean  mask_fully_covered, basic:integer  min_k, basic:string  quality_trim, basic:integer  trim_quality, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality, basic:integer  trim_poly_A, basic:decimal  min_length_fraction, basic:integer  max_length, basic:integer  min_average_quality, basic:integer  min_average_quality_bases, basic:integer  min_base_quality, basic:integer  min_consecutive_bases, basic:integer  trim_pad, basic:boolean  trim_by_overlap, basic:boolean  strict_overlap, basic:integer  min_overlap, basic:integer  min_insert, basic:boolean  trim_pairs_evenly, basic:integer  force_trim_left, basic:integer  force_trim_right, basic:integer  force_trim_right2, basic:integer  force_trim_mod, basic:integer  restrict_left, basic:integer  restrict_right, basic:decimal  min_GC, basic:decimal  max_GC, basic:integer  maxns, basic:boolean  toss_junk, basic:boolean  chastity_filter, basic:boolean  barcode_filter, list:data:seq:nucleotide  barcode_files, list:basic:string  barcode_sequences, basic:integer  x_min, basic:integer  y_min, basic:integer  x_max, basic:integer  y_max, basic:decimal  entropy, basic:integer  entropy_window, basic:integer  entropy_k, basic:boolean  entropy_mask, basic:integer  min_base_frequency, basic:boolean  nogroup)[Source: v3.1.2]

Run BBDuk on paired-end reads. BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

Input arguments

reads
label:

Reads

type:

data:reads:fastq:paired

required:

True

disabled:

False

hidden:

False

min_length
label:

Minimum length

type:

basic:integer

description:

Reads shorter than the minimum length will be discarded after trimming.

required:

True

disabled:

False

hidden:

False

default:

10

reference.sequences
label:

Sequences

type:

list:data:seq:nucleotide

description:

Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.

required:

False

disabled:

False

hidden:

False

reference.literal_sequences
label:

Literal sequences

type:

list:basic:string

description:

Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

processing.kmer_length
label:

Kmer length

type:

basic:integer

description:

Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.

required:

True

disabled:

False

hidden:

False

default:

27

processing.check_reverse_complements
label:

Check reverse complements

type:

basic:boolean

description:

Look for reverse complements of kmers in addition to forward kmers.

required:

True

disabled:

False

hidden:

False

default:

True

processing.mask_middle_base
label:

Mask the middle base of a kmer

type:

basic:boolean

description:

Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.

required:

True

disabled:

False

hidden:

False

default:

True

processing.min_kmer_hits
label:

Minimum number of kmer hits

type:

basic:integer

description:

Reads need at least this many matching kmers to be considered matching the reference.

required:

True

disabled:

False

hidden:

False

default:

1

processing.min_kmer_fraction
label:

Minimum kmer fraction

type:

basic:decimal

description:

A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.

required:

True

disabled:

False

hidden:

False

default:

0.0

processing.min_coverage_fraction
label:

Minimum kmer fraction

type:

basic:decimal

description:

A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.

required:

True

disabled:

False

hidden:

False

default:

0.0

processing.hamming_distance
label:

Maximum Hamming distance for kmers (substitutions only)

type:

basic:integer

description:

Hamming distance i.e. the number of mismatches allowed in the kmer.

required:

True

disabled:

False

hidden:

False

default:

0

processing.query_hamming_distance
label:

Hamming distance for query kmers

type:

basic:integer

description:

Set a hamming distance for query kmers instead of the read kmers. This makes the read processing much slower, but does not use additional memory.

required:

True

disabled:

False

hidden:

False

default:

0

processing.edit_distance
label:

Maximum edit distance from reference kmers (substitutions and indels)

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.hamming_distance2
label:

Hamming distance for short kmers when looking for shorter kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.query_hamming_distance2
label:

Hamming distance for short query kmers when looking for shorter kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.edit_distance2
label:

Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.forbid_N
label:

Forbid matching of read kmers containing N

type:

basic:boolean

description:

By default, these will match a reference ‘A’ if’Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.

required:

True

disabled:

False

hidden:

False

default:

False

processing.find_best_match
label:

Find best match

type:

basic:boolean

description:

If multiple matches, associate read with sequence sharing most kmers.

required:

True

disabled:

False

hidden:

False

default:

True

processing.remove_if_either_bad
label:

Remove both sequences of a paired-end read, if either of them is to be removed

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

processing.perform_error_correction
label:

Perform error correction with BBMerge prior to kmer operations

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.k_trim
label:

Trimming protocol to remove bases matching reference kmers from reads

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

f

choices:

  • Don’t trim: f

  • Trim to the right: r

  • Trim to the left: l

operations.k_mask
label:

Symbol to replace bases matching reference kmers

type:

basic:string

description:

Allows any non-whitespace character other than t or f. Processes short kmers on both ends.

required:

True

disabled:

False

hidden:

False

default:

f

operations.mask_fully_covered
label:

Only mask bases that are fully covered by kmers

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.min_k
label:

Look for shorter kmers at read tips down to this length when k-trimming or masking

type:

basic:integer

description:

-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.

required:

True

disabled:

False

hidden:

False

default:

-1

operations.quality_trim
label:

Trimming protocol to remove bases with quality below the minimum average region quality from read ends

type:

basic:string

description:

Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.

required:

True

disabled:

False

hidden:

False

default:

f

choices:

  • Trim neither end: f

  • Trim both ends: rl

  • Trim only right end: r

  • Trim only left end: l

  • Use sliding window: w

operations.trim_quality
label:

Average quality below which to trim region

type:

basic:integer

description:

Set trimming protocol to enable this parameter.

required:

True

disabled:

operations.quality_trim === ‘f’

hidden:

False

default:

6

operations.quality_encoding_offset
label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+ (33): 33

  • Illumina up to 1.3+, 1.5+ (64): 64

  • Auto: auto

operations.ignore_bad_quality
label:

Don’t crash if quality values appear to be incorrect

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.trim_poly_A
label:

Minimum length of poly-A or poly-T tails to trim on either end of reads

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_length_fraction
label:

Minimum length fraction

type:

basic:decimal

description:

Reads shorter than this fraction of original length after trimming will be discarded.

required:

True

disabled:

False

hidden:

False

default:

0.0

operations.max_length
label:

Maximum length

type:

basic:integer

description:

Reads longer than this after trimming will be discarded.

required:

False

disabled:

False

hidden:

False

operations.min_average_quality
label:

Minimum average quality

type:

basic:integer

description:

Reads with average quality (after trimming) below this will be discarded.

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_average_quality_bases
label:

Number of initial bases to calculate minimum average quality from

type:

basic:integer

description:

If positive, calculate minimum average quality from this many initial bases

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_base_quality
label:

Minimum base quality below which reads are discarded after trimming

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_consecutive_bases
label:

Minimum number of consecutive called bases

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.trim_pad
label:

Number of bases to trim around matching kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.trim_by_overlap
label:

Trim adapters based on where paired-end reads overlap

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.strict_overlap
label:

Adjust sensitivity in ‘Trim adapters based on where paired-end reads overlap’ mode

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

operations.min_overlap
label:

Minum number of overlapping bases

type:

basic:integer

description:

Require this many bases of overlap for detection.

required:

True

disabled:

False

hidden:

False

default:

14

operations.min_insert
label:

Minimum insert size

type:

basic:integer

description:

Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.

required:

True

disabled:

False

hidden:

False

default:

40

operations.trim_pairs_evenly
label:

Trim both sequences of paired-end reads to the minimum length of either sequence

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.force_trim_left
label:

Position from which to trim bases to the left

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.force_trim_right
label:

Position from which to trim bases to the right

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.force_trim_right2
label:

Number of bases to trim from the right end

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.force_trim_mod
label:

Modulo to right-trim reads

type:

basic:integer

description:

Trim reads to the largest multiple of modulo.

required:

True

disabled:

False

hidden:

False

default:

0

operations.restrict_left
label:

Number of leftmost bases to look in for kmer matches

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.restrict_right
label:

Number of rightmost bases to look in for kmer matches

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_GC
label:

Minimum GC content

type:

basic:decimal

description:

Discard reads with lower GC content.

required:

True

disabled:

False

hidden:

False

default:

0.0

operations.max_GC
label:

Maximum GC content

type:

basic:decimal

description:

Discard reads with higher GC content.

required:

True

disabled:

False

hidden:

False

default:

1.0

operations.maxns
label:

Max Ns after trimming

type:

basic:integer

description:

If non-negative, reads with more Ns than this (after trimming) will be discarded.

required:

True

disabled:

False

hidden:

False

default:

-1

operations.toss_junk
label:

Discard reads with invalid characters as bases

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

header_parsing.chastity_filter
label:

Discard reads that fail Illumina chastity filtering

type:

basic:boolean

description:

Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.

required:

True

disabled:

False

hidden:

False

default:

False

header_parsing.barcode_filter
label:

Remove reads with unexpected barcodes

type:

basic:boolean

description:

Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise. A barcode must be the last part of the read header.

required:

True

disabled:

False

hidden:

False

default:

False

header_parsing.barcode_files
label:

Barcode sequences

type:

list:data:seq:nucleotide

description:

FASTA file(s) with barcode sequences.

required:

False

disabled:

False

hidden:

False

header_parsing.barcode_sequences
label:

Literal barcode sequences

type:

list:basic:string

description:

Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

header_parsing.x_min
label:

Minimum X coordinate

type:

basic:integer

description:

If positive, discard reads with a smaller X coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

header_parsing.y_min
label:

Minimum Y coordinate

type:

basic:integer

description:

If positive, discard reads with a smaller Y coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

header_parsing.x_max
label:

Maximum X coordinate

type:

basic:integer

description:

If positive, discard reads with a larger X coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

header_parsing.y_max
label:

Maximum Y coordinate

type:

basic:integer

description:

If positive, discard reads with a larger Y coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

complexity.entropy
label:

Minimum entropy

type:

basic:decimal

description:

Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.

required:

True

disabled:

False

hidden:

False

default:

-1.0

complexity.entropy_window
label:

Length of sliding window used to calculate entropy

type:

basic:integer

description:

To use the sliding window set minimum entropy in range between 0.0 and 1.0.

required:

True

disabled:

False

hidden:

False

default:

50

complexity.entropy_k
label:

Length of kmers used to calcuate entropy

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

5

complexity.entropy_mask
label:

Mask low-entropy parts of sequences with N instead of discarding

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

complexity.min_base_frequency
label:

Minimum base frequency

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

fastqc.nogroup
label:

Disable grouping of bases for reads >50bp

type:

basic:boolean

description:

All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Remaining upstream reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Remaining downstream reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

statistics
label:

Statistics

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Upstream quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Downstream quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download upstream FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download downstream FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

BBDuk (single-end)

data:reads:fastq:single:bbduk:bbduk-single (data:reads:fastq:single  reads, basic:integer  min_length, list:data:seq:nucleotide  sequences, list:basic:string  literal_sequences, basic:integer  kmer_length, basic:boolean  check_reverse_complements, basic:boolean  mask_middle_base, basic:integer  min_kmer_hits, basic:decimal  min_kmer_fraction, basic:decimal  min_coverage_fraction, basic:integer  hamming_distance, basic:integer  query_hamming_distance, basic:integer  edit_distance, basic:integer  hamming_distance2, basic:integer  query_hamming_distance2, basic:integer  edit_distance2, basic:boolean  forbid_N, basic:boolean  find_best_match, basic:string  k_trim, basic:string  k_mask, basic:boolean  mask_fully_covered, basic:integer  min_k, basic:string  quality_trim, basic:integer  trim_quality, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality, basic:integer  trim_poly_A, basic:decimal  min_length_fraction, basic:integer  max_length, basic:integer  min_average_quality, basic:integer  min_average_quality_bases, basic:integer  min_base_quality, basic:integer  min_consecutive_bases, basic:integer  trim_pad, basic:integer  min_overlap, basic:integer  min_insert, basic:integer  force_trim_left, basic:integer  force_trim_right, basic:integer  force_trim_right2, basic:integer  force_trim_mod, basic:integer  restrict_left, basic:integer  restrict_right, basic:decimal  min_GC, basic:decimal  max_GC, basic:integer  maxns, basic:boolean  toss_junk, basic:boolean  chastity_filter, basic:boolean  barcode_filter, list:data:seq:nucleotide  barcode_files, list:basic:string  barcode_sequences, basic:integer  x_min, basic:integer  y_min, basic:integer  x_max, basic:integer  y_max, basic:decimal  entropy, basic:integer  entropy_window, basic:integer  entropy_k, basic:boolean  entropy_mask, basic:integer  min_base_frequency, basic:boolean  nogroup)[Source: v3.1.2]

Run BBDuk on single-end reads. BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

Input arguments

reads
label:

Reads

type:

data:reads:fastq:single

required:

True

disabled:

False

hidden:

False

min_length
label:

Minimum length

type:

basic:integer

description:

Reads shorter than the minimum length will be discarded after trimming.

required:

True

disabled:

False

hidden:

False

default:

10

reference.sequences
label:

Sequences

type:

list:data:seq:nucleotide

description:

Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.

required:

False

disabled:

False

hidden:

False

reference.literal_sequences
label:

Literal sequences

type:

list:basic:string

description:

Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

processing.kmer_length
label:

Kmer length

type:

basic:integer

description:

Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.

required:

True

disabled:

False

hidden:

False

default:

27

processing.check_reverse_complements
label:

Check reverse complements

type:

basic:boolean

description:

Look for reverse complements of kmers in addition to forward kmers

required:

True

disabled:

False

hidden:

False

default:

True

processing.mask_middle_base
label:

Mask the middle base of a kmer

type:

basic:boolean

description:

Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.

required:

True

disabled:

False

hidden:

False

default:

True

processing.min_kmer_hits
label:

Minimum number of kmer hits

type:

basic:integer

description:

Reads need at least this many matching kmers to be considered matching the reference.

required:

True

disabled:

False

hidden:

False

default:

1

processing.min_kmer_fraction
label:

Minimum kmer fraction

type:

basic:decimal

description:

A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.

required:

True

disabled:

False

hidden:

False

default:

0.0

processing.min_coverage_fraction
label:

Minimum coverage fraction

type:

basic:decimal

description:

A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.

required:

True

disabled:

False

hidden:

False

default:

0.0

processing.hamming_distance
label:

Maximum Hamming distance for kmers (substitutions only)

type:

basic:integer

description:

Hamming distance i.e. the number of mismatches allowed in the kmer.

required:

True

disabled:

False

hidden:

False

default:

0

processing.query_hamming_distance
label:

Hamming distance for query kmers

type:

basic:integer

description:

Set a hamming distance for query kmers instead of the read kmers. This makes the read processing much slower, but does not use additional memory.

required:

True

disabled:

False

hidden:

False

default:

0

processing.edit_distance
label:

Maximum edit distance from reference kmers (substitutions and indels)

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.hamming_distance2
label:

Hamming distance for short kmers when looking for shorter kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.query_hamming_distance2
label:

Hamming distance for short query kmers when looking for shorter kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.edit_distance2
label:

Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

processing.forbid_N
label:

Forbid matching of read kmers containing N

type:

basic:boolean

description:

By default, these will match a reference ‘A’ if’Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.

required:

True

disabled:

False

hidden:

False

default:

False

processing.find_best_match
label:

Find best match

type:

basic:boolean

description:

If multiple matches, associate read with sequence sharing most kmers.

required:

True

disabled:

False

hidden:

False

default:

True

operations.k_trim
label:

Trimming protocol to remove bases matching reference kmers from reads

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

f

choices:

  • Don’t trim: f

  • Trim to the right: r

  • Trim to the left: l

operations.k_mask
label:

Symbol to replace bases matching reference kmers

type:

basic:string

description:

Allows any non-whitespace character other than t or f. Processes short kmers on both ends.

required:

True

disabled:

False

hidden:

False

default:

f

operations.mask_fully_covered
label:

Only mask bases that are fully covered by kmers

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.min_k
label:

Look for shorter kmers at read tips down to this length when k-trimming or masking

type:

basic:integer

description:

-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.

required:

True

disabled:

False

hidden:

False

default:

-1

operations.quality_trim
label:

Trimming protocol to remove bases with quality below the minimum average region quality from read ends

type:

basic:string

description:

Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.

required:

True

disabled:

False

hidden:

False

default:

f

choices:

  • Trim neither end: f

  • Trim both ends: rl

  • Trim only right end: r

  • Trim only left end: l

  • Use sliding window: w

operations.trim_quality
label:

Average quality below which to trim region

type:

basic:integer

description:

Set trimming protocol to enable this parameter.

required:

True

disabled:

operations.quality_trim === ‘f’

hidden:

False

default:

6

operations.quality_encoding_offset
label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+ (33): 33

  • Illumina up to 1.3+, 1.5+ (64): 64

  • Auto: auto

operations.ignore_bad_quality
label:

Don’t crash if quality values appear to be incorrect

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

operations.trim_poly_A
label:

Minimum length of poly-A or poly-T tails to trim on either end of reads

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_length_fraction
label:

Minimum length fraction

type:

basic:decimal

description:

Reads shorter than this fraction of original length after trimming will be discarded.

required:

True

disabled:

False

hidden:

False

default:

0.0

operations.max_length
label:

Maximum length

type:

basic:integer

description:

Reads longer than this after trimming will be discarded.

required:

False

disabled:

False

hidden:

False

operations.min_average_quality
label:

Minimum average quality

type:

basic:integer

description:

Reads with average quality (after trimming) below this will be discarded.

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_average_quality_bases
label:

Number of initial bases to calculate minimum average quality from

type:

basic:integer

description:

If positive, calculate minimum average quality from this many initial bases

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_base_quality
label:

Minimum base quality below which reads are discarded after trimming

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_consecutive_bases
label:

Minimum number of consecutive called bases

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.trim_pad
label:

Number of bases to trim around matching kmers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_overlap
label:

Minum number of overlapping bases

type:

basic:integer

description:

Require this many bases of overlap for detection.

required:

True

disabled:

False

hidden:

False

default:

14

operations.min_insert
label:

Minimum insert size

type:

basic:integer

description:

Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.

required:

True

disabled:

False

hidden:

False

default:

40

operations.force_trim_left
label:

Position from which to trim bases to the left

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.force_trim_right
label:

Position from which to trim bases to the right

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.force_trim_right2
label:

Number of bases to trim from the right end

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.force_trim_mod
label:

Modulo to right-trim reads

type:

basic:integer

description:

Trim reads to the largest multiple of modulo.

required:

True

disabled:

False

hidden:

False

default:

0

operations.restrict_left
label:

Number of leftmost bases to look in for kmer matches

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.restrict_right
label:

Number of rightmost bases to look in for kmer matches

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

operations.min_GC
label:

Minimum GC content

type:

basic:decimal

description:

Discard reads with lower GC content.

required:

True

disabled:

False

hidden:

False

default:

0.0

operations.max_GC
label:

Maximum GC content

type:

basic:decimal

description:

Discard reads with higher GC content.

required:

True

disabled:

False

hidden:

False

default:

1.0

operations.maxns
label:

Max Ns after trimming

type:

basic:integer

description:

If non-negative, reads with more Ns than this (after trimming) will be discarded.

required:

True

disabled:

False

hidden:

False

default:

-1

operations.toss_junk
label:

Discard reads with invalid characters as bases

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

header_parsing.chastity_filter
label:

Discard reads that fail Illumina chastity filtering

type:

basic:boolean

description:

Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.

required:

True

disabled:

False

hidden:

False

default:

False

header_parsing.barcode_filter
label:

Remove reads with unexpected barcodes

type:

basic:boolean

description:

Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise. A barcode must be the last part of the read header.

required:

True

disabled:

False

hidden:

False

default:

False

header_parsing.barcode_files
label:

Barcode sequences

type:

list:data:seq:nucleotide

description:

FASTA file(s) with barcode sequences.

required:

False

disabled:

False

hidden:

False

header_parsing.barcode_sequences
label:

Literal barcode sequences

type:

list:basic:string

description:

Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

header_parsing.x_min
label:

Minimum X coordinate

type:

basic:integer

description:

If positive, discard reads with a smaller X coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

header_parsing.y_min
label:

Minimum Y coordinate

type:

basic:integer

description:

If positive, discard reads with a smaller Y coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

header_parsing.x_max
label:

Maximum X coordinate

type:

basic:integer

description:

If positive, discard reads with a larger X coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

header_parsing.y_max
label:

Maximum Y coordinate

type:

basic:integer

description:

If positive, discard reads with a larger Y coordinate.

required:

True

disabled:

False

hidden:

False

default:

-1

complexity.entropy
label:

Minimum entropy

type:

basic:decimal

description:

Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.

required:

True

disabled:

False

hidden:

False

default:

-1.0

complexity.entropy_window
label:

Length of sliding window used to calculate entropy

type:

basic:integer

description:

To use the sliding window set minimum entropy in range between 0.0 and 1.0.

required:

True

disabled:

False

hidden:

False

default:

50

complexity.entropy_k
label:

Length of kmers used to calcuate entropy

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

5

complexity.entropy_mask
label:

Mask low-entropy parts of sequences with N instead of discarding

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

complexity.min_base_frequency
label:

Minimum base frequency

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

0

fastqc.nogroup
label:

Disable grouping of bases for reads >50bp

type:

basic:boolean

description:

All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Remaining reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

statistics
label:

Statistics

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

BBDuk - STAR - featureCounts - QC

data:workflow:rnaseq:featurecounts:qc:workflow-bbduk-star-featurecounts-qc (data:reads:fastq  reads, data:index:star  genome, data:annotation  annotation, basic:string  assay_type, data:index:salmon  cdna_index, data:index:star  rrna_reference, data:index:star  globin_reference, list:data:seq:nucleotide  adapters, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality, basic:boolean  unstranded, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chim_segment_min, basic:boolean  quant_mode, basic:boolean  single_end, basic:string  out_filter_type, basic:integer  out_multimap_max, basic:integer  out_mismatch_max, basic:decimal  out_mismatch_nl_max, basic:integer  out_score_min, basic:decimal  out_mismatch_nrl_max, basic:integer  align_overhang_min, basic:integer  align_sjdb_overhang_min, basic:integer  align_intron_size_min, basic:integer  align_intron_size_max, basic:integer  align_gap_max, basic:string  align_end_alignment, basic:boolean  out_unmapped, basic:string  out_sam_attributes, basic:string  out_rg_line, basic:integer  n_reads, basic:string  feature_class, basic:string  feature_type, basic:string  id_attribute, basic:boolean  by_read_group, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v6.2.0]

RNA-seq pipeline comprised of preprocessing, alignment and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using __Seqtk__ tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.

Input arguments

reads
label:

Reads (FASTQ)

type:

data:reads:fastq

description:

Reads in FASTQ file, single or paired end.

required:

True

disabled:

False

hidden:

False

genome
label:

Indexed reference genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation

type:

data:annotation

description:

GTF and GFF3 annotation formats are supported.

required:

True

disabled:

False

hidden:

False

assay_type
label:

Assay type

type:

basic:string

description:

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

  • Strand non-specific: non_specific

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

  • Detect automatically: auto

cdna_index
label:

cDNA index file

type:

data:index:salmon

description:

Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.

required:

False

disabled:

False

hidden:

assay_type != ‘auto’

rrna_reference
label:

Indexed rRNA reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

globin_reference
label:

Indexed Globin reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

preprocessing.adapters
label:

Adapters

type:

list:data:seq:nucleotide

description:

FASTA file(s) with adapters.

required:

False

disabled:

False

hidden:

False

preprocessing.custom_adapter_sequences
label:

Custom adapter sequences

type:

list:basic:string

description:

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

preprocessing.kmer_length
label:

K-mer length [k=]

type:

basic:integer

description:

Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.

required:

True

disabled:

False

hidden:

False

default:

23

preprocessing.min_k
label:

Minimum k-mer length at right end of reads used for trimming [mink=]

type:

basic:integer

required:

True

disabled:

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

hidden:

False

default:

11

preprocessing.hamming_distance
label:

Maximum Hamming distance for k-mers [hammingdistance=]

type:

basic:integer

description:

Hamming distance i.e. the number of mismatches allowed in the kmer.

required:

True

disabled:

False

hidden:

False

default:

1

preprocessing.maxns
label:

Max Ns after trimming [maxns=]

type:

basic:integer

description:

If non-negative, reads with more Ns than this (after trimming) will be discarded.

required:

True

disabled:

False

hidden:

False

default:

-1

preprocessing.trim_quality
label:

Average quality below which to trim region [trimq=]

type:

basic:integer

description:

Phred algorithm is used, which is more accurate than naive trimming.

required:

True

disabled:

False

hidden:

False

default:

10

preprocessing.min_length
label:

Minimum read length [minlength=]

type:

basic:integer

description:

Reads shorter than minimum read length after trimming are discarded.

required:

True

disabled:

False

hidden:

False

default:

20

preprocessing.quality_encoding_offset
label:

Quality encoding offset [qin=]

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+: 33

  • Illumina up to 1.3+, 1.5+: 64

  • Auto: auto

preprocessing.ignore_bad_quality
label:

Ignore bad quality [ignorebadquality]

type:

basic:boolean

description:

Don’t crash if quality values appear to be incorrect.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.unstranded
label:

The data is unstranded [–outSAMstrandField intronMotif]

type:

basic:boolean

description:

For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.noncannonical
label:

Remove non-cannonical junctions (Cufflinks compatibility)

type:

basic:boolean

description:

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.chimeric_reads.chimeric
label:

Detect chimeric and circular alignments [–chimOutType SeparateSAMold]

type:

basic:boolean

description:

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.chimeric_reads.chim_segment_min
label:

Minimum length of chimeric segment [–chimSegmentMin]

type:

basic:integer

required:

True

disabled:

!alignment.chimeric_reads.chimeric

hidden:

False

default:

20

alignment.transcript_output.quant_mode
label:

Output in transcript coordinates [–quantMode]

type:

basic:boolean

description:

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.transcript_output.single_end
label:

Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]

type:

basic:boolean

description:

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).

required:

True

disabled:

!t_coordinates.quant_mode

hidden:

False

default:

False

alignment.filtering_options.out_filter_type
label:

Type of filtering [–outFilterType]

type:

basic:string

description:

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.

required:

True

disabled:

False

hidden:

False

default:

Normal

choices:

  • Normal: Normal

  • BySJout: BySJout

alignment.filtering_options.out_multimap_max
label:

Maximum number of loci [–outFilterMultimapNmax]

type:

basic:integer

description:

Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_mismatch_max
label:

Maximum number of mismatches [–outFilterMismatchNmax]

type:

basic:integer

description:

Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_mismatch_nl_max
label:

Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]

type:

basic:decimal

description:

Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_score_min
label:

Minimum alignment score [–outFilterScoreMin]

type:

basic:integer

description:

Alignment will be output only if its score is higher than or equal to this value (default: 0).

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_mismatch_nrl_max
label:

Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]

type:

basic:decimal

description:

Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_overhang_min
label:

Minimum overhang [–alignSJoverhangMin]

type:

basic:integer

description:

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_sjdb_overhang_min
label:

Minimum overhang (sjdb) [–alignSJDBoverhangMin]

type:

basic:integer

description:

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_intron_size_min
label:

Minimum intron size [–alignIntronMin]

type:

basic:integer

description:

Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_intron_size_max
label:

Maximum intron size [–alignIntronMax]

type:

basic:integer

description:

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_gap_max
label:

Minimum gap between mates [–alignMatesGapMax]

type:

basic:integer

description:

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_end_alignment
label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

True

disabled:

False

hidden:

False

default:

Local

choices:

  • Local: Local

  • EndToEnd: EndToEnd

  • Extend5pOfRead1: Extend5pOfRead1

  • Extend5pOfReads12: Extend5pOfReads12

alignment.output_options.out_unmapped
label:

Output unmapped reads (SAM) [–outSAMunmapped Within]

type:

basic:boolean

description:

Output of unmapped reads in the SAM format.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.output_options.out_sam_attributes
label:

Desired SAM attributes [–outSAMattributes]

type:

basic:string

description:

A string of desired SAM attributes, in the order desired for the output SAM.

required:

True

disabled:

False

hidden:

False

default:

Standard

choices:

  • Standard: Standard

  • All: All

  • NH HI NM MD: NH HI NM MD

  • None: None

alignment.output_options.out_rg_line
label:

SAM/BAM read group line [–outSAMattrRGline]

type:

basic:string

description:

The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in -readFilesIn. Commas have to be surrounded by spaces, e.g. -outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.

required:

False

disabled:

False

hidden:

False

quantification.n_reads
label:

Number of reads in subsampled alignment file

type:

basic:integer

description:

Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.

required:

True

disabled:

False

hidden:

assay_type != ‘auto’

default:

5000000

quantification.feature_class
label:

Feature class [-t]

type:

basic:string

description:

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

required:

True

disabled:

False

hidden:

False

default:

exon

quantification.feature_type
label:

Feature type

type:

basic:string

description:

The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.

required:

True

disabled:

False

hidden:

False

default:

gene

choices:

  • gene: gene

  • transcript: transcript

quantification.id_attribute
label:

ID attribute [-g]

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

required:

True

disabled:

False

hidden:

False

default:

gene_id

choices:

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

quantification.by_read_group
label:

Assign reads by read group

type:

basic:boolean

description:

RG tag is required to be present in the input BAM files.

required:

True

disabled:

False

hidden:

False

default:

True

downsampling.n_reads
label:

Number of reads

type:

basic:integer

description:

Number of reads to include in subsampling.

required:

True

disabled:

False

hidden:

False

default:

1000000

downsampling.advanced.seed
label:

Seed [-s]

type:

basic:integer

description:

Using the same random seed makes reads subsampling more reproducible in different environments.

required:

True

disabled:

False

hidden:

False

default:

11

downsampling.advanced.fraction
label:

Fraction of reads used

type:

basic:decimal

description:

Use the fraction of reads [0.0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.

required:

False

disabled:

False

hidden:

False

downsampling.advanced.two_pass
label:

2-pass mode [-2]

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

BBDuk - Salmon - QC

data:workflow:rnaseq:salmon:workflow-bbduk-salmon-qc (data:reads:fastq  reads, data:index:salmon  salmon_index, data:index:star  genome, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, list:data:seq:nucleotide  adapters, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality, basic:boolean  seq_bias, basic:boolean  gc_bias, basic:decimal  consensus_slack, basic:decimal  min_score_fraction, basic:integer  range_factorization_bins, basic:integer  min_assigned_frag, basic:integer  num_bootstraps, basic:integer  num_gibbs_samples, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v4.3.1]

Alignment-free RNA-Seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.

Input arguments

reads
label:

Select sample(s) (FASTQ)

type:

data:reads:fastq

description:

Reads in FASTQ file, single or paired end.

required:

True

disabled:

False

hidden:

False

salmon_index
label:

Salmon index

type:

data:index:salmon

description:

Transcriptome index file created using the Salmon indexing tool.

required:

True

disabled:

False

hidden:

False

genome
label:

Indexed reference genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation

type:

data:annotation

description:

GTF and GFF3 annotation formats are supported.

required:

True

disabled:

False

hidden:

False

rrna_reference
label:

Indexed rRNA reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

globin_reference
label:

Indexed Globin reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

preprocessing.adapters
label:

Adapters

type:

list:data:seq:nucleotide

description:

FASTA file(s) with adapters.

required:

False

disabled:

False

hidden:

False

preprocessing.custom_adapter_sequences
label:

Custom adapter sequences

type:

list:basic:string

description:

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

preprocessing.kmer_length
label:

K-mer length

type:

basic:integer

description:

K-mer length must be smaller or equal to the length of adapters.

required:

True

disabled:

False

hidden:

False

default:

23

preprocessing.min_k
label:

Minimum k-mer length at right end of reads used for trimming

type:

basic:integer

required:

True

disabled:

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

hidden:

False

default:

11

preprocessing.hamming_distance
label:

Maximum Hamming distance for k-mers

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

1

preprocessing.maxns
label:

Max Ns after trimming

type:

basic:integer

description:

If non-negative, reads with more Ns than this (after trimming) will be discarded.

required:

True

disabled:

False

hidden:

False

default:

-1

preprocessing.trim_quality
label:

Quality below which to trim reads from the right end

type:

basic:integer

description:

Phred algorithm is used, which is more accurate than naive trimming.

required:

True

disabled:

False

hidden:

False

default:

10

preprocessing.min_length
label:

Minimum read length

type:

basic:integer

description:

Reads shorter than minimum read length after trimming are discarded.

required:

True

disabled:

False

hidden:

False

default:

20

preprocessing.quality_encoding_offset
label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+: 33

  • Illumina up to 1.3+, 1.5+: 64

  • Auto: auto

preprocessing.ignore_bad_quality
label:

Ignore bad quality

type:

basic:boolean

description:

Don’t crash if quality values appear to be incorrect.

required:

True

disabled:

False

hidden:

False

default:

False

quantification.seq_bias
label:

Perform sequence-specific bias correction

type:

basic:boolean

description:

Perform sequence-specific bias correction.

required:

True

disabled:

False

hidden:

False

default:

True

quantification.gc_bias
label:

Perform fragment GC bias correction

type:

basic:boolean

description:

Perform fragment GC bias correction. If single-end reads are selected as input in this workflow, it is recommended that you set this option to False. If you selected paired-end reads as input in this workflow, it is recommended that you set this option to True.

required:

False

disabled:

False

hidden:

False

quantification.consensus_slack
label:

Consensus slack

type:

basic:decimal

description:

The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.

required:

False

disabled:

False

hidden:

False

quantification.min_score_fraction
label:

Minimum alignment score fraction

type:

basic:decimal

description:

The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].

required:

True

disabled:

False

hidden:

False

default:

0.65

quantification.range_factorization_bins
label:

Range factorization bins

type:

basic:integer

description:

Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.

required:

True

disabled:

False

hidden:

False

default:

4

quantification.min_assigned_frag
label:

Minimum number of assigned fragments

type:

basic:integer

description:

The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.

required:

True

disabled:

False

hidden:

False

default:

10

quantification.num_bootstraps
label:

–numBootstraps

type:

basic:integer

description:

Salmon has the ability to optionally compute bootstrapped abundance estimates. This is done by resampling (with replacement) from the counts assigned to the fragment equivalence classes, and then re-running the optimization procedure, either the EM or VBEM, for each such sample. The values of these different bootstraps allows us to assess technical variance in the main abundance estimates we produce. Such estimates can be useful for downstream (e.g. differential expression) tools that can make use of such uncertainty estimates. This option takes a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required.

required:

False

disabled:

quantification.num_gibbs_samples

hidden:

False

quantification.num_gibbs_samples
label:

–numGibbsSamples

type:

basic:integer

description:

Just as with the bootstrap procedure above, this option produces samples that allow us to estimate the variance in abundance estimates. However, in this case the samples are generated using posterior Gibbs sampling over the fragment equivalence classes rather than bootstrapping. We are currently analyzing these different approaches to assess the potential trade-offs in time / accuracy. The –numBootstraps and –numGibbsSamples options are mutually exclusive (i.e. in a given run, you must set at most one of these options to a positive integer.)

required:

False

disabled:

quantification.num_bootstraps

hidden:

False

downsampling.n_reads
label:

Number of reads

type:

basic:integer

description:

Number of reads to include in subsampling.

required:

True

disabled:

False

hidden:

False

default:

10000000

downsampling.seed
label:

Number of reads

type:

basic:integer

description:

Using the same random seed makes reads subsampling reproducible in different environments.

required:

True

disabled:

False

hidden:

False

default:

11

downsampling.fraction
label:

Fraction of reads

type:

basic:decimal

description:

Use the fraction of reads [0.0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.

required:

False

disabled:

False

hidden:

False

downsampling.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory usage.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

BED file

data:bedupload-bed (basic:file  src, basic:string  species, basic:string  build)[Source: v1.5.0]

Import a BED file (.bed) which is a tab-delimited text file that defines a feature track. It can have any file extension, but .bed is recommended. The BED file format is described on the [UCSC Genome Bioinformatics web site](http://genome.ucsc.edu/FAQ/FAQformat#format1).

Input arguments

src
label:

BED file

type:

basic:file

description:

Upload BED file annotation track. The first three required BED fields are chrom, chromStart and chromEnd.

required:

True

validate_regex:

\.(bed|narrowPeak)$

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Genome build

type:

basic:string

Output results

bed
label:

BED file

type:

basic:file

bed_jbrowse
label:

Bgzip bed file for JBrowse

type:

basic:file

tbi_jbrowse
label:

Bed file index for Jbrowse

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BEDPE file

data:bedpe:upload-bedpe (basic:file  src, basic:string  species, basic:string  build)[Source: v1.3.1]

Upload BEDPE files.

Input arguments

src
label:

Select BEDPE file to upload

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Output results

bedpe
label:

BEDPE file

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

BWA ALN

data:alignment:bam:bwaalnalignment-bwa-aln (data:index:bwa  genome, data:reads:fastq  reads, basic:integer  q, basic:boolean  use_edit, basic:integer  edit_value, basic:decimal  fraction, basic:boolean  seeds, basic:integer  seed_length, basic:integer  seed_dist)[Source: v2.6.2]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for Illumina sequence reads up to 100bp.

Input arguments

genome
label:

Reference genome

type:

data:index:bwa

reads
label:

Reads

type:

data:reads:fastq

q
label:

Quality threshold

type:

basic:integer

description:

Parameter for dynamic read trimming.

default:

0

use_edit
label:

Use maximum edit distance (excludes fraction of missing alignments)

type:

basic:boolean

default:

False

edit_value
label:

Maximum edit distance

type:

basic:integer

hidden:

!use_edit

default:

5

fraction
label:

Fraction of missing alignments

type:

basic:decimal

description:

The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.

hidden:

use_edit

default:

0.04

seeds
label:

Use seeds

type:

basic:boolean

default:

False

seed_length
label:

Seed length

type:

basic:integer

description:

Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.

hidden:

!seeds

default:

35

seed_dist
label:

Seed maximum edit distance

type:

basic:integer

hidden:

!seeds

default:

2

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

stats
label:

Statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BWA MEM

data:alignment:bam:bwamemalignment-bwa-mem (data:index:bwa  genome, data:reads:fastq  reads, basic:integer  seed_l, basic:integer  band_w, basic:decimal  re_seeding, basic:boolean  m, basic:integer  match, basic:integer  missmatch, basic:integer  gap_o, basic:integer  gap_e, basic:integer  clipping, basic:integer  unpaired_p, basic:boolean  report_all, basic:integer  report_tr)[Source: v3.6.0]

BWA MEM is a read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW). See [here](http://bio-bwa.sourceforge.net/) for more information.

Input arguments

genome
label:

Reference genome

type:

data:index:bwa

reads
label:

Reads

type:

data:reads:fastq

seed_l
label:

Minimum seed length

type:

basic:integer

description:

Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.

default:

19

band_w
label:

Band width

type:

basic:integer

description:

Gaps longer than this will not be found.

default:

100

re_seeding
label:

Re-seeding factor

type:

basic:decimal

description:

Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

default:

1.5

m
label:

Mark shorter split hits as secondary

type:

basic:boolean

description:

Mark shorter split hits as secondary (for Picard compatibility)

default:

False

scoring.match
label:

Score of a match

type:

basic:integer

default:

1

scoring.missmatch
label:

Mismatch penalty

type:

basic:integer

default:

4

scoring.gap_o
label:

Gap open penalty

type:

basic:integer

default:

6

scoring.gap_e
label:

Gap extension penalty

type:

basic:integer

default:

1

scoring.clipping
label:

Clipping penalty

type:

basic:integer

description:

Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)

default:

5

scoring.unpaired_p
label:

Penalty for an unpaired read pair

type:

basic:integer

description:

Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty

default:

9

reporting.report_all
label:

Report all found alignments

type:

basic:boolean

description:

Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.

default:

False

reporting.report_tr
label:

Report threshold score

type:

basic:integer

description:

Don’t output alignment with score lower than defined number. This option only affects output.

default:

30

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

stats
label:

Statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BWA MEM2

data:alignment:bam:bwamem2alignment-bwa-mem2 (data:index:bwamem2  genome, data:reads:fastq  reads, basic:integer  seed_l, basic:integer  band_w, basic:decimal  re_seeding, basic:boolean  m, basic:integer  match, basic:integer  missmatch, basic:integer  gap_o, basic:integer  gap_e, basic:integer  clipping, basic:integer  unpaired_p, basic:boolean  report_all, basic:integer  report_tr)[Source: v1.3.0]

Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. See [here](https://github.com/bwa-mem2/bwa-mem2) for more information.

Input arguments

genome
label:

Reference genome

type:

data:index:bwamem2

reads
label:

Reads

type:

data:reads:fastq

seed_l
label:

Minimum seed length

type:

basic:integer

description:

Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.

default:

19

band_w
label:

Band width

type:

basic:integer

description:

Gaps longer than this will not be found.

default:

100

re_seeding
label:

Re-seeding factor

type:

basic:decimal

description:

Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

default:

1.5

m
label:

Mark shorter split hits as secondary

type:

basic:boolean

description:

Mark shorter split hits as secondary (for Picard compatibility)

default:

False

scoring.match
label:

Score of a match

type:

basic:integer

default:

1

scoring.missmatch
label:

Mismatch penalty

type:

basic:integer

default:

4

scoring.gap_o
label:

Gap open penalty

type:

basic:integer

default:

6

scoring.gap_e
label:

Gap extension penalty

type:

basic:integer

default:

1

scoring.clipping
label:

Clipping penalty

type:

basic:integer

description:

Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)

default:

5

scoring.unpaired_p
label:

Penalty for an unpaired read pair

type:

basic:integer

description:

Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty

default:

9

reporting.report_all
label:

Report all found alignments

type:

basic:boolean

description:

Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.

default:

False

reporting.report_tr
label:

Report threshold score

type:

basic:integer

description:

Don’t output alignment with score lower than defined number. This option only affects output.

default:

30

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

stats
label:

Statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BWA SW

data:alignment:bam:bwaswalignment-bwa-sw (data:index:bwa  genome, data:reads:fastq  reads, basic:integer  match, basic:integer  missmatch, basic:integer  gap_o, basic:integer  gap_e)[Source: v2.5.2]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The paired-end mode only works for reads Illumina short-insert libraries.

Input arguments

genome
label:

Reference genome

type:

data:index:bwa

reads
label:

Reads

type:

data:reads:fastq

match
label:

Score of a match

type:

basic:integer

default:

1

missmatch
label:

Mismatch penalty

type:

basic:integer

default:

3

gap_o
label:

Gap open penalty

type:

basic:integer

default:

5

gap_e
label:

Gap extension penalty

type:

basic:integer

default:

2

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

stats
label:

Statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

BWA genome index

data:index:bwa:bwa-index (data:seq:nucleotide  ref_seq)[Source: v1.2.0]

Create BWA genome index.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

Output results

index
label:

BWA index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

BWA-MEM2 genome index

data:index:bwamem2:bwamem2-index (data:seq:nucleotide  ref_seq)[Source: v1.1.0]

Create BWA-MEM2 genome index.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

Output results

index
label:

BWA-MEM2 index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

BWA-MEM2 index files

data:index:bwamem2:upload-bwamem2-index (basic:file  ref_seq, basic:file  index_name, basic:string  species, basic:string  build)[Source: v1.0.0]

Import BWA-MEM2 index files.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

basic:file

required:

True

disabled:

False

hidden:

False

index_name
label:

BWA-MEM2 index files

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field. For sequences that are not related to any particular species (e.g. adapters file), you can select the value Other.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Macaca mulatta: Macaca mulatta

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Other: Other

build
label:

Genome build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Output results

index
label:

BWA-MEM2 index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Bam split

data:alignment:bam:primarybam-split (data:alignment:bam  bam, data:sam:header  header, data:sam:header  header2)[Source: v0.9.1]

Split hybrid bam file into two bam files.

Input arguments

bam
label:

Hybrid alignment bam

type:

data:alignment:bam

header
label:

Primary header sam file (optional)

type:

data:sam:header

description:

If no header file is provided, the headers will be extracted from the hybrid alignment bam file.

required:

False

header2
label:

Secondary header sam file (optional)

type:

data:sam:header

description:

If no header file is provided, the headers will be extracted from the hybrid alignment bam file.

required:

False

Output results

bam
label:

Uploaded file

type:

basic:file

bai
label:

Index BAI

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Bamclipper

data:alignment:bam:bamclipped:bamclipper (data:alignment:bam  alignment, data:bedpe  bedpe, basic:boolean  skip)[Source: v1.5.1]

Remove primer sequence from BAM alignments by soft-clipping. This process is a wrapper for bamclipper which can be found at https://github.com/tommyau/bamclipper.

Input arguments

alignment
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

bedpe
label:

BEDPE file

type:

data:bedpe

required:

False

disabled:

False

hidden:

False

skip
label:

Skip Bamclipper step

type:

basic:boolean

description:

Use this option to skip Bamclipper step.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

bam
label:

Clipped BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of clipped BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Bamliquidator

data:bam:plot:bamliquidatorbamliquidator (basic:string  analysis_type, list:data:alignment:bam  bam, basic:string  cell_type, basic:integer  bin_size, data:annotation:gtf  regions_gtf, data:bed  regions_bed, basic:integer  extension, basic:string  sense, basic:boolean  skip_plot, list:basic:string  black_list, basic:integer  threads)[Source: v0.3.3]

Set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.

Input arguments

analysis_type
label:

Analysis type

type:

basic:string

default:

bin

choices:

  • Bin mode: bin

  • Region mode: region

  • BED mode: bed

bam
label:

BAM File

type:

list:data:alignment:bam

cell_type
label:

Cell type

type:

basic:string

default:

cell_type

bin_size
label:

Bin size

type:

basic:integer

description:

Number of base pairs in each bin. The smaller the bin size the longer the runtime and the larger the data files. Default is 100000.

required:

False

hidden:

analysis_type != ‘bin’

regions_gtf
label:

Region gff file / Annotation file (.gff|.gtf)

type:

data:annotation:gtf

required:

False

hidden:

analysis_type != ‘region’

regions_bed
label:

Region bed file / Annotation file (.bed)

type:

data:bed

required:

False

hidden:

analysis_type != ‘bed’

extension
label:

Extension

type:

basic:integer

description:

Extends reads by number of bp

default:

200

sense
label:

Mapping strand to gff file

type:

basic:string

default:

.

choices:

  • Forward: +

  • Reverse: -

  • Both: .

skip_plot
label:

Skip plot

type:

basic:boolean

required:

False

black_list
label:

Black list

type:

list:basic:string

description:

One or more chromosome patterns to skip during bin liquidation. Default is to skip any chromosomes that contain any of the following substrings chrUn _random Zv9_ _hap.

required:

False

threads
label:

Threads

type:

basic:integer

description:

Number of threads to run concurrently during liquidation.

default:

1

Output results

analysis_type
label:

Analysis type

type:

basic:string

hidden:

True

output_dir
label:

Output directory

type:

basic:file

counts
label:

Counts HDF5 file

type:

basic:file

matrix
label:

Matrix file

type:

basic:file

required:

False

hidden:

analysis_type != ‘region’

summary
label:

Summary file

type:

basic:file:html

required:

False

hidden:

analysis_type != ‘bin’

Bamplot

data:bam:plot:bamplotbamplot (basic:string  genome, data:annotation:gtf  input_gff, basic:string  input_region, list:data:alignment:bam  bam, basic:integer  stretch_input, basic:string  color, basic:string  sense, basic:integer  extension, basic:boolean  rpm, basic:string  yscale, list:basic:string  names, basic:string  plot, basic:string  title, basic:string  scale, list:data:bed  bed, basic:boolean  multi_page)[Source: v1.4.3]

Plot a single locus from a bam.

Input arguments

genome
label:

Genome

type:

basic:string

choices:

  • HG19: HG19

  • HG18: HG18

  • MM8: MM8

  • MM9: MM9

  • MM10: MM10

  • RN6: RN6

  • RN4: RN4

input_gff
label:

Region string

type:

data:annotation:gtf

description:

Enter .gff file.

required:

False

input_region
label:

Region string

type:

basic:string

description:

Enter genomic region e.g. chr1:+:1-1000.

required:

False

bam
label:

Bam

type:

list:data:alignment:bam

description:

bam to plot from

required:

False

stretch_input
label:

Stretch-input

type:

basic:integer

description:

Stretch the input regions to a minimum length in bp, e.g. 10000 (for 10kb).

required:

False

color
label:

Color

type:

basic:string

description:

Enter a colon separated list of colors e.g. 255,0,0:255,125,0, default samples the rainbow.

default:

255,0,0:255,125,0

sense
label:

Sense

type:

basic:string

description:

Map to forward, reverse or’both strands. Default maps to both.

default:

both

choices:

  • Forward: forward

  • Reverse: reverse

  • Both: both

extension
label:

Extension

type:

basic:integer

description:

Extends reads by n bp. Default value is 200bp.

default:

200

rpm
label:

rpm

type:

basic:boolean

description:

Normalizes density to reads per million (rpm) Default is False.

required:

False

yscale
label:

y scale

type:

basic:string

description:

Choose either relative or uniform y axis scaling. Default is relative scaling.

default:

relative

choices:

  • relative: relative

  • uniform: uniform

names
label:

Names

type:

list:basic:string

description:

Enter a comma separated list of names for your bams.

required:

False

plot
label:

Single or multiple polt

type:

basic:string

description:

Choose either all lines on a single plot or multiple plots.

default:

merge

choices:

  • single: single

  • multiple: multiple

  • merge: merge

title
label:

Title

type:

basic:string

description:

Specify a title for the output plot(s), default will be the coordinate region.

default:

output

scale
label:

Scale

type:

basic:string

description:

Enter a comma separated list of multiplicative scaling factors for your bams. Default is none.

required:

False

bed
label:

Bed

type:

list:data:bed

description:

Add a space-delimited list of bed files to plot.

required:

False

multi_page
label:

Multi page

type:

basic:boolean

description:

If flagged will create a new pdf for each region.

default:

False

Output results

plot
label:

region plot

type:

basic:file

BaseQualityScoreRecalibrator

data:alignment:bam:bqsr:bqsr (data:alignment:bam  bam, data:seq:nucleotide  reference, list:data:variants:vcf  known_sites, data:bed  intervals, basic:string  read_group, basic:string  validation_stringency, basic:boolean  use_original_qualities, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v2.5.1]

A two pass process of BaseRecalibrator and ApplyBQSR from GATK. See GATK website for more information on BaseRecalibrator. It is possible to modify read group using GATK’s AddOrReplaceGroups through Replace read groups in BAM (``read_group``) input field.

Input arguments

bam
label:

BAM file containing reads

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

reference
label:

Reference genome file

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

known_sites
label:

List of known sites of variation

type:

list:data:variants:vcf

required:

True

disabled:

False

hidden:

False

intervals
label:

One or more genomic intervals over which to operate.

type:

data:bed

description:

This field is optional, but it can speed up the process by restricting calculations to specific genome regions.

required:

False

disabled:

False

hidden:

False

read_group
label:

Replace read groups in BAM

type:

basic:string

description:

Replace read groups in a BAM file.This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.

required:

True

disabled:

False

hidden:

False

default:

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

advanced.use_original_qualities
label:

Use the base quality scores from the OQ tag

type:

basic:boolean

description:

This flag tells GATK to use the original base qualities (that were in the data before BQSR/recalibration) which are stored in the OQ tag, if they are present, rather than use the post-recalibration quality scores. If no OQ tag is present for a read, the standard qual score will be used.

required:

True

disabled:

False

hidden:

False

default:

False

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

bam
label:

Base quality score recalibrated BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of base quality score recalibrated BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

recal_table
label:

Recalibration tabled

type:

basic:file

required:

True

disabled:

False

hidden:

False

BaseSpace file

data:file:basespace-file-import (basic:string  file_id, basic:secret  access_token_secret, basic:string  output, basic:integer  tries, basic:boolean  verbose)[Source: v1.5.1]

Import a file from Illumina BaseSpace.

Input arguments

file_id
label:

BaseSpace file ID

type:

basic:string

required:

True

disabled:

False

hidden:

False

access_token_secret
label:

BaseSpace access token

type:

basic:secret

description:

BaseSpace access token secret handle needed to download the file.

required:

True

disabled:

False

hidden:

False

advanced.output
label:

Output

type:

basic:string

description:

Sets what is printed to standard output. Argument ‘Full’ outputs everything, argument ‘Filename’ outputs only file names of downloaded files.

required:

True

disabled:

False

hidden:

False

default:

filename

choices:

  • Full: full

  • Filename: filename

advanced.tries
label:

Tries

type:

basic:integer

description:

Number of tries to download a file before giving up.

required:

True

disabled:

False

hidden:

False

default:

3

advanced.verbose
label:

Verbose

type:

basic:boolean

description:

Print detailed exception information to standard output when error occurs. Output argument had no effect on this argument.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

file
label:

File with reads

type:

basic:file

required:

True

disabled:

False

hidden:

False

Bedtools bamtobed

data:bedpe:bedtools-bamtobed (data:alignment:bam  alignment)[Source: v1.3.1]

Takes in a BAM file and calculates a normalization factor in BEDPE format. Done by sorting with Samtools and transformed with Bedtools.

Input arguments

alignment
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

Output results

bedpe
label:

BEDPE file

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Beta Cut & Run workflow

data:workflow:cutnrun:workflow-cutnrun-beta (data:reads:fastq:paired  reads, basic:integer  quality, basic:integer  nextseq, basic:integer  min_length, list:basic:string  adapter_1, list:basic:string  adapter_2, data:seq:nucleotide  adapter_file_1, data:seq:nucleotide  adapter_file_2, basic:string  universal_adapter, basic:integer  stringency, basic:decimal  error_rate, data:index:bowtie2  genome, data:index:bowtie2  spikein_genome, basic:string  alignment_mode, basic:string  speed, basic:boolean  dovetail, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  discordantly, basic:boolean  no_unal, basic:boolean  skip_norm, basic:decimal  scale, basic:boolean  downsample_reads, basic:integer  n_reads, basic:boolean  remove_duplicates)[Source: v2.0.0]

Beta Cut & Run workflow. Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN, which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome (optional). Aligned reads are processed to produce bigwig files to be viewed in a genome browser.

Input arguments

reads
label:

Input Reads (FASTQ)

type:

data:reads:fastq:paired

description:

Paired-end reads in FASTQ file.

required:

True

disabled:

False

hidden:

False

trimming_options.quality
label:

Quality cutoff

type:

basic:integer

description:

Trim low-quality ends from reads based on Phred score. Default: 20.

required:

True

disabled:

False

hidden:

False

default:

20

trimming_options.nextseq
label:

NextSeq/NovaSeq trim cutoff

type:

basic:integer

description:

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.

required:

False

disabled:

False

hidden:

False

trimming_options.min_length
label:

Minimum length after trimming

type:

basic:integer

description:

Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than the specified length to be printed out to validated paired-end files. A value of 0 disables filtering based on length. Default: 20.

required:

True

disabled:

False

hidden:

False

default:

20

trimming_options.adapter_options.adapter_1
label:

Read 1 adapter sequence

type:

list:basic:string

description:

Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with Read 1 adapters file and Universal adapters.

required:

False

disabled:

False

hidden:

False

default:

[]

trimming_options.adapter_options.adapter_2
label:

Read 2 adapter sequence

type:

list:basic:string

description:

Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with Read 2 adapters file and Universal adapters.

required:

False

disabled:

False

hidden:

False

default:

[]

trimming_options.adapter_options.adapter_file_1
label:

Read 1 adapters file

type:

data:seq:nucleotide

description:

This is mutually exclusive with Read 1 adapters and Universal adapters.

required:

False

disabled:

False

hidden:

False

trimming_options.adapter_options.adapter_file_2
label:

Read 2 adapters file

type:

data:seq:nucleotide

description:

This is mutually exclusive with Read 2 adapters and Universal adapters.

required:

False

disabled:

False

hidden:

False

trimming_options.adapter_options.universal_adapter
label:

Universal adapters

type:

basic:string

description:

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the min length value to 18bp. If smallRNA libraries are paired-end, then Read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required:

False

disabled:

False

hidden:

False

choices:

  • Illumina: --illumina

  • Nextera: --nextera

  • Illumina small RNA: --small_rna

trimming_options.adapter_options.stringency
label:

Overlap with adapter sequence required to trim

type:

basic:integer

description:

Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.

required:

True

disabled:

False

hidden:

False

default:

1

trimming_options.adapter_options.error_rate
label:

Maximum allowed error rate

type:

basic:decimal

description:

Number of errors divided by the length of the matching region. Default: 0.1.

required:

True

disabled:

False

hidden:

False

default:

0.1

alignment_options.genome
label:

Species genome

type:

data:index:bowtie2

required:

True

disabled:

False

hidden:

False

alignment_options.spikein_genome
label:

Spike-in genome

type:

data:index:bowtie2

required:

False

disabled:

normalization_options.skip_norm == true

hidden:

False

alignment_options.alignment_mode
label:

Alignment mode

type:

basic:string

description:

Local: Some characters may be omitted (‘soft clipped’) from the ends in order to achieve the greatest possible alignment score. End-to-end: Option without any trimming (or ‘soft clipping’) of bases from either end. This option is enabled by default and is suitable if reads have been clipped beforehand.

required:

True

disabled:

False

hidden:

False

default:

--end-to-end

choices:

  • Local: --local

  • End-to-end: --end-to-end

alignment_options.speed
label:

Speed vs. Sensitivity

type:

basic:string

description:

Setting for aligning fast or accurately. Default: Very sensitive.

required:

True

disabled:

False

hidden:

False

default:

--very-sensitive

choices:

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

alignment_options.pe_options.dovetail
label:

Dovetail

type:

basic:boolean

description:

If the mates dovetail, it implies that if the alignment of one mate extends beyond the starting point of the other, it results in the incorrect mate initiating upstream. This condition is considered concordant. Default: True.

required:

True

disabled:

False

hidden:

False

default:

True

alignment_options.pe_options.rep_se
label:

Report single ended

type:

basic:boolean

description:

If paired alignment cannot be found, Bowtie2 tries to find alignments for the individual mates. Default: False.

required:

True

disabled:

False

hidden:

False

default:

False

alignment_options.pe_options.minins
label:

Minimal distance

type:

basic:integer

description:

The minimum fragment length (–minins) for valid paired-end alignments. Default: 10.

required:

True

disabled:

False

hidden:

False

default:

10

alignment_options.pe_options.maxins
label:

Maximal distance

type:

basic:integer

description:

The maximum fragment length (–maxins) for valid paired-end alignments. Default: 700.

required:

True

disabled:

False

hidden:

False

default:

700

alignment_options.pe_options.discordantly
label:

Report discordantly matched read

type:

basic:boolean

description:

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance), alignment will still be reported. Useful for detecting structural variations. Default: False.

required:

True

disabled:

False

hidden:

False

default:

False

alignment_options.output_options.no_unal
label:

Suppress SAM records for unaligned reads

type:

basic:boolean

description:

When enabled, suppress SAM records for unaligned reads. Default: True.

required:

True

disabled:

False

hidden:

False

default:

True

normalization_options.skip_norm
label:

Skip normalization

type:

basic:boolean

description:

Skip the spike-in normalization step of BigWig output. Use this if you don’t provide a spike-in. Default: False.

required:

True

disabled:

False

hidden:

False

default:

False

normalization_options.scale
label:

Scale factor

type:

basic:decimal

description:

Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)). Default: 10000.

required:

True

disabled:

normalization_options.skip_norm == true

hidden:

False

default:

10000

downsampling_options.downsample_reads
label:

Downsample reads

type:

basic:boolean

description:

Option to downsample reads before trimming. Default: True

required:

True

disabled:

False

hidden:

False

default:

True

downsampling_options.n_reads
label:

Number of reads to downsample

type:

basic:integer

description:

Number of reads to downsample from the input FASTQ file. Default: 10M.

required:

True

disabled:

downsampling_options.downsample_reads == false

hidden:

False

default:

10000000

deduplication_options.remove_duplicates
label:

Remove duplicates

type:

basic:boolean

description:

Option on how to handle duplicate reads. True: Mark and remove duplicate reads. False: Only mark duplicate reads. Note that this option is only available for species genome. In case of spike-in genome, duplicate reads are always removed. Default: False.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

Bisulfite conversion rate

data:wgbs:bsrate:bs-conversion-rate (data:alignment:bam:walt  mr, basic:boolean  skip, data:seq:nucleotide  sequence, basic:boolean  count_all, basic:integer  read_length, basic:decimal  max_mismatch, basic:boolean  a_rich)[Source: v1.3.1]

Estimate bisulfite conversion rate in a control set. The program bsrate included in [Methpipe] (https://github.com/smithlabcode/methpipe) will estimate the bisulfite conversion rate.

Input arguments

mr
label:

Aligned reads from bisulfite sequencing

type:

data:alignment:bam:walt

description:

Bisulfite specifc alignment such as WALT is required as .mr file type is used. Duplicatesshould be removed to reduce any bias introduced by incomplete conversion on PCR duplicatereads.

required:

True

disabled:

False

hidden:

False

skip
label:

Skip Bisulfite conversion rate step

type:

basic:boolean

description:

Bisulfite conversion rate step can be skipped.

required:

True

disabled:

False

hidden:

False

default:

False

sequence
label:

Unmethylated control sequence

type:

data:seq:nucleotide

description:

Separate unmethylated control sequence FASTA file is required to estimate bisulfiteconversion rate.

required:

False

disabled:

False

hidden:

False

count_all
label:

Count all cytosines including CpGs

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

read_length
label:

Average read length

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

150

max_mismatch
label:

Maximum fraction of mismatches

type:

basic:decimal

required:

False

disabled:

False

hidden:

False

a_rich
label:

Reads are A-rich

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

Output results

report
label:

Bisulfite conversion rate report

type:

basic:file

required:

True

disabled:

False

hidden:

False

Bowtie (Dicty)

data:alignment:bam:bowtie1alignment-bowtie (data:index:bowtie  genome, data:reads:fastq  reads, basic:string  mode, basic:integer  m, basic:integer  l, basic:boolean  use_se, basic:integer  trim_5, basic:integer  trim_3, basic:integer  trim_nucl, basic:integer  trim_iter, basic:string  r)[Source: v2.5.2]

An ultrafast memory-efficient short read aligner.

Input arguments

genome
label:

Reference genome

type:

data:index:bowtie

reads
label:

Reads

type:

data:reads:fastq

mode
label:

Alignment mode

type:

basic:string

description:

When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq’s default policy. 1. Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the “seed”. 2. The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40. In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.

default:

-n

choices:

  • Use qualities (-n): -n

  • Use mismatches (-v): -v

m
label:

Allowed mismatches

type:

basic:integer

description:

When used with “Use qualities (-n)” it is the maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/–seedlen). This may be 0, 1, 2 or 3 and the default is 2 When used with “Use mismatches (-v)” report alignments with at most <int> mismatches.

default:

2

l
label:

Seed length (for -n only)

type:

basic:integer

description:

Only for “Use qualities (-n)”. Seed length (-l) is the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.

default:

28

use_se
label:

Map as single-ended (for paired end reads only)

type:

basic:boolean

description:

If this option is selected paired-end reads will be mapped as single-ended.

default:

False

start_trimming.trim_5
label:

Bases to trim from 5’

type:

basic:integer

description:

Number of bases to trim from from 5’ (left) end of each read before alignment

default:

0

start_trimming.trim_3
label:

Bases to trim from 3’

type:

basic:integer

description:

Number of bases to trim from from 3’ (right) end of each read before alignment

default:

0

trimming.trim_nucl
label:

Bases to trim

type:

basic:integer

description:

Number of bases to trim from 3’ end in each iteration.

default:

2

trimming.trim_iter
label:

Iterations

type:

basic:integer

description:

Number of iterations.

default:

0

reporting.r
label:

Reporting mode

type:

basic:string

description:

Report up to <int> valid alignments per read or pair (-k) (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the –best and –strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/–offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

default:

-a -m 1 --best --strata

choices:

  • Report unique alignments: -a -m 1 --best --strata

  • Report all alignments: -a --best

  • Report all alignments in the best stratum: -a --best --strata

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

stats
label:

Statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Bowtie genome index

data:index:bowtie:bowtie-index (data:seq:nucleotide  ref_seq)[Source: v1.2.1]

Create Bowtie genome index.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

Output results

index
label:

Bowtie index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Bowtie2

data:alignment:bam:bowtie2alignment-bowtie2 (data:index:bowtie2  genome, data:reads:fastq  reads, basic:string  mode, basic:string  speed, basic:boolean  use_se, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  no_overlap, basic:boolean  dovetail, basic:integer  N, basic:integer  L, basic:integer  gbar, basic:string  mp, basic:string  rdg, basic:string  rfg, basic:string  score_min, basic:integer  trim_5, basic:integer  trim_3, basic:integer  trim_iter, basic:integer  trim_nucl, basic:string  rep_mode, basic:integer  k_reports, basic:boolean  no_unal)[Source: v2.8.2]

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small–typically about 2.2 GB for the human genome (2.9 GB for paired-end). See [here](http://bowtie-bio.sourceforge.net/index.shtml) for more information.

Input arguments

genome
label:

Reference genome

type:

data:index:bowtie2

reads
label:

Reads

type:

data:reads:fastq

mode
label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--end-to-end

choices:

  • end to end mode: --end-to-end

  • local: --local

speed
label:

Speed vs. Sensitivity

type:

basic:string

description:

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

required:

False

choices:

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

PE_options.use_se
label:

Map as single-ended (for paired-end reads only)

type:

basic:boolean

description:

If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.

default:

False

PE_options.discordantly
label:

Report discordantly matched read

type:

basic:boolean

description:

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default:

True

PE_options.rep_se
label:

Report single ended

type:

basic:boolean

description:

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.

default:

True

PE_options.minins
label:

Minimal distance

type:

basic:integer

description:

The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.

default:

0

PE_options.maxins
label:

Maximal distance

type:

basic:integer

description:

The maximum fragment length for valid paired-end alignments.

default:

500

PE_options.no_overlap
label:

Not concordant when mates overlap

type:

basic:boolean

description:

When true, it is considered not concordant when mates overlap at all. Defaul is false.

default:

False

PE_options.dovetail
label:

Dovetail

type:

basic:boolean

description:

If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment.

default:

False

alignment_options.N
label:

Number of mismatches allowed in seed alignment (N)

type:

basic:integer

description:

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

required:

False

alignment_options.L
label:

Length of seed substrings (L)

type:

basic:integer

description:

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.

required:

False

alignment_options.gbar
label:

Disallow gaps within positions (gbar)

type:

basic:integer

description:

Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.

required:

False

alignment_options.mp
label:

Maximal and minimal mismatch penalty (mp)

type:

basic:string

description:

Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.

required:

False

alignment_options.rdg
label:

Set read gap open and extend penalties (rdg)

type:

basic:string

description:

Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.

required:

False

alignment_options.rfg
label:

Set reference gap open and close penalties (rfg)

type:

basic:string

description:

Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.

required:

False

alignment_options.score_min
label:

Minimum alignment score needed for “valid” alignment (score_min)

type:

basic:string

description:

Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.

required:

False

start_trimming.trim_5
label:

Bases to trim from 5’

type:

basic:integer

description:

Number of bases to trim from from 5’ (left) end of each read before alignment

default:

0

start_trimming.trim_3
label:

Bases to trim from 3’

type:

basic:integer

description:

Number of bases to trim from from 3’ (right) end of each read before alignment

default:

0

trimming.trim_iter
label:

Iterations

type:

basic:integer

description:

Number of iterations.

default:

0

trimming.trim_nucl
label:

Bases to trim

type:

basic:integer

description:

Number of bases to trim from 3’ end in each iteration.

default:

2

reporting.rep_mode
label:

Report mode

type:

basic:string

description:

Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments

default:

def

choices:

  • Default mode: def

  • -k mode: k

  • -a mode (very slow): a

reporting.k_reports
label:

Number of reports (for -k mode only)

type:

basic:integer

description:

Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. default: 5

default:

5

output_opts.no_unal
label:

Suppress SAM records for unaligned reads

type:

basic:boolean

description:

When true, suppress SAM records for unaligned reads. Default is false.

default:

False

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

stats
label:

Statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Bowtie2 genome index

data:index:bowtie2:bowtie2-index (data:seq:nucleotide  ref_seq)[Source: v1.2.1]

Create Bowtie2 genome index.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

Output results

index
label:

Bowtie2 index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Calculate coverage (bamCoverage)

data:coverage:bigwig:calculate-bigwig (data:alignment:bam  alignment, data:bedpe  bedpe, basic:decimal  scale, basic:integer  bin_size)[Source: v2.0.1]

Calculate bigWig coverage track. Deeptools bamCoverage takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig) as output. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. For more information is available in the [bamCoverage documentation](https://deeptools.readthedocs.io/en/latest/content/tools/bamCoverage.html).

Input arguments

alignment
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

bedpe
label:

BEDPE Normalization factor

type:

data:bedpe

description:

The BEDPE file describes disjoint genome features, such as structural variations or paired-end sequence alignments. It is used to estimate the scale factor [–scaleFactor].

required:

False

disabled:

False

hidden:

False

scale
label:

Scale for the normalization factor

type:

basic:decimal

description:

Magnitude of the scale factor. The scaling factor [–scaleFactor] is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).

required:

True

disabled:

!bedpe

hidden:

False

default:

10000

bin_size
label:

Bin size[–binSize]

type:

basic:integer

description:

Size of the bins (in bp) for the output bigWig file. A smaller bin size value will result in a higher resolution of the coverage track but also in a larger file size.

required:

True

disabled:

False

hidden:

False

default:

50

Output results

bigwig
label:

Coverage file (bigWig)

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Cell Ranger Count

data:scexpression:10x:cellranger-count (data:screads:10x:  reads, data:genomeindex:10x:  genome_index, basic:string  chemistry, basic:integer  trim_r1, basic:integer  trim_r2, basic:integer  expected_cells, basic:integer  force_cells)[Source: v1.2.2]

Perform gene expression analysis. Generate single cell feature counts for a single library. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count

Input arguments

reads
label:

10x reads data object

type:

data:screads:10x:

required:

True

disabled:

False

hidden:

False

genome_index
label:

10x genome index data object

type:

data:genomeindex:10x:

required:

True

disabled:

False

hidden:

False

chemistry
label:

Chemistry

type:

basic:string

description:

Assay configuration. By default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection.

required:

False

disabled:

False

hidden:

False

default:

auto

choices:

  • auto: auto

  • threeprime: Single Cell 3'

  • fiveprime: Single Cell 5'

  • SC3Pv1: Single Cell 3' v1

  • SC3Pv2: Single Cell 3' v2

  • SC3Pv3: Single Cell 3' v3

  • C5P-PE: Single Cell 5' paired-end

  • SC5P-R2: Single Cell 5' R2-only

trim_r1
label:

Trim R1

type:

basic:integer

description:

Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3’ v2 or Single Cell 5’. This and “Trim R2” are useful for determining the optimal read length for sequencing.

required:

False

disabled:

False

hidden:

False

trim_r2
label:

Trim R2

type:

basic:integer

description:

Hard-trim the input R2 sequence to this length.

required:

False

disabled:

False

hidden:

False

expected_cells
label:

Expected number of recovered cells

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

3000

force_cells
label:

Force cell number

type:

basic:integer

description:

Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.

required:

False

disabled:

False

hidden:

False

Output results

matrix_filtered
label:

Matrix (filtered)

type:

basic:file

required:

True

disabled:

False

hidden:

False

genes_filtered
label:

Genes (filtered)

type:

basic:file

required:

True

disabled:

False

hidden:

False

barcodes_filtered
label:

Barcodes (filtered)

type:

basic:file

required:

True

disabled:

False

hidden:

False

matrix_raw
label:

Matrix (raw)

type:

basic:file

required:

True

disabled:

False

hidden:

False

genes_raw
label:

Genes (raw)

type:

basic:file

required:

True

disabled:

False

hidden:

False

barcodes_raw
label:

Barcodes (raw)

type:

basic:file

required:

True

disabled:

False

hidden:

False

report
label:

Report

type:

basic:file:html

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

Cell Ranger Mkref

data:genomeindex:10x:cellranger-mkref (data:seq:nucleotide:  genome, data:annotation:gtf:  annotation)[Source: v2.1.3]

Reference preparation tool for 10x Genomics Cell Ranger. Build a Cell Ranger-compatible reference from genome FASTA and gene GTF files. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references

Input arguments

genome
label:

Reference genome

type:

data:seq:nucleotide:

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation

type:

data:annotation:gtf:

required:

True

disabled:

False

hidden:

False

Output results

genome_index
label:

Indexed genome

type:

basic:dir

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

ChIP-Seq (Gene Score)

data:chipseq:genescorechipseq-genescore (data:chipseq:peakscore  peakscore, basic:decimal  fdr, basic:decimal  pval, basic:decimal  logratio)[Source: v1.3.1]

Chip-Seq analysis - Gene Score (BCM)

Input arguments

peakscore
label:

PeakScore file

type:

data:chipseq:peakscore

description:

PeakScore file

fdr
label:

FDR threshold

type:

basic:decimal

description:

FDR threshold value (default = 0.00005).

default:

5e-05

pval
label:

Pval threshold

type:

basic:decimal

description:

Pval threshold value (default = 0.00005).

default:

5e-05

logratio
label:

Log-ratio threshold

type:

basic:decimal

description:

Log-ratio threshold value (default = 2).

default:

2.0

Output results

genescore
label:

Gene Score

type:

basic:file

ChIP-Seq (Peak Score)

data:chipseq:peakscorechipseq-peakscore (data:chipseq:callpeak:macs2  peaks, data:bed  bed)[Source: v2.3.1]

Chip-Seq analysis - Peak Score (BCM)

Input arguments

peaks
label:

MACS2 results

type:

data:chipseq:callpeak:macs2

description:

MACS2 results file (NarrowPeak)

bed
label:

BED file

type:

data:bed

Output results

peak_score
label:

Peak Score

type:

basic:file

ChIP-seq (MACS2)

data:chipseq:batch:macs2macs2-batch (list:data:alignment:bam  alignments, data:bed  promoter, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  profile_window, basic:string  shift_size)[Source: v1.5.1]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

Input arguments

alignments
label:

Aligned reads

type:

list:data:alignment:bam

description:

Select multiple treatment/background samples.

promoter
label:

Promoter regions BED file

type:

data:bed

description:

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required:

False

tagalign
label:

Use tagAlign files

type:

basic:boolean

description:

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

default:

True

prepeakqc_settings.q_threshold
label:

Quality filtering threshold

type:

basic:integer

default:

30

prepeakqc_settings.n_sub
label:

Number of reads to subsample

type:

basic:integer

default:

15000000

prepeakqc_settings.tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default:

False

prepeakqc_settings.shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required:

False

settings.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label:

Q-value cutoff

type:

basic:decimal

description:

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required:

False

disabled:

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

False

disabled:

settings.qvalue

hidden:

tagalign

settings.pvalue_prepeak
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled:

settings.qvalue

hidden:

!tagalign || settings.qvalue

default:

1e-05

settings.cap_num
label:

Cap number of peaks by taking top N peaks

type:

basic:integer

description:

To keep all peaks set value to 0.

disabled:

settings.broad

default:

500000

settings.mfold_lower
label:

MFOLD range (lower limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.mfold_upper
label:

MFOLD range (upper limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.slocal
label:

Small local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.llocal
label:

Large local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.extsize
label:

extsize

type:

basic:integer

description:

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required:

False

settings.shift
label:

Shift

type:

basic:integer

description:

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required:

False

settings.band_width
label:

Band width

type:

basic:integer

description:

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required:

False

settings.nolambda
label:

Use backgroud lambda as local lambda

type:

basic:boolean

description:

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default:

False

settings.fix_bimodal
label:

Turn on the auto paired-peak model process

type:

basic:boolean

description:

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default:

False

settings.nomodel
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

tagalign

default:

False

settings.nomodel_prepeak
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

!tagalign

default:

True

settings.down_sample
label:

Down-sample

type:

basic:boolean

description:

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default:

False

settings.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default:

True

settings.spmr
label:

Save signal per million reads for fragment pileup profiles

type:

basic:boolean

disabled:

settings.bedgraph === false

default:

True

settings.call_summits
label:

Call summits

type:

basic:boolean

description:

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default:

False

settings.broad
label:

Composite broad regions

type:

basic:boolean

description:

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled:

settings.call_summits === true

default:

False

settings.broad_cutoff
label:

Broad cutoff

type:

basic:decimal

description:

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required:

False

disabled:

settings.call_summits === true || settings.broad !== true

chipqc_settings.blacklist
label:

Blacklist regions

type:

data:bed

description:

BED file containing genomic regions that should be excluded from the analysis.

required:

False

chipqc_settings.calculate_enrichment
label:

Calculate enrichment

type:

basic:boolean

description:

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

default:

False

chipqc_settings.profile_window
label:

Window size

type:

basic:integer

description:

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

default:

400

chipqc_settings.shift_size
label:

Shift size

type:

basic:string

description:

Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end

default:

1:300

Output results

ChIP-seq (MACS2-ROSE2)

data:chipseq:batch:macs2macs2-rose2-batch (list:data:alignment:bam  alignments, data:bed  promoter, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff, basic:boolean  use_filtered_bam, basic:integer  tss, basic:integer  stitch, data:bed  mask, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  profile_window, basic:string  shift_size)[Source: v1.5.1]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

Input arguments

alignments
label:

Aligned reads

type:

list:data:alignment:bam

description:

Select multiple treatment/background samples.

promoter
label:

Promoter regions BED file

type:

data:bed

description:

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required:

False

tagalign
label:

Use tagAlign files

type:

basic:boolean

description:

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

default:

True

prepeakqc_settings.q_threshold
label:

Quality filtering threshold

type:

basic:integer

default:

30

prepeakqc_settings.n_sub
label:

Number of reads to subsample

type:

basic:integer

default:

15000000

prepeakqc_settings.tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default:

False

prepeakqc_settings.shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required:

False

settings.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label:

Q-value cutoff

type:

basic:decimal

description:

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required:

False

disabled:

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

False

disabled:

settings.qvalue

hidden:

tagalign

settings.pvalue_prepeak
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled:

settings.qvalue

hidden:

!tagalign || settings.qvalue

default:

1e-05

settings.cap_num
label:

Cap number of peaks by taking top N peaks

type:

basic:integer

description:

To keep all peaks set value to 0.

disabled:

settings.broad

default:

500000

settings.mfold_lower
label:

MFOLD range (lower limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.mfold_upper
label:

MFOLD range (upper limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.slocal
label:

Small local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.llocal
label:

Large local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.extsize
label:

extsize

type:

basic:integer

description:

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required:

False

settings.shift
label:

Shift

type:

basic:integer

description:

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required:

False

settings.band_width
label:

Band width

type:

basic:integer

description:

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required:

False

settings.nolambda
label:

Use backgroud lambda as local lambda

type:

basic:boolean

description:

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default:

False

settings.fix_bimodal
label:

Turn on the auto paired-peak model process

type:

basic:boolean

description:

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default:

False

settings.nomodel
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

tagalign

default:

False

settings.nomodel_prepeak
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

!tagalign

default:

True

settings.down_sample
label:

Down-sample

type:

basic:boolean

description:

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default:

False

settings.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default:

True

settings.spmr
label:

Save signal per million reads for fragment pileup profiles

type:

basic:boolean

disabled:

settings.bedgraph === false

default:

True

settings.call_summits
label:

Call summits

type:

basic:boolean

description:

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default:

False

settings.broad
label:

Composite broad regions

type:

basic:boolean

description:

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled:

settings.call_summits === true

default:

False

settings.broad_cutoff
label:

Broad cutoff

type:

basic:decimal

description:

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required:

False

disabled:

settings.call_summits === true || settings.broad !== true

rose_settings.use_filtered_bam
label:

Use Filtered BAM File

type:

basic:boolean

description:

Use filtered BAM file from a MACS2 object to rank enhancers by.

default:

True

rose_settings.tss
label:

TSS exclusion

type:

basic:integer

description:

Enter a distance from TSS to exclude. 0 = no TSS exclusion

default:

0

rose_settings.stitch
label:

Stitch

type:

basic:integer

description:

Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.

required:

False

rose_settings.mask
label:

Masking BED file

type:

data:bed

description:

Mask a set of regions from analysis. Provide a BED of masking regions.

required:

False

chipqc_settings.blacklist
label:

Blacklist regions

type:

data:bed

description:

BED file containing genomic regions that should be excluded from the analysis.

required:

False

chipqc_settings.calculate_enrichment
label:

Calculate enrichment

type:

basic:boolean

description:

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

default:

False

chipqc_settings.profile_window
label:

Window size

type:

basic:integer

description:

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

default:

400

chipqc_settings.shift_size
label:

Shift size

type:

basic:string

description:

Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end

default:

1:300

Output results

Chemical Mutagenesis

data:workflow:chemutworkflow-chemut (basic:string  analysis_type, data:seq:nucleotide  genome, list:data:alignment:bam  parental_strains, list:data:alignment:bam  mutant_strains, basic:boolean  base_recalibration, data:variants:vcf  known_sites, list:data:variants:vcf  known_indels, basic:integer  stand_call_conf, basic:integer  mbq, basic:integer  read_depth)[Source: v2.1.0]

Input arguments

analysis_type
label:

Analysis type

type:

basic:string

description:

Choice of the analysis type. Use “SNV” or “INDEL” options to run the GATK analysis only on the haploid portion of the dicty genome. Choose options SNV_CHR2 or INDEL_CHR2 to run the analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).

default:

snv

choices:

  • SNV: snv

  • INDEL: indel

  • SNV_CHR2: snv_chr2

  • INDEL_CHR2: indel_chr2

genome
label:

Reference genome

type:

data:seq:nucleotide

parental_strains
label:

Parental strains

type:

list:data:alignment:bam

mutant_strains
label:

Mutant strains

type:

list:data:alignment:bam

Vc.base_recalibration
label:

Do variant base recalibration

type:

basic:boolean

default:

False

Vc.known_sites
label:

Known sites (dbSNP)

type:

data:variants:vcf

required:

False

Vc.known_indels
label:

Known indels

type:

list:data:variants:vcf

required:

False

hidden:

Vc.base_recalibration === false

Vc.stand_call_conf
label:

Calling confidence threshold

type:

basic:integer

description:

The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.

default:

30

Vc.mbq
label:

Min base quality

type:

basic:integer

description:

Minimum base quality required to consider a base for calling.

default:

10

Vf.read_depth
label:

Read depth cutoff

type:

basic:integer

description:

The minimum number of replicate reads required for a variant site to be included.

default:

5

Output results

ChipQC

data:chipqc:chipqc (data:alignment:bam  alignment, data:chipseq:callpeak  peaks, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  quality_threshold, basic:integer  profile_window, basic:string  shift_size)[Source: v1.4.2]

Calculate quality control metrics for ChIP-seq samples. The analysis is based on ChIPQC package which computs a variety of quality control metrics and statistics, and provides plots and a report for assessment of experimental data for further analysis.

Input arguments

alignment
label:

Aligned reads

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

peaks
label:

Called peaks

type:

data:chipseq:callpeak

required:

True

disabled:

False

hidden:

False

blacklist
label:

Blacklist regions

type:

data:bed

description:

BED file containing genomic regions that should be excluded from the analysis.

required:

False

disabled:

False

hidden:

False

calculate_enrichment
label:

Calculate enrichment

type:

basic:boolean

description:

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

required:

True

disabled:

False

hidden:

False

default:

False

advanced.quality_threshold
label:

Mapping quality threshold

type:

basic:integer

description:

Only reads with mapping quality scores above this threshold will be used for some statistics.

required:

True

disabled:

False

hidden:

False

default:

15

advanced.profile_window
label:

Window size

type:

basic:integer

description:

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

required:

True

disabled:

False

hidden:

False

default:

400

advanced.shift_size
label:

Shift size

type:

basic:string

description:

Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end

required:

True

disabled:

False

hidden:

False

default:

1:300

Output results

report_folder
label:

ChipQC report folder

type:

basic:dir

required:

True

disabled:

False

hidden:

False

ccplot
label:

Cross coverage score plot

type:

basic:file

required:

True

disabled:

False

hidden:

False

coverage_histogram
label:

SSD metric plot

type:

basic:file

required:

True

disabled:

False

hidden:

False

peak_profile
label:

Peak profile plot

type:

basic:file

required:

True

disabled:

False

hidden:

False

peaks_barplot
label:

Barplot of reads in peaks

type:

basic:file

required:

True

disabled:

False

hidden:

False

peaks_density_plot
label:

Density plot of reads in peaks

type:

basic:file

required:

True

disabled:

False

hidden:

False

enrichment_heatmap
label:

Heatmap of reads in genomic features

type:

basic:file

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Convert GFF3 to GTF

data:annotation:gtfgff-to-gtf (data:annotation:gff3  annotation)[Source: v0.6.0]

Convert GFF3 file to GTF format.

Input arguments

annotation
label:

Annotation (GFF3)

type:

data:annotation:gff3

description:

Annotation in GFF3 format.

Output results

annot
label:

Converted GTF file

type:

basic:file

annot_sorted
label:

Sorted GTF file

type:

basic:file

annot_sorted_idx_igv
label:

Igv index for sorted GTF file

type:

basic:file

annot_sorted_track_jbrowse
label:

Jbrowse track for sorted GTF

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Convert files to reads (paired-end)

data:reads:fastq:paired:files-to-fastq-paired (list:data:file  src1, list:data:file  src2, basic:boolean  merge_lanes)[Source: v1.6.0]

Convert FASTQ files to paired-end reads.

Input arguments

src1
label:

Mate1

type:

list:data:file

required:

True

disabled:

False

hidden:

False

src2
label:

Mate2

type:

list:data:file

required:

True

disabled:

False

hidden:

False

merge_lanes
label:

Merge lanes

type:

basic:boolean

description:

Merge sample data split into multiple sequencing lanes into a single FASTQ file.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Reads file (mate 1)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Reads file (mate 2)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC (Upstream)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Quality control with FastQC (Downstream)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FasQC archive (Upstream)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download FasQC archive (Downstream)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Convert files to reads (single-end)

data:reads:fastq:single:files-to-fastq-single (list:data:file  src, basic:boolean  merge_lanes)[Source: v1.6.0]

Convert FASTQ files to single-end reads.

Input arguments

src
label:

Reads

type:

list:data:file

description:

Sequencing reads in FASTQ format

required:

True

disabled:

False

hidden:

False

merge_lanes
label:

Merge lanes

type:

basic:boolean

description:

Merge sample data split into multiple sequencing lanes into a single FASTQ file.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Reads file

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Cuffdiff 2.2

data:differentialexpression:cuffdiff:cuffdiff (list:data:cufflinks:cuffquant  case, list:data:cufflinks:cuffquant  control, list:basic:string  labels, data:annotation  annotation, data:seq:nucleotide  genome, basic:boolean  multi_read_correct, basic:boolean  create_sets, basic:decimal  gene_logfc, basic:decimal  gene_fdr, basic:decimal  fdr, basic:string  library_type, basic:string  library_normalization, basic:string  dispersion_method)[Source: v3.4.0]

Run Cuffdiff 2.2 analysis. Cuffdiff finds significant changes in transcript expression, splicing, and promoter use. You can use it to find differentially expressed genes and transcripts, as well as genes that are being differentially regulated at the transcriptional and post-transcriptional level. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/) and [here](https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cuffdiff/7) for more information.

Input arguments

case
label:

Case samples

type:

list:data:cufflinks:cuffquant

required:

True

disabled:

False

hidden:

False

control
label:

Control samples

type:

list:data:cufflinks:cuffquant

required:

True

disabled:

False

hidden:

False

labels
label:

Group labels

type:

list:basic:string

description:

Define labels for each sample group.

required:

True

disabled:

False

hidden:

False

default:

['control', 'case']

annotation
label:

Annotation (GTF/GFF3)

type:

data:annotation

description:

A transcript annotation file produced by cufflinks, cuffcompare, or other tool.

required:

True

disabled:

False

hidden:

False

genome
label:

Run bias detection and correction algorithm

type:

data:seq:nucleotide

description:

Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.

required:

False

disabled:

False

hidden:

False

multi_read_correct
label:

Do initial estimation procedure to more accurately weight reads with multiple genome mappings

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

create_sets
label:

Create gene sets

type:

basic:boolean

description:

After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.

required:

True

disabled:

False

hidden:

False

default:

False

gene_logfc
label:

Log2 fold change threshold for gene sets

type:

basic:decimal

description:

Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.

required:

True

disabled:

False

hidden:

!create_sets

default:

1.0

gene_fdr
label:

FDR threshold for gene sets

type:

basic:decimal

required:

True

disabled:

False

hidden:

!create_sets

default:

0.05

fdr
label:

Allowed FDR

type:

basic:decimal

description:

The allowed false discovery rate. The default is 0.05.

required:

True

disabled:

False

hidden:

False

default:

0.05

library_type
label:

Library type

type:

basic:string

description:

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

required:

True

disabled:

False

hidden:

False

default:

fr-unstranded

choices:

  • fr-unstranded: fr-unstranded

  • fr-firststrand: fr-firststrand

  • fr-secondstrand: fr-secondstrand

library_normalization
label:

Library normalization method

type:

basic:string

description:

You can control how library sizes (i.e. sequencing depths) are normalized in Cufflinks and Cuffdiff. Cuffdiff has several methods that require multiple libraries in order to work. Library normalization methods supported by Cufflinks work on one library at a time.

required:

True

disabled:

False

hidden:

False

default:

geometric

choices:

  • geometric: geometric

  • classic-fpkm: classic-fpkm

  • quartile: quartile

dispersion_method
label:

Dispersion method

type:

basic:string

description:

Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010).

required:

True

disabled:

False

hidden:

False

default:

pooled

choices:

  • pooled: pooled

  • per-condition: per-condition

  • blind: blind

  • poisson: poisson

Output results

raw
label:

Differential expression

type:

basic:file

required:

True

disabled:

False

hidden:

False

de_json
label:

Results table (JSON)

type:

basic:json

required:

True

disabled:

False

hidden:

False

de_file
label:

Results table (file)

type:

basic:file

required:

True

disabled:

False

hidden:

False

transcript_diff_exp
label:

Differential expression (transcript level)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tss_group_diff_exp
label:

Differential expression (primary transcript)

type:

basic:file

required:

True

disabled:

False

hidden:

False

cds_diff_exp
label:

Differential expression (coding sequence)

type:

basic:file

required:

True

disabled:

False

hidden:

False

cuffdiff_output
label:

Cuffdiff output

type:

basic:file

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID database

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

feature_type
label:

Feature type

type:

basic:string

required:

True

disabled:

False

hidden:

False

Cuffmerge

data:annotation:cuffmergecuffmerge (list:data:cufflinks:cufflinks  expressions, list:data:annotation:gtf  gtf, data:annotation  gff, data:seq:nucleotide  genome, basic:integer  threads)[Source: v2.2.0]

Cufflinks includes a script called Cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. The main purpose of Cuffmerge is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/) for more information.

Input arguments

expressions
label:

Cufflinks transcripts (GTF)

type:

list:data:cufflinks:cufflinks

required:

False

gtf
label:

Annotation files (GTF)

type:

list:data:annotation:gtf

description:

Annotation files you wish to merge together with Cufflinks produced annotation files (e.g. upload Cufflinks annotation GTF file)

required:

False

gff
label:

Reference annotation (GTF/GFF3)

type:

data:annotation

description:

An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.

required:

False

genome
label:

Reference genome

type:

data:seq:nucleotide

description:

This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats. Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension

required:

False

threads
label:

Use this many processor threads

type:

basic:integer

description:

Use this many threads to align reads. The default is 1.

default:

1

Output results

annot
label:

Merged GTF file

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Cuffnorm

data:cuffnormcuffnorm (list:data:cufflinks:cuffquant  cuffquant, data:annotation  annotation, basic:boolean  useERCC)[Source: v2.5.0]

Cufflinks includes a program, Cuffnorm, that you can use to generate tables of expression values that are properly normalized for library size. Cuffnorm takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM, BAM, or CXB files for two or more samples. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/) for more information. Replicate relation needs to be defined for Cuffnorm to account for replicates. If the replicate relation is not defined, each sample will be treated individually.

Input arguments

cuffquant
label:

Cuffquant expression file

type:

list:data:cufflinks:cuffquant

annotation
label:

Annotation (GTF/GFF3)

type:

data:annotation

description:

A transcript annotation file produced by cufflinks, cuffcompare, or other source.

useERCC
label:

ERCC spike-in normalization

type:

basic:boolean

description:

Use ERRCC spike-in controls for normalization.

default:

False

Output results

genes_count
label:

Genes count

type:

basic:file

genes_fpkm
label:

Genes FPKM

type:

basic:file

genes_attr
label:

Genes attr table

type:

basic:file

isoform_count
label:

Isoform count

type:

basic:file

isoform_fpkm
label:

Isoform FPKM

type:

basic:file

isoform_attr
label:

Isoform attr table

type:

basic:file

cds_count
label:

CDS count

type:

basic:file

cds_fpkm
label:

CDS FPKM

type:

basic:file

cds_attr
label:

CDS attr table

type:

basic:file

tss_groups_count
label:

TSS groups count

type:

basic:file

tss_groups_fpkm
label:

TSS groups FPKM

type:

basic:file

tss_attr
label:

TSS attr table

type:

basic:file

run_info
label:

Run info

type:

basic:file

raw_scatter
label:

FPKM exp scatter plot

type:

basic:file

boxplot
label:

Boxplot

type:

basic:file

fpkm_exp_raw
label:

FPKM exp raw

type:

basic:file

replicate_correlations
label:

Replicate correlatios plot

type:

basic:file

fpkm_means
label:

FPKM means

type:

basic:file

exp_fpkm_means
label:

Exp FPKM means

type:

basic:file

norm_scatter
label:

FKPM exp scatter normalized plot

type:

basic:file

required:

False

fpkm_exp_norm
label:

FPKM exp normalized

type:

basic:file

required:

False

spike_raw
label:

Spike raw

type:

basic:file

required:

False

spike_norm
label:

Spike normalized

type:

basic:file

required:

False

R_data
label:

All R normalization data

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Cuffquant 2.2

data:cufflinks:cuffquantcuffquant (data:alignment:bam  alignment, data:annotation  annotation, data:seq:nucleotide  genome, data:annotation:gtf  mask_file, basic:string  library_type, basic:boolean  multi_read_correct)[Source: v2.3.1]

Cuffquant allows you to compute the gene and transcript expression profiles and save these profiles to files that you can analyze later with Cuffdiff or Cuffnorm. See [here](http://cole-trapnell-lab.github.io/cufflinks/manual/) for more information.

Input arguments

alignment
label:

Aligned reads

type:

data:alignment:bam

annotation
label:

Annotation (GTF/GFF3)

type:

data:annotation

genome
label:

Run bias detection and correction algorithm

type:

data:seq:nucleotide

description:

Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.

required:

False

mask_file
label:

Mask file

type:

data:annotation:gtf

description:

Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

required:

False

library_type
label:

Library type

type:

basic:string

description:

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

default:

fr-unstranded

choices:

  • fr-unstranded: fr-unstranded

  • fr-firststrand: fr-firststrand

  • fr-secondstrand: fr-secondstrand

multi_read_correct
label:

Do initial estimation procedure to more accurately weight reads with multiple genome mappings

type:

basic:boolean

description:

Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.

default:

False

Output results

cxb
label:

Abundances (.cxb)

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Cuffquant results

data:cufflinks:cuffquantupload-cxb (basic:file  src, basic:string  source, basic:string  species, basic:string  build, basic:string  feature_type)[Source: v1.3.3]

Upload Cuffquant results file (.cxb)

Input arguments

src
label:

Cuffquant file

type:

basic:file

description:

Upload Cuffquant results file. Supported extention: *.cxb

required:

True

validate_regex:

\.(cxb)$

source
label:

Gene ID database

type:

basic:string

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

default:

gene

choices:

  • gene: gene

  • transcript: transcript

  • exon: exon

Output results

cxb
label:

Cuffquant results

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

Cut & Run

data:workflow:cutnrunworkflow-cutnrun (data:reads:fastq:paired  reads, basic:integer  quality, basic:integer  nextseq, basic:string  phred, basic:integer  min_length, basic:integer  max_n, basic:boolean  retain_unpaired, basic:integer  unpaired_len_1, basic:integer  unpaired_len_2, basic:integer  clip_r1, basic:integer  clip_r2, basic:integer  three_prime_r1, basic:integer  three_prime_r2, list:basic:string  adapter, list:basic:string  adapter_2, data:seq:nucleotide  adapter_file_1, data:seq:nucleotide  adapter_file_2, basic:string  universal_adapter, basic:integer  stringency, basic:decimal  error_rate, basic:integer  trim_5, basic:integer  trim_3, data:index:bowtie2  genome, basic:string  mode, basic:string  speed, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  no_overlap, basic:boolean  dovetail, basic:boolean  no_unal, data:index:bowtie2  genome, basic:string  mode, basic:string  speed, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  no_overlap, basic:boolean  dovetail, basic:boolean  no_unal, basic:string  format, basic:decimal  pvalue, basic:string  duplicates, basic:boolean  bedgraph, basic:integer  min_frag_length, basic:integer  max_frag_length, basic:decimal  scale)[Source: v1.6.0]

Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome. Aligned reads are processed to produce bigwig files to be viewed in a genome browser. Peaks are called using MACS2. Lenght-selection of reads is performed using alignmentSieve tool from the deeptools package.

Input arguments

reads
label:

Input reads

type:

data:reads:fastq:paired

options_trimming.quality_trim.quality
label:

Quality cutoff

type:

basic:integer

description:

Trim low-quality ends from reads based on Phred score.

required:

False

options_trimming.quality_trim.nextseq
label:

NextSeq/NovaSeq trim cutoff

type:

basic:integer

description:

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.

required:

False

options_trimming.quality_trim.phred
label:

Phred score encoding

type:

basic:string

description:

Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1 .9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming.

default:

--phred33

choices:

  • ASCII+33: --phred33

  • ASCII+64: --phred64

options_trimming.quality_trim.min_length
label:

Minimum length after trimming

type:

basic:integer

description:

Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.

default:

20

options_trimming.quality_trim.max_n
label:

Maximum number of Ns

type:

basic:integer

description:

Read exceeding this limit will result in the entire pair being removed from the trimmed output files.

required:

False

options_trimming.quality_trim.retain_unpaired
label:

Retain unpaired reads after trimming

type:

basic:boolean

description:

If only one of the two paired-end reads “became too short, the longer read will be written.

default:

False

options_trimming.quality_trim.unpaired_len_1
label:

Unpaired read length cutoff of mate 1

type:

basic:integer

hidden:

!quality_trim.retain_unpaired

default:

35

options_trimming.quality_trim.unpaired_len_2
label:

Unpaired read length cutoff for mate 2

type:

basic:integer

hidden:

!quality_trim.retain_unpaired

default:

35

options_trimming.quality_trim.clip_r1
label:

Trim bases from 5’ end of read 1

type:

basic:integer

description:

This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.

required:

False

options_trimming.quality_trim.clip_r2
label:

Trim bases from 5’ end of read 2

type:

basic:integer

description:

This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.

required:

False

options_trimming.quality_trim.three_prime_r1
label:

Trim bases from 3’ end of read 1

type:

basic:integer

description:

Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.

required:

False

options_trimming.quality_trim.three_prime_r2
label:

Trim bases from 3’ end of read 2

type:

basic:integer

description:

Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.

required:

False

options_trimming.adapter_trim.adapter
label:

Read 1 adapter sequence

type:

list:basic:string

description:

Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.

required:

False

options_trimming.adapter_trim.adapter_2
label:

Read 2 adapter sequence

type:

list:basic:string

description:

Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.

required:

False

options_trimming.adapter_trim.adapter_file_1
label:

Read 1 adapters file

type:

data:seq:nucleotide

description:

This is mutually exclusive with read 1 adapters and universal adapters.

required:

False

options_trimming.adapter_trim.adapter_file_2
label:

Read 2 adapters file

type:

data:seq:nucleotide

description:

This is mutually exclusive with read 2 adapters and universal adapters.

required:

False

options_trimming.adapter_trim.universal_adapter
label:

Universal adapters

type:

basic:string

description:

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required:

False

choices:

  • Illumina: --illumina

  • Nextera: --nextera

  • Illumina small RNA: --small_rna

options_trimming.adapter_trim.stringency
label:

Overlap with adapter sequence required to trim

type:

basic:integer

description:

Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.

default:

1

options_trimming.adapter_trim.error_rate
label:

Maximum allowed error rate

type:

basic:decimal

description:

Number of errors divided by the length of the matching region. Default value of 0.1.

default:

0.1

options_trimming.hard_trim.trim_5
label:

Hard trim sequence from 3’ end

type:

basic:integer

description:

Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.

required:

False

options_trimming.hard_trim.trim_3
label:

Hard trim sequences from 5’ end

type:

basic:integer

description:

Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.

required:

False

options_aln_species.genome
label:

Species genome

type:

data:index:bowtie2

options_aln_species.mode
label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--local

choices:

  • end to end mode: --end-to-end

  • local: --local

options_aln_species.speed
label:

Speed vs. Sensitivity

type:

basic:string

description:

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

default:

--very-sensitive

choices:

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

options_aln_species.discordantly
label:

Report discordantly matched read

type:

basic:boolean

description:

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default:

True

options_aln_species.rep_se
label:

Report single ended

type:

basic:boolean

description:

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).

default:

True

options_aln_species.minins
label:

Minimal distance

type:

basic:integer

description:

The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.

default:

10

options_aln_species.maxins
label:

Maximal distance

type:

basic:integer

description:

The maximum fragment length (–maxins) for valid paired-end alignments.

default:

700

options_aln_species.no_overlap
label:

Not concordant when mates overlap

type:

basic:boolean

description:

When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).

default:

False

options_aln_species.dovetail
label:

Dovetail

type:

basic:boolean

description:

If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.

default:

False

options_aln_species.no_unal
label:

Suppress SAM records for unaligned reads

type:

basic:boolean

description:

When true, suppress SAM records for unaligned reads. Default is true (–no-unal).

default:

True

options_aln_spikein.genome
label:

Spike-in genome

type:

data:index:bowtie2

options_aln_spikein.mode
label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--local

choices:

  • end to end mode: --end-to-end

  • local: --local

options_aln_spikein.speed
label:

Speed vs. Sensitivity

type:

basic:string

description:

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

default:

--very-sensitive

choices:

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

options_aln_spikein.discordantly
label:

Report discordantly matched read

type:

basic:boolean

description:

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default:

True

options_aln_spikein.rep_se
label:

Report single ended

type:

basic:boolean

description:

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).

default:

True

options_aln_spikein.minins
label:

Minimal distance

type:

basic:integer

description:

The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.

default:

10

options_aln_spikein.maxins
label:

Maximal distance

type:

basic:integer

description:

The maximum fragment length (–maxins) for valid paired-end alignments.

default:

700

options_aln_spikein.no_overlap
label:

Not concordant when mates overlap

type:

basic:boolean

description:

When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).

default:

True

options_aln_spikein.dovetail
label:

Dovetail

type:

basic:boolean

description:

If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.

default:

False

options_aln_spikein.no_unal
label:

Suppress SAM records for unaligned reads

type:

basic:boolean

description:

When true, suppress SAM records for unaligned reads. Default is true (–no-unal).

default:

True

options_pc.format
label:

Format of tag file

type:

basic:string

description:

This specifies the format of input files. For paired-end data the format dicates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.

required:

False

default:

BAMPE

choices:

  • BAM: BAM

  • BAMPE: BAMPE

options_pc.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff.

required:

False

default:

0.001

options_pc.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

options_pc.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10(pvalue) and -log10(qvalue) scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default:

True

options_sieve.min_frag_length
label:

Minimum fragment length

type:

basic:integer

description:

The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. Default is 0.

default:

0

options_sieve.max_frag_length
label:

Maximum fragment length

type:

basic:integer

description:

The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. Default is 0.

default:

0

options_scale.scale
label:

Scale factor

type:

basic:decimal

description:

Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).

default:

10000

Output results

Cutadapt (3’ mRNA-seq, single-end)

data:reads:fastq:single:cutadapt:cutadapt-3prime-single (data:reads:fastq:single  reads, basic:integer  nextseq_trim, basic:integer  quality_cutoff, basic:integer  min_len, basic:integer  min_overlap, basic:integer  times)[Source: v1.4.2]

Process 3’ mRNA-seq datasets using Cutadapt tool.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:single

required:

True

disabled:

False

hidden:

False

options.nextseq_trim
label:

NextSeq/NovaSeq trim

type:

basic:integer

description:

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.

required:

True

disabled:

False

hidden:

False

default:

10

options.quality_cutoff
label:

Quality cutoff

type:

basic:integer

description:

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.

required:

False

disabled:

False

hidden:

False

options.min_len
label:

Discard reads shorter than specified minimum length.

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

20

options.min_overlap
label:

Mimimum overlap

type:

basic:integer

description:

Minimum overlap between adapter and read for an adapter to be found.

required:

True

disabled:

False

hidden:

False

default:

20

options.times
label:

Remove up to a specified number of adapters from each read.

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

2

Output results

fastq
label:

Reads file.

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

report
label:

Cutadapt report

type:

basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC.

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive.

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Cutadapt (Corall RNA-Seq, paired-end)

data:reads:fastq:paired:cutadapt:cutadapt-corall-paired (data:reads:fastq:paired  reads, basic:integer  nextseq_trim, basic:integer  quality_cutoff, basic:integer  min_len, basic:integer  min_overlap)[Source: v1.3.2]

Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:paired

required:

True

disabled:

False

hidden:

False

options.nextseq_trim
label:

NextSeq/NovaSeq trim

type:

basic:integer

description:

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.

required:

True

disabled:

False

hidden:

False

default:

10

options.quality_cutoff
label:

Quality cutoff

type:

basic:integer

description:

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.

required:

False

disabled:

False

hidden:

False

options.min_len
label:

Minimum read length

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

20

options.min_overlap
label:

Mimimum overlap

type:

basic:integer

description:

Minimum overlap between adapter and read for an adapter to be found.

required:

True

disabled:

False

hidden:

False

default:

20

Output results

fastq
label:

Remaining mate1 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Remaining mate2 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

report
label:

Cutadapt report

type:

basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Mate1 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Mate2 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download mate1 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download mate2 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Cutadapt (Corall RNA-Seq, single-end)

data:reads:fastq:single:cutadapt:cutadapt-corall-single (data:reads:fastq:single  reads, basic:integer  nextseq_trim, basic:integer  quality_cutoff, basic:integer  min_len, basic:integer  min_overlap)[Source: v1.4.2]

Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:single

required:

True

disabled:

False

hidden:

False

options.nextseq_trim
label:

NextSeq/NovaSeq trim

type:

basic:integer

description:

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.

required:

True

disabled:

False

hidden:

False

default:

10

options.quality_cutoff
label:

Quality cutoff

type:

basic:integer

description:

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.

required:

False

disabled:

False

hidden:

False

options.min_len
label:

Minimum read length

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

20

options.min_overlap
label:

Mimimum overlap

type:

basic:integer

description:

Minimum overlap between adapter and read for an adapter to be found.

required:

True

disabled:

False

hidden:

False

default:

20

Output results

fastq
label:

Reads file

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

report
label:

Cutadapt report

type:

basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Cutadapt (paired-end)

data:reads:fastq:paired:cutadaptcutadapt-paired (data:reads:fastq:paired  reads, data:seq:nucleotide  mate1_5prime_file, data:seq:nucleotide  mate1_3prime_file, data:seq:nucleotide  mate2_5prime_file, data:seq:nucleotide  mate2_3prime_file, list:basic:string  mate1_5prime_seq, list:basic:string  mate1_3prime_seq, list:basic:string  mate2_5prime_seq, list:basic:string  mate2_3prime_seq, basic:integer  times, basic:decimal  error_rate, basic:integer  min_overlap, basic:boolean  match_read_wildcards, basic:boolean  no_indels, basic:integer  nextseq_trim, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:integer  maxlen, basic:integer  max_n, basic:string  pair_filter)[Source: v2.7.2]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:paired

adapters.mate1_5prime_file
label:

5 prime adapter file for Mate 1

type:

data:seq:nucleotide

required:

False

adapters.mate1_3prime_file
label:

3 prime adapter file for Mate 1

type:

data:seq:nucleotide

required:

False

adapters.mate2_5prime_file
label:

5 prime adapter file for Mate 2

type:

data:seq:nucleotide

required:

False

adapters.mate2_3prime_file
label:

3 prime adapter file for Mate 2

type:

data:seq:nucleotide

required:

False

adapters.mate1_5prime_seq
label:

5 prime adapter sequence for Mate 1

type:

list:basic:string

required:

False

adapters.mate1_3prime_seq
label:

3 prime adapter sequence for Mate 1

type:

list:basic:string

required:

False

adapters.mate2_5prime_seq
label:

5 prime adapter sequence for Mate 2

type:

list:basic:string

required:

False

adapters.mate2_3prime_seq
label:

3 prime adapter sequence for Mate 2

type:

list:basic:string

required:

False

adapters.times
label:

Times

type:

basic:integer

description:

Remove up to COUNT adapters from each read.

default:

1

adapters.error_rate
label:

Error rate

type:

basic:decimal

description:

Maximum allowed error rate (no. of errors divided by the length of the matching region).

default:

0.1

adapters.min_overlap
label:

Minimal overlap

type:

basic:integer

description:

Minimum overlap for an adapter match.

default:

3

adapters.match_read_wildcards
label:

Match read wildcards

type:

basic:boolean

description:

Interpret IUPAC wildcards in reads.

default:

False

adapters.no_indels
label:

No indels

type:

basic:boolean

description:

Disable (disallow) insertions and deletions in adapters.

default:

False

modify_reads.nextseq_trim
label:

NextSeq-specific quality trimming

type:

basic:integer

description:

NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.

required:

False

modify_reads.leading
label:

Quality on 5 prime

type:

basic:integer

description:

Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base.

required:

False

modify_reads.trailing
label:

Quality on 3 prime

type:

basic:integer

description:

Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base.

required:

False

modify_reads.crop
label:

Crop

type:

basic:integer

description:

Cut the specified number of bases from the end of the reads.

required:

False

modify_reads.headcrop
label:

Headcrop

type:

basic:integer

description:

Cut the specified number of bases from the start of the reads.

required:

False

filtering.minlen
label:

Min length

type:

basic:integer

description:

Drop the read if it is below a specified.

required:

False

filtering.maxlen
label:

Max length

type:

basic:integer

description:

Drop the read if it is above a specified length.

required:

False

filtering.max_n
label:

Max numebr of N-s

type:

basic:integer

description:

Discard reads having more ‘N’ bases than specified.

required:

False

filtering.pair_filter
label:

Which of the reads have to match the filtering criterion

type:

basic:string

description:

Which of the reads in a paired-end read have to match the filtering criterion in order for the pair to be filtered.

default:

any

choices:

  • Any of the reads in a paired-end read have to match the filtering criterion: any

  • Both of the reads in a paired-end read have to match the filtering criterion: both

Output results

fastq
label:

Reads file (forward)

type:

list:basic:file

fastq2
label:

Reads file (reverse)

type:

list:basic:file

report
label:

Cutadapt report

type:

basic:file

fastqc_url
label:

Quality control with FastQC (forward)

type:

list:basic:file:html

fastqc_url2
label:

Quality control with FastQC (reverse)

type:

list:basic:file:html

fastqc_archive
label:

Download FastQC archive (forward)

type:

list:basic:file

fastqc_archive2
label:

Download FastQC archive (reverse)

type:

list:basic:file

Cutadapt (single-end)

data:reads:fastq:single:cutadaptcutadapt-single (data:reads:fastq:single  reads, data:seq:nucleotide  up_primers_file, data:seq:nucleotide  down_primers_file, list:basic:string  up_primers_seq, list:basic:string  down_primers_seq, basic:integer  polya_tail, basic:integer  min_overlap, basic:integer  nextseq_trim, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:integer  maxlen, basic:integer  max_n, basic:boolean  match_read_wildcards, basic:boolean  no_indels, basic:integer  times, basic:decimal  error_rate)[Source: v2.5.2]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:single

adapters.up_primers_file
label:

5 prime adapter file

type:

data:seq:nucleotide

required:

False

adapters.down_primers_file
label:

3 prime adapter file

type:

data:seq:nucleotide

required:

False

adapters.up_primers_seq
label:

5 prime adapter sequence

type:

list:basic:string

required:

False

adapters.down_primers_seq
label:

3 prime adapter sequence

type:

list:basic:string

required:

False

adapters.polya_tail
label:

Poly-A tail

type:

basic:integer

description:

Length of poly-A tail, example - AAAN -> 3, AAAAAN -> 5

required:

False

adapters.min_overlap
label:

Minimal overlap

type:

basic:integer

description:

Minimum overlap for an adapter match

default:

3

modify_reads.nextseq_trim
label:

NextSeq-specific quality trimming

type:

basic:integer

description:

NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.

required:

False

modify_reads.leading
label:

Quality on 5 prime

type:

basic:integer

description:

Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.

required:

False

modify_reads.trailing
label:

Quality on 3 prime

type:

basic:integer

description:

Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.

required:

False

modify_reads.crop
label:

Crop

type:

basic:integer

description:

Cut the read to a specified length by removing bases from the end

required:

False

modify_reads.headcrop
label:

Headcrop

type:

basic:integer

description:

Cut the specified number of bases from the start of the read

required:

False

filtering.minlen
label:

Min length

type:

basic:integer

description:

Drop the read if it is below a specified length

required:

False

filtering.maxlen
label:

Max length

type:

basic:integer

description:

Drop the read if it is above a specified length.

required:

False

filtering.max_n
label:

Max numebr of N-s

type:

basic:integer

description:

Discard reads having more ‘N’ bases than specified.

required:

False

filtering.match_read_wildcards
label:

Match read wildcards

type:

basic:boolean

description:

Interpret IUPAC wildcards in reads.

required:

False

default:

False

filtering.no_indels
label:

No indels

type:

basic:boolean

description:

Disable (disallow) insertions and deletions in adapters.

default:

False

filtering.times
label:

Times

type:

basic:integer

description:

Remove up to COUNT adapters from each read.

default:

1

filtering.error_rate
label:

Error rate

type:

basic:decimal

description:

Maximum allowed error rate (no. of errors divided by the length of the matching region).

default:

0.1

Output results

fastq
label:

Reads file

type:

list:basic:file

report
label:

Cutadapt report

type:

basic:file

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

Cutadapt - STAR - StringTie (Corall, paired-end)

data:workflow:rnaseq:corallworkflow-corall-paired (data:reads:fastq:paired  reads, data:index:star  star_index, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:integer  quality_cutoff, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:string  feature_class, basic:string  id_attribute)[Source: v5.2.0]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:paired

star_index
label:

Genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

annotation
label:

Annotation

type:

data:annotation

description:

Genome annotation file (GTF).

rrna_reference
label:

Indexed rRNA reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label:

Indexed Globin reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

cutadapt.quality_cutoff
label:

Reads quality cutoff

type:

basic:integer

description:

Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.

required:

False

downsampling.n_reads
label:

Number of reads

type:

basic:integer

default:

1000000

downsampling.seed
label:

Seed

type:

basic:integer

default:

11

downsampling.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required:

False

downsampling.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default:

False

quantification.feature_class
label:

Feature class

type:

basic:string

description:

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

default:

exon

quantification.id_attribute
label:

ID attribute

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

default:

gene_id

choices:

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

Output results

Cutadapt - STAR - StringTie (Corall, single-end)

data:workflow:rnaseq:corallworkflow-corall-single (data:reads:fastq:single  reads, data:index:star  star_index, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:integer  quality_cutoff, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:string  feature_class, basic:string  id_attribute)[Source: v5.2.0]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:single

star_index
label:

Genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

annotation
label:

Annotation

type:

data:annotation

description:

Genome annotation file (GTF).

rrna_reference
label:

Indexed rRNA reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label:

Indexed Globin reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

cutadapt.quality_cutoff
label:

Reads quality cutoff

type:

basic:integer

description:

Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.

required:

False

downsampling.n_reads
label:

Number of reads

type:

basic:integer

default:

1000000

downsampling.seed
label:

Seed

type:

basic:integer

default:

11

downsampling.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required:

False

downsampling.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default:

False

quantification.feature_class
label:

Feature class

type:

basic:string

description:

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

default:

exon

quantification.id_attribute
label:

ID attribute

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

default:

gene_id

choices:

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

Output results

DESeq2

data:differentialexpression:deseq2:differentialexpression-deseq2 (list:data:expression  case, list:data:expression  control, basic:boolean  create_sets, basic:decimal  logfc, basic:decimal  fdr, basic:boolean  beta_prior, basic:boolean  count, basic:integer  min_count_sum, basic:boolean  cook, basic:decimal  cooks_cutoff, basic:boolean  independent, basic:decimal  alpha)[Source: v3.6.0]

Run DESeq2 analysis. The DESeq2 package estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. See [here](https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf) and [here](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) for more information.

Input arguments

case
label:

Case

type:

list:data:expression

description:

Case samples (replicates)

required:

True

disabled:

False

hidden:

False

control
label:

Control

type:

list:data:expression

description:

Control samples (replicates)

required:

True

disabled:

False

hidden:

False

create_sets
label:

Create gene sets

type:

basic:boolean

description:

After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.

required:

True

disabled:

False

hidden:

False

default:

False

logfc
label:

Log2 fold change threshold for gene sets

type:

basic:decimal

description:

Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.

required:

True

disabled:

False

hidden:

!create_sets

default:

1.0

fdr
label:

FDR threshold for gene sets

type:

basic:decimal

required:

True

disabled:

False

hidden:

!create_sets

default:

0.05

options.beta_prior
label:

Beta prior

type:

basic:boolean

description:

Whether or not to put a zero-mean normal prior on the non-intercept coefficients.

required:

True

disabled:

False

hidden:

False

default:

False

filter_options.count
label:

Filter genes based on expression count

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

filter_options.min_count_sum
label:

Minimum gene expression count summed over all samples

type:

basic:integer

description:

Filter genes in the expression matrix input. Remove genes where the expression count sum over all samples is below the threshold.

required:

True

disabled:

False

hidden:

!filter_options.count

default:

10

filter_options.cook
label:

Filter genes based on Cook’s distance

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

filter_options.cooks_cutoff
label:

Threshold on Cook’s distance

type:

basic:decimal

description:

If one or more samples have Cook’s distance larger than the threshold set here, the p-value for the row is set to NA. If left empty, the default threshold of 0.99 quantile of the F(p, m-p) distribution is used, where p is the number of coefficients being fitted and m is the number of samples. This test excludes Cook’s distance of samples belonging to experimental groups with only two samples.

required:

False

disabled:

False

hidden:

!filter_options.cook

filter_options.independent
label:

Apply independent gene filtering

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

filter_options.alpha
label:

Significance cut-off used for optimizing independent gene filtering

type:

basic:decimal

description:

The value should be set to adjusted p-value cut-off (FDR).

required:

True

disabled:

False

hidden:

!filter_options.independent

default:

0.1

Output results

raw
label:

Differential expression

type:

basic:file

required:

True

disabled:

False

hidden:

False

de_json
label:

Results table (JSON)

type:

basic:json

required:

True

disabled:

False

hidden:

False

de_file
label:

Results table (file)

type:

basic:file

required:

True

disabled:

False

hidden:

False

count_matrix
label:

Count matrix

type:

basic:file

required:

True

disabled:

False

hidden:

False

count_matrix_normalized
label:

Normalized count matrix (median of ratios)

type:

basic:file

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID database

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

feature_type
label:

Feature type

type:

basic:string

required:

True

disabled:

False

hidden:

False

Detect library strandedness

data:strandednesslibrary-strandedness (data:reads:fastq  reads, basic:integer  read_number, data:index:salmon  salmon_index)[Source: v0.6.2]

This process uses the Salmon transcript quantification tool to automatically infer the NGS library strandedness. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

Input arguments

reads
label:

Sequencing reads

type:

data:reads:fastq

description:

Sequencing reads in .fastq format. Both single and paired-end libraries are supported

read_number
label:

Number of input reads

type:

basic:integer

description:

Number of sequencing reads that are subsampled from each of the original .fastq files before library strand detection

default:

50000

salmon_index
label:

Transcriptome index file

type:

data:index:salmon

description:

Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results

Output results

strandedness
label:

Library strandedness type

type:

basic:string

description:

The predicted library strandedness type. The codes U and IU indicate ‘strand non-specific’ library for single or paired-end reads, respectively. Codes SF and ISF correspond to the ‘strand-specific forward’ library, for the single or paired-end reads, respectively. For ‘strand-specific reverse’ library, the corresponding codes are SR and ISR. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

fragment_ratio
label:

Compatible fragment ratio

type:

basic:decimal

description:

The ratio of fragments that support the predicted library strandedness type

log
label:

Log file

type:

basic:file

description:

Analysis log file.

Dictyostelium expressions

data:expression:polyaexpression-dicty (data:alignment:bam  alignment, data:annotation:gff3  gff, data:mappability:bcm  mappable)[Source: v1.4.2]

Dictyostelium-specific pipeline. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Input arguments

alignment
label:

Aligned sequence

type:

data:alignment:bam

gff
label:

Features (GFF3)

type:

data:annotation:gff3

mappable
label:

Mappability

type:

data:mappability:bcm

Output results

exp
label:

Expression RPKUM (polyA)

type:

basic:file

description:

mRNA reads scaled by uniquely mappable part of exons.

rpkmpolya
label:

Expression RPKM (polyA)

type:

basic:file

description:

mRNA reads scaled by exon length.

rc
label:

Read counts (polyA)

type:

basic:file

description:

mRNA reads uniquely mapped to gene exons.

rpkum
label:

Expression RPKUM

type:

basic:file

description:

Reads scaled by uniquely mappable part of exons.

rpkm
label:

Expression RPKM

type:

basic:file

description:

Reads scaled by exon length.

rc_raw
label:

Read counts (raw)

type:

basic:file

description:

Reads uniquely mapped to gene exons.

exp_json
label:

Expression RPKUM (polyA) (json)

type:

basic:json

exp_type
label:

Expression Type (default output)

type:

basic:string

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

Differential Expression (table)

data:differentialexpression:uploadupload-diffexp (basic:file  src, basic:string  gene_id, basic:string  logfc, basic:string  fdr, basic:string  logodds, basic:string  fwer, basic:string  pvalue, basic:string  stat, basic:string  source, basic:string  species, basic:string  build, basic:string  feature_type, list:data:expression  case, list:data:expression  control)[Source: v1.5.1]

Upload Differential Expression table.

Input arguments

src
label:

Differential expression file

type:

basic:file

description:

Differential expression file. Supported file types: *.xls, *.xlsx, *.tab (tab-delimited file), *.diff. DE file must include columns with log2(fold change) and FDR or pval information. DE file must contain header row with column names. Accepts DESeq, DESeq2, edgeR and CuffDiff output files.

validate_regex:

\.(xls|xlsx|tab|tab.gz|diff|diff.gz)$

gene_id
label:

Gene ID label

type:

basic:string

logfc
label:

LogFC label

type:

basic:string

fdr
label:

FDR label

type:

basic:string

required:

False

logodds
label:

LogOdds label

type:

basic:string

required:

False

fwer
label:

FWER label

type:

basic:string

required:

False

pvalue
label:

Pvalue label

type:

basic:string

required:

False

stat
label:

Statistics label

type:

basic:string

required:

False

source
label:

Gene ID database

type:

basic:string

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Build

type:

basic:string

description:

Genome build or annotation version.

feature_type
label:

Feature type

type:

basic:string

default:

gene

choices:

  • gene: gene

  • transcript: transcript

  • exon: exon

case
label:

Case

type:

list:data:expression

description:

Case samples (replicates)

required:

False

control
label:

Control

type:

list:data:expression

description:

Control samples (replicates)

required:

False

Output results

raw
label:

Differential expression

type:

basic:file

de_json
label:

Results table (JSON)

type:

basic:json

de_file
label:

Results table (file)

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

Differential expression of shRNA

data:shrna:differentialexpression:differentialexpression-shrna (data:file  parameter_file, list:data:expression:shrna2quant:  expression_data)[Source: v1.3.0]

Performing differential expression on a list of objects. Analysis starts by inputting a set of expression files (count matrices) and a parameter file. Parameter file is an xlsx file and consists of tabs: - `sample_key`: Should have column sample with exact sample name as input expression file(s), columns defining treatment and lastly a column which indicates replicate. - `contrasts`: Define groups which will be used to perform differential expression analysis. Model for DE uses these contrasts and replicate number. In R annotation, this would be ` ~ 1 + group + replicate`. Table should have two columns named `group_1` and `group_2`. - `overall_contrasts`: This is a layer “above” `contrasts`, where results from two contrasts are compared for lethal, beneficial and neutral species. Thresholds governing classification can be found in `classification_parameters` tab. - `classification_parameters`: This tab holds three columns, `threshold`, `value` and `description`. Only the first two are used in the workflow, description is for your benefit. This process outputs DESeq2 results, classified results based on provided thresholds and counts of beneficial and lethal species.

Input arguments

parameter_file
label:

Excel parameter file (.xlsx)

type:

data:file

description:

Select .xlsx file which holds parameters for analysis. See [here](https://github.com/genialis/shRNAde/blob/master/inst/extdata/template_doDE_inputs.xlsx) for a template.

required:

True

disabled:

False

hidden:

False

expression_data
label:

List of expression files from shrna2quant

type:

list:data:expression:shrna2quant:

required:

True

disabled:

False

hidden:

False

Output results

deseq_results
label:

DESeq2 results

type:

basic:file

required:

True

disabled:

False

hidden:

False

class_results
label:

Results classified based on thresholds provided by the user

type:

basic:file

required:

True

disabled:

False

hidden:

False

beneficial_counts
label:

shRNAs considered as beneficial based on user input

type:

basic:file

required:

True

disabled:

False

hidden:

False

lethal_counts
label:

shRNAs considered as lethal based on user input

type:

basic:file

required:

True

disabled:

False

hidden:

False

Ensembl Variant Effect Predictor

data:variants:vcf:vep:ensembl-vep (data:variants:vcf  vcf, data:vep:cache  cache, data:seq:nucleotide  ref_seq, basic:integer  n_forks)[Source: v2.1.0]

Run Ensembl-VEP. VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. This process accepts VCF file and VEP cache directory to produce VCF file with annotated variants, its index and summary of the procces.

Input arguments

vcf
label:

Input VCF file

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

cache
label:

Cache directory for Ensembl-VEP

type:

data:vep:cache

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

n_forks
label:

Number of forks

type:

basic:integer

description:

Using forking enables VEP to run multiple parallel threads, with each thread processing a subset of your input. Forking can dramatically improve runtime.

required:

True

disabled:

False

hidden:

False

default:

2

Output results

vcf
label:

Annotated VCF file

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

summary
label:

Summary of the analysis

type:

basic:file:html

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Ensembl-VEP cache directory

data:vep:cache:upload-vep-cache (basic:file  cache_file, basic:string  species, basic:string  build, basic:string  release)[Source: v1.1.0]

Import VEP cache directory.

Input arguments

cache_file
label:

Compressed cache directory

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu.

required:

True

disabled:

False

hidden:

False

default:

Homo sapiens

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

build
label:

Genome build

type:

basic:string

required:

True

disabled:

False

hidden:

False

release
label:

Cache release

type:

basic:string

required:

True

disabled:

False

hidden:

False

Output results

cache
label:

Cache directory

type:

basic:dir

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

release
label:

Cache release

type:

basic:string

required:

True

disabled:

False

hidden:

False

Expression Time Course

data:etcetc-bcm (list:data:expression  expressions, basic:boolean  avg)[Source: v1.2.2]

Select gene expression data and form a time course.

Input arguments

expressions
label:

RPKM expression profile

type:

list:data:expression

required:

True

avg
label:

Average by time

type:

basic:boolean

default:

True

Output results

etcfile
label:

Expression time course file

type:

basic:file

etc
label:

Expression time course

type:

basic:json

Expression aggregator

data:aggregator:expressionexpression-aggregator (list:data:expression  exps, basic:string  group_by, data:aggregator:expression  expr_aggregator)[Source: v0.5.1]

Collect expression data from samples grouped by sample descriptor field. The Expression aggregator process should not be run in Batch Mode, as this will create redundant outputs. Rather, select multiple samples below for which you wish to aggregate the expression matrix.

Input arguments

exps
label:

Expressions

type:

list:data:expression

group_by
label:

Sample descriptor field

type:

basic:string

expr_aggregator
label:

Expression aggregator

type:

data:aggregator:expression

required:

False

Output results

exp_matrix
label:

Expression matrix

type:

basic:file

box_plot
label:

Box plot

type:

basic:json

log_box_plot
label:

Log box plot

type:

basic:json

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

exp_type
label:

Expression type

type:

basic:string

Expression matrix

data:expressionsetmergeexpressions (list:data:expression  exps, list:basic:string  genes)[Source: v1.4.2]

Merge expression data to create an expression matrix where each column represents all the gene expression levels from a single experiment, and each row represents the expression of a gene across all experiments.

Input arguments

exps
label:

Gene expressions

type:

list:data:expression

genes
label:

Filter genes

type:

list:basic:string

required:

False

Output results

expset
label:

Expression set

type:

basic:file

expset_type
label:

Expression set type

type:

basic:string

Expression time course

data:etcupload-etc (basic:file  src)[Source: v1.4.1]

Upload Expression time course.

Input arguments

src
label:

Expression time course file (xls or tab)

type:

basic:file

description:

Expression time course

required:

True

validate_regex:

\.(xls|xlsx|tab)$

Output results

etcfile
label:

Expression time course file

type:

basic:file

etc
label:

Expression time course

type:

basic:json

FASTA file

data:seq:nucleotide:upload-fasta-nucl (basic:file  src, basic:string  species, basic:string  build)[Source: v3.2.0]

Import nucleotide sequence file in FASTA format. FASTA file is a text-based format for representing nucleotide sequences, in which nucleotides are represented using single-letter codes. The uploaded FASTA file can hold multiple nucleotide sequences.

Input arguments

src
label:

Sequence file (FASTA)

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field. For sequences that are not related to any particular species (e.g. adapters file), you can select the value Other.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Macaca mulatta: Macaca mulatta

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Other: Other

build
label:

Genome build

type:

basic:string

description:

Enter a genome build information associated with the uploaded sequence(s).

required:

True

disabled:

False

hidden:

False

Output results

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta_dict
label:

FASTA dictionary

type:

basic:file

required:

True

disabled:

False

hidden:

False

num_seqs
label:

Number of sequences

type:

basic:integer

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

FASTQ file (paired-end)

data:reads:fastq:paired:upload-fastq-paired (list:basic:file  src1, list:basic:file  src2, basic:boolean  merge_lanes)[Source: v2.6.0]

Import paired-end reads in FASTQ format. Import paired-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

Input arguments

src1
label:

Mate1

type:

list:basic:file

description:

Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*

required:

True

disabled:

False

hidden:

False

src2
label:

Mate2

type:

list:basic:file

description:

Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*

required:

True

disabled:

False

hidden:

False

merge_lanes
label:

Merge lanes

type:

basic:boolean

description:

Merge sample data split into multiple sequencing lanes into a single FASTQ file.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Reads file (mate 1)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Reads file (mate 2)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC (Upstream)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Quality control with FastQC (Downstream)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive (Upstream)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download FastQC archive (Downstream)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

FASTQ file (single-end)

data:reads:fastq:single:upload-fastq-single (list:basic:file  src, basic:boolean  merge_lanes)[Source: v2.6.0]

Import single-end reads in FASTQ format. Import single-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

Input arguments

src
label:

Reads

type:

list:basic:file

description:

Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*

required:

True

disabled:

False

hidden:

False

merge_lanes
label:

Merge lanes

type:

basic:boolean

description:

Merge sample data split into multiple sequencing lanes into a single FASTQ file.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Reads file

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Find similar genes

data:similarexpression:find-similar (list:data:expression  expressions, basic:string  gene, basic:string  distance)[Source: v1.3.1]

Find genes with similar expression profile. Find genes that have similar expression over time to the query gene.

Input arguments

expressions
label:

Time series relation

type:

list:data:expression

description:

Select time course to which the expressions belong to.

required:

True

disabled:

False

hidden:

False

gene
label:

Query gene

type:

basic:string

description:

Select a gene to which others are compared.

required:

True

disabled:

False

hidden:

False

distance
label:

Distance metric

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

spearman

choices:

  • Euclidean: euclidean

  • Spearman: spearman

  • Pearson: pearson

Output results

similar_genes
label:

Similar genes

type:

basic:json

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID database

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

feature_type
label:

Feature type

type:

basic:string

required:

True

disabled:

False

hidden:

False

GAF file

data:gaf:2:0upload-gaf (basic:file  src, basic:string  source, basic:string  species)[Source: v1.4.0]

GO annotation file (GAF v2.0) relating gene ID and associated GO terms

Input arguments

src
label:

GO annotation file (GAF v2.0)

type:

basic:file

description:

Upload GO annotation file (GAF v2.0) relating gene ID and associated GO terms

source
label:

Gene ID database

type:

basic:string

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • MGI: MGI

  • NCBI: NCBI

  • UCSC: UCSC

  • UniProtKB: UniProtKB

species
label:

Species

type:

basic:string

Output results

gaf
label:

GO annotation file (GAF v2.0)

type:

basic:file

gaf_obj
label:

GAF object

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

GATK GenomicsDBImport

data:genomicsdb:gatk-genomicsdb-import (list:data:variants:gvcf  gvcfs, data:bed  intervals, basic:boolean  use_existing, data:genomicsdb  existing_db, basic:integer  batch_size, basic:boolean  consolidate, basic:integer  max_heap_size, basic:boolean  use_cms_gc)[Source: v1.3.0]

Import single-sample GVCFs into GenomicsDB before joint genotyping.

Input arguments

gvcfs
label:

Input data (GVCF)

type:

list:data:variants:gvcf

required:

True

disabled:

False

hidden:

False

intervals
label:

Intervals file (.bed)

type:

data:bed

description:

Intervals file is required if a new database will be created.

required:

False

disabled:

False

hidden:

False

use_existing
label:

Add new samples to an existing GenomicsDB workspace

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

existing_db
label:

Select a GATK GenomicsDB object

type:

data:genomicsdb

description:

Instead of creating a new database the GVCFs are added to this database and a new GenomicsDB object is created.

required:

False

disabled:

False

hidden:

!use_existing

advanced_options.batch_size
label:

Batch size

type:

basic:integer

description:

Batch size controls the number of samples for which readers are open at once and therefore provides a way to minimize memory consumption. However, it can take longer to complete. Use the consolidate flag if more than a hundred batches were used. This will improve feature read time. batchSize=0 means no batching (i.e. readers for all samples will be opened at once).

required:

True

disabled:

False

hidden:

False

default:

0

advanced_options.consolidate
label:

Consolidate

type:

basic:boolean

description:

Boolean flag to enable consolidation. If importing data in batches, a new fragment is created for each batch. In case thousands of fragments are created, GenomicsDB feature readers will try to open ~20x as many files. Also, internally GenomicsDB would consume more memory to maintain bookkeeping data from all fragments. Use this flag to merge all fragments into one. Merging can potentially improve read performance, however overall benefit might not be noticeable as the top Java layers have significantly higher overheads. This flag has no effect if only one batch is used.

required:

True

disabled:

False

hidden:

False

default:

False

advanced_options.max_heap_size
label:

Java maximum heap size in GB (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size.

required:

True

disabled:

False

hidden:

False

default:

28

advanced_options.use_cms_gc
label:

Use CMS Garbage Collector in Java

type:

basic:boolean

description:

The Concurrent Mark Sweep (CMS) implementation uses multiple garbage collector threads for garbage collection.

required:

True

disabled:

False

hidden:

False

default:

True

Output results

database
label:

GenomicsDB workspace

type:

basic:dir

required:

True

disabled:

False

hidden:

False

intervals
label:

Intervals file

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK GenotypeGVCFs

data:variants:vcf:genotypegvcfs:gatk-genotype-gvcfs (data:genomicsdb  database, data:seq:nucleotide  ref_seq, data:variants:vcf  dbsnp, basic:integer  n_jobs, basic:integer  max_heap_size)[Source: v2.3.0]

Consolidate GVCFs and run joint calling using GenotypeGVCFs tool.

Input arguments

database
label:

GATK GenomicsDB

type:

data:genomicsdb

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

dbsnp
label:

dbSNP file

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

advanced_options.n_jobs
label:

Number of concurent jobs

type:

basic:integer

description:

Use a fixed number of jobs for genotyping instead of determining it based on the number of available cores.

required:

False

disabled:

False

hidden:

False

advanced_options.max_heap_size
label:

Java maximum heap size in GB (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size.

required:

True

disabled:

False

hidden:

False

default:

28

Output results

vcf
label:

GVCF file

type:

basic:file

required:

True

disabled:

False

hidden:

False

vcf_dir
label:

Folder with split GVCFs

type:

basic:dir

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK HaplotypeCaller (GVCF)

data:variants:gvcf:gatk-haplotypecaller-gvcf (data:alignment:bam  bam, data:seq:nucleotide  ref_seq, data:bed  intervals, basic:decimal  contamination)[Source: v1.3.0]

Run GATK HaplotypeCaller in GVCF mode.

Input arguments

bam
label:

Analysis ready BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

options.intervals
label:

Use intervals BED file to limit the analysis to the specified parts of the genome.

type:

data:bed

required:

False

disabled:

False

hidden:

False

options.contamination
label:

Contamination fraction

type:

basic:decimal

description:

Fraction of contamination in sequencing data (for all samples) to aggressively remove.

required:

True

disabled:

False

hidden:

False

default:

0

Output results

vcf
label:

GVCF file

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK MergeVcfs

data:variants:vcf:mergevcfs:gatk-merge-vcfs (list:data:variants:vcf  vcfs, data:seq:nucleotide  ref_seq, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.2.0]

Combine multiple variant files into a single variant file using GATK MergeVcfs.

Input arguments

vcfs
label:

Input data (VCFs)

type:

list:data:variants:vcf

required:

True

disabled:

False

hidden:

False

advanced_options.ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

description:

Optionally use a sequence dictionary file (.dict) if the input VCF does not contain a complete contig list.

required:

False

disabled:

False

hidden:

False

advanced_options.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced_options.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

Merged VCF

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK SelectVariants (multi-sample)

data:variants:vcf:selectvariants:gatk-select-variants (data:variants:vcf  vcf, data:bed  intervals, list:basic:string  select_type, basic:boolean  exclude_filtered, data:seq:nucleotide  ref_seq, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.2.0]

Select a subset of variants based on various criteria using GATK SelectVariants. This tool works with multi-sample VCF file as an input.

Input arguments

vcf
label:

Input data (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

intervals
label:

Intervals file (.bed)

type:

data:bed

description:

One or more genomic intervals over which to operate. This can also be used to get data from a specific interval.

required:

False

disabled:

False

hidden:

False

select_type
label:

Select only a certain type of variants from the input file

type:

list:basic:string

description:

This argument selects particular kinds of variants out of a list. If left empty, there is no type selection and all variant types are considered for other selection criteria. Valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. Can be specified multiple times.

required:

False

disabled:

False

hidden:

False

exclude_filtered
label:

Don’t include filtered sites

type:

basic:boolean

description:

If this flag is enabled, sites that have been marked as filtered (i.e. have anything other than `.` or `PASS` in the FILTER field) will be excluded from the output.

required:

True

disabled:

False

hidden:

False

default:

False

advanced_options.ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

False

disabled:

False

hidden:

False

advanced_options.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced_options.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

Selected variants (VCF)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK SelectVariants (single-sample)

data:variants:vcf:selectvariants:single:gatk-select-variants-single (data:variants:vcf  vcf, data:bed  intervals, list:basic:string  select_type, basic:boolean  exclude_filtered, data:seq:nucleotide  ref_seq, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.1.0]

Select a subset of variants based on various criteria using GATK SelectVariants. This tool works with single-sample VCF file as an input.

Input arguments

vcf
label:

Input data (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

intervals
label:

Intervals file (.bed)

type:

data:bed

description:

One or more genomic intervals over which to operate. This can also be used to get data from a specific interval.

required:

False

disabled:

False

hidden:

False

select_type
label:

Select only a certain type of variants from the input file

type:

list:basic:string

description:

This argument selects particular kinds of variants out of a list. If left empty, there is no type selection and all variant types are considered for other selection criteria. Valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. Can be specified multiple times.

required:

False

disabled:

False

hidden:

False

exclude_filtered
label:

Don’t include filtered sites

type:

basic:boolean

description:

If this flag is enabled, sites that have been marked as filtered (i.e. have anything other than `.` or `PASS` in the FILTER field) will be excluded from the output.

required:

True

disabled:

False

hidden:

False

default:

False

advanced_options.ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

False

disabled:

False

hidden:

False

advanced_options.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced_options.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

Selected variants (VCF)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK SplitNCigarReads

data:alignment:bam:splitncigar:gatk-split-ncigar (data:alignment:bam  bam, data:seq:nucleotide  ref_seq, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.2.0]

Splits reads that contain Ns in their cigar string. Identifies all N cigar elements and creates k+1 new reads (where k is the number of N cigar elements). The first read includes the bases that are to the left of the first N element, while the part of the read that is to the right of the N (including the Ns) is hard clipped and so on for the rest of the new reads. Used for post-processing RNA reads aligned against the full reference.

Input arguments

bam
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence FASTA file

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

bam
label:

BAM file with reads split at N CIGAR elements

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK VariantFiltration (multi-sample)

data:variants:vcf:variantfiltration:gatk-variant-filtration (data:variants:vcf  vcf, data:seq:nucleotide  ref_seq, list:basic:string  filter_expressions, list:basic:string  filter_name, list:basic:string  genotype_filter_expressions, list:basic:string  genotype_filter_name, data:variants:vcf  mask, basic:string  mask_name, basic:integer  cluster, basic:integer  window, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.3.0]

Filter multi-sample variant calls based on INFO and/or FORMAT annotations. This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed. If you want to remove failing variants, use GATK SelectVariants process.

Input arguments

vcf
label:

Input data (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

filter_expressions
label:

Expressions used with INFO fields to filter

type:

list:basic:string

description:

VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.

required:

False

disabled:

False

hidden:

False

filter_name
label:

Names to use for the list of filters

type:

list:basic:string

description:

This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.

required:

False

disabled:

False

hidden:

False

genotype_filter_expressions
label:

Expressions used with FORMAT field to filter

type:

list:basic:string

description:

Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’. This filter expression will filter all the samples in the multi-sample VCF file.

required:

False

disabled:

False

hidden:

False

genotype_filter_name
label:

Names to use for the list of genotype filters

type:

list:basic:string

description:

Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.

required:

False

disabled:

False

hidden:

False

mask
label:

Input mask

type:

data:variants:vcf

description:

Any variant which overlaps entries from the provided mask file will be filtered.

required:

False

disabled:

False

hidden:

False

mask_name
label:

The text to put in the FILTER field if a ‘mask’ is provided

type:

basic:string

description:

When using the mask file, the mask name will be annotated in the variant record.

required:

False

disabled:

!mask

hidden:

False

advanced.cluster
label:

Cluster size

type:

basic:integer

description:

The number of SNPs which make up a cluster. Must be at least 2.

required:

True

disabled:

False

hidden:

False

default:

3

advanced.window
label:

Window size

type:

basic:integer

description:

The window size (in bases) in which to evaluate clustered SNPs.

required:

True

disabled:

False

hidden:

False

default:

0

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

Filtered variants (VCF)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK VariantFiltration (single-sample)

data:variants:vcf:variantfiltration:single:gatk-variant-filtration-single (data:variants:vcf  vcf, data:seq:nucleotide  ref_seq, list:basic:string  filter_expressions, list:basic:string  filter_name, list:basic:string  genotype_filter_expressions, list:basic:string  genotype_filter_name, data:variants:vcf  mask, basic:string  mask_name, basic:integer  cluster, basic:integer  window, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.3.0]

Filter single-sample variant calls based on INFO and/or FORMAT annotations. This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed. If you want to remove failing variants, use GATK SelectVariants process.

Input arguments

vcf
label:

Input data (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

filter_expressions
label:

Expressions used with INFO fields to filter

type:

list:basic:string

description:

VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.

required:

False

disabled:

False

hidden:

False

filter_name
label:

Names to use for the list of filters

type:

list:basic:string

description:

This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.

required:

False

disabled:

False

hidden:

False

genotype_filter_expressions
label:

Expressions used with FORMAT field to filter

type:

list:basic:string

description:

Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’.

required:

False

disabled:

False

hidden:

False

genotype_filter_name
label:

Names to use for the list of genotype filters

type:

list:basic:string

description:

Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.

required:

False

disabled:

False

hidden:

False

mask
label:

Input mask

type:

data:variants:vcf

description:

Any variant which overlaps entries from the provided mask file will be filtered.

required:

False

disabled:

False

hidden:

False

mask_name
label:

The text to put in the FILTER field if a ‘mask’ is provided

type:

basic:string

description:

When using the mask file, the mask name will be annotated in the variant record.

required:

False

disabled:

!mask

hidden:

False

advanced.cluster
label:

Cluster size

type:

basic:integer

description:

The number of SNPs which make up a cluster. Must be at least 2.

required:

True

disabled:

False

hidden:

False

default:

3

advanced.window
label:

Window size

type:

basic:integer

description:

The window size (in bases) in which to evaluate clustered SNPs.

required:

True

disabled:

False

hidden:

False

default:

0

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

Filtered variants (VCF)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK VariantsToTable

data:variantstable:variants-to-table (data:variants:vcf  vcf, list:basic:string  vcf_fields, list:basic:string  gf_fields, basic:boolean  split_alleles)[Source: v1.2.0]

Run GATK VariantsToTable. This tool extracts specified fields for each variant in a VCF file to a tab-delimited table, which may be easier to work with than a VCF. For additional information, please see [manual page](https://gatk.broadinstitute.org/hc/en-us/articles/360036711531-VariantsToTable)

Input arguments

vcf
label:

Input VCF file

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

vcf_fields
label:

Select VCF fields

type:

list:basic:string

description:

The name of a standard VCF field or an INFO field to include in the output table. The field can be any standard VCF column (e.g. CHROM, ID, QUAL) or any annotation name in the INFO field (e.g. AC, AF).

required:

True

disabled:

False

hidden:

False

default:

['CHROM', 'POS', 'ID', 'REF', 'ALT']

advanced_options.gf_fields
label:

Include FORMAT/sample-level fields

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

default:

['GT', 'GQ']

advanced_options.split_alleles
label:

Split multi-allelic records into multiple lines

type:

basic:boolean

description:

By default, a variant record with multiple ALT alleles will be summarized in one line, with per alt-allele fields (e.g. allele depth) separated by commas.This may cause difficulty when the table is loaded by an R script, for example. Use this flag to write multi-allelic records on separate lines of output.

required:

True

disabled:

False

hidden:

False

default:

True

Output results

tsv
label:

Tab-delimited file with variants

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK filter variants (VQSR)

data:variants:vcf:vqsr:gatk-vqsr (data:variants:vcf  vcf, data:variants:vcf  dbsnp, data:variants:vcf  mills, data:variants:vcf  axiom_poly, data:variants:vcf  hapmap, data:variants:vcf  omni, data:variants:vcf  thousand_genomes, basic:boolean  use_as_anno, list:basic:string  indel_anno_fields, list:basic:string  snp_anno_fields, basic:decimal  indel_filter_level, basic:decimal  snp_filter_level, basic:integer  max_gaussians_indels, basic:integer  max_gaussians_snps)[Source: v1.2.0]

Filter WGS variants using Variant Quality Score Recalibration (VQSR) procedure.

Input arguments

vcf
label:

Input data (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

resource_files.dbsnp
label:

dbSNP file

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

resource_files.mills
label:

Mills and 1000G gold standard indels

type:

data:variants:vcf

required:

False

disabled:

False

hidden:

False

resource_files.axiom_poly
label:

1000G Axiom genotype data

type:

data:variants:vcf

required:

False

disabled:

False

hidden:

False

resource_files.hapmap
label:

HapMap variants

type:

data:variants:vcf

required:

False

disabled:

False

hidden:

False

resource_files.omni
label:

1000G Omni variants

type:

data:variants:vcf

required:

False

disabled:

False

hidden:

False

resource_files.thousand_genomes
label:

1000G high confidence SNPs

type:

data:variants:vcf

required:

False

disabled:

False

hidden:

False

advanced_options.use_as_anno
label:

–use-allele-specific-annotations

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced_options.indel_anno_fields
label:

Annotation fields (INDEL filtering)

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

default:

['FS', 'ReadPosRankSum', 'MQRankSum', 'QD', 'SOR', 'DP']

advanced_options.snp_anno_fields
label:

Annotation fields (SNP filtering)

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

default:

['QD', 'MQRankSum', 'ReadPosRankSum', 'FS', 'MQ', 'SOR', 'DP']

advanced_options.indel_filter_level
label:

–truth-sensitivity-filter-level (INDELs)

type:

basic:decimal

required:

True

disabled:

False

hidden:

False

default:

99.0

advanced_options.snp_filter_level
label:

–truth-sensitivity-filter-level (SNPs)

type:

basic:decimal

required:

True

disabled:

False

hidden:

False

default:

99.7

advanced_options.max_gaussians_indels
label:

–max-gaussians (INDELs)

type:

basic:integer

description:

This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. This parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with a No data found error message. In this case, try decrementing the –max-gaussians value.

required:

True

disabled:

False

hidden:

False

default:

4

advanced_options.max_gaussians_snps
label:

–max-gaussians (SNPs)

type:

basic:integer

description:

This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. This parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with a No data found error message. In this case, try decrementing the –max-gaussians value.

required:

True

disabled:

False

hidden:

False

default:

6

Output results

vcf
label:

GVCF file

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK refine variants

data:variants:vcf:refinevariants:gatk-refine-variants (data:variants:vcf  vcf, data:seq:nucleotide  ref_seq, data:variants:vcf  vcf_pop)[Source: v1.1.1]

Run GATK Genotype Refinement. The goal of the Genotype Refinement workflow is to use additional data to improve the accuracy of genotype calls and to filter genotype calls that are not reliable enough for downstream analysis. In this sense it serves as an optional extension of the variant calling workflow, intended for researchers whose work requires high-quality identification of individual genotypes. For additional information, please see [manual page](https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants)

Input arguments

vcf
label:

The main input, as produced in the GATK VQSR process

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

vcf_pop
label:

Population-level variant set (VCF)

type:

data:variants:vcf

required:

False

disabled:

False

hidden:

False

Output results

vcf
label:

Refined multi-sample vcf

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GATK4 (HaplotypeCaller)

data:variants:vcf:gatk:hc:vc-gatk4-hc (data:alignment:bam  alignment, data:seq:nucleotide  genome, data:bed  intervals_bed, data:variants:vcf  dbsnp, basic:integer  stand_call_conf, basic:integer  mbq, basic:integer  max_reads, basic:integer  interval_padding, basic:boolean  soft_clipped, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.5.0]

GATK HaplotypeCaller Variant Calling. Call germline SNPs and indels via local re-assembly of haplotypes. The HaplotypeCaller is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. This allows the HaplotypeCaller to be more accurate when calling regions that are traditionally difficult to call, for example when they contain different types of variants close to each other. It also makes the HaplotypeCaller much better at calling indels than position-based callers like UnifiedGenotyper.

Input arguments

alignment
label:

Analysis ready BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

genome
label:

Reference genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

intervals_bed
label:

Intervals (from BED file)

type:

data:bed

description:

Use this option to perform the analysis over only part of the genome.

required:

False

disabled:

False

hidden:

False

dbsnp
label:

dbSNP file

type:

data:variants:vcf

description:

Database of known polymorphic sites.

required:

True

disabled:

False

hidden:

False

stand_call_conf
label:

Min call confidence threshold

type:

basic:integer

description:

The minimum phred-scaled confidence threshold at which variants should be called.

required:

True

disabled:

False

hidden:

False

default:

30

mbq
label:

Min Base Quality

type:

basic:integer

description:

Minimum base quality required to consider a base for calling.

required:

True

disabled:

False

hidden:

False

default:

20

max_reads
label:

Max reads per aligment start site

type:

basic:integer

description:

Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.

required:

True

disabled:

False

hidden:

False

default:

50

advanced.interval_padding
label:

Interval padding

type:

basic:integer

description:

Amount of padding (in bp) to add to each interval you are including. The recommended value is 100.

required:

False

disabled:

False

hidden:

!intervals_bed

advanced.soft_clipped
label:

Do not analyze soft clipped bases in the reads

type:

basic:boolean

description:

Suitable option for RNA-seq variant calling.

required:

True

disabled:

False

hidden:

False

default:

False

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

VCF file

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

GEO import

data:geo:geo-import (basic:string  gse_accession, basic:boolean  prefetch, basic:string  max_size_prefetch, basic:integer  min_spot_id, basic:integer  max_spot_id, basic:integer  min_read_len, basic:boolean  clip, basic:boolean  aligned, basic:boolean  unaligned, basic:file  mapping_file, basic:string  source, basic:string  build)[Source: v2.7.2]

Import all runs from a GEO Series. WARNING: Additional costs for storage and processing may be incurred if a very large data set is selected. RNA-seq ChIP-Seq, ATAC-Seq and expression microarray datasets can be uploaded. For RNA-Seq data sets this runs the SRA import process for each experiment (SRX) from the selected RNA-Seq GEO Series. The same procedure is followed for ChIP-Seq and ATAC-Seq data sets. If GSE contains microarray data, it downloads individual samples and uploads them as microarray expression objects. Probe IDs can be mapped to the Ensembl IDs if the corresponding GPL platform is supported, otherwise, a custom mapping file should be provided. Currently supported platforms are: GPL74, GPL201, GPL96, GPL571, GPL97, GPL570, GPL91, GPL8300, GPL92, GPL93, GPL94, GPL95, GPL17586, GPL5175, GPL80, GPL6244, GPL16686, GPL15207, GPL1352, GPL11068, GPL26966, GPL6848, GPL14550, GPL17077, GPL16981, GPL13497, GPL6947, GPL10558, GPL6883, GPL13376,GPL6884, GPL6254. In addition metadata table with sample information is created and uploaded to the same collection.

Input arguments

gse_accession
label:

GEO accession

type:

basic:string

description:

Enter a GEO series accession number.

required:

True

disabled:

False

hidden:

False

advanced.prefetch
label:

Prefetch SRA file

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

advanced.max_size_prefetch
label:

Maximum file size to download in KB

type:

basic:string

description:

A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).

required:

True

disabled:

False

hidden:

False

default:

20G

advanced.min_spot_id
label:

Minimum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.max_spot_id
label:

Maximum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.min_read_len
label:

Minimum read length

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.clip
label:

Clip adapter sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.aligned
label:

Dump only aligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.unaligned
label:

Dump only unaligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.mapping_file
label:

File with probe ID mappings

type:

basic:file

description:

The file should be tab-separated and contain two columns with their column names. The first column should contain Gene IDs and the second one should contain probe names. Supported file extensions are .tab.*, .tsv.*, .txt.*

required:

False

disabled:

False

hidden:

False

advanced.source
label:

Gene ID source

type:

basic:string

description:

Gene ID source used for probe mapping is required when using a custom file.

required:

False

disabled:

False

hidden:

False

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

advanced.build
label:

Genome build

type:

basic:string

description:

Genome build of mapping file is required when using a custom file.

required:

False

disabled:

False

hidden:

False

Output results

GFF3 file

data:annotation:gff3upload-gff3 (basic:file  src, basic:string  source, basic:string  species, basic:string  build)[Source: v3.5.0]

Import a General Feature Format (GFF) file which is a file format used for describing genes and other features of DNA, RNA and protein sequences. See [here](https://useast.ensembl.org/info/website/upload/gff3.html) and [here](https://en.wikipedia.org/wiki/General_feature_format) for more information.

Input arguments

src
label:

Annotation (GFF3)

type:

basic:file

description:

Annotation in GFF3 format. Supported extensions are: .gff, .gff3 and .gtf

validate_regex:

\.(gff|gff3|gtf)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

source
label:

Gene ID database

type:

basic:string

choices:

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

build
label:

Build

type:

basic:string

Output results

annot
label:

Uploaded GFF3 file

type:

basic:file

annot_sorted
label:

Sorted GFF3 file

type:

basic:file

annot_sorted_idx_igv
label:

IGV index for sorted GFF3

type:

basic:file

annot_sorted_track_jbrowse
label:

Jbrowse track for sorted GFF3

type:

basic:file

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

GTF file

data:annotation:gtfupload-gtf (basic:file  src, basic:string  source, basic:string  species, basic:string  build)[Source: v3.5.0]

Import a Gene Transfer Format (GTF) file. It is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information. See [here](https://en.wikipedia.org/wiki/General_feature_format) for differences between GFF and GTF files.

Input arguments

src
label:

Annotation (GTF)

type:

basic:file

description:

Annotation in GTF format.

validate_regex:

\.(gtf|gff)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

source
label:

Gene ID database

type:

basic:string

choices:

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

build
label:

Build

type:

basic:string

Output results

annot
label:

Uploaded GTF file

type:

basic:file

annot_sorted
label:

Sorted GTF file

type:

basic:file

annot_sorted_idx_igv
label:

IGV index for sorted GTF file

type:

basic:file

required:

False

annot_sorted_track_jbrowse
label:

Jbrowse track for sorted GTF

type:

basic:file

required:

False

source
label:

Gene ID database

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Gene set

data:geneset:upload-geneset (basic:file  src, basic:string  source, basic:string  species)[Source: v1.3.2]

Upload a set of genes. Provide one gene ID per line in a .tab, .tab.gz, or .txt file format.

Input arguments

src
label:

Gene set

type:

basic:file

description:

List of genes (.tab/.txt extension), one gene ID per line.

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

Output results

geneset
label:

Gene set

type:

basic:file

required:

True

disabled:

False

hidden:

False

geneset_json
label:

Gene set (JSON)

type:

basic:json

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Gene set (create from Venn diagram)

data:geneset:venn:create-geneset-venn (list:basic:string  genes, basic:string  source, basic:string  species, basic:file  venn)[Source: v1.3.2]

Create a gene set from a Venn diagram.

Input arguments

genes
label:

Genes

type:

list:basic:string

description:

List of genes.

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

venn
label:

Venn diagram

type:

basic:file

description:

JSON file of Venn diagram.

required:

True

disabled:

False

hidden:

False

Output results

geneset
label:

Gene set

type:

basic:file

required:

True

disabled:

False

hidden:

False

geneset_json
label:

Gene set (JSON)

type:

basic:json

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

venn
label:

Venn diagram

type:

basic:json

required:

True

disabled:

False

hidden:

False

Gene set (create)

data:geneset:create-geneset (list:basic:string  genes, basic:string  source, basic:string  species)[Source: v1.3.2]

Create a gene set from a list of genes.

Input arguments

genes
label:

Genes

type:

list:basic:string

description:

List of genes.

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

Output results

geneset
label:

Gene set

type:

basic:file

required:

True

disabled:

False

hidden:

False

geneset_json
label:

Gene set (JSON)

type:

basic:json

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

HISAT2

data:alignment:bam:hisat2alignment-hisat2 (data:index:hisat2  genome, data:reads:fastq  reads, basic:boolean  softclip, basic:integer  noncansplice, basic:boolean  cufflinks)[Source: v2.6.1]

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of genomes (as well as to a single reference genome). See [here](https://ccb.jhu.edu/software/hisat2/index.shtml) for more information.

Input arguments

genome
label:

Reference genome

type:

data:index:hisat2

reads
label:

Reads

type:

data:reads:fastq

softclip
label:

Disallow soft clipping

type:

basic:boolean

default:

False

spliced_alignments.noncansplice
label:

Non-canonical splice sites penalty (optional)

type:

basic:integer

description:

Sets the penalty for each pair of non-canonical splice sites (e.g. non-GT/AG).

required:

False

spliced_alignments.cufflinks
label:

Report alignments tailored specifically for Cufflinks

type:

basic:boolean

description:

With this option, HISAT2 looks for novel splice sites with three signals (GT/AG, GC/AG, AT/AC), but all user-provided splice sites are used irrespective of their signals. HISAT2 produces an optional field, XS:A:[+-], for every spliced alignment.

default:

False

Output results

bam
label:

Alignment file

type:

basic:file

description:

Position sorted alignment

bai
label:

Index BAI

type:

basic:file

stats
label:

Statistics

type:

basic:file

splice_junctions
label:

Splice junctions

type:

basic:file

unmapped_f
label:

Unmapped reads (mate 1)

type:

basic:file

required:

False

unmapped_r
label:

Unmapped reads (mate 2)

type:

basic:file

required:

False

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

HISAT2 genome index

data:index:hisat2:hisat2-index (data:seq:nucleotide  ref_seq)[Source: v1.2.1]

Create HISAT2 genome index.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

Output results

index
label:

HISAT2 index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

HMR

data:wgbs:hmrhmr (data:wgbs:methcounts  methcounts)[Source: v1.4.0]

Identify hypo-methylated regions.

Input arguments

methcounts
label:

Methylation levels

type:

data:wgbs:methcounts

description:

Methylation levels data calculated using methcounts.

Output results

hmr
label:

Hypo-methylated regions

type:

basic:file

tbi_jbrowse
label:

Bed file index for Jbrowse

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Hierarchical clustering of time courses

data:clustering:hierarchical:etc:clustering-hierarchical-etc (list:data:expression  expressions, list:basic:string  genes, basic:string  gene_species, basic:string  gene_source, basic:string  distance, basic:string  linkage, basic:boolean  ordering)[Source: v1.3.1]

Cluster gene expression time courses. Hierarchical clustering of expression time courses.

Input arguments

expressions
label:

Time series relation

type:

list:data:expression

description:

Select time course to which the expressions belong to.

required:

True

disabled:

False

hidden:

False

genes
label:

Gene subset

type:

list:basic:string

description:

Select at least two genes or leave this field empty.

required:

False

disabled:

False

hidden:

False

gene_species
label:

Species

type:

basic:string

description:

Species to which the selected genes belong to. This field is required if gene subset is set.

required:

False

disabled:

False

hidden:

!genes

choices:

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Homo sapiens: Homo sapiens

  • Macaca mulatta: Macaca mulatta

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

gene_source
label:

Gene ID database of selected genes

type:

basic:string

description:

This field is required if gene subset is set.

required:

False

disabled:

False

hidden:

!genes

distance
label:

Distance metric

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

spearman

choices:

  • Euclidean: euclidean

  • Spearman: spearman

  • Pearson: pearson

linkage
label:

Linkage method

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

average

choices:

  • single: single

  • average: average

  • complete: complete

ordering
label:

Use optimal ordering

type:

basic:boolean

description:

Results in a more intuitive tree structure, but may slow down the clustering on large datasets

required:

True

disabled:

False

hidden:

False

default:

False

Output results

cluster
label:

Hieararhical clustering

type:

basic:json

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID database

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

feature_type
label:

Feature type

type:

basic:string

required:

True

disabled:

False

hidden:

False

IDAT file

data:methylationarray:idat:upload-idat (basic:file  red_channel, basic:file  green_channel, basic:string  species, basic:string  platform)[Source: v1.1.1]

Upload Illumina methylation array raw IDAT data. This import process accepts Illumina methylation array BeadChip raw files in IDAT format. Two input files, one for each of the Green and Red signal channels, are expected. The uploads of human (HM27, HM450, EPIC) and mouse (MM285) array types are supported.

Input arguments

red_channel
label:

Red channel IDAT file (*_Red.idat)

type:

basic:file

required:

True

disabled:

False

hidden:

False

green_channel
label:

Green channel IDAT file (*_Grn.idat)

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu.

required:

True

disabled:

False

hidden:

False

default:

Homo sapiens

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

platform
label:

Protein ID database source

type:

basic:string

description:

Select a methylation array platform for human (HM450, HM27, EPIC) or mouse (MM285) samples.

required:

True

disabled:

False

hidden:

False

default:

HM450

choices:

  • HM450: HM450

  • HM27: HM27

  • EPIC: EPIC

  • MM285: MM285

Output results

red_channel
label:

Red channel IDAT file

type:

basic:file

required:

True

disabled:

False

hidden:

False

green_channel
label:

Green channel IDAT file

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

platform
label:

Platform

type:

basic:string

required:

True

disabled:

False

hidden:

False

MACS 1.4

data:chipseq:callpeak:macs14macs14 (data:alignment:bam  treatment, data:alignment:bam  control, basic:string  pvalue)[Source: v3.5.1]

Model-based Analysis of ChIP-Seq (MACS 1.4) empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. See the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2592715/) for more information.

Input arguments

treatment
label:

BAM File

type:

data:alignment:bam

control
label:

BAM Background File

type:

data:alignment:bam

required:

False

pvalue
label:

P-value

type:

basic:string

default:

1e-9

choices:

  • 1e-9: 1e-9

  • 1e-6: 1e-6

Output results

peaks_bed
label:

Peaks (BED)

type:

basic:file

summits_bed
label:

Summits (BED)

type:

basic:file

peaks_xls
label:

Peaks (XLS)

type:

basic:file

wiggle
label:

Wiggle

type:

basic:file

control_bigwig
label:

Control (bigWig)

type:

basic:file

required:

False

treat_bigwig
label:

Treat (bigWig)

type:

basic:file

peaks_bigbed_igv_ucsc
label:

Peaks (bigBed)

type:

basic:file

required:

False

summits_bigbed_igv_ucsc
label:

Summits (bigBed)

type:

basic:file

required:

False

peaks_tbi_jbrowse
label:

JBrowse track peaks file

type:

basic:file

summits_tbi_jbrowse
label:

JBrowse track summits file

type:

basic:file

model
label:

Model

type:

basic:file

required:

False

neg_peaks
label:

Negative peaks (XLS)

type:

basic:file

required:

False

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

MACS 2.0

data:chipseq:callpeak:macs2:macs2-callpeak (data:alignment:bam  case, data:alignment:bam  control, data:bed  promoter, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  format, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff)[Source: v4.8.1]

Call ChIP-Seq peaks with MACS 2.0. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

Input arguments

case
label:

Case (treatment)

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

control
label:

Control (background)

type:

data:alignment:bam

required:

False

disabled:

False

hidden:

False

promoter
label:

Promoter regions BED file

type:

data:bed

description:

BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required:

False

disabled:

False

hidden:

False

tagalign
label:

Use tagAlign files

type:

basic:boolean

description:

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

required:

True

disabled:

False

hidden:

False

default:

False

prepeakqc_settings.q_threshold
label:

Quality filtering threshold

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

30

prepeakqc_settings.n_sub
label:

Number of reads to subsample

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

15000000

prepeakqc_settings.tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on ‘+’ strand by 4bp and reads on ‘-’ strand by 5bp.

required:

True

disabled:

False

hidden:

False

default:

False

prepeakqc_settings.shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required:

False

disabled:

False

hidden:

False

settings.format
label:

Format of tag file

type:

basic:string

description:

This specifies the format of input files. For paired-end data the format dictates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.

required:

True

disabled:

False

hidden:

tagalign

default:

BAM

choices:

  • BAM: BAM

  • BAMPE: BAMPE

settings.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

disabled:

False

hidden:

tagalign

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

True

disabled:

False

hidden:

!tagalign

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label:

Q-value cutoff

type:

basic:decimal

description:

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required:

False

disabled:

settings.pvalue && settings.pvalue_prepeak

hidden:

False

settings.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

False

disabled:

settings.qvalue

hidden:

tagalign

settings.pvalue_prepeak
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

True

disabled:

settings.qvalue

hidden:

!tagalign || settings.qvalue

default:

1e-05

settings.cap_num
label:

Cap number of peaks by taking top N peaks

type:

basic:integer

description:

To keep all peaks set value to 0.

required:

True

disabled:

settings.broad

hidden:

False

default:

500000

settings.mfold_lower
label:

MFOLD range (lower limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

disabled:

False

hidden:

False

settings.mfold_upper
label:

MFOLD range (upper limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

disabled:

False

hidden:

False

settings.slocal
label:

Small local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

disabled:

False

hidden:

False

settings.llocal
label:

Large local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

disabled:

False

hidden:

False

settings.extsize
label:

Extension size [–extsize]

type:

basic:integer

description:

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required:

False

disabled:

False

hidden:

False

settings.shift
label:

Shift

type:

basic:integer

description:

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required:

False

disabled:

False

hidden:

settings.format == ‘BAMPE’

settings.band_width
label:

Band width

type:

basic:integer

description:

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required:

False

disabled:

False

hidden:

False

settings.nolambda
label:

Use background lambda as local lambda

type:

basic:boolean

description:

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

required:

True

disabled:

False

hidden:

False

default:

False

settings.fix_bimodal
label:

Turn on the auto paired-peak model process

type:

basic:boolean

description:

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

required:

True

disabled:

False

hidden:

False

default:

False

settings.nomodel
label:

Bypass building the shifting model [–nomodel]

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

required:

True

disabled:

False

hidden:

tagalign

default:

False

settings.nomodel_prepeak
label:

Bypass building the shifting model [–nomodel]

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

required:

True

disabled:

False

hidden:

!tagalign

default:

True

settings.down_sample
label:

Down-sample

type:

basic:boolean

description:

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and unreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

required:

True

disabled:

False

hidden:

False

default:

False

settings.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

required:

True

disabled:

False

hidden:

False

default:

True

settings.spmr
label:

Save fragment pileup and control lambda

type:

basic:boolean

required:

True

disabled:

settings.bedgraph === false

hidden:

False

default:

True

settings.call_summits
label:

Call summits [–call-summits]

type:

basic:boolean

description:

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

required:

True

disabled:

False

hidden:

False

default:

False

settings.broad
label:

Composite broad regions [–broad]

type:

basic:boolean

description:

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

required:

True

disabled:

settings.call_summits === true

hidden:

False

default:

False

settings.broad_cutoff
label:

Broad cutoff

type:

basic:decimal

description:

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required:

False

disabled:

settings.call_summits === true || settings.broad !== true

hidden:

False

Output results

called_peaks
label:

Called peaks

type:

basic:file

required:

True

disabled:

False

hidden:

False

narrow_peaks
label:

Narrow peaks

type:

basic:file

required:

False

disabled:

False

hidden:

False

chip_qc
label:

QC report

type:

basic:file

required:

False

disabled:

False

hidden:

False

case_prepeak_qc
label:

Pre-peak QC report (case)

type:

basic:file

required:

True

disabled:

False

hidden:

False

case_tagalign
label:

Filtered tagAlign (case)

type:

basic:file

required:

True

disabled:

False

hidden:

False

case_bam
label:

Filtered BAM (case)

type:

basic:file

required:

True

disabled:

False

hidden:

False

case_bai
label:

Filtered BAM index (case)

type:

basic:file

required:

True

disabled:

False

hidden:

False

control_prepeak_qc
label:

Pre-peak QC report (control)

type:

basic:file

required:

False

disabled:

False

hidden:

False

control_tagalign
label:

Filtered tagAlign (control)

type:

basic:file

required:

False

disabled:

False

hidden:

False

control_bam
label:

Filtered BAM (control)

type:

basic:file

required:

False

disabled:

False

hidden:

False

control_bai
label:

Filtered BAM index (control)

type:

basic:file

required:

False

disabled:

False

hidden:

False

narrow_peaks_bigbed_igv_ucsc
label:

Narrow peaks (BigBed)

type:

basic:file

required:

False

disabled:

False

hidden:

False

summits
label:

Peak summits

type:

basic:file

required:

False

disabled:

False

hidden:

False

summits_tbi_jbrowse
label:

Peak summits tbi index for JBrowse

type:

basic:file

required:

False

disabled:

False

hidden:

False

summits_bigbed_igv_ucsc
label:

Summits (bigBed)

type:

basic:file

required:

False

disabled:

False

hidden:

False

broad_peaks
label:

Broad peaks

type:

basic:file

required:

False

disabled:

False

hidden:

False

gappedPeak
label:

Broad peaks (bed12/gappedPeak)

type:

basic:file

required:

False

disabled:

False

hidden:

False

treat_pileup
label:

Treatment pileup (bedGraph)

type:

basic:file

required:

False

disabled:

False

hidden:

False

treat_pileup_bigwig
label:

Treatment pileup (bigWig)

type:

basic:file

required:

False

disabled:

False

hidden:

False

control_lambda
label:

Control lambda (bedGraph)

type:

basic:file

required:

False

disabled:

False

hidden:

False

control_lambda_bigwig
label:

Control lambda (bigwig)

type:

basic:file

required:

False

disabled:

False

hidden:

False

model
label:

Model

type:

basic:file

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

MACS2

data:workflow:chipseq:macs2rose2workflow-macs2 (data:alignment:bam  case, data:alignment:bam  control, data:bed  promoter, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  profile_window, basic:string  shift_size)[Source: v1.2.0]

Input arguments

case
label:

Case (treatment)

type:

data:alignment:bam

control
label:

Control (background)

type:

data:alignment:bam

required:

False

promoter
label:

Promoter regions BED file

type:

data:bed

description:

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required:

False

tagalign
label:

Use tagAlign files

type:

basic:boolean

description:

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

default:

False

prepeakqc_settings.q_threshold
label:

Quality filtering threshold

type:

basic:integer

default:

30

prepeakqc_settings.n_sub
label:

Number of reads to subsample

type:

basic:integer

default:

15000000

prepeakqc_settings.tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default:

False

prepeakqc_settings.shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required:

False

settings.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label:

Q-value cutoff

type:

basic:decimal

description:

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required:

False

disabled:

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

False

disabled:

settings.qvalue

hidden:

tagalign

settings.pvalue_prepeak
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled:

settings.qvalue

hidden:

!tagalign || settings.qvalue

default:

1e-05

settings.cap_num
label:

Cap number of peaks by taking top N peaks

type:

basic:integer

description:

To keep all peaks set value to 0.

disabled:

settings.broad

default:

500000

settings.mfold_lower
label:

MFOLD range (lower limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.mfold_upper
label:

MFOLD range (upper limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.slocal
label:

Small local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.llocal
label:

Large local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.extsize
label:

extsize

type:

basic:integer

description:

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required:

False

settings.shift
label:

Shift

type:

basic:integer

description:

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required:

False

settings.band_width
label:

Band width

type:

basic:integer

description:

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required:

False

settings.nolambda
label:

Use backgroud lambda as local lambda

type:

basic:boolean

description:

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default:

False

settings.fix_bimodal
label:

Turn on the auto paired-peak model process

type:

basic:boolean

description:

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model is failed.

default:

False

settings.nomodel
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

tagalign

default:

False

settings.nomodel_prepeak
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

!tagalign

default:

True

settings.down_sample
label:

Down-sample

type:

basic:boolean

description:

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default:

False

settings.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default:

True

settings.spmr
label:

Save signal per million reads for fragment pileup profiles

type:

basic:boolean

disabled:

settings.bedgraph === false

default:

True

settings.call_summits
label:

Call summits

type:

basic:boolean

description:

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default:

False

settings.broad
label:

Composite broad regions

type:

basic:boolean

description:

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled:

settings.call_summits === true

default:

False

settings.broad_cutoff
label:

Broad cutoff

type:

basic:decimal

description:

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required:

False

disabled:

settings.call_summits === true || settings.broad !== true

chipqc_settings.blacklist
label:

Blacklist regions

type:

data:bed

description:

BED file containing genomic regions that should be excluded from the analysis.

required:

False

chipqc_settings.calculate_enrichment
label:

Calculate enrichment

type:

basic:boolean

description:

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome build which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

default:

False

chipqc_settings.profile_window
label:

Window size

type:

basic:integer

description:

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

default:

400

chipqc_settings.shift_size
label:

Shift size

type:

basic:string

description:

Vector of values to try when computing optimal shift sizes. It should be specified as consecutive numbers vector with start:end

default:

1:300

Output results

MACS2 - ROSE2

data:workflow:chipseq:macs2rose2workflow-macs-rose (data:alignment:bam  case, data:alignment:bam  control, data:bed  promoter, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff, basic:boolean  use_filtered_bam, basic:integer  tss, basic:integer  stitch, data:bed  mask, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  profile_window, basic:string  shift_size)[Source: v1.4.0]

Input arguments

case
label:

Case (treatment)

type:

data:alignment:bam

control
label:

Control (background)

type:

data:alignment:bam

required:

False

promoter
label:

Promoter regions BED file

type:

data:bed

description:

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required:

False

tagalign
label:

Use tagAlign files

type:

basic:boolean

description:

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

default:

False

prepeakqc_settings.q_threshold
label:

Quality filtering threshold

type:

basic:integer

default:

30

prepeakqc_settings.n_sub
label:

Number of reads to subsample

type:

basic:integer

default:

15000000

prepeakqc_settings.tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default:

False

prepeakqc_settings.shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required:

False

settings.duplicates
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label:

Q-value cutoff

type:

basic:decimal

description:

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required:

False

disabled:

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required:

False

disabled:

settings.qvalue

hidden:

tagalign

settings.pvalue_prepeak
label:

P-value cutoff

type:

basic:decimal

description:

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled:

settings.qvalue

hidden:

!tagalign || settings.qvalue

default:

1e-05

settings.cap_num
label:

Cap number of peaks by taking top N peaks

type:

basic:integer

description:

To keep all peaks set value to 0.

disabled:

settings.broad

default:

500000

settings.mfold_lower
label:

MFOLD range (lower limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.mfold_upper
label:

MFOLD range (upper limit)

type:

basic:integer

description:

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required:

False

settings.slocal
label:

Small local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.llocal
label:

Large local region

type:

basic:integer

description:

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required:

False

settings.extsize
label:

extsize

type:

basic:integer

description:

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required:

False

settings.shift
label:

Shift

type:

basic:integer

description:

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required:

False

settings.band_width
label:

Band width

type:

basic:integer

description:

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required:

False

settings.nolambda
label:

Use backgroud lambda as local lambda

type:

basic:boolean

description:

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default:

False

settings.fix_bimodal
label:

Turn on the auto paired-peak model process

type:

basic:boolean

description:

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default:

False

settings.nomodel
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

tagalign

default:

False

settings.nomodel_prepeak
label:

Bypass building the shifting model

type:

basic:boolean

description:

While on, MACS will bypass building the shifting model.

hidden:

!tagalign

default:

True

settings.down_sample
label:

Down-sample

type:

basic:boolean

description:

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default:

False

settings.bedgraph
label:

Save fragment pileup and control lambda

type:

basic:boolean

description:

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default:

True

settings.spmr
label:

Save signal per million reads for fragment pileup profiles

type:

basic:boolean

disabled:

settings.bedgraph === false

default:

True

settings.call_summits
label:

Call summits

type:

basic:boolean

description:

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default:

False

settings.broad
label:

Composite broad regions

type:

basic:boolean

description:

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled:

settings.call_summits === true

default:

False

settings.broad_cutoff
label:

Broad cutoff

type:

basic:decimal

description:

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required:

False

disabled:

settings.call_summits === true || settings.broad !== true

rose_settings.use_filtered_bam
label:

Use Filtered BAM File

type:

basic:boolean

description:

Use filtered BAM file from a MACS2 object to rank enhancers by.

default:

False

rose_settings.tss
label:

TSS exclusion

type:

basic:integer

description:

Enter a distance from TSS to exclude. 0 = no TSS exclusion

default:

0

rose_settings.stitch
label:

Stitch

type:

basic:integer

description:

Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.

required:

False

rose_settings.mask
label:

Masking BED file

type:

data:bed

description:

Mask a set of regions from analysis. Provide a BED of masking regions.

required:

False

chipqc_settings.blacklist
label:

Blacklist regions

type:

data:bed

description:

BED file containing genomic regions that should be excluded from the analysis.

required:

False

chipqc_settings.calculate_enrichment
label:

Calculate enrichment

type:

basic:boolean

description:

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome build which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

default:

False

chipqc_settings.profile_window
label:

Window size

type:

basic:integer

description:

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

default:

400

chipqc_settings.shift_size
label:

Shift size

type:

basic:string

description:

Vector of values to try when computing optimal shift sizes. It should be specified as consecutive numbers vector with start:end

default:

1:300

Output results

ML-ready expression

data:ml:table:expressions:upload-ml-expression (basic:file  exp, basic:string  source, basic:string  species, data:ml:space  reference_space)[Source: v1.0.2]

Upload ML-ready expression matrix.

Input arguments

exp
label:

Transformed expressions

type:

basic:file

description:

A TAB separated file containing transformed expression values with sample IDs for index (first column with label sample_id) and ENSEMBL IDs (recommended but not required) for the column names.

required:

True

disabled:

False

hidden:

False

source
label:

Feature source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

reference_space
label:

Reference space of ML-ready data

type:

data:ml:space

required:

True

disabled:

False

hidden:

False

Output results

exp
label:

Transformed expressions

type:

basic:file

required:

True

disabled:

False

hidden:

False

source
label:

Feature source

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Map microarray probes

data:microarray:mapping:map-microarray-probes (list:data:microarray:normalized  expressions, basic:file  mapping_file, basic:string  source, basic:string  build)[Source: v1.1.1]

Map microarray probes to Gene IDs. Mapping can be done automatically or using a custom mapping file. For automatic probe mapping all ‘Normalized expression’ objects should have a GEO platform ID. If the platform is supported the provided probe IDs will be mapped to the corresponding Ensembl IDs. Currently supported platforms are: GPL74, GPL201, GPL96, GPL571, GPL97, GPL570, GPL91, GPL8300, GPL92, GPL93, GPL94, GPL95, GPL17586, GPL5175, GPL80, GPL6244, GPL16686, GPL15207, GPL1352, GPL11068, GPL26966, GPL6848, GPL14550, GPL17077, GPL16981, GPL13497, GPL6947, GPL10558, GPL6883, GPL13376,GPL6884, GPL6254.

Input arguments

expressions
label:

Normalized expressions

type:

list:data:microarray:normalized

required:

True

disabled:

False

hidden:

False

mapping_file
label:

File with probe ID mappings

type:

basic:file

description:

The file should be tab-separated and contain two columns with their column names. The first column should contain Gene IDs and the second one should contain probe names. Supported file extensions are .tab.*, .tsv.*, .txt.*

required:

False

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

description:

Gene ID source used for probe mapping is required when using a custom file.

required:

False

disabled:

False

hidden:

False

choices:

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

build
label:

Genome build

type:

basic:string

description:

Genome build of mapping file is required when using a custom file.

required:

False

disabled:

False

hidden:

False

Output results

mapped_exp
label:

Mapped expressions

type:

basic:file

required:

True

disabled:

False

hidden:

False

probe_mapping
label:

Probe to transcript mapping used

type:

basic:string

required:

True

disabled:

False

hidden:

False

mapping
label:

Mapping file

type:

basic:file

required:

True

disabled:

False

hidden:

False

platform
label:

Microarray platform type

type:

basic:string

required:

True

disabled:

False

hidden:

False

platform_id
label:

GEO platform ID

type:

basic:string

required:

False

disabled:

False

hidden:

False

Mappability

data:mappability:bcmmappability-bcm (data:index:bowtie  genome, data:annotation:gff3  gff, basic:integer  length)[Source: v3.1.2]

Compute genome mappability. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky’s Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Input arguments

genome
label:

Reference genome

type:

data:index:bowtie

gff
label:

General feature format

type:

data:annotation:gff3

length
label:

Read length

type:

basic:integer

default:

50

Output results

mappability
label:

Mappability

type:

basic:file

Mappability info

data:mappability:bcmupload-mappability (basic:file  src)[Source: v1.2.3]

Upload mappability information.

Input arguments

src
label:

Mappability file

type:

basic:file

description:

Mappability file: 2 column tab separated

validate_regex:

\.(tab)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

Output results

mappability
label:

Uploaded mappability

type:

basic:file

MarkDuplicates

data:alignment:bam:markduplicate:markduplicates (data:alignment:bam  bam, basic:boolean  skip, basic:boolean  remove_duplicates, basic:string  validation_stringency, basic:string  assume_sort_order, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.7.0]

Remove duplicate reads from BAM file. Tool from Picard, wrapped by GATK4. See GATK MarkDuplicates for more information.

Input arguments

bam
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

skip
label:

Skip MarkDuplicates step

type:

basic:boolean

description:

MarkDuplicates step can be skipped.

required:

True

disabled:

False

hidden:

False

default:

False

remove_duplicates
label:

Remove duplicates

type:

basic:boolean

description:

If true do not write duplicates to the output file instead of writing them with appropriate flags set.

required:

True

disabled:

False

hidden:

False

default:

False

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

assume_sort_order
label:

Assume sort order

type:

basic:string

description:

If not null (default), assume that the input file has this order even if the header says otherwise.Possible values are unsorted, queryname, coordinate and unknown.

required:

True

disabled:

False

hidden:

False

default:

choices:

  • as in BAM header (default):

  • unsorted: unsorted

  • queryname: queryname

  • coordinate: coordinate

  • duplicate: duplicate

  • unknown: unknown

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

bam
label:

Marked duplicates BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of marked duplicates BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

metrics_file
label:

Metrics from MarkDuplicate process

type:

basic:file

required:

True

disabled:

False

hidden:

False

Merge Expressions (ETC)

data:expressionset:etcmergeetc (list:data:etc  exps, list:basic:string  genes)[Source: v1.2.4]

Merge Expression Time Course (ETC) data.

Input arguments

exps
label:

Expression Time Course (ETC)

type:

list:data:etc

genes
label:

Filter genes

type:

list:basic:string

required:

False

Output results

expset
label:

Expression set

type:

basic:file

expset_type
label:

Expression set type

type:

basic:string

Merge FASTQ (paired-end)

data:mergereads:paired:merge-fastq-paired (list:data:reads:fastq:paired:  reads)[Source: v2.2.2]

Merge paired-end FASTQs into one sample. Samples are merged based on the defined replicate group relations and then uploaded as separate samples.

Input arguments

reads
label:

Select relations

type:

list:data:reads:fastq:paired:

description:

Define and select Replicate relations.

required:

True

disabled:

False

hidden:

False

Output results

Merge FASTQ (single-end)

data:mergereads:single:merge-fastq-single (list:data:reads:fastq:single:  reads)[Source: v2.2.2]

Merge single-end FASTQs into one sample. Samples are merged based on the defined replicate group relations and then uploaded as separate samples.

Input arguments

reads
label:

Select relations

type:

list:data:reads:fastq:single:

description:

Define and select replicate relations.

required:

True

disabled:

False

hidden:

False

Output results

Metadata table

data:metadata:upload-metadata (basic:file  src)[Source: v1.1.1]

Upload metadata file where more than one row can match to a single sample. The uploaded metadata table represents one-to-many (1:n) relation to samples in the working collection. Metadata table must contain a column with one of the following headers: “Sample ID”, “Sample name” or “Sample slug”.

Input arguments

src
label:

Table with metadata

type:

basic:file

description:

The metadata table should use one of the following extensions: .csv, .tab, .tsv, .xlsx, .xls

required:

True

disabled:

False

hidden:

False

Output results

table
label:

Uploaded table

type:

basic:file

required:

True

disabled:

False

hidden:

False

n_samples
label:

Number of samples

type:

basic:integer

required:

True

disabled:

False

hidden:

False

Metadata table (one-to-one)

data:metadata:unique:upload-metadata-unique (basic:file  src)[Source: v1.1.1]

Upload metadata file where each row corresponds to a single sample. The uploaded metadata table represents one-to-one (1:1) relation to samples in the working collection. Metadata table must contain a column with one of the following headers: “Sample ID”, “Sample name” or “Sample slug”.

Input arguments

src
label:

Table with metadata

type:

basic:file

description:

The metadata table should use one of the following extensions: .csv, .tab, .tsv, .xlsx, .xls

required:

True

disabled:

False

hidden:

False

Output results

table
label:

Uploaded table

type:

basic:file

required:

True

disabled:

False

hidden:

False

n_samples
label:

Number of samples

type:

basic:integer

required:

True

disabled:

False

hidden:

False

MultiQC

data:multiqc:multiqc (list:data:  data, basic:boolean  dirs, basic:integer  dirs_depth, basic:boolean  fullnames, basic:boolean  config, basic:string  cl_config)[Source: v1.22.0]

Aggregate results from bioinformatics analyses across many samples into a single report. [MultiQC](http://www.multiqc.info) searches a given directory for analysis logs and compiles a HTML report. It’s a general purpose tool, perfect for summarising the output from numerous bioinformatics tools.

Input arguments

data
label:

Input data

type:

list:data:

required:

True

disabled:

False

hidden:

False

advanced.dirs
label:

–dirs

type:

basic:boolean

description:

Prepend directory to sample names.

required:

True

disabled:

False

hidden:

False

default:

True

advanced.dirs_depth
label:

–dirs-depth

type:

basic:integer

description:

Prepend a specified number of directories to sample names. Enter a negative number (default) to take from start of path.

required:

True

disabled:

False

hidden:

False

default:

-1

advanced.fullnames
label:

–fullnames

type:

basic:boolean

description:

Disable the sample name cleaning (leave as full file name).

required:

True

disabled:

False

hidden:

False

default:

False

advanced.config
label:

Use configuration file

type:

basic:boolean

description:

Use Genialis configuration file for MultiQC report.

required:

True

disabled:

False

hidden:

False

default:

True

advanced.cl_config
label:

–cl-config

type:

basic:string

description:

Enter text with command-line configuration options to override the defaults (e.g. custom_logo_url: https://www.genialis.com).

required:

False

disabled:

False

hidden:

False

Output results

report
label:

MultiQC report

type:

basic:file:html

required:

True

disabled:

False

hidden:

False

report_data
label:

Report data

type:

basic:dir

required:

True

disabled:

False

hidden:

False

OBO file

data:ontology:oboupload-obo (basic:file  src)[Source: v1.4.0]

Upload gene ontology in OBO format.

Input arguments

src
label:

Gene ontology (OBO)

type:

basic:file

description:

Gene ontology in OBO format.

required:

True

validate_regex:

\.obo(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

Output results

obo
label:

Ontology file

type:

basic:file

obo_obj
label:

OBO object

type:

basic:file

PCA

data:pcapca (list:data:expression  exps, list:basic:string  genes, basic:string  source, basic:string  species)[Source: v2.4.2]

Principal component analysis (PCA)

Input arguments

exps
label:

Expressions

type:

list:data:expression

genes
label:

Gene subset

type:

list:basic:string

required:

False

source
label:

Gene ID database of selected genes

type:

basic:string

description:

This field is required if gene subset is set.

required:

False

species
label:

Species

type:

basic:string

description:

Species latin name. This field is required if gene subset is set.

required:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

Output results

pca
label:

PCA

type:

basic:json

Picard AlignmentSummary

data:picard:summary:alignment-summary (data:alignment:bam  bam, data:seq:nucleotide  genome, data:seq:nucleotide  adapters, basic:string  validation_stringency, basic:integer  insert_size, basic:string  pair_orientation, basic:boolean  bisulfite, basic:boolean  assume_sorted)[Source: v2.3.0]

Produce a summary of alignment metrics from BAM file. Tool from Picard, wrapped by GATK4. See GATK CollectAlignmentSummaryMetrics for more information.

Input arguments

bam
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

genome
label:

Genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

adapters
label:

Adapter sequences

type:

data:seq:nucleotide

required:

False

disabled:

False

hidden:

False

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

insert_size
label:

Maximum insert size

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

100000

pair_orientation
label:

Pair orientation

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

null

choices:

  • Unspecified: null

  • FR: FR

  • RF: RF

  • TANDEM: TANDEM

bisulfite
label:

BAM file consists of bisulfite sequenced reads

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

assume_sorted
label:

Sorted BAM file

type:

basic:boolean

description:

If true the sort order in the header file will be ignored.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

report
label:

Alignement metrics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Picard CollectRrbsMetrics

data:picard:rrbs:rrbs-metrics (data:alignment:bam  bam, data:seq:nucleotide  genome, basic:integer  min_quality, basic:integer  next_base_quality, basic:integer  min_lenght, basic:decimal  mismatch_rate, basic:string  validation_stringency, basic:boolean  assume_sorted)[Source: v2.3.0]

Produce metrics for RRBS data based on the methylation status. This tool uses reduced representation bisulfite sequencing (Rrbs) data to determine cytosine methylation status across all reads of a genomic DNA sequence. Tool is wrapped by GATK4. See GATK CollectRrbsMetrics for more information.

Input arguments

bam
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

genome
label:

Genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

min_quality
label:

Threshold for base quality of a C base before it is considered

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

20

next_base_quality
label:

Threshold for quality of a base next to a C before the C base is considered

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

10

min_lenght
label:

Minimum read length

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

5

mismatch_rate
label:

Maximum fraction of mismatches in a read to be considered (Range: 0 and 1)

type:

basic:decimal

required:

True

disabled:

False

hidden:

False

default:

0.1

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

assume_sorted
label:

Sorted BAM file

type:

basic:boolean

description:

If true the sort order in the header file will be ignored.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

report
label:

RRBS summary metrics

type:

basic:file

required:

True

disabled:

False

hidden:

False

detailed_report
label:

Detailed RRBS report

type:

basic:file

required:

True

disabled:

False

hidden:

False

plot
label:

QC plots

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Picard InsertSizeMetrics

data:picard:insert:insert-size (data:alignment:bam  bam, data:seq:nucleotide  genome, basic:decimal  minimum_fraction, basic:boolean  include_duplicates, basic:decimal  deviations, basic:string  validation_stringency, basic:boolean  assume_sorted)[Source: v2.3.0]

Collect metrics about the insert size of a paired-end library. Tool from Picard, wrapped by GATK4. See GATK CollectInsertSizeMetrics for more information.

Input arguments

bam
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

genome
label:

Genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

minimum_fraction
label:

Minimum fraction of reads in a category to be considered

type:

basic:decimal

description:

When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).

required:

True

disabled:

False

hidden:

False

default:

0.05

include_duplicates
label:

Include reads marked as duplicates in the insert size histogram

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

deviations
label:

Deviations limit

type:

basic:decimal

description:

Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.

required:

True

disabled:

False

hidden:

False

default:

10.0

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

assume_sorted
label:

Sorted BAM file

type:

basic:boolean

description:

If True, the sort order in the header file will be ignored.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

report
label:

Insert size metrics

type:

basic:file

required:

True

disabled:

False

hidden:

False

plot
label:

Insert size histogram

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Picard WGS Metrics

data:picard:wgsmetrics:wgs-metrics (data:alignment:bam  bam, data:seq:nucleotide  genome, basic:integer  read_length, basic:boolean  create_histogram, basic:integer  min_map_quality, basic:integer  min_quality, basic:integer  coverage_cap, basic:integer  accumulation_cap, basic:boolean  count_unpaired, basic:integer  sample_size, basic:string  validation_stringency)[Source: v2.4.0]

Collect metrics about coverage of whole genome sequencing. Tool from Picard, wrapped by GATK4. See GATK CollectWgsMetrics for more information.

Input arguments

bam
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

genome
label:

Genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

read_length
label:

Average read length

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

150

create_histogram
label:

Include data for base quality histogram in the metrics file

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

options.min_map_quality
label:

Minimum mapping quality for a read to contribute coverage

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

20

options.min_quality
label:

Minimum base quality for a base to contribute coverage

type:

basic:integer

description:

N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.

required:

True

disabled:

False

hidden:

False

default:

20

options.coverage_cap
label:

Maximum coverage cap

type:

basic:integer

description:

Treat positions with coverage exceeding this value as if they had coverage at this set value.

required:

True

disabled:

False

hidden:

False

default:

250

options.accumulation_cap
label:

Ignore positions with coverage above this value

type:

basic:integer

description:

At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value

required:

True

disabled:

False

hidden:

False

default:

100000

options.count_unpaired
label:

Count unpaired reads and paired reads with one end unmapped

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

options.sample_size
label:

Sample Size used for Theoretical Het Sensitivity sampling

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

10000

options.validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

Output results

report
label:

WGS metrics report

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Pre-peakcall QC

data:prepeakqcqc-prepeak (data:alignment:bam  alignment, basic:integer  q_treshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift)[Source: v0.5.2]

ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. Both fragment length estimation and the tagAlign file can be used as inputs in MACS 2.0. QC report contains ENCODE 3 proposed QC metrics – [NRF, PBC bottlenecking coefficients](https://www.encodeproject.org/data-standards/terms/), [NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

Input arguments

alignment
label:

Aligned reads

type:

data:alignment:bam

q_treshold
label:

Quality filtering treshold

type:

basic:integer

default:

30

n_sub
label:

Number of reads to subsample

type:

basic:integer

default:

15000000

tn5
label:

Tn5 shifting

type:

basic:boolean

description:

Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.

default:

False

shift
label:

User-defined cross-correlation peak strandshift

type:

basic:integer

description:

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required:

False

Output results

chip_qc
label:

QC report

type:

basic:file

tagalign
label:

Filtered tagAlign

type:

basic:file

fraglen
label:

Fragnment length

type:

basic:integer

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Prepare GEO - ChIP-Seq

data:other:geo:chipseqprepare-geo-chipseq (list:data:reads:fastq  reads, list:data:chipseq:callpeak  macs, basic:string  name)[Source: v2.1.3]

Prepare ChIP-seq data for GEO upload.

Input arguments

reads
label:

Reads

type:

list:data:reads:fastq

description:

List of reads objects. Fastq files will be used.

macs
label:

MACS

type:

list:data:chipseq:callpeak

description:

List of MACS2 or MACS14 objects. BedGraph (MACS2) or Wiggle (MACS14) files will be used.

name
label:

Collection name

type:

basic:string

Output results

tarball
label:

GEO folder

type:

basic:file

table
label:

Annotation table

type:

basic:file

Prepare GEO - RNA-Seq

data:other:geo:rnaseqprepare-geo-rnaseq (list:data:reads:fastq  reads, list:data:expression  expressions, basic:string  name)[Source: v0.2.3]

Prepare RNA-Seq data for GEO upload.

Input arguments

reads
label:

Reads

type:

list:data:reads:fastq

description:

List of reads objects. Fastq files will be used.

expressions
label:

Expressions

type:

list:data:expression

description:

Cuffnorm data object. Expression table will be used.

name
label:

Collection name

type:

basic:string

Output results

tarball
label:

GEO folder

type:

basic:file

table
label:

Annotation table

type:

basic:file

QoRTs QC

data:qorts:qc:qorts-qc (data:alignment:bam  alignment, data:annotation:gtf  annotation, basic:string  stranded, data:index:salmon  cdna_index, basic:integer  n_reads, basic:integer  maxPhredScore, basic:integer  adjustPhredScore)[Source: v1.8.0]

QoRTs QC analysis.

Input arguments

alignment
label:

Alignment

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

annotation
label:

GTF annotation

type:

data:annotation:gtf

required:

True

disabled:

False

hidden:

False

options.stranded
label:

Assay type

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

  • Strand non-specific: non_specific

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

  • Detect automatically: auto

options.cdna_index
label:

cDNA index file

type:

data:index:salmon

required:

False

disabled:

False

hidden:

options.stranded != ‘auto’

options.n_reads
label:

Number of reads in subsampled alignment file

type:

basic:integer

required:

True

disabled:

False

hidden:

options.stranded != ‘auto’

default:

5000000

options.maxPhredScore
label:

Max Phred Score

type:

basic:integer

required:

False

disabled:

False

hidden:

False

options.adjustPhredScore
label:

Adjust Phred Score

type:

basic:integer

required:

False

disabled:

False

hidden:

False

Output results

plot
label:

QC multiplot

type:

basic:file

required:

False

disabled:

False

hidden:

False

summary
label:

QC summary

type:

basic:file

required:

True

disabled:

False

hidden:

False

qorts_data
label:

QoRTs report data

type:

basic:file

required:

True

disabled:

False

hidden:

False

QuantSeq workflow

data:workflow:quant:featurecounts:workflow-quantseq (basic:string  trimming_tool, data:reads:fastq  reads, data:index:star  genome, list:data:seq:nucleotide  adapters, data:annotation  annotation, basic:string  assay_type, data:index:star  rrna_reference, data:index:star  globin_reference, basic:integer  quality_cutoff, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality)[Source: v5.1.0]

3’ mRNA-Seq pipeline. Reads are preprocessed by __BBDuk__ or __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to assess the rRNA/globin sequence depletion rate.

Input arguments

trimming_tool
label:

Trimming tool

type:

basic:string

description:

Select the trimming tool. If you select BBDuk then please provide adapter sequences in fasta file(s). If you select Cutadapt as a trimming tool, pre-determined adapter sequences will be removed.

required:

True

disabled:

False

hidden:

False

choices:

  • BBDuk: bbduk

  • Cutadapt: cutadapt

reads
label:

Input reads (FASTQ)

type:

data:reads:fastq

description:

Reads in FASTQ file, single or paired end.

required:

True

disabled:

False

hidden:

False

genome
label:

Indexed reference genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

adapters
label:

Adapters

type:

list:data:seq:nucleotide

description:

Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.

required:

False

disabled:

False

hidden:

trimming_tool != ‘bbduk’

annotation
label:

Annotation

type:

data:annotation

description:

GTF and GFF3 annotation formats are supported.

required:

True

disabled:

False

hidden:

False

assay_type
label:

Assay type

type:

basic:string

description:

In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

required:

False

disabled:

False

hidden:

False

choices:

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

rrna_reference
label:

Indexed rRNA reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

False

disabled:

False

hidden:

False

globin_reference
label:

Indexed Globin reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

False

disabled:

False

hidden:

False

cutadapt.quality_cutoff
label:

Reads quality cutoff

type:

basic:integer

description:

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.

required:

False

disabled:

False

hidden:

False

downsampling.n_reads
label:

Number of reads

type:

basic:integer

description:

Number of reads to include in subsampling.

required:

True

disabled:

False

hidden:

False

default:

1000000

downsampling.advanced.seed
label:

Number of reads

type:

basic:integer

description:

Using the same random seed makes reads subsampling reproducible in different environments.

required:

True

disabled:

False

hidden:

False

default:

11

downsampling.advanced.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the’Number of reads’ input parameter.

required:

False

disabled:

False

hidden:

False

downsampling.advanced.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

required:

True

disabled:

False

hidden:

False

default:

False

preprocessing.quality_encoding_offset
label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+: 33

  • Illumina up to 1.3+, 1.5+: 64

  • Auto: auto

preprocessing.ignore_bad_quality
label:

Ignore bad quality

type:

basic:boolean

description:

Don’t crash if quality values appear to be incorrect.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

Quantify shRNA species using bowtie2

data:expression:shrna2quantshrna-quant (data:alignment:bam  alignment, basic:integer  readlengths, basic:integer  alignscores)[Source: v1.4.0]

Based on `bowtie2` output (.bam file) calculate number of mapped species. Input is limited to results from `bowtie2` since `YT:Z:` tag used to fetch aligned species is specific to this process. Result is a count matrix (successfully mapped reads) where species are in rows columns contain read specifics (count, species name, sequence, `AS:i:` tag value).

Input arguments

alignment
label:

Alignment

type:

data:alignment:bam

required:

True

readlengths
label:

Species lengths threshold

type:

basic:integer

description:

Species with read lengths below specified threshold will be removed from final output. Default is no removal.

alignscores
label:

Align scores filter threshold

type:

basic:integer

description:

Species with align score below specified threshold will be removed from final output. Default is no removal.

Output results

exp
label:

Normalized expression

type:

basic:file

rc
label:

Read counts

type:

basic:file

required:

False

exp_json
label:

Expression (json)

type:

basic:json

exp_type
label:

Expression type

type:

basic:string

source
label:

Gene ID source

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

feature_type
label:

Feature type

type:

basic:string

mapped_species
label:

Mapped species

type:

basic:file

RNA-SeQC

data:rnaseqc:qc:rnaseqc-qc (data:alignment:bam  alignment, data:annotation:gtf  annotation, basic:integer  mapping_quality, basic:integer  base_mismatch, basic:integer  offset, basic:integer  window_size, basic:integer  gene_length, basic:integer  detection_threshold, basic:boolean  exclude_chimeric, basic:string  stranded, data:index:salmon  cdna_index, basic:integer  n_reads)[Source: v2.0.0]

RNA-SeQC QC analysis. An efficient new version of RNA-SeQC that computes a comprehensive set of metrics for characterizing samples processed by a wide range of protocols. It also quantifies gene- and exon-level expression, enabling effective quality control of large-scale RNA-seq datasets. More information can be found in the [GitHub repository](https://github.com/getzlab/rnaseqc) and in the [original paper](https://academic.oup.com/bioinformatics/article/37/18/3048/6156810?login=false).

Input arguments

alignment
label:

Input aligned reads (BAM file)

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation file (GTF)

type:

data:annotation:gtf

description:

The input GTF file containing features to check the bam against. The file should include gene_id in the attributes column for all entries. During the process the file is formatted so the transcript_id matches the gene_id. Exons are merged to remove overlaps and exon_id field is then matched with gene_id including the consecutive exon number.

required:

True

disabled:

False

hidden:

False

rnaseqc_options.mapping_quality
label:

Mapping quality [–mapping-quality]

type:

basic:integer

description:

Set the lower bound on read quality for exon coverage counting. Reads below this number are excluded from coverage metrics.

required:

True

disabled:

False

hidden:

False

default:

255

rnaseqc_options.base_mismatch
label:

Base mismatch [–base-mismatch]

type:

basic:integer

description:

Set the maximum number of allowed mismatches between a read and the reference sequence. Reads with more than this number of mismatches are excluded from coverage metrics.

required:

True

disabled:

False

hidden:

False

default:

6

rnaseqc_options.offset
label:

Offset [–offset]

type:

basic:integer

description:

Set the offset into the gene for the 3’ and 5’ windows in bias calculation. A positive value shifts the 3’ and 5’ windows towards each other, while a negative value shifts them apart.

required:

True

disabled:

False

hidden:

False

default:

150

rnaseqc_options.window_size
label:

Window size [–window-size]

type:

basic:integer

description:

Set the offset into the gene for the 3’ and 5’ windows in bias calculation.

required:

True

disabled:

False

hidden:

False

default:

100

rnaseqc_options.gene_length
label:

Window size [–gene-length]

type:

basic:integer

description:

Set the minimum size of a gene for bias calculation. Genes below this size are ignored in the calculation.

required:

True

disabled:

False

hidden:

False

default:

600

rnaseqc_options.detection_threshold
label:

Detection threshold [–detection-threshold]

type:

basic:integer

description:

Number of counts on a gene to consider the gene ‘detected’. Additionally, genes below this limit are excluded from 3’ bias computation.

required:

True

disabled:

False

hidden:

False

default:

5

rnaseqc_options.exclude_chimeric
label:

Exclude chimeric reads [–exclude-chimeric]

type:

basic:boolean

description:

Exclude chimeric reads from the read counts.

required:

True

disabled:

False

hidden:

False

default:

False

strand_detection_options.stranded
label:

Assay type [–stranded]

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

  • Strand non-specific: non_specific

  • Strand-specific reverse then forward: reverse

  • Strand-specific forward then reverse: forward

  • Detect automatically: auto

strand_detection_options.cdna_index
label:

cDNA index file

type:

data:index:salmon

required:

False

disabled:

False

hidden:

strand_detection_options.stranded != ‘auto’

strand_detection_options.n_reads
label:

Number of reads in subsampled alignment file. Subsampled reads will be used in strandedness detection

type:

basic:integer

required:

True

disabled:

False

hidden:

strand_detection_options.stranded != ‘auto’

default:

5000000

Output results

metrics
label:

metrics

type:

basic:file

required:

True

disabled:

False

hidden:

False

RNA-Seq (Cuffquant)

data:workflow:rnaseq:cuffquantworkflow-rnaseq-cuffquant (data:reads:fastq  reads, data:index:hisat2  genome, data:annotation  annotation)[Source: v2.1.0]

Input arguments

reads
label:

Input reads

type:

data:reads:fastq

genome
label:

genome

type:

data:index:hisat2

annotation
label:

Annotation file

type:

data:annotation

Output results

RNA-seq Variant Calling Workflow

data:workflow:rnaseq:variants:workflow-rnaseq-variantcalling (data:alignment:bam:star  bam, data:reads:fastq  reads, basic:boolean  preprocessing, data:seq:nucleotide  ref_seq, data:index:star  genome, data:variants:vcf  dbsnp, list:data:variants:vcf  indels, data:bed  intervals, data:variants:vcf  clinvar, data:geneset  geneset, list:basic:string  mutations, list:data:seq:nucleotide  adapters, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality, basic:boolean  two_pass_mode, basic:boolean  out_unmapped, basic:string  align_end_alignment, basic:string  read_group, basic:integer  stand_call_conf, basic:boolean  soft_clipped, basic:integer  interval_padding, list:basic:string  filter_expressions, list:basic:string  filter_name, list:basic:string  genotype_filter_expressions, list:basic:string  genotype_filter_name, data:variants:vcf  mask, basic:string  mask_name, basic:string  filtering_options, list:basic:string  vcf_fields, list:basic:string  ann_fields, basic:boolean  split_alleles, basic:boolean  show_filtered, list:basic:string  gf_fields, basic:boolean  multiqc, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v2.4.0]

Identify variants in RNA-seq data. This pipeline follows GATK best practices recommendantions for variant calling with RNA-seq data. The pipeline steps include read alignment (STAR), data cleanup (MarkDuplicates), splitting reads that contain Ns in their cigar string (SplitNCigarReads), base quality recalibration (BaseRecalibrator, ApplyBQSR), variant calling (HaplotypeCaller), variant filtering (VariantFiltration) and variant annotation (SnpEff). The last step of the pipeline is process Mutations table which prepares variants for ReSDK VariantTables. There is also possibility to run the pipeline directly from BAM file. In this case, it is recommended that you use two-pass mode in STAR alignment as well as turn the option ‘–outSAMunmapped Within’ on.

Input arguments

bam
label:

Input BAM file

type:

data:alignment:bam:star

description:

Input BAM file that was computed with STAR aligner. It is highly recommended that two-pass mode was used for the alignment as well as ‘–outSAMunmapped Within’ option if you want to use BAM file as an input.

required:

False

disabled:

reads

hidden:

False

reads
label:

Input sample (FASTQ)

type:

data:reads:fastq

description:

Input data in FASTQ format.

required:

False

disabled:

bam

hidden:

False

preprocessing
label:

Perform reads processing with BBDuk

type:

basic:boolean

description:

If your reads have not been processed, set this to True.

required:

True

disabled:

bam

hidden:

False

default:

True

ref_seq
label:

Reference FASTA sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

genome
label:

Indexed reference genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

required:

False

disabled:

bam

hidden:

False

dbsnp
label:

dbSNP file

type:

data:variants:vcf

description:

File with known variants.

required:

True

disabled:

False

hidden:

False

indels
label:

Known INDEL sites

type:

list:data:variants:vcf

required:

False

disabled:

False

hidden:

False

intervals
label:

Intervals (from BED file)

type:

data:bed

description:

Use this option to perform the analysis over only part of the genome.

required:

False

disabled:

False

hidden:

False

clinvar
label:

ClinVar VCF file

type:

data:variants:vcf

description:

[ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease.

required:

False

disabled:

False

hidden:

False

geneset
label:

Gene set

type:

data:geneset

description:

Select a gene set with genes you are interested in. Only variants of genes in the selected gene set will be in the output.

required:

False

disabled:

mutations

hidden:

False

mutations
label:

Gene and its mutations

type:

list:basic:string

description:

Insert the gene you are interested in, together with mutations. First enter the name of the gene and then the mutations. Seperate gene from mutations with ‘:’ and mutations with ‘,’. Example of an input: ‘KRAS: Gly12, Gly61’. Press enter after each input (gene + mutations). NOTE: Field only accepts three character amino acid symbols. If you use this option, the selected geneset will not be used for Mutations table process.

required:

False

disabled:

geneset

hidden:

False

bbduk.adapters
label:

Adapters

type:

list:data:seq:nucleotide

description:

Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.

required:

False

disabled:

False

hidden:

False

bbduk.custom_adapter_sequences
label:

Custom adapter sequences

type:

list:basic:string

description:

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

bbduk.kmer_length
label:

K-mer length [k=]

type:

basic:integer

description:

Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.

required:

True

disabled:

False

hidden:

False

default:

23

bbduk.min_k
label:

Minimum k-mer length at right end of reads used for trimming [mink=]

type:

basic:integer

required:

True

disabled:

bbduk.adapters.length === 0 && bbduk.custom_adapter_sequences.length === 0

hidden:

False

default:

11

bbduk.hamming_distance
label:

Maximum Hamming distance for k-mers [hammingdistance=]

type:

basic:integer

description:

Hamming distance i.e. the number of mismatches allowed in the kmer.

required:

True

disabled:

False

hidden:

False

default:

1

bbduk.maxns
label:

Max Ns after trimming [maxns=]

type:

basic:integer

description:

If non-negative, reads with more Ns than this (after trimming) will be discarded.

required:

True

disabled:

False

hidden:

False

default:

-1

bbduk.trim_quality
label:

Average quality below which to trim region [trimq=]

type:

basic:integer

description:

Phred algorithm is used, which is more accurate than naive trimming.

required:

True

disabled:

False

hidden:

False

default:

28

bbduk.min_length
label:

Minimum read length [minlength=]

type:

basic:integer

description:

Reads shorter than minimum read length after trimming are discarded.

required:

True

disabled:

False

hidden:

False

default:

30

bbduk.quality_encoding_offset
label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+: 33

  • Illumina up to 1.3+, 1.5+: 64

  • Auto: auto

bbduk.ignore_bad_quality
label:

Ignore bad quality

type:

basic:boolean

description:

Don’t crash if quality values appear to be incorrect.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.two_pass_mode
label:

Use two pass mode [–twopassMode]

type:

basic:boolean

description:

Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.

required:

True

disabled:

False

hidden:

False

default:

True

alignment.out_unmapped
label:

Output unmapped reads (SAM) [–outSAMunmapped Within]

type:

basic:boolean

description:

Output of unmapped reads in the SAM format.

required:

True

disabled:

False

hidden:

False

default:

True

alignment.align_end_alignment
label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

True

disabled:

False

hidden:

False

default:

Local

choices:

  • Local: Local

  • EndToEnd: EndToEnd

  • Extend5pOfRead1: Extend5pOfRead1

  • Extend5pOfReads12: Extend5pOfReads12

bam_processing.read_group
label:

Replace read groups in BAM

type:

basic:string

description:

Replace read groups in a BAM file. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation.

required:

True

disabled:

False

hidden:

False

default:

-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1

haplotype_caller.stand_call_conf
label:

Min call confidence threshold

type:

basic:integer

description:

The minimum phred-scaled confidence threshold at which variants should be called.

required:

True

disabled:

False

hidden:

False

default:

20

haplotype_caller.soft_clipped
label:

Do not analyze soft clipped bases in the reads

type:

basic:boolean

description:

Suitable option for RNA-seq variant calling.

required:

True

disabled:

False

hidden:

False

default:

True

haplotype_caller.interval_padding
label:

Interval padding

type:

basic:integer

description:

Amount of padding (in bp) to add to each interval you are including. The recommended value is 100. Set to 0 if you want to turn it off.

required:

True

disabled:

False

hidden:

!intervals

default:

100

variant_filtration.filter_expressions
label:

Expressions used with INFO fields to filter

type:

list:basic:string

description:

VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.

required:

True

disabled:

False

hidden:

False

default:

['FS > 30.0', 'QD < 2.0']

variant_filtration.filter_name
label:

Names to use for the list of filters

type:

list:basic:string

description:

This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.

required:

True

disabled:

False

hidden:

False

default:

['FS', 'QD']

variant_filtration.genotype_filter_expressions
label:

Expressions used with FORMAT field to filter

type:

list:basic:string

description:

Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’. This filter expression will filter all the samples in the multi-sample VCF file.

required:

True

disabled:

False

hidden:

False

default:

['AD.1 < 5.0']

variant_filtration.genotype_filter_name
label:

Names to use for the list of genotype filters

type:

list:basic:string

description:

Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.

required:

True

disabled:

False

hidden:

False

default:

['AD']

variant_filtration.mask
label:

Input mask

type:

data:variants:vcf

description:

Any variant which overlaps entries from the provided mask file will be filtered.

required:

False

disabled:

False

hidden:

False

variant_filtration.mask_name
label:

The text to put in the FILTER field if a ‘mask’ is provided

type:

basic:string

description:

When using the mask file, the mask name will be annotated in the variant record.

required:

False

disabled:

!variant_filtration.mask

hidden:

False

snpeff.filtering_options
label:

SnpEff filtering expressions

type:

basic:string

description:

Filter annotated VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)

required:

False

disabled:

False

hidden:

False

mutations_table.vcf_fields
label:

Select VCF fields

type:

list:basic:string

description:

The name of a standard VCF field or an INFO field to include in the output table. The field can be any standard VCF column (e.g. CHROM, ID, QUAL) or any annotation name in the INFO field (e.g. AC, AF). Required fields are CHROM, POS, ID, REF and ANN. If your variants file was annotated with clinvar information then fields CLNDN, CLNSIG and CLNSIGCONF might be of your interest.

required:

True

disabled:

False

hidden:

False

default:

['CHROM', 'POS', 'ID', 'QUAL', 'REF', 'ALT', 'FILTER', 'ANN', 'CLNDN', 'CLNSIG']

mutations_table.ann_fields
label:

ANN fields to use

type:

list:basic:string

description:

Only use specific fields from the SnpEff ANN field. All available fields: Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO’ .Fields are seperated by ‘|’. For more information, follow this [link](https://pcingola.github.io/SnpEff/se_inputoutput/#ann-field-vcf-output-files).

required:

True

disabled:

False

hidden:

False

default:

['Allele', 'Annotation', 'Annotation_Impact', 'Gene_Name', 'Feature_ID', 'HGVS.p']

mutations_table.split_alleles
label:

Split multi-allelic records into multiple lines

type:

basic:boolean

description:

By default, a variant record with multiple ALT alleles will be summarized in one line, with per alt-allele fields (e.g. allele depth) separated by commas.This may cause difficulty when the table is loaded by an R script, for example. Use this flag to write multi-allelic records on separate lines of output.

required:

True

disabled:

False

hidden:

False

default:

True

mutations_table.show_filtered
label:

Include filtered records in the output

type:

basic:boolean

description:

Include filtered records in the output of the GATK VariantsToTable.

required:

True

disabled:

False

hidden:

False

default:

True

mutations_table.gf_fields
label:

Include FORMAT/sample-level fields. Note: If you specify DP from genotype field, it will overwrite the original DP field. By default fields GT (genotype), AD (allele depth), DP (depth at the sample level), FT (sample-level filter) are included in the analysis.

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

default:

['GT', 'AD', 'DP', 'FT']

advanced.multiqc
label:

Trigger MultiQC

type:

basic:boolean

description:

If the input for the pipeline is BAM file that has been computed by the RNA-seq gene expression pipeline, than MultiQC object already exists for this sample, so there is no need for an additional MultiQC process. If the input for this pipeline is FASTQ, than MultiQC cannot be disabled.

required:

True

disabled:

False

hidden:

!bam

default:

False

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

RNA-seq variant calling preprocess

data:alignment:bam:rnaseqvc:rnaseq-vc-preprocess (data:alignment:bam  bam, data:seq:nucleotide  ref_seq, list:data:variants:vcf  known_sites, basic:string  read_group, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v1.3.0]

Prepare BAM file from STAR aligner for HaplotypeCaller. This process includes steps MarkDuplicates, SplitNCigarReads, read-group assignment and base quality recalibration (BQSR).

Input arguments

bam
label:

Alignment BAM file from STAR alignment

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence FASTA file

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

known_sites
label:

List of known sites of variation

type:

list:data:variants:vcf

description:

One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.

required:

True

disabled:

False

hidden:

False

read_group
label:

Replace read groups in BAM

type:

basic:string

description:

Replace read groups in a BAM file. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using GATK AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.

required:

True

disabled:

False

hidden:

False

default:

-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

bam
label:

Preprocessed BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

metrics_file
label:

Metrics from MarkDuplicate process

type:

basic:file

required:

True

disabled:

False

hidden:

False

ROSE2

data:chipseq:rose2:rose2 (data:chipseq:callpeak  input_macs, data:bed  input_upload, basic:boolean  use_filtered_bam, data:alignment:bam  rankby, data:alignment:bam  control, basic:integer  tss, basic:integer  stitch, data:bed  mask)[Source: v5.2.1]

Run ROSE2. Rank Ordering of Super-Enhancers algorithm (ROSE2) takes the acetylation peaks called by a peak caller (MACS, MACS2…) and based on the in-between distances and the acetylation signal at the peaks judges whether they can be considered super-enhancers. The ranked values are plotted and by locating the inflection point in the resulting graph, super-enhancers are assigned. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

Input arguments

input_macs
label:

BED/narrowPeak file (MACS results)

type:

data:chipseq:callpeak

required:

False

disabled:

False

hidden:

input_upload

input_upload
label:

BED file (Upload)

type:

data:bed

required:

False

disabled:

False

hidden:

input_macs || use_filtered_bam

use_filtered_bam
label:

Use Filtered BAM File

type:

basic:boolean

description:

Use filtered BAM file from a MACS2 object to rank enhancers by. Only applicable if input is MACS2.

required:

True

disabled:

False

hidden:

input_upload

default:

False

rankby
label:

BAM file

type:

data:alignment:bam

description:

BAM file to rank enhancers by.

required:

False

disabled:

False

hidden:

use_filtered_bam

control
label:

Control BAM File

type:

data:alignment:bam

description:

BAM file to rank enhancers by.

required:

False

disabled:

False

hidden:

use_filtered_bam

tss
label:

TSS exclusion

type:

basic:integer

description:

Enter a distance from TSS to exclude. 0 = no TSS exclusion.

required:

True

disabled:

False

hidden:

False

default:

0

stitch
label:

Stitch

type:

basic:integer

description:

Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.

required:

False

disabled:

False

hidden:

False

mask
label:

Masking BED file

type:

data:bed

description:

Mask a set of regions from analysis. Provide a BED of masking regions.

required:

False

disabled:

False

hidden:

False

Output results

all_enhancers
label:

All enhancers table

type:

basic:file

required:

True

disabled:

False

hidden:

False

enhancers_with_super
label:

Super enhancers table

type:

basic:file

required:

True

disabled:

False

hidden:

False

plot_points
label:

Plot points

type:

basic:file

required:

True

disabled:

False

hidden:

False

plot_panel
label:

Plot panel

type:

basic:file

required:

True

disabled:

False

hidden:

False

enhancer_gene
label:

Enhancer to gene

type:

basic:file

required:

True

disabled:

False

hidden:

False

enhancer_top_gene
label:

Enhancer to top gene

type:

basic:file

required:

True

disabled:

False

hidden:

False

gene_enhancer
label:

Gene to Enhancer

type:

basic:file

required:

True

disabled:

False

hidden:

False

stitch_parameter
label:

Stitch parameter

type:

basic:file

required:

False

disabled:

False

hidden:

False

all_output
label:

All output

type:

basic:file

required:

True

disabled:

False

hidden:

False

scatter_plot
label:

Super-Enhancer plot

type:

basic:json

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Reads (QSEQ multiplexed, paired)

data:multiplexed:qseq:pairedupload-multiplexed-paired (basic:file  reads, basic:file  reads2, basic:file  barcodes, basic:file  annotation)[Source: v1.4.1]

Upload multiplexed NGS reds in QSEQ format.

Input arguments

reads
label:

Multiplexed upstream reads

type:

basic:file

description:

NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.

required:

True

validate_regex:

((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

reads2
label:

Multiplexed downstream reads

type:

basic:file

description:

NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.

required:

True

validate_regex:

((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

barcodes
label:

NGS barcodes

type:

basic:file

description:

Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.

required:

True

validate_regex:

((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

annotation
label:

Barcode mapping

type:

basic:file

description:

A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.

required:

True

validate_regex:

(\.tsv)$

Output results

qseq_reads
label:

Multiplexed upstream reads

type:

basic:file

qseq_reads2
label:

Multiplexed downstream reads

type:

basic:file

qseq_barcodes
label:

NGS barcodes

type:

basic:file

annotation
label:

Barcode mapping

type:

basic:file

matched
label:

Matched

type:

basic:string

notmatched
label:

Not matched

type:

basic:string

badquality
label:

Bad quality

type:

basic:string

skipped
label:

Skipped

type:

basic:string

Reads (QSEQ multiplexed, single)

data:multiplexed:qseq:singleupload-multiplexed-single (basic:file  reads, basic:file  barcodes, basic:file  annotation)[Source: v1.4.1]

Upload multiplexed NGS reds in QSEQ format.

Input arguments

reads
label:

Multiplexed NGS reads

type:

basic:file

description:

NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.

required:

True

validate_regex:

(\.(qseq)(|\.txt)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

barcodes
label:

NGS barcodes

type:

basic:file

description:

Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.

required:

True

validate_regex:

(\.(qseq)(|\.txt)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

annotation
label:

Barcode mapping

type:

basic:file

description:

A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.

required:

True

validate_regex:

(\.tsv)$

Output results

qseq_reads
label:

Multiplexed NGS reads

type:

basic:file

qseq_barcodes
label:

NGS barcodes

type:

basic:file

annotation
label:

Barcode mapping

type:

basic:file

matched
label:

Matched

type:

basic:string

notmatched
label:

Not matched

type:

basic:string

badquality
label:

Bad quality

type:

basic:string

skipped
label:

Skipped

type:

basic:string

Reads (scRNA 10x)

data:screads:10x:upload-sc-10x (list:basic:file  barcodes, list:basic:file  reads)[Source: v1.4.1]

Import 10x scRNA reads in FASTQ format.

Input arguments

barcodes
label:

Barcodes (.fastq.gz)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

reads
label:

Reads (.fastq.gz)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Output results

barcodes
label:

Barcodes

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

reads
label:

Reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url_barcodes
label:

Quality control with FastQC (Barcodes)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url_reads
label:

Quality control with FastQC (Reads)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

Reverse complement FASTQ (paired-end)

data:reads:fastq:paired:seqtk:seqtk-rev-complement-paired (data:reads:fastq:paired  reads, basic:string  select_mate)[Source: v1.2.2]

Reverse complement paired-end FASTQ reads file using Seqtk.

Input arguments

reads
label:

Reads

type:

data:reads:fastq:paired

required:

True

disabled:

False

hidden:

False

select_mate
label:

Select mate

type:

basic:string

description:

Select the which mate should be reverse complemented.

required:

True

disabled:

False

hidden:

False

default:

Mate 1

choices:

  • Mate 1: Mate 1

  • Mate 2: Mate 2

  • Both: Both

Output results

fastq
label:

Reverse complemented FASTQ file

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Remaining mate

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC (Mate 1)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive (Mate 1)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Quality control with FastQC (Mate 2)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download FastQC archive (Mate 2)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Reverse complement FASTQ (single-end)

data:reads:fastq:single:seqtk:seqtk-rev-complement-single (data:reads:fastq:single  reads)[Source: v1.3.2]

Reverse complement single-end FASTQ reads file using Seqtk.

Input arguments

reads
label:

Reads

type:

data:reads:fastq:single

required:

True

disabled:

False

hidden:

False

Output results

fastq
label:

Reverse complemented FASTQ file

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

SAM header

data:sam:headerupload-header-sam (basic:file  src)[Source: v1.2.3]

Upload a mapping file header in SAM format.

Input arguments

src
label:

Header (SAM)

type:

basic:file

description:

A mapping file header in SAM format.

validate_regex:

\.(sam)$

Output results

sam
label:

Uploaded file

type:

basic:file

SRA data

data:sra:import-sra (list:basic:string  sra_accession, basic:boolean  prefetch, basic:string  max_size_prefetch, basic:integer  min_spot_id, basic:integer  max_spot_id, basic:integer  min_read_len, basic:boolean  clip, basic:boolean  aligned, basic:boolean  unaligned)[Source: v1.5.1]

Import reads from SRA. Import single or paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

Input arguments

sra_accession
label:

SRA accession(s)

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

advanced.prefetch
label:

Prefetch SRA file

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

advanced.max_size_prefetch
label:

Maximum file size to download in KB

type:

basic:string

description:

A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).

required:

True

disabled:

False

hidden:

False

default:

20G

advanced.min_spot_id
label:

Minimum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.max_spot_id
label:

Maximum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.min_read_len
label:

Minimum read length

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.clip
label:

Clip adapter sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.aligned
label:

Dump only aligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.unaligned
label:

Dump only unaligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

Output results

SRA data (paired-end)

data:reads:fastq:paired:import-sra-paired (list:basic:string  sra_accession, basic:boolean  prefetch, basic:string  max_size_prefetch, basic:integer  min_spot_id, basic:integer  max_spot_id, basic:integer  min_read_len, basic:boolean  clip, basic:boolean  aligned, basic:boolean  unaligned)[Source: v1.6.1]

Import paired-end reads from SRA. Import paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

Input arguments

sra_accession
label:

SRA accession(s)

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

advanced.prefetch
label:

Prefetch SRA file

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

advanced.max_size_prefetch
label:

Maximum file size to download in KB

type:

basic:string

description:

A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).

required:

True

disabled:

False

hidden:

False

default:

20G

advanced.min_spot_id
label:

Minimum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.max_spot_id
label:

Maximum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.min_read_len
label:

Minimum read length

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.clip
label:

Clip adapter sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.aligned
label:

Dump only aligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.unaligned
label:

Dump only unaligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Reads file (mate 1)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Reads file (mate 2)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC (mate 1)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Quality control with FastQC (mate 2)

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive (mate 1)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download FastQC archive (mate 2)

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

SRA data (single-end)

data:reads:fastq:single:import-sra-single (list:basic:string  sra_accession, basic:boolean  prefetch, basic:string  max_size_prefetch, basic:integer  min_spot_id, basic:integer  max_spot_id, basic:integer  min_read_len, basic:boolean  clip, basic:boolean  aligned, basic:boolean  unaligned)[Source: v1.6.1]

Import single-end reads from SRA. Import single-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

Input arguments

sra_accession
label:

SRA accession(s)

type:

list:basic:string

required:

True

disabled:

False

hidden:

False

advanced.prefetch
label:

Prefetch SRA file

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

True

advanced.max_size_prefetch
label:

Maximum file size to download in KB

type:

basic:string

description:

A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).

required:

True

disabled:

False

hidden:

False

default:

20G

advanced.min_spot_id
label:

Minimum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.max_spot_id
label:

Maximum spot ID

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.min_read_len
label:

Minimum read length

type:

basic:integer

required:

False

disabled:

False

hidden:

False

advanced.clip
label:

Clip adapter sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.aligned
label:

Dump only aligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

advanced.unaligned
label:

Dump only unaligned sequences

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Reads file

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

STAR

data:alignment:bam:star:alignment-star (data:reads:fastq  reads, data:index:star  genome, data:annotation  annotation, basic:boolean  unstranded, basic:boolean  noncannonical, basic:boolean  gene_counts, basic:string  feature_exon, basic:integer  sjdb_overhang, basic:boolean  chimeric, basic:integer  chim_segment_min, basic:boolean  quant_mode, basic:boolean  single_end, basic:string  out_filter_type, basic:integer  out_multimap_max, basic:integer  out_mismatch_max, basic:decimal  out_mismatch_nl_max, basic:integer  out_score_min, basic:decimal  out_mismatch_nrl_max, basic:integer  align_overhang_min, basic:integer  align_sjdb_overhang_min, basic:integer  align_intron_size_min, basic:integer  align_intron_size_max, basic:integer  align_gap_max, basic:string  align_end_alignment, basic:boolean  two_pass_mode, basic:boolean  out_unmapped, basic:string  out_sam_attributes, basic:string  out_rg_line, list:basic:integer  limit_buffer_size, basic:integer  limit_sam_records, basic:integer  limit_junction_reads, basic:integer  limit_collapsed_junctions, basic:integer  limit_inserted_junctions)[Source: v5.1.0]

Align reads with STAR aligner. Spliced Transcripts Alignment to a Reference (STAR) software is based on an alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. More information can be found in the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) and in the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/). The current version of STAR is 2.7.10b.

Input arguments

reads
label:

Input reads (FASTQ)

type:

data:reads:fastq

required:

True

disabled:

False

hidden:

False

genome
label:

Indexed reference genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation file (GTF/GFF3)

type:

data:annotation

description:

Insert known annotations into genome indices at the mapping stage.

required:

False

disabled:

False

hidden:

False

unstranded
label:

The data is unstranded [–outSAMstrandField intronMotif]

type:

basic:boolean

description:

For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.

required:

True

disabled:

False

hidden:

False

default:

False

noncannonical
label:

Remove non-canonical junctions (Cufflinks compatibility)

type:

basic:boolean

description:

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

required:

True

disabled:

False

hidden:

False

default:

False

gene_counts
label:

Gene count [–quantMode GeneCounts]

type:

basic:boolean

description:

With this option set to True STAR will count the number of reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters.

required:

True

disabled:

False

hidden:

False

default:

False

annotation_options.feature_exon
label:

Feature type [–sjdbGTFfeatureExon]

type:

basic:string

description:

Feature type in GTF file to be used as exons for building transcripts.

required:

True

disabled:

False

hidden:

False

default:

exon

annotation_options.sjdb_overhang
label:

Junction length [–sjdbOverhang]

type:

basic:integer

description:

This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In the case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.

required:

True

disabled:

False

hidden:

False

default:

100

detect_chimeric.chimeric
label:

Detect chimeric and circular alignments [–chimOutType SeparateSAMold]

type:

basic:boolean

description:

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments.Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

required:

True

disabled:

False

hidden:

False

default:

False

detect_chimeric.chim_segment_min
label:

Minimum length of chimeric segment [–chimSegmentMin]

type:

basic:integer

required:

True

disabled:

!detect_chimeric.chimeric

hidden:

False

default:

20

t_coordinates.quant_mode
label:

Output in transcript coordinates [–quantMode TranscriptomeSAM]

type:

basic:boolean

description:

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

required:

True

disabled:

False

hidden:

False

default:

False

t_coordinates.single_end
label:

Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]

type:

basic:boolean

description:

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).

required:

True

disabled:

!t_coordinates.quant_mode

hidden:

False

default:

False

filtering.out_filter_type
label:

Type of filtering [–outFilterType]

type:

basic:string

description:

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.

required:

True

disabled:

False

hidden:

False

default:

Normal

choices:

  • Normal: Normal

  • BySJout: BySJout

filtering.out_multimap_max
label:

Maximum number of loci [–outFilterMultimapNmax]

type:

basic:integer

description:

Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).

required:

False

disabled:

False

hidden:

False

filtering.out_mismatch_max
label:

Maximum number of mismatches [–outFilterMismatchNmax]

type:

basic:integer

description:

Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.

required:

False

disabled:

False

hidden:

False

filtering.out_mismatch_nl_max
label:

Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]

type:

basic:decimal

description:

Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.

required:

False

disabled:

False

hidden:

False

filtering.out_score_min
label:

Minumum alignment score [–outFilterScoreMin]

type:

basic:integer

description:

Alignment will be output only if its score is higher than or equal to this value (default: 0).

required:

False

disabled:

False

hidden:

False

filtering.out_mismatch_nrl_max
label:

Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]

type:

basic:decimal

description:

Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.

required:

False

disabled:

False

hidden:

False

alignment.align_overhang_min
label:

Minimum overhang [–alignSJoverhangMin]

type:

basic:integer

description:

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required:

False

disabled:

False

hidden:

False

alignment.align_sjdb_overhang_min
label:

Minimum overhang (sjdb) [–alignSJDBoverhangMin]

type:

basic:integer

description:

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required:

False

disabled:

False

hidden:

False

alignment.align_intron_size_min
label:

Minimum intron size [–alignIntronMin]

type:

basic:integer

description:

Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required:

False

disabled:

False

hidden:

False

alignment.align_intron_size_max
label:

Maximum intron size [–alignIntronMax]

type:

basic:integer

description:

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).

required:

False

disabled:

False

hidden:

False

alignment.align_gap_max
label:

Minimum gap between mates [–alignMatesGapMax]

type:

basic:integer

description:

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required:

False

disabled:

False

hidden:

False

alignment.align_end_alignment
label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

False

disabled:

False

hidden:

False

choices:

  • Local: Local

  • EndToEnd: EndToEnd

  • Extend5pOfRead1: Extend5pOfRead1

  • Extend5pOfReads12: Extend5pOfReads12

two_pass_mapping.two_pass_mode
label:

Use two pass mode [–twopassMode]

type:

basic:boolean

description:

Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.

required:

True

disabled:

False

hidden:

False

default:

False

output_options.out_unmapped
label:

Output unmapped reads (SAM) [–outSAMunmapped Within]

type:

basic:boolean

description:

Output of unmapped reads in the SAM format.

required:

True

disabled:

False

hidden:

False

default:

False

output_options.out_sam_attributes
label:

Desired SAM attributes [–outSAMattributes]

type:

basic:string

description:

A string of desired SAM attributes, in the order desired for the output SAM.

required:

True

disabled:

False

hidden:

False

default:

Standard

choices:

  • Standard: Standard

  • All: All

  • NH HI NM MD: NH HI NM MD

  • None: None

output_options.out_rg_line
label:

SAM/BAM read group line [–outSAMattrRGline]

type:

basic:string

description:

The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in –readFilesIn. Commas have to be surrounded by spaces, e.g. –outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.

required:

False

disabled:

False

hidden:

False

limits.limit_buffer_size
label:

Buffer size [–limitIObufferSize]

type:

list:basic:integer

description:

Maximum available buffers size (bytes) for input/output, per thread. Parameter requires two numbers - separate sizes for input and output buffers.

required:

True

disabled:

False

hidden:

False

default:

[30000000, 50000000]

limits.limit_sam_records
label:

Maximum size of the SAM record [–limitOutSAMoneReadBytes]

type:

basic:integer

description:

Maximum size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax.

required:

True

disabled:

False

hidden:

False

default:

100000

limits.limit_junction_reads
label:

Maximum number of junctions [–limitOutSJoneRead]

type:

basic:integer

description:

Maximum number of junctions for one read (including all multi-mappers).

required:

True

disabled:

False

hidden:

False

default:

1000

limits.limit_collapsed_junctions
label:

Maximum number of collapsed junctions [–limitOutSJcollapsed]

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

1000000

limits.limit_inserted_junctions
label:

Maximum number of junction to be inserted [–limitSjdbInsertNsj]

type:

basic:integer

description:

Maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run.

required:

True

disabled:

False

hidden:

False

default:

1000000

Output results

bam
label:

Alignment file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

BAM file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

unmapped_1
label:

Unmapped reads (mate 1)

type:

basic:file

required:

False

disabled:

False

hidden:

False

unmapped_2
label:

Unmapped reads (mate 2)

type:

basic:file

required:

False

disabled:

False

hidden:

False

sj
label:

Splice junctions

type:

basic:file

required:

True

disabled:

False

hidden:

False

chimeric
label:

Chimeric alignments

type:

basic:file

required:

False

disabled:

False

hidden:

False

alignment_transcriptome
label:

Alignment (transcriptome coordinates)

type:

basic:file

required:

False

disabled:

False

hidden:

False

gene_counts
label:

Gene counts

type:

basic:file

required:

False

disabled:

False

hidden:

False

stats
label:

Statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

STAR genome index

data:index:star:alignment-star-index (data:seq:nucleotide  ref_seq, data:annotation  annotation, basic:string  source, basic:string  feature_exon, basic:integer  sjdb_overhang, basic:integer  genome_sa_string_len, basic:integer  genome_chr_bin_size, basic:integer  genome_sa_sparsity)[Source: v4.0.0]

Generate STAR genome index. Generate genome indices files from the supplied reference genome sequence and GTF files. The current version of STAR is 2.7.10b.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation file (GTF/GFF3)

type:

data:annotation

description:

Insert known annotations into genome indices at the indexing stage.

required:

False

disabled:

False

hidden:

False

source
label:

Gene ID Database Source

type:

basic:string

required:

False

disabled:

annotation

hidden:

False

choices:

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

annotation_options.feature_exon
label:

Feature type [–sjdbGTFfeatureExon]

type:

basic:string

description:

Feature type in GTF file to be used as exons for building transcripts.

required:

True

disabled:

False

hidden:

False

default:

exon

annotation_options.sjdb_overhang
label:

Junction length [–sjdbOverhang]

type:

basic:integer

description:

This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.

required:

True

disabled:

False

hidden:

False

default:

100

advanced.genome_sa_string_len
label:

Small genome adjustment [–genomeSAindexNbases]

type:

basic:integer

description:

For small genomes, the parameter –genomeSAindexNbases needs to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.

required:

False

disabled:

False

hidden:

False

advanced.genome_chr_bin_size
label:

Bin size for genome storage [–genomeChrBinNbits]

type:

basic:integer

description:

If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the –genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: –genomeChrBinNbits = min(18, log2(GenomeLength / NumberOfReferences)). For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.

required:

False

disabled:

False

hidden:

False

advanced.genome_sa_sparsity
label:

Suffix array sparsity [–genomeSAsparseD]

type:

basic:integer

description:

Suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction (integer > 0, default = 1).

required:

False

disabled:

False

hidden:

False

Output results

index
label:

Indexed genome

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

STAR-based gene quantification workflow

data:workflow:rnaseq:star:qc:workflow-bbduk-star-qc (data:reads:fastq  reads, data:index:star  genome, data:annotation  annotation, basic:string  assay_type, data:index:salmon  cdna_index, data:index:star  rrna_reference, data:index:star  globin_reference, list:data:seq:nucleotide  adapters, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, basic:string  quality_encoding_offset, basic:boolean  ignore_bad_quality, basic:boolean  unstranded, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chim_segment_min, basic:boolean  quant_mode, basic:boolean  single_end, basic:string  out_filter_type, basic:integer  out_multimap_max, basic:integer  out_mismatch_max, basic:decimal  out_mismatch_nl_max, basic:integer  out_score_min, basic:decimal  out_mismatch_nrl_max, basic:integer  align_overhang_min, basic:integer  align_sjdb_overhang_min, basic:integer  align_intron_size_min, basic:integer  align_intron_size_max, basic:integer  align_gap_max, basic:string  align_end_alignment, basic:boolean  two_pass_mode, basic:boolean  out_unmapped, basic:string  out_sam_attributes, basic:string  out_rg_line, basic:integer  n_reads, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v1.4.0]

STAR-based RNA-seq pipeline. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. STAR aligner counts and reports the number of aligned reads per gene while mapping. STAR version used is 2.7.10b. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are downsampled (using __Seqtk__ tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences. Final step of the workflow is QoRTs QC analysis with downsampled reads.

Input arguments

reads
label:

Reads (FASTQ)

type:

data:reads:fastq

description:

Reads in FASTQ file, single or paired end.

required:

True

disabled:

False

hidden:

False

genome
label:

Indexed reference genome

type:

data:index:star

description:

Genome index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

annotation
label:

Annotation

type:

data:annotation

description:

GTF and GFF3 annotation formats are supported.

required:

True

disabled:

False

hidden:

False

assay_type
label:

Assay type

type:

basic:string

description:

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

  • Strand non-specific: non_specific

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

  • Detect automatically: auto

cdna_index
label:

Indexed cDNA reference sequence

type:

data:index:salmon

description:

Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.

required:

False

disabled:

False

hidden:

assay_type != ‘auto’

rrna_reference
label:

Indexed rRNA reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

globin_reference
label:

Indexed Globin reference sequence

type:

data:index:star

description:

Reference sequence index prepared by STAR aligner indexing tool.

required:

True

disabled:

False

hidden:

False

preprocessing.adapters
label:

Adapters

type:

list:data:seq:nucleotide

description:

FASTA file(s) with adapters.

required:

False

disabled:

False

hidden:

False

preprocessing.custom_adapter_sequences
label:

Custom adapter sequences

type:

list:basic:string

description:

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required:

False

disabled:

False

hidden:

False

default:

[]

preprocessing.kmer_length
label:

K-mer length [k=]

type:

basic:integer

description:

K-mer length used for finding contaminants. Contaminants shorter than k-mer length will not be found. K-mer length must be at least 1.

required:

True

disabled:

False

hidden:

False

default:

23

preprocessing.min_k
label:

Minimum k-mer length at right end of reads used for trimming [mink=]

type:

basic:integer

required:

True

disabled:

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

hidden:

False

default:

11

preprocessing.hamming_distance
label:

Maximum Hamming distance for k-mers [hammingdistance=]

type:

basic:integer

description:

Hamming distance i.e. the number of mismatches allowed in the k-mer.

required:

True

disabled:

False

hidden:

False

default:

1

preprocessing.maxns
label:

Max Ns after trimming [maxns=]

type:

basic:integer

description:

If non-negative, reads with more Ns than this (after trimming) will be discarded.

required:

True

disabled:

False

hidden:

False

default:

-1

preprocessing.trim_quality
label:

Average quality below which to trim region [trimq=]

type:

basic:integer

description:

Phred algorithm is used, which is more accurate than naive trimming.

required:

True

disabled:

False

hidden:

False

default:

10

preprocessing.min_length
label:

Minimum read length [minlength=]

type:

basic:integer

description:

Reads shorter than minimum read length after trimming are discarded.

required:

True

disabled:

False

hidden:

False

default:

20

preprocessing.quality_encoding_offset
label:

Quality encoding offset [qin=]

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

  • Sanger / Illumina 1.8+: 33

  • Illumina up to 1.3+, 1.5+: 64

  • Auto: auto

preprocessing.ignore_bad_quality
label:

Ignore bad quality [ignorebadquality]

type:

basic:boolean

description:

Don’t crash if quality values appear to be incorrect.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.unstranded
label:

The data is unstranded [–outSAMstrandField intronMotif]

type:

basic:boolean

description:

For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.noncannonical
label:

Remove non-canonical junctions (Cufflinks compatibility)

type:

basic:boolean

description:

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.chimeric_reads.chimeric
label:

Detect chimeric and circular alignments [–chimOutType SeparateSAMold]

type:

basic:boolean

description:

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.chimeric_reads.chim_segment_min
label:

Minimum length of chimeric segment [–chimSegmentMin]

type:

basic:integer

required:

True

disabled:

!alignment.chimeric_reads.chimeric

hidden:

False

default:

20

alignment.transcript_output.quant_mode
label:

Output in transcript coordinates [–quantMode]

type:

basic:boolean

description:

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

required:

True

disabled:

False

hidden:

False

default:

False

alignment.transcript_output.single_end
label:

Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]

type:

basic:boolean

description:

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).

required:

True

disabled:

!t_coordinates.quant_mode

hidden:

False

default:

False

alignment.filtering_options.out_filter_type
label:

Type of filtering [–outFilterType]

type:

basic:string

description:

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.

required:

True

disabled:

False

hidden:

False

default:

Normal

choices:

  • Normal: Normal

  • BySJout: BySJout

alignment.filtering_options.out_multimap_max
label:

Maximum number of loci [–outFilterMultimapNmax]

type:

basic:integer

description:

Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_mismatch_max
label:

Maximum number of mismatches [–outFilterMismatchNmax]

type:

basic:integer

description:

Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_mismatch_nl_max
label:

Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]

type:

basic:decimal

description:

Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_score_min
label:

Minimum alignment score [–outFilterScoreMin]

type:

basic:integer

description:

Alignment will be output only if its score is higher than or equal to this value (default: 0).

required:

False

disabled:

False

hidden:

False

alignment.filtering_options.out_mismatch_nrl_max
label:

Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]

type:

basic:decimal

description:

Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_overhang_min
label:

Minimum overhang [–alignSJoverhangMin]

type:

basic:integer

description:

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_sjdb_overhang_min
label:

Minimum overhang (sjdb) [–alignSJDBoverhangMin]

type:

basic:integer

description:

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_intron_size_min
label:

Minimum intron size [–alignIntronMin]

type:

basic:integer

description:

Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_intron_size_max
label:

Maximum intron size [–alignIntronMax]

type:

basic:integer

description:

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_gap_max
label:

Minimum gap between mates [–alignMatesGapMax]

type:

basic:integer

description:

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required:

False

disabled:

False

hidden:

False

alignment.alignment_options.align_end_alignment
label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

True

disabled:

False

hidden:

False

default:

Local

choices:

  • Local: Local

  • EndToEnd: EndToEnd

  • Extend5pOfRead1: Extend5pOfRead1

  • Extend5pOfReads12: Extend5pOfReads12

alignment.two_pass_mapping.two_pass_mode
label:

Use two pass mode [–twopassMode]

type:

basic:boolean

description:

Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.

required:

True

disabled:

False

hidden:

False

default:

True

alignment.output_options.out_unmapped
label:

Output unmapped reads (SAM) [–outSAMunmapped Within]

type:

basic:boolean

description:

Output of unmapped reads in the SAM format.

required:

True

disabled:

False

hidden:

False

default:

True

alignment.output_options.out_sam_attributes
label:

Desired SAM attributes [–outSAMattributes]

type:

basic:string

description:

A string of desired SAM attributes, in the order desired for the output SAM.

required:

True

disabled:

False

hidden:

False

default:

Standard

choices:

  • Standard: Standard

  • All: All

  • NH HI NM MD: NH HI NM MD

  • None: None

alignment.output_options.out_rg_line
label:

SAM/BAM read group line [–outSAMattrRGline]

type:

basic:string

description:

The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines corresponds to different (comma separated) input files in -readFilesIn. Commas have to be surrounded by spaces, e.g. -outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.

required:

False

disabled:

False

hidden:

False

quantification.n_reads
label:

Number of reads in subsampled alignment file for strandedness detection

type:

basic:integer

description:

Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.

required:

True

disabled:

False

hidden:

assay_type != ‘auto’

default:

5000000

downsampling.n_reads
label:

Number of reads

type:

basic:integer

description:

Number of reads to include in downsampling.

required:

True

disabled:

False

hidden:

False

default:

1000000

downsampling.advanced.seed
label:

Seed [-s]

type:

basic:integer

description:

Using the same random seed makes reads downsampling more reproducible in different environments.

required:

True

disabled:

False

hidden:

False

default:

11

downsampling.advanced.fraction
label:

Fraction of reads used

type:

basic:decimal

description:

Use the fraction of reads [0.0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.

required:

False

disabled:

False

hidden:

False

downsampling.advanced.two_pass
label:

2-pass mode [-2]

type:

basic:boolean

description:

Enable two-pass mode when downsampling. Two-pass mode is twice as slow but with much reduced memory.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

Salmon Index

data:index:salmonsalmon-index (data:seq:nucleotide  nucl, data:file  decoys, basic:boolean  gencode, basic:boolean  keep_duplicates, basic:string  source, basic:string  species, basic:string  build, basic:integer  kmerlen)[Source: v2.2.1]

Generate index files for Salmon transcript quantification tool.

Input arguments

nucl
label:

Nucleotide sequence

type:

data:seq:nucleotide

description:

A CDS sequence file in .FASTA format.

decoys
label:

Decoys

type:

data:file

description:

Treat these sequences as decoys that may have sequence homologous to some known transcript.

required:

False

gencode
label:

Gencode

type:

basic:boolean

description:

This flag will expect the input transcript FASTA to be in GENCODE format, and will split the transcript name at the first ‘|’ character. These reduced names will be used in the output and when looking for these transcripts in a gene to transcript GTF.

default:

False

keep_duplicates
label:

Keep duplicates

type:

basic:boolean

description:

This flag will disable the default indexing behavior of discarding sequence-identical duplicate transcripts. If this flag is passed, then duplicate transcripts that appear in the input will be retained and quantified separately.

default:

False

source
label:

Source of attribute ID

type:

basic:string

choices:

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

build
label:

Genome build

type:

basic:string

kmerlen
label:

Size of k-mers

type:

basic:integer

description:

The size of k-mers that should be used for the quasi index. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads.

default:

31

Output results

index
label:

Salmon index

type:

basic:dir

source
label:

Source of attribute ID

type:

basic:string

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Samtools bedcov

data:bedcov:samtools-bedcov (data:alignment:bam  bam, data:bed  bedfile, basic:integer  min_read_qual, basic:boolean  rm_del_ref_skips, basic:string  output_option)[Source: v1.2.0]

Samtools bedcov. Reports the total read base count (i.e. the sum of per base read depths) for each genomic region specified in the supplied BED file. The regions are output as they appear in the BED file and are 0-based. The output is formatted as tab-delimited data, where the initial three columns indicate the chromosome, start, and end positions of the region. The subsequent column provides either the cumulative read base counts or the normalized sum of read base counts based on the length of each individual region (mean coverage). For more information about samtools bedcov, click [here](https://www.htslib.org/doc/samtools-bedcov.html).

Input arguments

bam
label:

Input BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

bedfile
label:

Target BED file

type:

data:bed

description:

Target BED file with regions to extract.

required:

True

disabled:

False

hidden:

False

advanced.min_read_qual
label:

Minimum read mapping quality

type:

basic:integer

description:

Only count reads with mapping quality greater than or equal to [-Q]

required:

False

disabled:

False

hidden:

False

advanced.rm_del_ref_skips
label:

Skip deletions and ref skips

type:

basic:boolean

description:

Do not include deletions (D) and ref skips (N) in bedcov computation. [-j]

required:

True

disabled:

False

hidden:

False

default:

False

advanced.output_option
label:

Metric by which to output coverage

type:

basic:string

description:

Opt for either displaying the cumulative read base counts or the normalized read base counts based on the length of each region. The latter approach is not part of samtools but implemented within the resolwe-bio process.

required:

False

disabled:

False

hidden:

False

default:

sum

choices:

  • Sum (default): sum

  • Mean: mean

Output results

coverage_report
label:

Output coverage report

type:

basic:file

required:

True

disabled:

False

hidden:

False

Samtools coverage (multi-sample)

data:samtoolscoverage:multi:samtools-coverage-multi (list:data:alignment:bam  bam, basic:string  region, basic:integer  min_read_length, basic:integer  min_mq, basic:integer  min_bq, list:basic:string  excl_flags, basic:integer  depth, basic:boolean  no_header)[Source: v1.0.0]

Samtools coverage for multiple BAM files. Computes the depth at each position or region and creates tabulated text. For more information about samtools coverage, click [here](https://www.htslib.org/doc/samtools-coverage.html).

Input arguments

bam
label:

Input BAM files

type:

list:data:alignment:bam

description:

Select BAM file(s) for the analysis. Coverage information will be calculated from the merged alignments.

required:

True

disabled:

False

hidden:

False

region
label:

Region

type:

basic:string

description:

Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.

required:

False

disabled:

False

hidden:

False

advanced.min_read_length
label:

Minimum read length

type:

basic:integer

description:

Ignore reads shorter than specified number of base pairs.

required:

False

disabled:

False

hidden:

False

advanced.min_mq
label:

Minimum mapping quality

type:

basic:integer

description:

Minimum mapping quality for an alignment to be used.

required:

False

disabled:

False

hidden:

False

advanced.min_bq
label:

Minimum base quality

type:

basic:integer

description:

Minimum base quality for a base to be considered.

required:

False

disabled:

False

hidden:

False

advanced.excl_flags
label:

Filter flags

type:

list:basic:string

description:

Filter flags: skip reads with mask bits set. Press ENTER after each flag.

required:

True

disabled:

False

hidden:

False

default:

['UNMAP', 'SECONDARY', 'QCFAIL', 'DUP']

advanced.depth
label:

Maximum allowed coverage depth

type:

basic:integer

description:

If 0, depth is set to the maximum integer value effectively removing any depth limit.

required:

True

disabled:

False

hidden:

False

default:

1000000

advanced.no_header
label:

No header

type:

basic:boolean

description:

Do not output header.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

table
label:

Output coverage table

type:

basic:file

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Samtools coverage (single-sample)

data:samtoolscoverage:single:samtools-coverage-single (data:alignment:bam  bam, basic:string  region, basic:integer  min_read_length, basic:integer  min_mq, basic:integer  min_bq, list:basic:string  excl_flags, basic:integer  depth, basic:boolean  no_header)[Source: v1.0.0]

Samtools coverage for a single BAM file. Computes the depth at each position or region and creates tabulated text. For more information about samtools coverage, click [here](https://www.htslib.org/doc/samtools-coverage.html).

Input arguments

bam
label:

Input BAM file

type:

data:alignment:bam

description:

Select BAM file for the analysis

required:

True

disabled:

False

hidden:

False

region
label:

Region

type:

basic:string

description:

Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.

required:

False

disabled:

False

hidden:

False

advanced.min_read_length
label:

Minimum read length

type:

basic:integer

description:

Ignore reads shorter than specified number of base pairs.

required:

False

disabled:

False

hidden:

False

advanced.min_mq
label:

Minimum mapping quality

type:

basic:integer

description:

Minimum mapping quality for an alignment to be used.

required:

False

disabled:

False

hidden:

False

advanced.min_bq
label:

Minimum base quality

type:

basic:integer

description:

Minimum base quality for a base to be considered.

required:

False

disabled:

False

hidden:

False

advanced.excl_flags
label:

Filter flags

type:

list:basic:string

description:

Filter flags: skip reads with mask bits set. Press ENTER after each flag.

required:

True

disabled:

False

hidden:

False

default:

['UNMAP', 'SECONDARY', 'QCFAIL', 'DUP']

advanced.depth
label:

Maximum allowed coverage depth

type:

basic:integer

description:

If 0, depth is set to the maximum integer value effectively removing any depth limit.

required:

True

disabled:

False

hidden:

False

default:

1000000

advanced.no_header
label:

No header

type:

basic:boolean

description:

Do not output header.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

table
label:

Output coverage table

type:

basic:file

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Samtools fastq (paired-end)

data:reads:fastq:paired:bamtofastq:bamtofastq-paired (data:alignment:bam  bam)[Source: v1.3.2]

Convert aligned reads in BAM format to paired-end FASTQ files format.

Input arguments

bam
label:

BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

Output results

fastq
label:

Remaining mate1 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Remaining mate2 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Mate1 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Mate2 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download mate1 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download mate2 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Samtools idxstats

data:samtools:idxstats:samtools-idxstats (data:alignment:bam  alignment)[Source: v1.4.2]

Retrieve and print stats in the index file.

Input arguments

alignment
label:

Alignment

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

Output results

report
label:

Samtools idxstats report

type:

basic:file

required:

True

disabled:

False

hidden:

False

Samtools view

data:alignment:bam:samtools:samtools-view (data:alignment:bam  bam, basic:string  region, data:bed  bedfile, basic:boolean  include_header, basic:boolean  only_header, basic:decimal  subsample, basic:integer  subsample_seed, basic:integer  threads)[Source: v1.0.1]

Samtools view. With no options or regions specified, saves all alignments in the specified input alignment file in BAM format to standard output also in BAM format. You may specify one or more space-separated region specifications to restrict output to only those alignments which overlap the specified region(s). For more information about samtools view, click [here](https://www.htslib.org/doc/samtools-view.html).

Input arguments

bam
label:

Input BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

region
label:

Region

type:

basic:string

description:

Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.

required:

False

disabled:

False

hidden:

bedfile

bedfile
label:

Target BED file

type:

data:bed

description:

Target BED file with regions to extract.If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30292-39103.

required:

False

disabled:

False

hidden:

region

advanced.include_header
label:

Include the header in the output

type:

basic:boolean

required:

True

disabled:

advanced.only_header

hidden:

False

default:

True

advanced.only_header
label:

Output the header only

type:

basic:boolean

description:

Selecting this option overrides all other options.

required:

True

disabled:

advanced.include_header

hidden:

False

default:

False

advanced.subsample
label:

Fraction of the input alignments

type:

basic:decimal

description:

Output only a proportion of the input alignments, as specified by 0.0 ≤ FLOAT ≤ 1.0, which gives the fraction of templates/pairs to be kept. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate.

required:

False

disabled:

False

hidden:

False

advanced.subsample_seed
label:

Subsampling seed

type:

basic:integer

description:

Subsampling seed used to influence which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected.

required:

True

disabled:

False

hidden:

!advanced.subsample

default:

11

advanced.threads
label:

Number of threads

type:

basic:integer

description:

Number of BAM compression threads to use in addition to main thread.

required:

True

disabled:

False

hidden:

False

default:

2

Output results

bam
label:

Output BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Ouput index file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

False

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Secondary hybrid BAM file

data:alignment:bam:secondaryupload-bam-secondary (data:alignment:bam  bam, basic:file  src, basic:string  species, basic:string  build)[Source: v0.10.0]

Upload a secondary mapping file in BAM format.

Input arguments

bam
label:

Hybrid bam

type:

data:alignment:bam

description:

Secondary bam will be appended to the same sample where hybrid bam is.

required:

False

src
label:

Mapping (BAM)

type:

basic:file

description:

A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.

validate_regex:

\.(bam)$

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Drosophila melanogaster: Drosophila melanogaster

  • Mus musculus: Mus musculus

build
label:

Build

type:

basic:string

Output results

bam
label:

Uploaded file

type:

basic:file

bai
label:

Index BAI

type:

basic:file

stats
label:

Alignment statistics

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Single cell BAM file and index

data:alignment:bam:scseq:upload-bam-scseq-indexed (basic:file  src, basic:file  src2, data:screads:  reads, basic:string  species, basic:string  build)[Source: v1.4.1]

Import scSeq BAM file and index.

Input arguments

src
label:

Mapping (BAM)

type:

basic:file

description:

A mapping file in BAM format.

required:

True

disabled:

False

hidden:

False

src2
label:

BAM index (*.bam.bai file)

type:

basic:file

description:

An index file of a BAM mapping file (ending with bam.bai).

required:

True

disabled:

False

hidden:

False

reads
label:

Single cell fastq reads

type:

data:screads:

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Output results

bam
label:

Uploaded BAM

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index BAI

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Spike-ins quality control

data:spikeinsspikein-qc (list:data:expression  samples, basic:string  mix)[Source: v1.4.1]

Plot spike-ins measured abundances for samples quality control. The process will output graphs showing the correlation between known concentration of ERCC spike-ins and sample’s measured abundance.

Input arguments

samples
label:

Expressions with spike-ins

type:

list:data:expression

mix
label:

Spike-ins mix

type:

basic:string

description:

Select spike-ins mix.

choices:

  • ERCC Mix 1: ercc_mix1

  • ERCC Mix 2: ercc_mix2

  • SIRV-Set 3: sirv_set3

Output results

plots
label:

Plot figures

type:

list:basic:file

required:

False

report
label:

HTML report with results

type:

basic:file:html

required:

False

hidden:

True

report_zip
label:

ZIP file contining HTML report with results

type:

basic:file

required:

False

Subsample FASTQ (paired-end)

data:reads:fastq:paired:seqtk:seqtk-sample-paired (data:reads:fastq:paired  reads, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v1.5.2]

Subsample reads from FASTQ files (paired-end). [Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).

Input arguments

reads
label:

Reads

type:

data:reads:fastq:paired

required:

True

disabled:

False

hidden:

False

n_reads
label:

Number of reads

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

1000000

advanced.seed
label:

Seed

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

11

advanced.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.

required:

False

disabled:

False

hidden:

False

advanced.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Remaining mate 1 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Remaining mate 2 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Mate 1 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Mate 2 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download mate 1 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download mate 2 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Subsample FASTQ (single-end)

data:reads:fastq:single:seqtk:seqtk-sample-single (data:reads:fastq:single  reads, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v1.5.2]

Subsample reads from FASTQ file (single-end). [Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).

Input arguments

reads
label:

Reads

type:

data:reads:fastq:single

required:

True

disabled:

False

hidden:

False

n_reads
label:

Number of reads

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

1000000

advanced.seed
label:

Seed

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

11

advanced.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.

required:

False

disabled:

False

hidden:

False

advanced.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

fastq
label:

Remaining reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Subsample FASTQ and BWA Aln (paired-end)

data:workflow:chipseq:seqtkbwaalnworkflow-subsample-bwa-aln-paired (data:reads:fastq:paired  reads, data:index:bwa  genome, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:integer  q, basic:boolean  use_edit, basic:integer  edit_value, basic:decimal  fraction, basic:boolean  seeds, basic:integer  seed_length, basic:integer  seed_dist)[Source: v1.1.0]

Input arguments

reads
label:

Reads

type:

data:reads:fastq:paired

genome
label:

Reference genome

type:

data:index:bwa

downsampling.n_reads
label:

Number of reads

type:

basic:integer

default:

10000000

downsampling.advanced.seed
label:

Seed

type:

basic:integer

default:

11

downsampling.advanced.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required:

False

downsampling.advanced.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default:

True

alignment.q
label:

Quality threshold

type:

basic:integer

description:

Parameter for dynamic read trimming.

default:

5

alignment.use_edit
label:

Use maximum edit distance (excludes fraction of missing alignments)

type:

basic:boolean

default:

False

alignment.edit_value
label:

Maximum edit distance

type:

basic:integer

hidden:

!use_edit

default:

5

alignment.fraction
label:

Fraction of missing alignments

type:

basic:decimal

description:

The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.

hidden:

use_edit

default:

0.04

alignment.seeds
label:

Use seeds

type:

basic:boolean

default:

True

alignment.seed_length
label:

Seed length

type:

basic:integer

description:

Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.

hidden:

!seeds

default:

32

alignment.seed_dist
label:

Seed maximum edit distance

type:

basic:integer

hidden:

!seeds

default:

2

Output results

Subsample FASTQ and BWA Aln (single-end)

data:workflow:chipseq:seqtkbwaalnworkflow-subsample-bwa-aln-single (data:reads:fastq:single  reads, data:index:bwa  genome, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:integer  q, basic:boolean  use_edit, basic:integer  edit_value, basic:decimal  fraction, basic:boolean  seeds, basic:integer  seed_length, basic:integer  seed_dist)[Source: v1.1.0]

Input arguments

reads
label:

Reads

type:

data:reads:fastq:single

genome
label:

Reference genome

type:

data:index:bwa

downsampling.n_reads
label:

Number of reads

type:

basic:integer

default:

10000000

downsampling.advanced.seed
label:

Seed

type:

basic:integer

default:

11

downsampling.advanced.fraction
label:

Fraction

type:

basic:decimal

description:

Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required:

False

downsampling.advanced.two_pass
label:

2-pass mode

type:

basic:boolean

description:

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default:

True

alignment.q
label:

Quality threshold

type:

basic:integer

description:

Parameter for dynamic read trimming.

default:

5

alignment.use_edit
label:

Use maximum edit distance (excludes fraction of missing alignments)

type:

basic:boolean

default:

False

alignment.edit_value
label:

Maximum edit distance

type:

basic:integer

hidden:

!use_edit

default:

5

alignment.fraction
label:

Fraction of missing alignments

type:

basic:decimal

description:

The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.

hidden:

use_edit

default:

0.04

alignment.seeds
label:

Use seeds

type:

basic:boolean

default:

True

alignment.seed_length
label:

Seed length

type:

basic:integer

description:

Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.

hidden:

!seeds

default:

32

alignment.seed_dist
label:

Seed maximum edit distance

type:

basic:integer

hidden:

!seeds

default:

2

Output results

Test basic fields

data:test:fieldstest-basic-fields (basic:boolean  boolean, basic:date  date, basic:datetime  datetime, basic:decimal  decimal, basic:integer  integer, basic:string  string, basic:text  text, basic:url:download  url_download, basic:url:view  url_view, basic:string  string2, basic:string  string3, basic:string  string4, basic:string  string5, basic:string  string6, basic:string  string7, basic:string  tricky2)[Source: v1.2.4]

Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.

Input arguments

boolean
label:

Boolean

type:

basic:boolean

default:

True

date
label:

Date

type:

basic:date

default:

2013-12-31

datetime
label:

Date and time

type:

basic:datetime

default:

2013-12-31 23:59:59

decimal
label:

Decimal

type:

basic:decimal

default:

-123.456

integer
label:

Integer

type:

basic:integer

default:

-123

string
label:

String

type:

basic:string

default:

Foo b-a-r.gz 1.23

text
label:

Text

type:

basic:text

default:

Foo bar in 3 lines.

url_download
label:

URL download

type:

basic:url:download

default:

{'url': 'http://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf'}

url_view
label:

URL view

type:

basic:url:view

default:

{'name': 'Something', 'url': 'http://www.something.com/'}

group.string2
label:

String 2 required

type:

basic:string

description:

String 2 description.

required:

True

disabled:

false

hidden:

false

placeholder:

Enter string

group.string3
label:

String 3 disabled

type:

basic:string

description:

String 3 description.

disabled:

true

default:

disabled

group.string4
label:

String 4 hidden

type:

basic:string

description:

String 4 description.

hidden:

True

default:

hidden

group.string5
label:

String 5 choices

type:

basic:string

description:

String 5 description.

hidden:

False

default:

choice_2

choices:

  • Choice 1: choice_1

  • Choice 2: choice_2

  • Choice 3: choice_3

group.string6
label:

String 6 regex only “Aa”

type:

basic:string

default:

AAaAaaa

validate_regex:

^[aA]*$

group.string7
label:

String 7 optional choices

type:

basic:string

description:

String 7 description.

required:

False

hidden:

False

default:

choice_2

choices:

  • Choice 1: choice_1

  • Choice 2: choice_2

  • Choice 3: choice_3

tricky.tricky1.tricky2
label:

Tricky 2

type:

basic:string

default:

true

Output results

output
label:

Result

type:

basic:url:view

out_boolean
label:

Boolean

type:

basic:boolean

out_date
label:

Date

type:

basic:date

out_datetime
label:

Date and time

type:

basic:datetime

out_decimal
label:

Decimal

type:

basic:decimal

out_integer
label:

Integer

type:

basic:integer

out_string
label:

String

type:

basic:string

out_text
label:

Text

type:

basic:text

out_url_download
label:

URL download

type:

basic:url:download

out_url_view
label:

URL view

type:

basic:url:view

out_group.string2
label:

String 2 required

type:

basic:string

description:

String 2 description.

out_group.string3
label:

String 3 disabled

type:

basic:string

description:

String 3 description.

out_group.string4
label:

String 4 hidden

type:

basic:string

description:

String 4 description.

out_group.string5
label:

String 5 choices

type:

basic:string

description:

String 5 description.

out_group.string6
label:

String 6 regex only “Aa”

type:

basic:string

out_group.string7
label:

String 7 optional choices

type:

basic:string

out_tricky.tricky1.tricky2
label:

Tricky 2

type:

basic:string

Test disabled inputs

data:test:disabledtest-disabled (basic:boolean  broad, basic:integer  broad_width, basic:string  width_label, basic:integer  if_and_condition)[Source: v1.2.4]

Test disabled input fields.

Input arguments

broad
label:

Broad peaks

type:

basic:boolean

default:

False

broad_width
label:

Width of peaks

type:

basic:integer

disabled:

broad === false

default:

5

width_label
label:

Width label

type:

basic:string

disabled:

broad === false

default:

FD

if_and_condition
label:

If width is 5 and label FDR

type:

basic:integer

disabled:

broad_width == 5 && width_label == ‘FDR’

default:

5

Output results

output
label:

Result

type:

basic:string

Test hidden inputs

data:test:hiddentest-hidden (basic:boolean  broad, basic:integer  broad_width, basic:integer  parameter1, basic:integer  parameter2, basic:integer  broad_width2)[Source: v1.2.4]

Test hidden input fields

Input arguments

broad
label:

Broad peaks

type:

basic:boolean

default:

False

broad_width
label:

Width of peaks

type:

basic:integer

hidden:

broad === false

default:

5

parameters_broad_f.parameter1
label:

parameter1

type:

basic:integer

default:

10

parameters_broad_f.parameter2
label:

parameter2

type:

basic:integer

default:

10

parameters_broad_t.broad_width2
label:

Width of peaks2

type:

basic:integer

default:

5

Output results

output
label:

Result

type:

basic:string

Test select controler

data:test:resulttest-list (data:test:result  single, list:data:test:result  multiple)[Source: v1.2.4]

Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.

Input arguments

single
label:

Single

type:

data:test:result

multiple
label:

Multiple

type:

list:data:test:result

Output results

output
label:

Result

type:

basic:string

Test sleep progress

data:test:resulttest-sleep-progress (basic:integer  t)[Source: v1.2.4]

Test for the progress bar by sleeping 5 times for the specified amount of time.

Input arguments

t
label:

Sleep time

type:

basic:integer

default:

5

Output results

output
label:

Result

type:

basic:string

Trim Galore (paired-end)

data:reads:fastq:paired:trimgalore:trimgalore-paired (data:reads:fastq:paired  reads, list:basic:string  adapter, list:basic:string  adapter_2, data:seq:nucleotide  adapter_file_1, data:seq:nucleotide  adapter_file_2, basic:string  universal_adapter, basic:integer  stringency, basic:decimal  error_rate, basic:integer  quality, basic:integer  nextseq, basic:string  phred, basic:integer  min_length, basic:integer  max_n, basic:boolean  retain_unpaired, basic:integer  unpaired_len_1, basic:integer  unpaired_len_2, basic:integer  clip_r1, basic:integer  clip_r2, basic:integer  three_prime_r1, basic:integer  three_prime_r2, basic:integer  trim_5, basic:integer  trim_3)[Source: v1.3.2]

Process paired-end sequencing reads with Trim Galore. Trim Galore is a wrapper script that makes use of the publicly available adapter trimming tool Cutadapt and FastQC for quality control once the trimming process has completed. Low-quality ends are trimmed from reads in addition to adapter removal in a single pass. If no sequence was supplied, Trim Galore will attempt to auto-detect the adapter which has been used. For this it will analyse the first 1 million sequences of the first specified file and attempt to find the first 12 or 13bp of the following standard adapters: Illumina: AGATCGGAAGAGC, Small RNA: TGGAATTCTCGG, Nextera: CTGTCTCTTATA. If no adapter contamination can be detected within the first 1 million sequences, or in case of a tie between several different adapters, Trim Galore defaults to illumina adapters. For additional information see official [user guide](https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md).

Input arguments

reads
label:

Select paired-end reads

type:

data:reads:fastq:paired

required:

True

disabled:

False

hidden:

False

adapter_trim.adapter
label:

Read 1 adapter sequence

type:

list:basic:string

description:

Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.

required:

False

disabled:

False

hidden:

False

default:

[]

adapter_trim.adapter_2
label:

Read 2 adapter sequence

type:

list:basic:string

description:

Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.

required:

False

disabled:

False

hidden:

False

default:

[]

adapter_trim.adapter_file_1
label:

Read 1 adapters file

type:

data:seq:nucleotide

description:

This is mutually exclusive with read 1 adapters and universal adapters.

required:

False

disabled:

False

hidden:

False

adapter_trim.adapter_file_2
label:

Read 2 adapters file

type:

data:seq:nucleotide

description:

This is mutually exclusive with read 2 adapters and universal adapters.

required:

False

disabled:

False

hidden:

False

adapter_trim.universal_adapter
label:

Universal adapters

type:

basic:string

description:

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required:

False

disabled:

False

hidden:

False

choices:

  • Illumina: --illumina

  • Nextera: --nextera

  • Illumina small RNA: --small_rna

adapter_trim.stringency
label:

Overlap with adapter sequence required to trim

type:

basic:integer

description:

Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.

required:

True

disabled:

False

hidden:

False

default:

1

adapter_trim.error_rate
label:

Maximum allowed error rate

type:

basic:decimal

description:

Number of errors divided by the length of the matching region

required:

True

disabled:

False

hidden:

False

default:

0.1

quality_trim.quality
label:

Quality cutoff

type:

basic:integer

description:

Trim low-quality ends from reads based on phred score.

required:

True

disabled:

False

hidden:

False

default:

20

quality_trim.nextseq
label:

NextSeq/NovaSeq trim cutoff

type:

basic:integer

description:

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.

required:

False

disabled:

False

hidden:

False

quality_trim.phred
label:

Phred score encoding

type:

basic:string

description:

Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1.9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming

required:

True

disabled:

False

hidden:

False

default:

--phred33

choices:

  • ASCII+33: --phred33

  • ASCII+64: --phred64

quality_trim.min_length
label:

Minimum length after trimming

type:

basic:integer

description:

Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.

required:

True

disabled:

False

hidden:

False

default:

20

quality_trim.max_n
label:

Maximum number of Ns

type:

basic:integer

description:

Read exceeding this limit will result in the entire pair being removed from the trimmed output files.

required:

False

disabled:

False

hidden:

False

quality_trim.retain_unpaired
label:

Retain unpaired reads after trimming

type:

basic:boolean

description:

If only one of the two paired-end reads became too short, the longer read will be written.

required:

True

disabled:

False

hidden:

False

default:

False

quality_trim.unpaired_len_1
label:

Unpaired read length cutoff for mate 1

type:

basic:integer

required:

True

disabled:

False

hidden:

!quality_trim.retain_unpaired

default:

35

quality_trim.unpaired_len_2
label:

Unpaired read length cutoff for mate 2

type:

basic:integer

required:

True

disabled:

False

hidden:

!quality_trim.retain_unpaired

default:

35

quality_trim.clip_r1
label:

Trim bases from 5’ end of read 1

type:

basic:integer

description:

This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.

required:

False

disabled:

False

hidden:

False

quality_trim.clip_r2
label:

Trim bases from 5’ end of read 2

type:

basic:integer

description:

This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.

required:

False

disabled:

False

hidden:

False

quality_trim.three_prime_r1
label:

Trim bases from 3’ end of read 1

type:

basic:integer

description:

Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.

required:

False

disabled:

False

hidden:

False

quality_trim.three_prime_r2
label:

Trim bases from 3’ end of read 2

type:

basic:integer

description:

Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.

required:

False

disabled:

False

hidden:

False

hard_trim.trim_5
label:

Hard trim sequences from 3’ end

type:

basic:integer

description:

Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.

required:

False

disabled:

False

hidden:

False

hard_trim.trim_3
label:

Hard trim sequences from 5’ end

type:

basic:integer

description:

Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.

required:

False

disabled:

False

hidden:

False

Output results

fastq
label:

Remaining mate 1 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastq2
label:

Remaining mate 2 reads

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

report
label:

Trim galore report

type:

basic:file

required:

False

disabled:

False

hidden:

False

fastqc_url
label:

Mate 1 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_url2
label:

Mate 2 quality control with FastQC

type:

list:basic:file:html

required:

True

disabled:

False

hidden:

False

fastqc_archive
label:

Download mate 1 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

fastqc_archive2
label:

Download mate 2 FastQC archive

type:

list:basic:file

required:

True

disabled:

False

hidden:

False

Trimmomatic (paired-end)

data:reads:fastq:paired:trimmomatictrimmomatic-paired (data:reads:fastq:paired  reads, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  palindrome_clip_threshold, basic:integer  min_adapter_length, basic:boolean  keep_both_reads, basic:integer  window_size, basic:integer  required_quality, basic:integer  target_length, basic:decimal  strictness, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:integer  average_quality)[Source: v2.5.2]

Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.

Input arguments

reads
label:

Reads

type:

data:reads:fastq:paired

illuminaclip.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.

required:

False

illuminaclip.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.

required:

False

disabled:

!illuminaclip.adapters

illuminaclip.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequence’, ‘Seed mismatches’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.

required:

False

disabled:

!illuminaclip.adapters

illuminaclip.palindrome_clip_threshold
label:

Palindrome clip threshold

type:

basic:integer

description:

Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminacliping.

required:

False

disabled:

!illuminaclip.adapters

illuminaclip.min_adapter_length
label:

Minimum adapter length

type:

basic:integer

description:

In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.

disabled:

!illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold

default:

8

illuminaclip.keep_both_reads
label:

Keep both reads

type:

basic:boolean

description:

After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read.By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming Illuminaclip. ‘Adapter sequence’, ‘Seed mismatches’, ‘Simple clip threshold’, ‘Palindrome clip threshold’ and also ‘Minimum adapter length’ are needed in order to use this parameter.

required:

False

disabled:

!illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold && !illuminaclip.min_adapter_length

slidingwindow.window_size
label:

Window size

type:

basic:integer

description:

Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).

required:

False

slidingwindow.required_quality
label:

Required quality

type:

basic:integer

description:

Specifies the average quality required. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).

required:

False

maxinfo.target_length
label:

Target length

type:

basic:integer

description:

This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).

required:

False

maxinfo.strictness
label:

Strictness

type:

basic:decimal

description:

This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).

required:

False

trim_bases.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning. Specifies the minimum quality required to keep a base.

required:

False

trim_bases.trailing
label:

Trailing

type:

basic:integer

description:

Remove low quality bases from the end. Specifies the minimum quality required to keep a base.

required:

False

trim_bases.crop
label:

Crop

type:

basic:integer

description:

Cut the read to a specified length by removing bases from the end.

required:

False

trim_bases.headcrop
label:

Headcrop

type:

basic:integer

description:

Cut the specified number of bases from the start of the read.

required:

False

reads_filtering.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

reads_filtering.average_quality
label:

Average quality

type:

basic:integer

description:

Drop the read if the average quality is below the specified level.

required:

False

Output results

fastq
label:

Reads file (mate 1)

type:

list:basic:file

fastq_unpaired
label:

Reads file

type:

basic:file

required:

False

fastq2
label:

Reads file (mate 2)

type:

list:basic:file

fastq2_unpaired
label:

Reads file

type:

basic:file

required:

False

fastqc_url
label:

Quality control with FastQC (Upstream)

type:

list:basic:file:html

fastqc_url2
label:

Quality control with FastQC (Downstream)

type:

list:basic:file:html

fastqc_archive
label:

Download FastQC archive (Upstream)

type:

list:basic:file

fastqc_archive2
label:

Download FastQC archive (Downstream)

type:

list:basic:file

Trimmomatic (single-end)

data:reads:fastq:single:trimmomatictrimmomatic-single (data:reads:fastq:single  reads, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  window_size, basic:integer  required_quality, basic:integer  target_length, basic:decimal  strictness, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:integer  average_quality)[Source: v2.5.2]

Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.

Input arguments

reads
label:

Reads

type:

data:reads:fastq:single

illuminaclip.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform Illuminacliping.

required:

False

illuminaclip.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequences’ and ‘Simple clip threshold’ parameter are needed to perform Illuminacliping.

required:

False

disabled:

!illuminaclip.adapters

illuminaclip.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.

required:

False

disabled:

!illuminaclip.adapters

slidingwindow.window_size
label:

Window size

type:

basic:integer

description:

Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).

required:

False

slidingwindow.required_quality
label:

Required quality

type:

basic:integer

description:

Specifies the average quality required in window size. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).

required:

False

maxinfo.target_length
label:

Target length

type:

basic:integer

description:

This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).

required:

False

maxinfo.strictness
label:

Strictness

type:

basic:decimal

description:

This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).

required:

False

trim_bases.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning, if below a threshold quality.

required:

False

trim_bases.trailing
label:

Trailing quality

type:

basic:integer

description:

Remove low quality bases from the end, if below a threshold quality.

required:

False

trim_bases.crop
label:

Crop

type:

basic:integer

description:

Cut the read to a specified length by removing bases from the end.

required:

False

trim_bases.headcrop
label:

Headcrop

type:

basic:integer

description:

Cut the specified number of bases from the start of the read.

required:

False

reads_filtering.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

reads_filtering.average_quality
label:

Average quality

type:

basic:integer

description:

Drop the read if the average quality is below the specified level.

required:

False

Output results

fastq
label:

Reads file

type:

list:basic:file

fastqc_url
label:

Quality control with FastQC

type:

list:basic:file:html

fastqc_archive
label:

Download FastQC archive

type:

list:basic:file

UMI-tools dedup

data:alignment:bam:umitools:dedup:umi-tools-dedup (data:alignment:bam  alignment)[Source: v1.5.1]

Deduplicate reads using UMI and mapping coordinates.

Input arguments

alignment
label:

Alignment

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

Output results

bam
label:

Clipped BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of clipped BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

dedup_log
label:

Deduplication log

type:

basic:file

required:

True

disabled:

False

hidden:

False

dedup_stats
label:

Deduplication stats

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Upload microarray expression (unmapped)

data:microarray:normalized:upload-microarray-expression (basic:file  exp, basic:string  exp_type, basic:string  platform, basic:string  platform_id, basic:string  species)[Source: v1.1.1]

Import unmapped microarray expression data.

Input arguments

exp
label:

Normalized expression

type:

basic:file

description:

Normalized expression file with the original probe IDs. Supported file extensions are .tab.*, .tsv.*, .txt.*

required:

True

disabled:

False

hidden:

False

exp_type
label:

Normalization type

type:

basic:string

required:

True

disabled:

False

hidden:

False

platform
label:

Microarray platform name

type:

basic:string

required:

True

disabled:

False

hidden:

False

platform_id
label:

GEO platform ID

type:

basic:string

description:

Platform ID according to the GEO database. This can be used in following steps to automatically map probe IDs to genes.

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Macaca mulatta: Macaca mulatta

  • Dictyostelium discoideum: Dictyostelium discoideum

Output results

exp
label:

Uploaded normalized expression

type:

basic:file

required:

True

disabled:

False

hidden:

False

exp_type
label:

Normalization type

type:

basic:string

required:

True

disabled:

False

hidden:

False

platform
label:

Microarray platform type

type:

basic:string

required:

True

disabled:

False

hidden:

False

platform_id
label:

GEO platform ID

type:

basic:string

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

Upload proteomics sample

data:proteomics:massspectrometry:upload-proteomics-sample (basic:file  src, basic:string  species, basic:string  source)[Source: v1.2.1]

Upload a mass spectrometry proteomics sample data file. The input 5-column tab-delimited file with the .txt suffix is expected to contain a header line with the following meta-data column names: “Uniprot ID”, “Gene symbol”, “Protein name” and “Number of peptides”. The fifth column contains the sample data.

Input arguments

src
label:

Table containing mass spectrometry data (.txt)

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

source
label:

Protein ID database source

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

UniProtKB

choices:

  • UniProtKB: UniProtKB

Output results

table
label:

Uploaded table

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

source
label:

Source

type:

basic:string

required:

True

disabled:

False

hidden:

False

Upload proteomics sample set

data:proteomics:sampleset:upload-proteomics-sample-set (basic:file  src, basic:string  species, basic:string  source)[Source: v1.2.1]

Upload a mass spectrometry proteomics sample set file. The input multi-sample tab-delimited file with the .txt suffix is expected to contain a header line with the following meta-data column names: “Uniprot ID”, “Gene symbol”, “Protein name” and “Number of peptides”. Each additional column in the input file should contain data for a single sample.

Input arguments

src
label:

Table containing mass spectrometry data (.txt)

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field.

required:

True

disabled:

False

hidden:

False

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

source
label:

Protein ID database source

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

UniProtKB

choices:

  • UniProtKB: UniProtKB

Output results

table
label:

Uploaded table

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

source
label:

Source

type:

basic:string

required:

True

disabled:

False

hidden:

False

VCF file

data:variants:vcfupload-variants-vcf (basic:file  src, basic:string  species, basic:string  build)[Source: v2.3.0]

Upload variants in VCF format.

Input arguments

src
label:

Variants (VCF)

type:

basic:file

description:

Variants in VCF format.

required:

True

validate_regex:

\.(vcf)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

species
label:

Species

type:

basic:string

description:

Species latin name.

choices:

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label:

Genome build

type:

basic:string

Output results

vcf
label:

Uploaded file

type:

basic:file

tbi
label:

Tabix index

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

Variant calling (CheMut)

data:variants:vcf:chemut:vc-chemut (data:seq:nucleotide  genome, list:data:alignment:bam  parental_strains, list:data:alignment:bam  mutant_strains, basic:boolean  base_recalibration, data:variants:vcf  known_sites, list:data:variants:vcf  known_indels, basic:string  PL, basic:string  LB, basic:string  PU, basic:string  CN, basic:date  DT, data:bed  intervals, basic:integer  ploidy, basic:integer  stand_call_conf, basic:integer  mbq, basic:integer  max_reads, basic:integer  java_gc_threads, basic:integer  max_heap_size)[Source: v3.0.1]

CheMut varint calling using multiple BAM input files.

Input arguments

genome
label:

Reference genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

parental_strains
label:

Parental strains

type:

list:data:alignment:bam

required:

True

disabled:

False

hidden:

False

mutant_strains
label:

Mutant strains

type:

list:data:alignment:bam

required:

True

disabled:

False

hidden:

False

base_recalibration
label:

Do variant base recalibration

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

known_sites
label:

dbSNP file

type:

data:variants:vcf

description:

Database of known polymorphic sites.

required:

False

disabled:

False

hidden:

False

known_indels
label:

Known indels

type:

list:data:variants:vcf

required:

False

disabled:

False

hidden:

!base_recalibration

reads_info.PL
label:

Platform/technology

type:

basic:string

description:

Platform/technology used to produce the reads.

required:

True

disabled:

False

hidden:

False

default:

Illumina

choices:

  • Capillary: Capillary

  • Ls454: Ls454

  • Illumina: Illumina

  • SOLiD: SOLiD

  • Helicos: Helicos

  • IonTorrent: IonTorrent

  • Pacbio: Pacbio

reads_info.LB
label:

Library

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

x

reads_info.PU
label:

Platform unit

type:

basic:string

description:

Platform unit (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.

required:

True

disabled:

False

hidden:

False

default:

x

reads_info.CN
label:

Sequencing center

type:

basic:string

description:

Name of sequencing center producing the read.

required:

True

disabled:

False

hidden:

False

default:

x

reads_info.DT
label:

Date

type:

basic:date

description:

Date the run was produced.

required:

True

disabled:

False

hidden:

False

default:

2017-01-01

hc.intervals
label:

Intervals (from BED file)

type:

data:bed

description:

Use this option to perform the analysis over only part of the genome.

required:

False

disabled:

False

hidden:

False

hc.ploidy
label:

Sample ploidy

type:

basic:integer

description:

Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).

required:

True

disabled:

False

hidden:

False

default:

2

hc.stand_call_conf
label:

Min call confidence threshold

type:

basic:integer

description:

The minimum phred-scaled confidence threshold at which variants should be called.

required:

True

disabled:

False

hidden:

False

default:

30

hc.mbq
label:

Min Base Quality

type:

basic:integer

description:

Minimum base quality required to consider a base for calling.

required:

True

disabled:

False

hidden:

False

default:

10

hc.max_reads
label:

Max reads per alignment start site

type:

basic:integer

description:

Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.

required:

True

disabled:

False

hidden:

False

default:

50

advanced.java_gc_threads
label:

Java ParallelGCThreads

type:

basic:integer

description:

Sets the number of threads used during parallel phases of the garbage collectors.

required:

True

disabled:

False

hidden:

False

default:

2

advanced.max_heap_size
label:

Java maximum heap size (Xmx)

type:

basic:integer

description:

Set the maximum Java heap size (in GB).

required:

True

disabled:

False

hidden:

False

default:

12

Output results

vcf
label:

Called variants file

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Variant filtering (CheMut)

data:variants:vcf:filtering:filtering-chemut (data:variants:vcf  variants, basic:string  analysis_type, basic:string  parental_strain, basic:string  mutant_strain, data:seq:nucleotide  genome, basic:integer  read_depth)[Source: v1.8.2]

Filtering and annotation of Variant Calling (CheMut). Filtering and annotation of Variant Calling data - Chemical mutagenesis in _Dictyostelium discoideum_.

Input arguments

variants
label:

Variants file (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

analysis_type
label:

Analysis type

type:

basic:string

description:

Choice of the analysis type. Use ‘SNV’ or ‘INDEL’ options. Choose options SNV_CHR2 or INDEL_CHR2 to run the GATK analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).

required:

True

disabled:

False

hidden:

False

default:

snv

choices:

  • SNV: snv

  • INDEL: indel

  • SNV_CHR2: snv_chr2

  • INDEL_CHR2: indel_chr2

parental_strain
label:

Parental strain prefix

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

parental

mutant_strain
label:

Mutant strain prefix

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

mut

genome
label:

Reference genome

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

read_depth
label:

Read Depth Cutoff

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

5

Output results

summary
label:

Summary

type:

basic:file

description:

Summarize the input parameters and results.

required:

True

disabled:

False

hidden:

False

vcf
label:

Variants

type:

basic:file

description:

A genome VCF file of variants that passed the filters.

required:

True

disabled:

False

hidden:

False

tbi
label:

Tabix index

type:

basic:file

required:

True

disabled:

False

hidden:

False

variants_filtered
label:

Variants filtered

type:

basic:file

description:

A data frame of variants that passed the filters.

required:

False

disabled:

False

hidden:

False

variants_filtered_alt
label:

Variants filtered (multiple alt. alleles)

type:

basic:file

description:

A data frame of variants that contain more than two alternative alleles. These variants are likely to be false positives.

required:

False

disabled:

False

hidden:

False

gene_list_all
label:

Gene list (all)

type:

basic:file

description:

Genes that are mutated at least once.

required:

False

disabled:

False

hidden:

False

gene_list_top
label:

Gene list (top)

type:

basic:file

description:

Genes that are mutated at least twice.

required:

False

disabled:

False

hidden:

False

mut_chr
label:

Mutations (by chr)

type:

basic:file

description:

List mutations in individual chromosomes.

required:

False

disabled:

False

hidden:

False

mut_strain
label:

Mutations (by strain)

type:

basic:file

description:

List mutations in individual strains.

required:

False

disabled:

False

hidden:

False

strain_by_gene
label:

Strain (by gene)

type:

basic:file

description:

List mutants that carry mutations in individual genes.

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

WALT

data:alignment:bam:waltwalt (data:index:walt  genome, data:reads:fastq  reads, basic:boolean  rm_dup, basic:integer  optical_distance, basic:integer  mismatch, basic:integer  number, basic:string  spikein_name, basic:boolean  filter_spikein)[Source: v3.7.2]

WALT (Wildcard ALignment Tool) is a read mapping program for bisulfite sequencing in DNA methylation studies.

Input arguments

genome
label:

Reference genome

type:

data:index:walt

reads
label:

Reads

type:

data:reads:fastq

rm_dup
label:

Remove duplicates

type:

basic:boolean

default:

True

optical_distance
label:

Optical duplicate distance

type:

basic:integer

description:

The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.

disabled:

!rm_dup

default:

0

mismatch
label:

Maximum allowed mismatches

type:

basic:integer

required:

False

number
label:

Number of reads to map in one loop

type:

basic:integer

description:

Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.

required:

False

spikein_options.spikein_name
label:

Chromosome name of unmethylated control sequence

type:

basic:string

description:

Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.

required:

False

spikein_options.filter_spikein
label:

Remove control/spike-in sequences.

type:

basic:boolean

description:

Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).

disabled:

!spikein_options.spikein_name

default:

False

Output results

bam
label:

Alignment file (BAM)

type:

basic:file

description:

Position sorted alignment in .bam format

bai
label:

Index BAI

type:

basic:file

stats
label:

Statistics

type:

basic:file

mr
label:

Alignment file (MR)

type:

basic:file

description:

Position sorted alignment in .mr format.

duplicates_report
label:

Removed duplicates statistics

type:

basic:file

required:

False

unmapped
label:

Unmapped reads

type:

basic:file

required:

False

spikein_mr
label:

Alignment file of unmethylated control reads

type:

basic:file

required:

False

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

WALT genome index

data:index:walt:walt-index (data:seq:nucleotide  ref_seq)[Source: v1.2.1]

Create WALT genome index.

Input arguments

ref_seq
label:

Reference sequence (nucleotide FASTA)

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

Output results

index
label:

WALT index

type:

basic:dir

required:

True

disabled:

False

hidden:

False

fastagz
label:

FASTA file (compressed)

type:

basic:file

required:

True

disabled:

False

hidden:

False

fasta
label:

FASTA file

type:

basic:file

required:

True

disabled:

False

hidden:

False

fai
label:

FASTA file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

WGBS (paired-end)

data:workflow:wgbsworkflow-wgbs-paired (data:reads:fastq:paired  reads, data:index:walt  walt_index, data:seq:nucleotide  ref_seq, basic:string  validation_stringency, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  min_adapter_length, basic:integer  palindrome_clip_threshold, basic:boolean  keep_both_reads, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:boolean  rm_dup, basic:integer  optical_distance, basic:integer  mismatch, basic:integer  number, basic:string  spikein_name, basic:boolean  filter_spikein, basic:boolean  skip, data:seq:nucleotide  sequence, basic:boolean  count_all, basic:integer  read_length, basic:decimal  max_mismatch, basic:boolean  a_rich, basic:boolean  cpgs, basic:boolean  symmetric_cpgs, data:seq:nucleotide  adapters, basic:integer  insert_size, basic:string  pair_orientation, basic:integer  read_length, basic:integer  min_map_quality, basic:integer  min_quality, basic:integer  coverage_cap, basic:integer  accumulation_cap, basic:integer  sample_size, basic:integer  min_quality, basic:integer  next_base_quality, basic:integer  min_lenght, basic:decimal  mismatch_rate, basic:decimal  minimum_fraction, basic:boolean  include_duplicates, basic:decimal  deviations)[Source: v2.2.0]

This WGBS pipeline is comprised of trimming, alignment, computation of methylation levels, identification of hypo-methylated regions (HMRs) and additional QC steps. First, reads are trimmed to remove adapters or kit specific artifacts. Reads are then aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package. QC steps are based on [Picard](http://broadinstitute.github.io/picard/) and include high level metrics about the alignment, WGS performance and summary statistics from bisulfite sequencing. Final QC reports are summarized by MultiQC.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:paired

walt_index
label:

Walt index

type:

data:index:walt

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

adapter_trimming.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform adapter trimming. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.

required:

False

adapter_trimming.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.

required:

False

disabled:

!adapter_trimming.adapters

adapter_trimming.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.

required:

False

disabled:

!adapter_trimming.adapters

adapter_trimming.min_adapter_length
label:

Minimum adapter length

type:

basic:integer

description:

In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.

disabled:

!adapter_trimming.seed_mismatches && !adapter_trimming.simple_clip_threshold && !adapter_trimming.palindrome_clip_threshold

default:

8

adapter_trimming.palindrome_clip_threshold
label:

Palindrome clip threshold

type:

basic:integer

description:

Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.

required:

False

disabled:

!adapter_trimming.adapters

adapter_trimming.keep_both_reads
label:

Keep both reads

type:

basic:boolean

description:

After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming adapter trimming.

required:

False

disabled:

!adapter_trimming.seed_mismatches && !adapter_trimming.simple_clip_threshold && !adapter_trimming.palindrome_clip_threshold && !adapter_trimming.min_adapter_length

trimming_filtering.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning, if below a threshold quality.

required:

False

trimming_filtering.trailing
label:

Trailing quality

type:

basic:integer

description:

Remove low quality bases from the end, if below a threshold quality.

required:

False

trimming_filtering.crop
label:

Crop

type:

basic:integer

description:

Cut the read to a specified length by removing bases from the end.

required:

False

trimming_filtering.headcrop
label:

Headcrop

type:

basic:integer

description:

Cut the specified number of bases from the start of the read.

required:

False

trimming_filtering.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

alignment.rm_dup
label:

Remove duplicates

type:

basic:boolean

default:

True

alignment.optical_distance
label:

Optical duplicate distance

type:

basic:integer

description:

The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.

disabled:

!alignment.rm_dup

default:

0

alignment.mismatch
label:

Maximum allowed mismatches

type:

basic:integer

default:

6

alignment.number
label:

Number of reads to map in one loop

type:

basic:integer

description:

Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.

required:

False

alignment.spikein_name
label:

Chromosome name of unmethylated control sequence

type:

basic:string

description:

Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.

required:

False

alignment.filter_spikein
label:

Remove control/spike-in sequences.

type:

basic:boolean

description:

Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).

disabled:

!alignment.spikein_name

default:

False

bsrate.skip
label:

Skip Bisulfite conversion rate step

type:

basic:boolean

description:

Bisulfite conversion rate step can be skipped. If separate alignment file for unmethylated control sequence is not produced during the alignment this process will fail.

disabled:

!alignment.spikein_name

default:

True

bsrate.sequence
label:

Unmethylated control sequence

type:

data:seq:nucleotide

required:

False

disabled:

bsrate.skip

bsrate.count_all
label:

Count all cytosines including CpGs

type:

basic:boolean

disabled:

bsrate.skip

default:

True

bsrate.read_length
label:

Average read length

type:

basic:integer

default:

150

bsrate.max_mismatch
label:

Maximum fraction of mismatches

type:

basic:decimal

required:

False

disabled:

bsrate.skip

bsrate.a_rich
label:

Reads are A-rich

type:

basic:boolean

disabled:

bsrate.skip

default:

False

methcounts.cpgs
label:

Only CpG context sites

type:

basic:boolean

description:

Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.

disabled:

methcounts.symmetric_cpgs

default:

False

methcounts.symmetric_cpgs
label:

Merge CpG pairs

type:

basic:boolean

description:

Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.

disabled:

methcounts.cpgs

default:

True

summary.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

required:

False

summary.insert_size
label:

Maximum insert size

type:

basic:integer

default:

100000

summary.pair_orientation
label:

Pair orientation

type:

basic:string

default:

null

choices:

  • Unspecified: null

  • FR: FR

  • RF: RF

  • TANDEM: TANDEM

wgs_metrics.read_length
label:

Average read length

type:

basic:integer

default:

150

wgs_metrics.min_map_quality
label:

Minimum mapping quality for a read to contribute coverage

type:

basic:integer

default:

20

wgs_metrics.min_quality
label:

Minimum base quality for a base to contribute coverage

type:

basic:integer

description:

N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.

default:

20

wgs_metrics.coverage_cap
label:

Maximum coverage cap

type:

basic:integer

description:

Treat positions with coverage exceeding this value as if they had coverage at this set value.

default:

250

wgs_metrics.accumulation_cap
label:

Ignore positions with coverage above this value

type:

basic:integer

description:

At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value

default:

100000

wgs_metrics.sample_size
label:

Sample Size used for Theoretical Het Sensitivity sampling

type:

basic:integer

default:

10000

rrbs_metrics.min_quality
label:

Threshold for base quality of a C base before it is considered

type:

basic:integer

default:

20

rrbs_metrics.next_base_quality
label:

Threshold for quality of a base next to a C before the C base is considered

type:

basic:integer

default:

10

rrbs_metrics.min_lenght
label:

Minimum read length

type:

basic:integer

default:

5

rrbs_metrics.mismatch_rate
label:

Maximum fraction of mismatches in a read to be considered (Between 0 and 1)

type:

basic:decimal

default:

0.1

insert.minimum_fraction
label:

Minimum fraction of reads in a category to be considered

type:

basic:decimal

description:

When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).

default:

0.05

insert.include_duplicates
label:

Include reads marked as duplicates in the insert size histogram

type:

basic:boolean

default:

False

insert.deviations
label:

Deviations limit

type:

basic:decimal

description:

Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.

default:

10.0

Output results

WGBS (single-end)

data:workflow:wgbsworkflow-wgbs-single (data:reads:fastq:single  reads, data:index:walt  walt_index, data:seq:nucleotide  ref_seq, basic:string  validation_stringency, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:boolean  rm_dup, basic:integer  optical_distance, basic:integer  mismatch, basic:integer  number, basic:string  spikein_name, basic:boolean  filter_spikein, basic:boolean  skip, data:seq:nucleotide  sequence, basic:boolean  count_all, basic:integer  read_length, basic:decimal  max_mismatch, basic:boolean  a_rich, basic:boolean  cpgs, basic:boolean  symmetric_cpgs, data:seq:nucleotide  adapters, basic:integer  insert_size, basic:string  pair_orientation, basic:integer  read_length, basic:integer  min_map_quality, basic:integer  min_quality, basic:integer  coverage_cap, basic:integer  accumulation_cap, basic:integer  sample_size, basic:integer  min_quality, basic:integer  next_base_quality, basic:integer  min_lenght, basic:decimal  mismatch_rate)[Source: v2.2.0]

This WGBS pipeline is comprised of trimming, alignment, computation of methylation levels, identification of hypo-methylated regions (HMRs) and additional QC steps. First, reads are trimmed to remove adapters or kit specific artifacts. Reads are then aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package. QC steps are based on [Picard](http://broadinstitute.github.io/picard/) and include high level metrics about the alignment, WGS performance and summary statistics from bisulfite sequencing. Final QC reports are summarized by MultiQC.

Input arguments

reads
label:

Select sample(s)

type:

data:reads:fastq:single

walt_index
label:

Walt index

type:

data:index:walt

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

adapter_trimming.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform adapter trimming.

required:

False

adapter_trimming.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.

required:

False

disabled:

!adapter_trimming.adapters

adapter_trimming.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.

required:

False

disabled:

!adapter_trimming.adapters

trimming_filtering.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning, if below a threshold quality.

required:

False

trimming_filtering.trailing
label:

Trailing quality

type:

basic:integer

description:

Remove low quality bases from the end, if below a threshold quality.

required:

False

trimming_filtering.crop
label:

Crop

type:

basic:integer

description:

Cut the read to a specified length by removing bases from the end.

required:

False

trimming_filtering.headcrop
label:

Headcrop

type:

basic:integer

description:

Cut the specified number of bases from the start of the read.

required:

False

trimming_filtering.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

alignment.rm_dup
label:

Remove duplicates

type:

basic:boolean

default:

True

alignment.optical_distance
label:

Optical duplicate distance

type:

basic:integer

description:

The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.

disabled:

!alignment.rm_dup

default:

0

alignment.mismatch
label:

Maximum allowed mismatches

type:

basic:integer

default:

6

alignment.number
label:

Number of reads to map in one loop

type:

basic:integer

description:

Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.

required:

False

alignment.spikein_name
label:

Chromosome name of unmethylated control sequence

type:

basic:string

description:

Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.

required:

False

alignment.filter_spikein
label:

Remove control/spike-in sequences.

type:

basic:boolean

description:

Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).

disabled:

!alignment.spikein_name

default:

False

bsrate.skip
label:

Skip Bisulfite conversion rate step

type:

basic:boolean

description:

Bisulfite conversion rate step can be skipped. If separate alignment file for unmethylated control sequence is not produced during the alignment this process will fail.

disabled:

!alignment.spikein_name

default:

True

bsrate.sequence
label:

Unmethylated control sequence

type:

data:seq:nucleotide

required:

False

disabled:

bsrate.skip

bsrate.count_all
label:

Count all cytosines including CpGs

type:

basic:boolean

disabled:

bsrate.skip

default:

True

bsrate.read_length
label:

Average read length

type:

basic:integer

default:

150

bsrate.max_mismatch
label:

Maximum fraction of mismatches

type:

basic:decimal

required:

False

disabled:

bsrate.skip

bsrate.a_rich
label:

Reads are A-rich

type:

basic:boolean

disabled:

bsrate.skip

default:

False

methcounts.cpgs
label:

Only CpG context sites

type:

basic:boolean

description:

Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.

disabled:

methcounts.symmetric_cpgs

default:

False

methcounts.symmetric_cpgs
label:

Merge CpG pairs

type:

basic:boolean

description:

Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.

disabled:

methcounts.cpgs

default:

True

summary.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

required:

False

summary.insert_size
label:

Maximum insert size

type:

basic:integer

default:

100000

summary.pair_orientation
label:

Pair orientation

type:

basic:string

default:

null

choices:

  • Unspecified: null

  • FR: FR

  • RF: RF

  • TANDEM: TANDEM

wgs_metrics.read_length
label:

Average read length

type:

basic:integer

default:

150

wgs_metrics.min_map_quality
label:

Minimum mapping quality for a read to contribute coverage

type:

basic:integer

default:

20

wgs_metrics.min_quality
label:

Minimum base quality for a base to contribute coverage

type:

basic:integer

description:

N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.

default:

20

wgs_metrics.coverage_cap
label:

Maximum coverage cap

type:

basic:integer

description:

Treat positions with coverage exceeding this value as if they had coverage at this set value.

default:

250

wgs_metrics.accumulation_cap
label:

Ignore positions with coverage above this value

type:

basic:integer

description:

At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value

default:

100000

wgs_metrics.sample_size
label:

Sample Size used for Theoretical Het Sensitivity sampling

type:

basic:integer

default:

10000

rrbs_metrics.min_quality
label:

Threshold for base quality of a C base before it is considered

type:

basic:integer

default:

20

rrbs_metrics.next_base_quality
label:

Threshold for quality of a base next to a C before the C base is considered

type:

basic:integer

default:

10

rrbs_metrics.min_lenght
label:

Minimum read length

type:

basic:integer

default:

5

rrbs_metrics.mismatch_rate
label:

Maximum fraction of mismatches in a read to be considered (Between 0 and 1)

type:

basic:decimal

default:

0.1

Output results

WGS (paired-end) analysis

data:workflow:wgsworkflow-wgs-paired (data:reads:fastq:paired  reads, data:index:bwa  bwa_index, data:seq:nucleotide  ref_seq, list:data:variants:vcf  known_sites, data:variants:vcf  hc_dbsnp, basic:string  validation_stringency, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  min_adapter_length, basic:integer  palindrome_clip_threshold, basic:integer  leading, basic:integer  trailing, basic:integer  minlen, basic:integer  seed_l, basic:integer  band_w, basic:decimal  re_seeding, basic:boolean  m, basic:integer  match, basic:integer  mismatch, basic:integer  gap_o, basic:integer  gap_e, basic:integer  clipping, basic:integer  unpaired_p, basic:integer  report_tr, basic:boolean  skip, basic:boolean  remove_duplicates, basic:string  assume_sort_order, basic:string  read_group, data:seq:nucleotide  adapters, basic:integer  max_insert_size, basic:string  pair_orientation, basic:integer  read_length, basic:integer  min_map_quality, basic:integer  min_quality, basic:integer  coverage_cap, basic:integer  accumulation_cap, basic:integer  sample_size, basic:decimal  minimum_fraction, basic:boolean  include_duplicates, basic:decimal  deviations, basic:integer  stand_call_conf, basic:integer  mbq)[Source: v2.1.0]

Whole genome sequencing pipeline analyses paired-end whole genome sequencing data. It consists of trimming, aligning, marking of duplicates, Picard metrics, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Marking of duplicates (MarkDuplicates), Picard metrics (AlignmentSummaryMetrics, CollectWgsMetrics and InsertSizeMetrics), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. Result is a file of called variants (VCF).

Input arguments

reads
label:

Raw untrimmed reads

type:

data:reads:fastq:paired

description:

Raw paired-end reads.

bwa_index
label:

Genome index (BWA)

type:

data:index:bwa

description:

BWA genome index.

ref_seq
label:

Reference genome sequence

type:

data:seq:nucleotide

known_sites
label:

Known sites of variation used in BQSR

type:

list:data:variants:vcf

description:

Known sites of variation as a VCF file.

hc_dbsnp
label:

dbSNP for GATK4’s HaplotypeCaller

type:

data:variants:vcf

description:

dbSNP database of variants for variant calling.

validation_stringency
label:

Validation stringency

type:

basic:string

description:

Validation stringency for all BAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

default:

STRICT

choices:

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

advanced.trimming.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform adapter trimming. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.

required:

False

advanced.trimming.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.

required:

False

disabled:

!advanced.trimming.adapters

advanced.trimming.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.

required:

False

disabled:

!advanced.trimming.adapters

advanced.trimming.min_adapter_length
label:

Minimum adapter length

type:

basic:integer

description:

In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.

disabled:

!advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold

default:

8

advanced.trimming.palindrome_clip_threshold
label:

Palindrome clip threshold

type:

basic:integer

description:

Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.

required:

False

disabled:

!advanced.trimming.adapters

advanced.trimming.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning, if below a threshold quality.

required:

False

advanced.trimming.trailing
label:

Trailing quality

type:

basic:integer

description:

Remove low quality bases from the end, if below a threshold quality.

required:

False

advanced.trimming.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

advanced.align.seed_l
label:

Minimum seed length

type:

basic:integer

description:

Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.

default:

19

advanced.align.band_w
label:

Band width

type:

basic:integer

description:

Gaps longer than this will not be found.

default:

100

advanced.align.re_seeding
label:

Re-seeding factor

type:

basic:decimal

description:

Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

default:

1.5

advanced.align.m
label:

Mark shorter split hits as secondary

type:

basic:boolean

description:

Mark shorter split hits as secondary (for Picard compatibility)

default:

False

advanced.align.scoring.match
label:

Score of a match

type:

basic:integer

default:

1

advanced.align.scoring.mismatch
label:

Mismatch penalty

type:

basic:integer

default:

4

advanced.align.scoring.gap_o
label:

Gap open penalty

type:

basic:integer

default:

6

advanced.align.scoring.gap_e
label:

Gap extension penalty

type:

basic:integer

default:

1

advanced.align.scoring.clipping
label:

Clipping penalty

type:

basic:integer

description:

Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)

default:

5

advanced.align.scoring.unpaired_p
label:

Penalty for an unpaired read pair

type:

basic:integer

description:

Affinity to force pair. Score: scoreRead1+ scoreRead2-Penalty

default:

9

advanced.align.report_tr
label:

Report threshold score

type:

basic:integer

description:

Don’t output alignment with score lower than defined number. This option only affects output.

default:

30

advanced.markduplicates.skip
label:

Skip GATK’s MarkDuplicates step

type:

basic:boolean

default:

False

advanced.markduplicates.remove_duplicates
label:

Remove found duplicates

type:

basic:boolean

default:

False

advanced.markduplicates.assume_sort_order
label:

Assume sort oder

type:

basic:string

default:

choices:

  • as in BAM header (default):

  • unsorted: unsorted

  • queryname: queryname

  • coordinate: coordinate

  • duplicate: duplicate

  • unknown: unknown

advanced.bqsr.read_group
label:

Read group (@RG)

type:

basic:string

description:

This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields.

default:

-LB=NA;-PL=NA;-PU=NA;-SM=sample

advanced.summary.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

required:

False

advanced.summary.max_insert_size
label:

Maximum insert size

type:

basic:integer

default:

100000

advanced.summary.pair_orientation
label:

Pair orientation

type:

basic:string

default:

null

choices:

  • Unspecified: null

  • FR: FR

  • RF: RF

  • TANDEM: TANDEM

advanced.wgs_metrics.read_length
label:

Average read length

type:

basic:integer

default:

150

advanced.wgs_metrics.min_map_quality
label:

Minimum mapping quality for a read to contribute coverage

type:

basic:integer

default:

20

advanced.wgs_metrics.min_quality
label:

Minimum base quality for a base to contribute coverage

type:

basic:integer

description:

N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.

default:

20

advanced.wgs_metrics.coverage_cap
label:

Maximum coverage cap

type:

basic:integer

description:

Treat positions with coverage exceeding this value as if they had coverage at this set value.

default:

250

advanced.wgs_metrics.accumulation_cap
label:

Ignore positions with coverage above this value

type:

basic:integer

description:

At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value.

default:

100000

advanced.wgs_metrics.sample_size
label:

Sample Size used for Theoretical Het Sensitivity sampling

type:

basic:integer

default:

10000

advanced.insert_size.minimum_fraction
label:

Minimum fraction of reads in a category to be considered

type:

basic:decimal

description:

When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).

default:

0.05

advanced.insert_size.include_duplicates
label:

Include reads marked as duplicates in the insert size histogram

type:

basic:boolean

default:

False

advanced.insert_size.deviations
label:

Deviations limit

type:

basic:decimal

description:

Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS * MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.

default:

10.0

advanced.hc.stand_call_conf
label:

Min call confidence threshold

type:

basic:integer

description:

The minimum phred-scaled confidence threshold at which variants should be called.

default:

20

advanced.hc.mbq
label:

Min Base Quality

type:

basic:integer

description:

Minimum base quality required to consider a base for calling.

default:

20

Output results

WGS analysis (GVCF)

data:workflow:wgs:gvcf:workflow-wgs-gvcf (data:reads:fastq:paired  reads, data:alignment:bam  aligned_reads, data:seq:nucleotide  ref_seq, data:index:bwamem2  bwa_index, list:data:variants:vcf  known_sites, basic:boolean  enable_trimming, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  min_adapter_length, basic:integer  palindrome_clip_threshold, basic:integer  leading, basic:integer  trailing, basic:integer  minlen, data:bed  intervals, basic:integer  contamination, data:seq:nucleotide  adapters, basic:integer  max_insert_size, basic:string  pair_orientation, basic:integer  read_length, basic:integer  min_map_quality, basic:integer  min_quality, basic:integer  coverage_cap, basic:integer  accumulation_cap, basic:integer  sample_size, basic:decimal  minimum_fraction, basic:boolean  include_duplicates, basic:decimal  deviations)[Source: v2.3.0]

Whole genome sequencing pipeline (GATK GVCF). The pipeline follows GATK best practices recommendations and prepares single-sample paired-end sequencing data for a joint-genotyping step. The pipeline steps include read trimming (Trimmomatic), read alignment (BWA-MEM2), marking of duplicates (Picard MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (GATK HaplotypeCaller in GVCF mode). The QC reports (FASTQC report, Picard AlignmentSummaryMetrics, CollectWgsMetrics and InsertSizeMetrics) are summarized using MultiQC.

Input arguments

reads
label:

Input sample (FASTQ)

type:

data:reads:fastq:paired

description:

Input data in FASTQ format. This input type allows for optional read trimming procedure and is mutually exclusive with the BAM input file type.

required:

False

disabled:

aligned_reads

hidden:

False

aligned_reads
label:

Input sample (BAM)

type:

data:alignment:bam

description:

Input data in BAM format. This input file type is mutually exclusive with the FASTQ input file type and does not allow for read trimming procedure.

required:

False

disabled:

reads

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

bwa_index
label:

BWA genome index

type:

data:index:bwamem2

required:

True

disabled:

False

hidden:

False

known_sites
label:

Known sites of variation (VCF)

type:

list:data:variants:vcf

required:

True

disabled:

False

hidden:

False

trimming_options.enable_trimming
label:

Trim and quality filter input data

type:

basic:boolean

description:

Enable or disable adapter trimming and QC filtering procedure.

required:

True

disabled:

False

hidden:

False

default:

False

trimming_options.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequences in FASTA format that will be removed from the reads.

required:

False

disabled:

!trimming_options.enable_trimming

hidden:

False

trimming_options.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.

required:

False

disabled:

!trimming_options.adapters

hidden:

False

trimming_options.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter sequence must be against a read. This field is required to perform adapter trimming.

required:

False

disabled:

!trimming_options.adapters

hidden:

False

trimming_options.min_adapter_length
label:

Minimum adapter length

type:

basic:integer

description:

In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.

required:

True

disabled:

!trimming_options.seed_mismatches && !trimming_options.simple_clip_threshold && !trimming_options.palindrome_clip_threshold

hidden:

False

default:

8

trimming_options.palindrome_clip_threshold
label:

Palindrome clip threshold

type:

basic:integer

description:

Specifies how accurate the match between the two adapter ligated reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.

required:

False

disabled:

!trimming_options.adapters

hidden:

False

trimming_options.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning, if below a threshold quality.

required:

False

disabled:

!trimming_options.enable_trimming

hidden:

False

trimming_options.trailing
label:

Trailing quality

type:

basic:integer

description:

Remove low quality bases from the end, if below a threshold quality.

required:

False

disabled:

!trimming_options.enable_trimming

hidden:

False

trimming_options.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

disabled:

!trimming_options.enable_trimming

hidden:

False

gatk_options.intervals
label:

Intervals BED file

type:

data:bed

description:

Use intervals BED file to limit the analysis to the specified parts of the genome.

required:

False

disabled:

False

hidden:

False

gatk_options.contamination
label:

Contamination fraction

type:

basic:integer

description:

Fraction of contamination in sequencing data (for all samples) to aggressively remove.

required:

True

disabled:

False

hidden:

False

default:

0

alignment_summary.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

required:

False

disabled:

False

hidden:

False

alignment_summary.max_insert_size
label:

Maximum insert size

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

100000

alignment_summary.pair_orientation
label:

Pair orientation

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

null

choices:

  • Unspecified: null

  • FR: FR

  • RF: RF

  • TANDEM: TANDEM

wgs_metrics.read_length
label:

Average read length

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

150

wgs_metrics.min_map_quality
label:

Minimum mapping quality for a read to contribute coverage

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

20

wgs_metrics.min_quality
label:

Minimum base quality for a base to contribute coverage

type:

basic:integer

description:

N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.

required:

True

disabled:

False

hidden:

False

default:

20

wgs_metrics.coverage_cap
label:

Maximum coverage cap

type:

basic:integer

description:

Treat positions with coverage exceeding this value as if they had coverage at this set value.

required:

True

disabled:

False

hidden:

False

default:

250

wgs_metrics.accumulation_cap
label:

Ignore positions with coverage above this value

type:

basic:integer

description:

At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value.

required:

True

disabled:

False

hidden:

False

default:

100000

wgs_metrics.sample_size
label:

Sample size used for Theoretical Het Sensitivity sampling

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

10000

insert_size.minimum_fraction
label:

Minimum fraction of reads in a category to be considered

type:

basic:decimal

description:

When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).

required:

True

disabled:

False

hidden:

False

default:

0.05

insert_size.include_duplicates
label:

Include reads marked as duplicates in the insert size histogram

type:

basic:boolean

required:

True

disabled:

False

hidden:

False

default:

False

insert_size.deviations
label:

Deviations limit

type:

basic:decimal

description:

Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS * MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.

required:

True

disabled:

False

hidden:

False

default:

10.0

Output results

WGS preprocess data with bwa-mem2

data:alignment:bam:wgsbwa2:wgs-preprocess-bwa2 (data:reads:fastq:paired  reads, data:alignment:bam  aligned_reads, data:seq:nucleotide  ref_seq, data:index:bwamem2  bwa_index, list:data:variants:vcf  known_sites, basic:integer  pixel_distance, basic:integer  n_jobs)[Source: v1.4.0]

Prepare analysis ready BAM file. This process follows GATK best practices procedure to prepare analysis-ready BAM file. The steps included are read alignment using BWA MEM2, marking of duplicates (Picard MarkDuplicates), BAM sorting, read-group assignment and base quality score recalibration (BQSR).

Input arguments

reads
label:

Input sample (FASTQ)

type:

data:reads:fastq:paired

required:

False

disabled:

False

hidden:

False

aligned_reads
label:

Input sample (BAM)

type:

data:alignment:bam

required:

False

disabled:

False

hidden:

False

ref_seq
label:

Reference sequence

type:

data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

bwa_index
label:

BWA-MEM2 genome index

type:

data:index:bwamem2

required:

True

disabled:

False

hidden:

False

known_sites
label:

Known sites of variation (VCF)

type:

list:data:variants:vcf

required:

True

disabled:

False

hidden:

False

advanced_options.pixel_distance
label:

–OPTICAL_DUPLICATE_PIXEL_DISTANCE

type:

basic:integer

description:

Set the optical pixel distance, e.g. distance between clusters. Modify this parameter to ensure compatibility with older Illumina platforms.

required:

True

disabled:

False

hidden:

False

default:

2500

advanced_options.n_jobs
label:

Number of concurent jobs

type:

basic:integer

description:

Use a fixed number of jobs for quality score recalibration of determining it based on the number of available cores.

required:

False

disabled:

False

hidden:

False

Output results

bam
label:

Analysis ready BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

BAM file index

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

metrics_file
label:

Metrics from MarkDuplicate process

type:

basic:file

required:

True

disabled:

False

hidden:

False

Whole exome sequencing (WES) analysis

data:workflow:wesworkflow-wes (data:reads:fastq:paired  reads, data:index:bwa  bwa_index, data:seq:nucleotide  ref_seq, list:data:variants:vcf  known_sites, data:bed  intervals, data:variants:vcf  hc_dbsnp, basic:string  validation_stringency, data:seq:nucleotide  adapters, basic:integer  seed_mismatches, basic:integer  simple_clip_threshold, basic:integer  min_adapter_length, basic:integer  palindrome_clip_threshold, basic:integer  leading, basic:integer  trailing, basic:integer  minlen, basic:integer  seed_l, basic:integer  band_w, basic:boolean  m, basic:decimal  re_seeding, basic:integer  match, basic:integer  mismatch, basic:integer  gap_o, basic:integer  gap_e, basic:integer  clipping, basic:integer  unpaired_p, basic:integer  report_tr, data:bedpe  bedpe, basic:boolean  skip, basic:boolean  md_skip, basic:boolean  md_remove_duplicates, basic:string  md_assume_sort_order, basic:string  read_group, basic:integer  stand_call_conf, basic:integer  mbq)[Source: v3.1.0]

Whole exome sequencing pipeline analyzes Illumina panel data. It consists of trimming, aligning, soft clipping, (optional) marking of duplicates, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Soft clipping of Illumina primer sequences is done using bamclipper tool. Marking of duplicates (MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. To successfully run this pipeline, you will need a genome (FASTA), paired-end (FASTQ) files, BEDPE file for bamclipper, known sites of variation (dbSNP) (VCF), dbSNP database of variations (can be the same as known sites of variation), intervals on which target capture was done (BED) and illumina adapter sequences (FASTA). Make sure that specified resources match the genome used in the alignment step. Result is a file of called variants (VCF).

Input arguments

reads
label:

Raw untrimmed reads

type:

data:reads:fastq:paired

description:

Raw paired-end reads.

bwa_index
label:

BWA genome index

type:

data:index:bwa

description:

Genome index used for the BWA alignment step.

ref_seq
label:

Genome FASTA

type:

data:seq:nucleotide

description:

The selection of Genome FASTA should match the BWA index species and genome build type.

known_sites
label:

Known sites of variation used in BQSR

type:

list:data:variants:vcf

description:

Known sites of variation as a VCF file.

intervals
label:

Intervals

type:

data:bed

description:

Use intervals to narrow the analysis to defined regions. This usually help cutting down on process time.

hc_dbsnp
label:

dbSNP for GATK4’s HaplotypeCaller

type:

data:variants:vcf

description:

dbSNP database of variants for variant calling.

validation_stringency
label:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.

type:

basic:string

default:

STRICT

choices:

  • STRICT: STRICT

  • SILENT: SILENT

  • LENIENT: LENIENT

advanced.trimming.adapters
label:

Adapter sequences

type:

data:seq:nucleotide

description:

Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.

required:

False

advanced.trimming.seed_mismatches
label:

Seed mismatches

type:

basic:integer

description:

Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.

required:

False

disabled:

!advanced.trimming.adapters

advanced.trimming.simple_clip_threshold
label:

Simple clip threshold

type:

basic:integer

description:

Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.

required:

False

disabled:

!advanced.trimming.adapters

advanced.trimming.min_adapter_length
label:

Minimum adapter length

type:

basic:integer

description:

In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.

disabled:

!advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold

default:

8

advanced.trimming.palindrome_clip_threshold
label:

Palindrome clip threshold

type:

basic:integer

description:

Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminaclipping.

required:

False

disabled:

!advanced.trimming.adapters

advanced.trimming.leading
label:

Leading quality

type:

basic:integer

description:

Remove low quality bases from the beginning, if below a threshold quality.

required:

False

advanced.trimming.trailing
label:

Trailing quality

type:

basic:integer

description:

Remove low quality bases from the end, if below a threshold quality.

required:

False

advanced.trimming.minlen
label:

Minimum length

type:

basic:integer

description:

Drop the read if it is below a specified length.

required:

False

advanced.align.seed_l
label:

Minimum seed length

type:

basic:integer

description:

Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.

default:

19

advanced.align.band_w
label:

Band width

type:

basic:integer

description:

Gaps longer than this will not be found.

default:

100

advanced.align.m
label:

Mark shorter split hits as secondary

type:

basic:boolean

description:

Mark shorter split hits as secondary (for Picard compatibility)

default:

False

advanced.align.re_seeding
label:

Re-seeding factor

type:

basic:decimal

description:

Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

default:

1.5

advanced.align.scoring.match
label:

Score of a match

type:

basic:integer

default:

1

advanced.align.scoring.mismatch
label:

Mismatch penalty

type:

basic:integer

default:

4

advanced.align.scoring.gap_o
label:

Gap open penalty

type:

basic:integer

default:

6

advanced.align.scoring.gap_e
label:

Gap extension penalty

type:

basic:integer

default:

1

advanced.align.scoring.clipping
label:

Clipping penalty

type:

basic:integer

description:

Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)

default:

5

advanced.align.scoring.unpaired_p
label:

Penalty for an unpaired read pair

type:

basic:integer

description:

Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty

default:

9

advanced.align.report_tr
label:

Report threshold score

type:

basic:integer

description:

Don’t output alignment with score lower than defined number. This option only affects output.

default:

30

advanced.bamclipper.bedpe
label:

BEDPE file used for clipping using Bamclipper

type:

data:bedpe

description:

BEDPE file used for clipping using Bamclipper tool.

required:

False

advanced.bamclipper.skip
label:

Skip Bamclipper step

type:

basic:boolean

description:

Use this option to skip Bamclipper step.

default:

False

advanced.markduplicates.md_skip
label:

Skip GATK’s MarkDuplicates step

type:

basic:boolean

default:

False

advanced.markduplicates.md_remove_duplicates
label:

Remove found duplicates

type:

basic:boolean

default:

False

advanced.markduplicates.md_assume_sort_order
label:

Assume sort oder

type:

basic:string

default:

choices:

  • as in BAM header (default):

  • unsorted: unsorted

  • queryname: queryname

  • coordinate: coordinate

  • duplicate: duplicate

  • unknown: unknown

advanced.bqsr.read_group
label:

Read group (@RG)

type:

basic:string

description:

If BAM file has not been prepared using a @RG tag, you can add it here. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation linked above.

required:

False

advanced.hc.stand_call_conf
label:

Min call confidence threshold

type:

basic:integer

description:

The minimum phred-scaled confidence threshold at which variants should be called.

default:

20

advanced.hc.mbq
label:

Min Base Quality

type:

basic:integer

description:

Minimum base quality required to consider a base for calling.

default:

20

Output results

Xengsort classify

data:xengsort:classification:xengsort-classify (data:reads:fastq  reads, data:xengsort:index  index, basic:string  upload_reads, basic:boolean  merge_both, basic:decimal  chunksize)[Source: v1.0.0]

Classify xenograft reads with Xengsort. Xengsort is an alignment free method for sorting reads from xenograft experiments. It classifies sequencing reads into five categories based on their origin: host, graft, both, neither, and ambiguous. Categories “host” and “graft” are for reads that can be clearly assigned to one of the species. Category “both” is for reads that match equally well to both references. Category “neither” is for reads that contain many k-mers that cannot be found in the key-value store; these could point to technical problems (primer dimers) or contamination of the sample with other species. Finally, category “ambiguous” is for reads that provide conflicting information. Such reads should not usually be seen; they could result from PCR hybrids between host and graft during library preparation. Description of the method and evaluation on several datasets is provided in the [article](https://doi.org/10.1186/s13015-021-00181-w).

Input arguments

reads
label:

Reads

type:

data:reads:fastq

required:

True

disabled:

False

hidden:

False

index
label:

Xengsort genome index

type:

data:xengsort:index

required:

True

disabled:

False

hidden:

False

upload_reads
label:

Select reads to upload

type:

basic:string

description:

All read categories are returned in this process but only the ones selected are uploaded as separate FASTQ files. This should be used for categories of reads that will be used in further analyses.

required:

True

disabled:

False

hidden:

False

default:

none

choices:

  • none: none

  • all: all

  • graft: graft

  • graft, both: graft, both

  • graft, host: graft, host

  • graft, host, both: graft, host, both

merge_both
label:

Upload merged graft and both reads

type:

basic:boolean

description:

Merge graft reads with the reads that can originate from both genomes and upload it as graft reads. In any workflow, the latter reads, classified as both may pose problems, because one may not be able to decide on the species of origin due to ultra-conserved regions between species.

required:

True

disabled:

False

hidden:

upload_reads == ‘none’

default:

False

advanced.chunksize
label:

Chunk size in MB [–chunksize]

type:

basic:decimal

description:

Controll the memory usage by setting chunk size per thread.

required:

True

disabled:

False

hidden:

False

default:

16.0

Output results

stats
label:

Xengsort classification statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

host1
label:

Host reads (mate 1)

type:

basic:file

required:

True

disabled:

False

hidden:

False

host2
label:

Host reads (mate 2)

type:

basic:file

required:

False

disabled:

False

hidden:

False

graft1
label:

Graft reads (mate 1)

type:

basic:file

required:

True

disabled:

False

hidden:

False

graft2
label:

Graft reads (mate 2)

type:

basic:file

required:

False

disabled:

False

hidden:

False

both1
label:

Both reads (mate 1)

type:

basic:file

required:

True

disabled:

False

hidden:

False

both2
label:

Both reads (mate 2)

type:

basic:file

required:

False

disabled:

False

hidden:

False

neither1
label:

Neither reads (mate 1)

type:

basic:file

required:

True

disabled:

False

hidden:

False

neither2
label:

Neither reads (mate 2)

type:

basic:file

required:

False

disabled:

False

hidden:

False

ambiguous1
label:

Ambiguous reads (mate 1)

type:

basic:file

required:

True

disabled:

False

hidden:

False

ambiguous2
label:

Ambiguous reads (mate 2)

type:

basic:file

required:

False

disabled:

False

hidden:

False

graft_species
label:

Graft species

type:

basic:string

required:

True

disabled:

False

hidden:

False

graft_build
label:

Graft build

type:

basic:string

required:

True

disabled:

False

hidden:

False

host_species
label:

Host species

type:

basic:string

required:

True

disabled:

False

hidden:

False

host_build
label:

Host build

type:

basic:string

required:

True

disabled:

False

hidden:

False

Xengsort index

data:xengsort:index:xengsort-index (list:data:seq:nucleotide  graft_refs, list:data:seq:nucleotide  host_refs, basic:integer  n_kmer, basic:integer  kmer_size, basic:boolean  aligned_cache, basic:boolean  fixed_hashing, basic:integer  page_size, basic:decimal  fill)[Source: v1.0.1]

Build an index for sorting xenograft reads with Xengsort. Xengsort is an alignment free method for sorting reads from xenograft experiments. Description of the method and evaluation on several datasets is provided in the [article](https://doi.org/10.1186/s13015-021-00181-w).

Input arguments

graft_refs
label:

Graft reference sequences (nucleotide FASTA)

type:

list:data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

host_refs
label:

Host reference sequences (nucleotide FASTA)

type:

list:data:seq:nucleotide

required:

True

disabled:

False

hidden:

False

n_kmer
label:

Number of distinct k-mers [–nobjects]

type:

basic:integer

description:

The number of k-mers that will be stored in the hash table. This depends on the used reference genomes and must be estimated beforehand. If the number of distinct k-mers is known beforehand it should be specified. For all 25-mers in the human and mouse genome and transcriptome, this number is roughly 4,500,000,000. If this is not set, the number is estimated with ntCard tool and increased by two percent to account for errors.

required:

False

disabled:

False

hidden:

False

advanced.kmer_size
label:

k-mer size [–kmersize]

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

25

advanced.aligned_cache
label:

Use power-of-two aligned pages [–aligned]

type:

basic:boolean

description:

Indicates whether each bucket should consume a number of bits that is a power of 2. Using –aligned ensures that each bucket stays within the same cache line, but may waste space (padding bits), yielding faster speed but larger space requirements. By default no bits are used for padding and buckets may cross cache line boundaries [–unaligned]. This is slightly slower, but may save a little or a lot of space.

required:

True

disabled:

False

hidden:

False

default:

False

advanced.fixed_hashing
label:

Use fixed hash function [–hashfunctions]

type:

basic:boolean

description:

Hash function used to store the key-value pairs is defined by –hashfunction parameter. With this option selected a fixed hash function (linear945:linear9123641:linear349341847) is used. When this is not selected a different random functions are chosen each time. It is recommended to have them chosen randomly unless you need strictly reproducible behavior.

required:

True

disabled:

False

hidden:

False

default:

True

advanced.page_size
label:

Number of elements stored in one bucket (or page) [–pagesize]

type:

basic:integer

required:

True

disabled:

False

hidden:

False

default:

4

advanced.fill
label:

Fill rate of the hash table [–fill]

type:

basic:decimal

description:

This determines the desired fill rate or load factor of the hash table. It should be set between 0.0 and 1.0. It is beneficial to leave part of the hash table empty for faster lookups. Together with the number of distinct k-mers [–nobjects], the number of slots in the table is calculated as ceil(nobjects/fill).

required:

True

disabled:

False

hidden:

False

default:

0.88

Output results

index
label:

Xengsort index

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Xengsort statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

graft_species
label:

Graft species

type:

basic:string

required:

True

disabled:

False

hidden:

False

graft_build
label:

Graft build

type:

basic:string

required:

True

disabled:

False

hidden:

False

host_species
label:

Host species

type:

basic:string

required:

True

disabled:

False

hidden:

False

host_build
label:

Host build

type:

basic:string

required:

True

disabled:

False

hidden:

False

alignmentSieve

data:alignment:bam:sieve:alignmentsieve (data:alignment:bam  alignment, basic:integer  min_fragment_length, basic:integer  max_fragment_length)[Source: v1.5.3]

Filter alignments of BAM files according to specified parameters. Program is bundled with deeptools. See [documentation]( https://deeptools.readthedocs.io/en/develop/content/tools/alignmentSieve.html) for more details.

Input arguments

alignment
label:

Alignment BAM file

type:

data:alignment:bam

required:

True

disabled:

False

hidden:

False

min_fragment_length
label:

–minFragmentLength

type:

basic:integer

description:

The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)

required:

True

disabled:

False

hidden:

False

default:

0

max_fragment_length
label:

–maxFragmentLength

type:

basic:integer

description:

The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. (Default: 0)

required:

True

disabled:

False

hidden:

False

default:

0

Output results

bam
label:

Sieved BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

bai
label:

Index of sieved BAM file

type:

basic:file

required:

True

disabled:

False

hidden:

False

stats
label:

Alignment statistics

type:

basic:file

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

edgeR

data:differentialexpression:edger:differentialexpression-edger (list:data:expression  case, list:data:expression  control, basic:integer  count_filter, basic:boolean  create_sets, basic:decimal  logfc, basic:decimal  fdr)[Source: v1.7.0]

Run EdgeR analysis. Empirical Analysis of Digital Gene Expression Data in R (edgeR). Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. See [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) for more information.

Input arguments

case
label:

Case

type:

list:data:expression

description:

Case samples (replicates)

required:

True

disabled:

False

hidden:

False

control
label:

Control

type:

list:data:expression

description:

Control samples (replicates)

required:

True

disabled:

False

hidden:

False

count_filter
label:

Raw counts filtering threshold

type:

basic:integer

description:

Filter genes in the expression matrix input. Remove genes where the number of counts in all samples is below the threshold.

required:

True

disabled:

False

hidden:

False

default:

10

create_sets
label:

Create gene sets

type:

basic:boolean

description:

After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.

required:

True

disabled:

False

hidden:

False

default:

False

logfc
label:

Log2 fold change threshold for gene sets

type:

basic:decimal

description:

Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.

required:

True

disabled:

False

hidden:

!create_sets

default:

1.0

fdr
label:

FDR threshold for gene sets

type:

basic:decimal

required:

True

disabled:

False

hidden:

!create_sets

default:

0.05

Output results

raw
label:

Differential expression

type:

basic:file

required:

True

disabled:

False

hidden:

False

de_json
label:

Results table (JSON)

type:

basic:json

required:

True

disabled:

False

hidden:

False

de_file
label:

Results table (file)

type:

basic:file

required:

True

disabled:

False

hidden:

False

source
label:

Gene ID database

type:

basic:string

required:

True

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

feature_type
label:

Feature type

type:

basic:string

required:

True

disabled:

False

hidden:

False

methcounts

data:wgbs:methcountsmethcounts (data:seq:nucleotide  genome, data:alignment:bam:walt  alignment, basic:boolean  cpgs, basic:boolean  symmetric_cpgs)[Source: v3.3.0]

The methcounts program takes the mapped reads and produces the methylation level at each genomic cytosine, with the option to produce only levels for CpG-context cytosines.

Input arguments

genome
label:

Reference genome

type:

data:seq:nucleotide

alignment
label:

Mapped reads

type:

data:alignment:bam:walt

description:

WGBS alignment file in Mapped Read (.mr) format.

cpgs
label:

Only CpG context sites

type:

basic:boolean

description:

Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.

disabled:

symmetric_cpgs

default:

False

symmetric_cpgs
label:

Merge CpG pairs

type:

basic:boolean

description:

Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.

disabled:

cpgs

default:

True

Output results

meth
label:

Methylation levels

type:

basic:file

stats
label:

Statistics

type:

basic:file

bigwig
label:

Methylation levels BigWig file

type:

basic:file

species
label:

Species

type:

basic:string

build
label:

Build

type:

basic:string

miRNA pipeline

data:workflow:mirnaworkflow-mirna (data:reads:fastq:single  reads, data:seq:nucleotide  up_primers_file, data:seq:nucleotide  down_primers_file, list:basic:string  up_primers_seq, list:basic:string  down_primers_seq, basic:integer  min_overlap, basic:boolean  show_advanced, basic:integer  leading, basic:integer  trailing, basic:integer  minlen, basic:integer  maxlen, basic:integer  max_n, basic:boolean  match_read_wildcards, basic:boolean  no_indels, basic:decimal  error_rate, data:index:bowtie2  genome, basic:boolean  show_alignment_options, basic:string  mode, basic:string  speed, basic:integer  N, basic:integer  L, basic:string  rep_mode, basic:integer  k_reports, data:annotation  annotation, basic:string  id_attribute, basic:string  feature_class, basic:string  normalization_type, basic:boolean  allow_multi_overlap, basic:boolean  count_multi_mapping_reads, basic:string  assay_type)[Source: v3.1.0]

Input arguments

preprocessing.reads
label:

Input miRNA reads.

type:

data:reads:fastq:single

preprocessing.adapters.up_primers_file
label:

5 prime adapter file

type:

data:seq:nucleotide

required:

False

preprocessing.adapters.down_primers_file
label:

3 prime adapter file

type:

data:seq:nucleotide

required:

False

preprocessing.adapters.up_primers_seq
label:

5 prime adapter sequence

type:

list:basic:string

required:

False

preprocessing.adapters.down_primers_seq
label:

3 prime adapter sequence

type:

list:basic:string

required:

False

preprocessing.adapters.min_overlap
label:

Minimal overlap

type:

basic:integer

description:

Minimum overlap for an adapter match. Default 5.

default:

5

preprocessing.show_advanced
label:

Show advanced preprocessing parameters

type:

basic:boolean

default:

False

preprocessing.trimming.leading
label:

Quality on 5 prime

type:

basic:integer

description:

Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. Default: 20.

hidden:

!preprocessing.show_advanced

default:

28

preprocessing.trimming.trailing
label:

Quality on 3 prime

type:

basic:integer

description:

Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. Default: 20.

hidden:

!preprocessing.show_advanced

default:

28

preprocessing.filtering.minlen
label:

Min length

type:

basic:integer

description:

Drop the read if it is below a specified length. Default: 15.

hidden:

!preprocessing.show_advanced

default:

15

preprocessing.filtering.maxlen
label:

Max length

type:

basic:integer

description:

Drop the read if it is above a specified length. Default: 35.

hidden:

!preprocessing.show_advanced

default:

35

preprocessing.filtering.max_n
label:

Max numebr of N-s

type:

basic:integer

description:

Discard reads having more ‘N’ bases than specified. Default: 1.

hidden:

!preprocessing.show_advanced

default:

1

preprocessing.filtering.match_read_wildcards
label:

Match read wildcards

type:

basic:boolean

description:

Interpret IUPAC wildcards in reads.

hidden:

!preprocessing.show_advanced

default:

True

preprocessing.filtering.no_indels
label:

No indels

type:

basic:boolean

description:

Disable (disallow) insertions and deletions in adapters.

hidden:

!preprocessing.show_advanced

default:

True

preprocessing.filtering.error_rate
label:

Error rate

type:

basic:decimal

description:

Maximum allowed error rate (no. of errors divided by the length of the matching region). Default: 0.2.

hidden:

!preprocessing.show_advanced

default:

0.2

alignment.genome
label:

Genome reference

type:

data:index:bowtie2

description:

Choose the genome reference against which to align reads.

alignment.show_alignment_options
label:

Show alignment options

type:

basic:boolean

default:

False

alignment.alignment_options.mode
label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score. Default: –local (with sensitivity set to ‘–very-sensitive’ for both options).

hidden:

!alignment.show_alignment_options

default:

--local

choices:

  • local: --local

  • end to end mode: --end-to-end

alignment.alignment_options.speed
label:

Sensitivity

type:

basic:string

description:

A quick parameter presetting for aligning accurately. This option is a shortcut for parameters as follows: For both alignment modes: –very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

hidden:

!alignment.show_alignment_options

default:

--very-sensitive

alignment.alignment_options.N
label:

Number of mismatches allowed in seed alignment (N)

type:

basic:integer

description:

Sets the number of mismatches allowed in seed. Can be set to 0 or 1. Default: 0

hidden:

!alignment.show_alignment_options

default:

0

alignment.alignment_options.L
label:

Length of seed substrings (L)

type:

basic:integer

description:

Sets the length of the seed substrings to align during multiseed alignment. The –very-sensitive preset sets -L to 20 in –end-to-end and in –local mode. For miRNA, a shorter seed length is recommended. Default: -L 8

hidden:

!alignment.show_alignment_options

default:

8

alignment.alignment_options.rep_mode
label:

Report mode

type:

basic:string

description:

Tool default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments. Default: -k

hidden:

!alignment.show_alignment_options

default:

k

choices:

  • Tool default mode: def

  • -k mode: k

  • -a mode (very slow): a

alignment.alignment_options.k_reports
label:

Number of reports (for -k mode only)

type:

basic:integer

description:

Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. Default: 5

hidden:

!alignment.show_alignment_options

default:

5

quant_options.annotation
label:

Annotation (GTF/GFF3)

type:

data:annotation

quant_options.id_attribute
label:

ID attribute

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats. miRNA name refers to the miRBase GFF3 ‘Name’ filed and is the default option.

default:

Name

choices:

  • miRNA name: Name

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

quant_options.feature_class
label:

Feature class

type:

basic:string

description:

Feature class (3rd column in GFF file) to be used, all features of other types are ignored. Default: miRNA.

default:

miRNA

quant_options.normalization_type
label:

Normalization type

type:

basic:string

description:

The default expression normalization type.

default:

CPM

quant_options.allow_multi_overlap
label:

Count multi-overlapping reads

type:

basic:boolean

description:

Assign reads to all their overlapping features or meta-features.

default:

True

quant_options.count_multi_mapping_reads
label:

Count multi-mapping reads

type:

basic:boolean

description:

For a multi-mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM input is used to detect multi-mapping reads.

default:

True

assay_type
label:

Assay type

type:

basic:string

description:

Indicate if strand-specific read counting should be performed. In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay, the read has to be mapped to the same strand as the feature. In strand-specific reverse assay these rules are reversed.

choices:

  • Strand non-specific: non_specific

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

Output results

shRNA quantification

data:workflow:trimalquantworkflow-trim-align-quant (data:reads:fastq:single  reads, list:basic:string  up_primers_seq, list:basic:string  down_primers_seq, basic:decimal  error_rate_5end, basic:decimal  error_rate_3end, data:index:bowtie2  genome, basic:string  mode, basic:integer  N, basic:integer  L, basic:integer  gbar, basic:string  mp, basic:string  rdg, basic:string  rfg, basic:string  score_min, basic:integer  readlengths, basic:integer  alignscores)[Source: v1.1.0]

Input arguments

reads
label:

Untrimmed reads.

type:

data:reads:fastq:single

description:

First stage of shRNA pipeline. Trims 5’ adapters, then 3’ adapters using the same error rate setting, aligns reads to a reference library and quantifies species.

trimming_options.up_primers_seq
label:

5’ adapter sequence

type:

list:basic:string

description:

A string of 5’ adapter sequence.

required:

True

trimming_options.down_primers_seq
label:

3’ adapter sequence

type:

list:basic:string

description:

A string of 3’ adapter sequence.

required:

True

trimming_options.error_rate_5end
label:

Error rate for 5’

type:

basic:decimal

description:

Maximum allowed error rate (no. of errors divided by the length of the matching region) for 5’ trimming.

required:

False

default:

0.1

trimming_options.error_rate_3end
label:

Error rate for 3’

type:

basic:decimal

description:

Maximum allowed error rate (no. of errors divided by the length of the matching region) for 3’ trimming.

required:

False

default:

0.1

alignment_options.genome
label:

Reference library

type:

data:index:bowtie2

description:

Choose the reference library against which to align reads.

alignment_options.mode
label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--end-to-end

choices:

  • end to end mode: --end-to-end

  • local: --local

alignment_options.N
label:

Number of mismatches allowed in seed alignment (N)

type:

basic:integer

description:

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

required:

False

alignment_options.L
label:

Length of seed substrings (L)

type:

basic:integer

description:

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.

required:

False

alignment_options.gbar
label:

Disallow gaps within positions (gbar)

type:

basic:integer

description:

Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.

required:

False

alignment_options.mp
label:

Maximal and minimal mismatch penalty (mp)

type:

basic:string

description:

Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.

required:

False

alignment_options.rdg
label:

Set read gap open and extend penalties (rdg)

type:

basic:string

description:

Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.

required:

False

alignment_options.rfg
label:

Set reference gap open and close penalties (rfg)

type:

basic:string

description:

Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.

required:

False

alignment_options.score_min
label:

Minimum alignment score needed for “valid” alignment (score-min)

type:

basic:string

description:

Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.

required:

False

quant_options.readlengths
label:

Species lengths threshold

type:

basic:integer

description:

Species with read lengths below specified threshold will be removed from final output. Default is no removal.

quant_options.alignscores
label:

Align scores filter threshold

type:

basic:integer

description:

Species with align score below specified threshold will be removed from final output. Default is no removal.

Output results

snpEff (General variant annotation) (multi-sample)

data:variants:vcf:snpeff:snpeff (data:variants:vcf  variants, basic:string  database, data:variants:vcf  dbsnp, basic:string  filtering_options, list:data:geneset  sets, list:basic:string  extract_fields, basic:boolean  one_per_line)[Source: v1.1.1]

Annotate variants with SnpEff. SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes). This process also allows filtering of variants with ``SnpSift filter`` command and extracting specific fields from the VCF file with ``SnpSift extractFields`` command. This tool works with multi-sample VCF file as an input.

Input arguments

variants
label:

Variants (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

database
label:

snpEff database

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

GRCh38.99

choices:

  • GRCh37.75: GRCh37.75

  • GRCh38.99: GRCh38.99

dbsnp
label:

Known variants

type:

data:variants:vcf

description:

List of known variants for annotation.

required:

False

disabled:

False

hidden:

False

filtering_options
label:

Filtering expressions

type:

basic:string

description:

Filter VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)

required:

False

disabled:

False

hidden:

False

sets
label:

Files with list of genes

type:

list:data:geneset

description:

Use list of genes, if you only want variants reported for them. Each file must have one string per line.

required:

False

disabled:

False

hidden:

!filtering_options

extract_fields
label:

Fields to extract

type:

list:basic:string

description:

Write fields you want to extract from annonated vcf file and press Enter after each one. Example of fields: `CHROM POS REF ALT ‘ANN[*].GENE’`. For more information follow this [link](https://pcingola.github.io/SnpEff/ss_extractfields/).

required:

False

disabled:

False

hidden:

False

advanced.one_per_line
label:

One effect per line

type:

basic:boolean

description:

If there is more than one effect per variant, write them to seperate lines.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

vcf
label:

Annotated variants (VCF)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Index of annotated variants

type:

basic:file

required:

True

disabled:

False

hidden:

False

vcf_extracted
label:

Extracted annotated variants (VCF)

type:

basic:file

required:

False

disabled:

False

hidden:

False

tbi_extracted
label:

Index of extracted variants

type:

basic:file

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

genes
label:

SnpEff genes

type:

basic:file

required:

True

disabled:

False

hidden:

False

summary
label:

Summary

type:

basic:file:html

required:

True

disabled:

False

hidden:

False

snpEff (General variant annotation) (single-sample)

data:variants:vcf:snpeff:single:snpeff-single (data:variants:vcf  variants, basic:string  database, data:variants:vcf  dbsnp, basic:string  filtering_options, list:data:geneset  sets, list:basic:string  extract_fields, basic:boolean  one_per_line)[Source: v1.0.1]

Annotate variants with SnpEff. SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes). This process also allows filtering of variants with ``SnpSift filter`` command and extracting specific fields from the VCF file with ``SnpSift extractFields`` command. This tool works with single-sample VCF file as an input.

Input arguments

variants
label:

Variants (VCF)

type:

data:variants:vcf

required:

True

disabled:

False

hidden:

False

database
label:

snpEff database

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

GRCh38.99

choices:

  • GRCh37.75: GRCh37.75

  • GRCh38.99: GRCh38.99

dbsnp
label:

Known variants

type:

data:variants:vcf

description:

List of known variants for annotation.

required:

False

disabled:

False

hidden:

False

filtering_options
label:

Filtering expressions

type:

basic:string

description:

Filter VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)

required:

False

disabled:

False

hidden:

False

sets
label:

Files with list of genes

type:

list:data:geneset

description:

Use list of genes, if you only want variants reported for them. Each file must have one string per line.

required:

False

disabled:

False

hidden:

!filtering_options

extract_fields
label:

Fields to extract

type:

list:basic:string

description:

Write fields you want to extract from annonated vcf file and press Enter after each one. Example of fields: `CHROM POS REF ALT ‘ANN[*].GENE’`. For more information follow this [link](https://pcingola.github.io/SnpEff/ss_extractfields/).

required:

False

disabled:

False

hidden:

False

advanced.one_per_line
label:

One effect per line

type:

basic:boolean

description:

If there is more than one effect per variant, write them to seperate lines.

required:

True

disabled:

False

hidden:

False

default:

False

Output results

vcf
label:

Annotated variants (VCF)

type:

basic:file

required:

True

disabled:

False

hidden:

False

tbi
label:

Index of annotated variants

type:

basic:file

required:

True

disabled:

False

hidden:

False

vcf_extracted
label:

Extracted annotated variants (VCF)

type:

basic:file

required:

False

disabled:

False

hidden:

False

tbi_extracted
label:

Index of extracted variants

type:

basic:file

required:

False

disabled:

False

hidden:

False

species
label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

build
label:

Build

type:

basic:string

required:

True

disabled:

False

hidden:

False

genes
label:

SnpEff genes

type:

basic:file

required:

True

disabled:

False

hidden:

False

summary
label:

Summary

type:

basic:file:html

required:

True

disabled:

False

hidden:

False