Process definitions

ATAC-Seq

data:workflow:atacseqworkflow-atac-seq (data:reads:fastq reads, data:index:bowtie2 genome, data:bed promoter, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:boolean tagalign, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v3.1.1]

This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment, pre-peakcall QC, and calling peaks (with post-peakcall QC). First, reads are aligned to a genome using [Bowtie2](http://bowtie-bio.sourceforge.net/index.shtml) aligner. Next, pre-peakcall QC metrics are calculated. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). Finally, the peaks are called using [MACS2](https://github.com/taoliu/MACS/). The post-peakcall QC report includes additional QC metrics – number of peaks, fraction of reads in peaks (FRiP), number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions.

reads

label:: Select sample(s)
type:: data:reads:fastq

genome

label:: Genome
type:: data:index:bowtie2

promoter

label:: Promoter regions BED file
type:: data:bed
description:: BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:: False

alignment.mode

label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--local

choices:

end to end mode: --end-to-end
local: --local

alignment.speed

label:

Speed vs. Sensitivity

type:

basic:string

default:

--sensitive

choices:

Very fast: --very-fast
Fast: --fast
Sensitive: --sensitive
Very sensitive: --very-sensitive

alignment.PE_options.use_se

label:: Map as single-ended (for paired-end reads only)
type:: basic:boolean
description:: If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
default:: False

alignment.PE_options.discordantly

label:: Report discordantly matched read
type:: basic:boolean
description:: If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
default:: True

alignment.PE_options.rep_se

label:: Report single ended
type:: basic:boolean
description:: If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
default:: True

alignment.PE_options.minins

label:: Minimal distance
type:: basic:integer
description:: The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
default:: 0

alignment.PE_options.maxins

label:: Maximal distance
type:: basic:integer
description:: The maximum fragment length for valid paired-end alignments.
default:: 2000

alignment.start_trimming.trim_5

label:: Bases to trim from 5’
type:: basic:integer
description:: Number of bases to trim from from 5’ (left) end of each read before alignment.
default:: 0

alignment.start_trimming.trim_3

label:: Bases to trim from 3’
type:: basic:integer
description:: Number of bases to trim from from 3’ (right) end of each read before alignment
default:: 0

alignment.trimming.trim_iter

label:: Iterations
type:: basic:integer
description:: Number of iterations.
default:: 0

alignment.trimming.trim_nucl

label:: Bases to trim
type:: basic:integer
description:: Number of bases to trim from 3’ end in each iteration.
default:: 2

alignment.reporting.rep_mode

label:

Report mode

type:

basic:string

description:

Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments

default:

def

choices:

Default mode: def
-k mode: k
-a mode (very slow): a

alignment.reporting.k_reports

label:: Number of reports (for -k mode only)
type:: basic:integer
description:: Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first.
default:: 5

prepeakqc_settings.q_threshold

label:: Quality filtering threshold
type:: basic:integer
default:: 30

prepeakqc_settings.n_sub

label:: Number of reads to subsample
type:: basic:integer
default:: 25000000

prepeakqc_settings.tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
default:: True

prepeakqc_settings.shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
default:: 0

settings.tagalign

label:: Use tagAlign files
type:: basic:boolean
description:: Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:: True

settings.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

settings.tagalign

choices:

1: 1
auto: auto
all: all

settings.duplicates_prepeak

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!settings.tagalign

default:

all

choices:

1: 1
auto: auto
all: all

settings.qvalue

label:: Q-value cutoff
type:: basic:decimal
description:: The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:: False
disabled:: settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: False
disabled:: settings.qvalue
hidden:: settings.tagalign

settings.pvalue_prepeak

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:: settings.qvalue
hidden:: !settings.tagalign || settings.qvalue
default:: 0.01

settings.cap_num

label:: Cap number of peaks by taking top N peaks
type:: basic:integer
description:: To keep all peaks set value to 0.
disabled:: settings.broad
default:: 300000

settings.mfold_lower

label:: MFOLD range (lower limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.mfold_upper

label:: MFOLD range (upper limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.slocal

label:: Small local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.llocal

label:: Large local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.extsize

label:: extsize
type:: basic:integer
description:: While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
default:: 150

settings.shift

label:: Shift
type:: basic:integer
description:: Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
default:: -75

settings.band_width

label:: Band width
type:: basic:integer
description:: The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:: False

settings.nolambda

label:: Use backgroud lambda as local lambda
type:: basic:boolean
description:: With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:: False

settings.fix_bimodal

label:: Turn on the auto paired-peak model process
type:: basic:boolean
description:: Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
default:: False

settings.nomodel

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: settings.tagalign
default:: False

settings.nomodel_prepeak

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: !settings.tagalign
default:: True

settings.down_sample

label:: Down-sample
type:: basic:boolean
description:: When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
default:: False

settings.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:: True

settings.spmr

label:: Save signal per million reads for fragment pileup profiles
type:: basic:boolean
disabled:: settings.bedgraph === false
default:: True

settings.call_summits

label:: Call summits
type:: basic:boolean
description:: MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:: True

settings.broad

label:: Composite broad regions
type:: basic:boolean
description:: When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:: settings.call_summits === true
default:: False

settings.broad_cutoff

label:: Broad cutoff
type:: basic:decimal
description:: Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:: False
disabled:: settings.call_summits === true || settings.broad !== true

Abstract alignment process

data:alignmentabstract-alignment ()[Source: v1.0.1]

bam

label:: Alignment file
type:: basic:file

bai

label:: Alignment index BAI
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Abstract annotation process

data:annotationabstract-annotation ()[Source: v1.0.1]

annot

label:: Uploaded file
type:: basic:file

source

label:: Gene ID source
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Abstract bed process

data:bedabstract-bed ()[Source: v1.0.2]

bed

label:: BED
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Abstract differential expression process

data:differentialexpressionabstract-differentialexpression ()[Source: v1.0.1]

raw

label:: Differential expression (gene level)
type:: basic:file

de_json

label:: Results table (JSON)
type:: basic:json

de_file

label:: Results table (file)
type:: basic:file

source

label:: Gene ID source
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

feature_type

label:: Feature type
type:: basic:string

Abstract expression process

data:expressionabstract-expression ()[Source: v1.0.1]

exp

label:: Normalized expression
type:: basic:file

rc

label:: Read counts
type:: basic:file
required:: False

exp_json

label:: Expression (json)
type:: basic:json

exp_type

label:: Expression type
type:: basic:string

source

label:: Gene ID source
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

feature_type

label:: Feature type
type:: basic:string

Annotate novel splice junctions (regtools)

data:junctions:regtoolsregtools-junctions-annotate (data:seq:nucleotide genome, data:annotation:gtf annotation, data:alignment:bam:star alignment_star, data:alignment:bam alignment, data:bed input_bed_junctions)[Source: v1.3.1]

Identify novel splice junctions by using regtools to annotate against a reference. The process accepts reference genome, reference genome annotation (GTF), and input with reads information (STAR aligment or reads aligned by any other aligner or junctions in BED12 format). If STAR aligner data is given as input, the process calculates BED12 file from STAR ‘SJ.out.tab’ file, and annotates all junctions with ‘regtools junctions annotate’ command. When reads are aligned by other aligner, junctions are extracted with ‘regtools junctions extract’ tool and then annotated with ‘junction annotate’ command. Third option allows user to provide directly BED12 file with junctions, which are then annotated. Finnally, annotated novel junctions are filtered in a separate output file. More information can be found in the [regtools manual](https://regtools.readthedocs.io/en/latest/).

genome

label:: Reference genome
type:: data:seq:nucleotide

annotation

label:: Reference genome annotation (GTF)
type:: data:annotation:gtf

alignment_star

label:: STAR alignment
type:: data:alignment:bam:star
description:: Splice junctions detected by STAR aligner (SJ.out.tab STAR output file). Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
required:: False

alignment

label:: Alignment
type:: data:alignment:bam
description:: Aligned reads from which splice junctions are going to be extracted. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
required:: False

input_bed_junctions

label:: Junctions in BED12 format
type:: data:bed
description:: Splice junctions in BED12 format. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
required:: False

novel_splice_junctions

label:: Table of annotated novel splice junctions
type:: basic:file

splice_junctions

label:: Table of annotated splice junctions
type:: basic:file

novel_sj_bed

label:: Novel splice junctions in BED format
type:: basic:file

bed

label:: Splice junctions in BED format
type:: basic:file

novel_sj_bigbed_igv_ucsc

label:: Novel splice junctions in BigBed format
type:: basic:file
required:: False

bigbed_igv_ucsc

label:: Splice junctions in BigBed format
type:: basic:file
required:: False

novel_sj_tbi_jbrowse

label:: Novel splice junctions bed tbi index for JBrowse
type:: basic:file

tbi_jbrowse

label:: Bed tbi index for JBrowse
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Archive samples

data:archive:samplesarchive-samples (list:data data, list:basic:string fields, basic:boolean j)[Source: v0.5.2]

Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names.

data

label:: Data list
type:: list:data

fields

label:: Output file fields
type:: list:basic:string

j

label:: Junk paths
type:: basic:boolean
description:: Store just names of saved files (junk the path)
default:: False

archive

label:: Archive
type:: basic:file

BAM file

data:alignment:bam:uploadupload-bam (basic:file src, basic:string species, basic:string build)[Source: v1.8.0]

Import a BAM file (.bam), which is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

src

label:: Mapping (BAM)
type:: basic:file
description:: A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
validate_regex:: \.(bam)$

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Build
type:: basic:string

bam

label:: Uploaded file
type:: basic:file

bai

label:: Index BAI
type:: basic:file

stats

label:: Alignment statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BAM file and index

data:alignment:bam:uploadupload-bam-indexed (basic:file src, basic:file src2, basic:string species, basic:string build)[Source: v1.8.0]

Import a BAM file (.bam) and BAM index (.bam.bai). BAM file is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

src

label:: Mapping (BAM)
type:: basic:file
description:: A mapping file in BAM format.
validate_regex:: \.(bam)$

src2

label:: bam index (*.bam.bai file)
type:: basic:file
description:: An index file of a BAM mapping file (ending with bam.bai).
validate_regex:: \.(bam.bai)$

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Build
type:: basic:string

bam

label:: Uploaded file
type:: basic:file

bai

label:: Index BAI
type:: basic:file

stats

label:: Alignment statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BBDuk (paired-end)

data:reads:fastq:paired:bbduk:bbduk-paired (data:reads:fastq:paired reads, basic:integer min_length, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean find_best_match, basic:boolean remove_if_either_bad, basic:boolean perform_error_correction, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:boolean trim_by_overlap, basic:boolean strict_overlap, basic:integer min_overlap, basic:integer min_insert, basic:boolean trim_pairs_evenly, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v3.1.2]

Run BBDuk on paired-end reads. BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

reads

label:: Reads
type:: data:reads:fastq:paired
required:: True
disabled:: False
hidden:: False

min_length

label:: Minimum length
type:: basic:integer
description:: Reads shorter than the minimum length will be discarded after trimming.
required:: True
disabled:: False
hidden:: False
default:: 10

reference.sequences

label:: Sequences
type:: list:data:seq:nucleotide
description:: Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
required:: False
disabled:: False
hidden:: False

reference.literal_sequences

label:: Literal sequences
type:: list:basic:string
description:: Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

processing.kmer_length

label:: Kmer length
type:: basic:integer
description:: Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
required:: True
disabled:: False
hidden:: False
default:: 27

processing.check_reverse_complements

label:: Check reverse complements
type:: basic:boolean
description:: Look for reverse complements of kmers in addition to forward kmers.
required:: True
disabled:: False
hidden:: False
default:: True

processing.mask_middle_base

label:: Mask the middle base of a kmer
type:: basic:boolean
description:: Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
required:: True
disabled:: False
hidden:: False
default:: True

processing.min_kmer_hits

label:: Minimum number of kmer hits
type:: basic:integer
description:: Reads need at least this many matching kmers to be considered matching the reference.
required:: True
disabled:: False
hidden:: False
default:: 1

processing.min_kmer_fraction

label:: Minimum kmer fraction
type:: basic:decimal
description:: A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
required:: True
disabled:: False
hidden:: False
default:: 0.0

processing.min_coverage_fraction

label:: Minimum kmer fraction
type:: basic:decimal
description:: A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
required:: True
disabled:: False
hidden:: False
default:: 0.0

processing.hamming_distance

label:: Maximum Hamming distance for kmers (substitutions only)
type:: basic:integer
description:: Hamming distance i.e. the number of mismatches allowed in the kmer.
required:: True
disabled:: False
hidden:: False
default:: 0

processing.query_hamming_distance

label:: Hamming distance for query kmers
type:: basic:integer
description:: Set a hamming distance for query kmers instead of the read kmers. This makes the read processing much slower, but does not use additional memory.
required:: True
disabled:: False
hidden:: False
default:: 0

processing.edit_distance

label:: Maximum edit distance from reference kmers (substitutions and indels)
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.hamming_distance2

label:: Hamming distance for short kmers when looking for shorter kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.query_hamming_distance2

label:: Hamming distance for short query kmers when looking for shorter kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.edit_distance2

label:: Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.forbid_N

label:: Forbid matching of read kmers containing N
type:: basic:boolean
description:: By default, these will match a reference ‘A’ if’Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
required:: True
disabled:: False
hidden:: False
default:: False

processing.find_best_match

label:: Find best match
type:: basic:boolean
description:: If multiple matches, associate read with sequence sharing most kmers.
required:: True
disabled:: False
hidden:: False
default:: True

processing.remove_if_either_bad

label:: Remove both sequences of a paired-end read, if either of them is to be removed
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

processing.perform_error_correction

label:: Perform error correction with BBMerge prior to kmer operations
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.k_trim

label:

Trimming protocol to remove bases matching reference kmers from reads

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

f

choices:

Don’t trim: f
Trim to the right: r
Trim to the left: l

operations.k_mask

label:: Symbol to replace bases matching reference kmers
type:: basic:string
description:: Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
required:: True
disabled:: False
hidden:: False
default:: f

operations.mask_fully_covered

label:: Only mask bases that are fully covered by kmers
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.min_k

label:: Look for shorter kmers at read tips down to this length when k-trimming or masking
type:: basic:integer
description:: -1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
required:: True
disabled:: False
hidden:: False
default:: -1

operations.quality_trim

label:

Trimming protocol to remove bases with quality below the minimum average region quality from read ends

type:

basic:string

description:

Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.

required:

True

disabled:

False

hidden:

False

default:

f

choices:

Trim neither end: f
Trim both ends: rl
Trim only right end: r
Trim only left end: l
Use sliding window: w

operations.trim_quality

label:: Average quality below which to trim region
type:: basic:integer
description:: Set trimming protocol to enable this parameter.
required:: True
disabled:: operations.quality_trim === ‘f’
hidden:: False
default:: 6

operations.quality_encoding_offset

label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+ (33): 33
Illumina up to 1.3+, 1.5+ (64): 64
Auto: auto

operations.ignore_bad_quality

label:: Don’t crash if quality values appear to be incorrect
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.trim_poly_A

label:: Minimum length of poly-A or poly-T tails to trim on either end of reads
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_length_fraction

label:: Minimum length fraction
type:: basic:decimal
description:: Reads shorter than this fraction of original length after trimming will be discarded.
required:: True
disabled:: False
hidden:: False
default:: 0.0

operations.max_length

label:: Maximum length
type:: basic:integer
description:: Reads longer than this after trimming will be discarded.
required:: False
disabled:: False
hidden:: False

operations.min_average_quality

label:: Minimum average quality
type:: basic:integer
description:: Reads with average quality (after trimming) below this will be discarded.
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_average_quality_bases

label:: Number of initial bases to calculate minimum average quality from
type:: basic:integer
description:: If positive, calculate minimum average quality from this many initial bases
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_base_quality

label:: Minimum base quality below which reads are discarded after trimming
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_consecutive_bases

label:: Minimum number of consecutive called bases
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.trim_pad

label:: Number of bases to trim around matching kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.trim_by_overlap

label:: Trim adapters based on where paired-end reads overlap
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.strict_overlap

label:: Adjust sensitivity in ‘Trim adapters based on where paired-end reads overlap’ mode
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

operations.min_overlap

label:: Minum number of overlapping bases
type:: basic:integer
description:: Require this many bases of overlap for detection.
required:: True
disabled:: False
hidden:: False
default:: 14

operations.min_insert

label:: Minimum insert size
type:: basic:integer
description:: Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
required:: True
disabled:: False
hidden:: False
default:: 40

operations.trim_pairs_evenly

label:: Trim both sequences of paired-end reads to the minimum length of either sequence
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.force_trim_left

label:: Position from which to trim bases to the left
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.force_trim_right

label:: Position from which to trim bases to the right
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.force_trim_right2

label:: Number of bases to trim from the right end
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.force_trim_mod

label:: Modulo to right-trim reads
type:: basic:integer
description:: Trim reads to the largest multiple of modulo.
required:: True
disabled:: False
hidden:: False
default:: 0

operations.restrict_left

label:: Number of leftmost bases to look in for kmer matches
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.restrict_right

label:: Number of rightmost bases to look in for kmer matches
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_GC

label:: Minimum GC content
type:: basic:decimal
description:: Discard reads with lower GC content.
required:: True
disabled:: False
hidden:: False
default:: 0.0

operations.max_GC

label:: Maximum GC content
type:: basic:decimal
description:: Discard reads with higher GC content.
required:: True
disabled:: False
hidden:: False
default:: 1.0

operations.maxns

label:: Max Ns after trimming
type:: basic:integer
description:: If non-negative, reads with more Ns than this (after trimming) will be discarded.
required:: True
disabled:: False
hidden:: False
default:: -1

operations.toss_junk

label:: Discard reads with invalid characters as bases
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

header_parsing.chastity_filter

label:: Discard reads that fail Illumina chastity filtering
type:: basic:boolean
description:: Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
required:: True
disabled:: False
hidden:: False
default:: False

header_parsing.barcode_filter

label:: Remove reads with unexpected barcodes
type:: basic:boolean
description:: Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise. A barcode must be the last part of the read header.
required:: True
disabled:: False
hidden:: False
default:: False

header_parsing.barcode_files

label:: Barcode sequences
type:: list:data:seq:nucleotide
description:: FASTA file(s) with barcode sequences.
required:: False
disabled:: False
hidden:: False

header_parsing.barcode_sequences

label:: Literal barcode sequences
type:: list:basic:string
description:: Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

header_parsing.x_min

label:: Minimum X coordinate
type:: basic:integer
description:: If positive, discard reads with a smaller X coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

header_parsing.y_min

label:: Minimum Y coordinate
type:: basic:integer
description:: If positive, discard reads with a smaller Y coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

header_parsing.x_max

label:: Maximum X coordinate
type:: basic:integer
description:: If positive, discard reads with a larger X coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

header_parsing.y_max

label:: Maximum Y coordinate
type:: basic:integer
description:: If positive, discard reads with a larger Y coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

complexity.entropy

label:: Minimum entropy
type:: basic:decimal
description:: Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
required:: True
disabled:: False
hidden:: False
default:: -1.0

complexity.entropy_window

label:: Length of sliding window used to calculate entropy
type:: basic:integer
description:: To use the sliding window set minimum entropy in range between 0.0 and 1.0.
required:: True
disabled:: False
hidden:: False
default:: 50

complexity.entropy_k

label:: Length of kmers used to calcuate entropy
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 5

complexity.entropy_mask

label:: Mask low-entropy parts of sequences with N instead of discarding
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

complexity.min_base_frequency

label:: Minimum base frequency
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

fastqc.nogroup

label:: Disable grouping of bases for reads >50bp
type:: basic:boolean
description:: All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Remaining upstream reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Remaining downstream reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

statistics

label:: Statistics
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Upstream quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Downstream quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download upstream FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download downstream FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

BBDuk (single-end)

data:reads:fastq:single:bbduk:bbduk-single (data:reads:fastq:single reads, basic:integer min_length, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean find_best_match, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:integer min_overlap, basic:integer min_insert, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v3.1.2]

Run BBDuk on single-end reads. BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

reads

label:: Reads
type:: data:reads:fastq:single
required:: True
disabled:: False
hidden:: False

min_length

label:: Minimum length
type:: basic:integer
description:: Reads shorter than the minimum length will be discarded after trimming.
required:: True
disabled:: False
hidden:: False
default:: 10

reference.sequences

label:: Sequences
type:: list:data:seq:nucleotide
description:: Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
required:: False
disabled:: False
hidden:: False

reference.literal_sequences

label:: Literal sequences
type:: list:basic:string
description:: Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

processing.kmer_length

label:: Kmer length
type:: basic:integer
description:: Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
required:: True
disabled:: False
hidden:: False
default:: 27

processing.check_reverse_complements

label:: Check reverse complements
type:: basic:boolean
description:: Look for reverse complements of kmers in addition to forward kmers
required:: True
disabled:: False
hidden:: False
default:: True

processing.mask_middle_base

label:: Mask the middle base of a kmer
type:: basic:boolean
description:: Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
required:: True
disabled:: False
hidden:: False
default:: True

processing.min_kmer_hits

label:: Minimum number of kmer hits
type:: basic:integer
description:: Reads need at least this many matching kmers to be considered matching the reference.
required:: True
disabled:: False
hidden:: False
default:: 1

processing.min_kmer_fraction

label:: Minimum kmer fraction
type:: basic:decimal
description:: A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
required:: True
disabled:: False
hidden:: False
default:: 0.0

processing.min_coverage_fraction

label:: Minimum coverage fraction
type:: basic:decimal
description:: A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
required:: True
disabled:: False
hidden:: False
default:: 0.0

processing.hamming_distance

label:: Maximum Hamming distance for kmers (substitutions only)
type:: basic:integer
description:: Hamming distance i.e. the number of mismatches allowed in the kmer.
required:: True
disabled:: False
hidden:: False
default:: 0

processing.query_hamming_distance

label:: Hamming distance for query kmers
type:: basic:integer
description:: Set a hamming distance for query kmers instead of the read kmers. This makes the read processing much slower, but does not use additional memory.
required:: True
disabled:: False
hidden:: False
default:: 0

processing.edit_distance

label:: Maximum edit distance from reference kmers (substitutions and indels)
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.hamming_distance2

label:: Hamming distance for short kmers when looking for shorter kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.query_hamming_distance2

label:: Hamming distance for short query kmers when looking for shorter kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.edit_distance2

label:: Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

processing.forbid_N

label:: Forbid matching of read kmers containing N
type:: basic:boolean
description:: By default, these will match a reference ‘A’ if’Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
required:: True
disabled:: False
hidden:: False
default:: False

processing.find_best_match

label:: Find best match
type:: basic:boolean
description:: If multiple matches, associate read with sequence sharing most kmers.
required:: True
disabled:: False
hidden:: False
default:: True

operations.k_trim

label:

Trimming protocol to remove bases matching reference kmers from reads

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

f

choices:

Don’t trim: f
Trim to the right: r
Trim to the left: l

operations.k_mask

label:: Symbol to replace bases matching reference kmers
type:: basic:string
description:: Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
required:: True
disabled:: False
hidden:: False
default:: f

operations.mask_fully_covered

label:: Only mask bases that are fully covered by kmers
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.min_k

label:: Look for shorter kmers at read tips down to this length when k-trimming or masking
type:: basic:integer
description:: -1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
required:: True
disabled:: False
hidden:: False
default:: -1

operations.quality_trim

label:

Trimming protocol to remove bases with quality below the minimum average region quality from read ends

type:

basic:string

description:

Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.

required:

True

disabled:

False

hidden:

False

default:

f

choices:

Trim neither end: f
Trim both ends: rl
Trim only right end: r
Trim only left end: l
Use sliding window: w

operations.trim_quality

label:: Average quality below which to trim region
type:: basic:integer
description:: Set trimming protocol to enable this parameter.
required:: True
disabled:: operations.quality_trim === ‘f’
hidden:: False
default:: 6

operations.quality_encoding_offset

label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+ (33): 33
Illumina up to 1.3+, 1.5+ (64): 64
Auto: auto

operations.ignore_bad_quality

label:: Don’t crash if quality values appear to be incorrect
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

operations.trim_poly_A

label:: Minimum length of poly-A or poly-T tails to trim on either end of reads
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_length_fraction

label:: Minimum length fraction
type:: basic:decimal
description:: Reads shorter than this fraction of original length after trimming will be discarded.
required:: True
disabled:: False
hidden:: False
default:: 0.0

operations.max_length

label:: Maximum length
type:: basic:integer
description:: Reads longer than this after trimming will be discarded.
required:: False
disabled:: False
hidden:: False

operations.min_average_quality

label:: Minimum average quality
type:: basic:integer
description:: Reads with average quality (after trimming) below this will be discarded.
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_average_quality_bases

label:: Number of initial bases to calculate minimum average quality from
type:: basic:integer
description:: If positive, calculate minimum average quality from this many initial bases
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_base_quality

label:: Minimum base quality below which reads are discarded after trimming
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_consecutive_bases

label:: Minimum number of consecutive called bases
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.trim_pad

label:: Number of bases to trim around matching kmers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_overlap

label:: Minum number of overlapping bases
type:: basic:integer
description:: Require this many bases of overlap for detection.
required:: True
disabled:: False
hidden:: False
default:: 14

operations.min_insert

label:: Minimum insert size
type:: basic:integer
description:: Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
required:: True
disabled:: False
hidden:: False
default:: 40

operations.force_trim_left

label:: Position from which to trim bases to the left
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.force_trim_right

label:: Position from which to trim bases to the right
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.force_trim_right2

label:: Number of bases to trim from the right end
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.force_trim_mod

label:: Modulo to right-trim reads
type:: basic:integer
description:: Trim reads to the largest multiple of modulo.
required:: True
disabled:: False
hidden:: False
default:: 0

operations.restrict_left

label:: Number of leftmost bases to look in for kmer matches
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.restrict_right

label:: Number of rightmost bases to look in for kmer matches
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

operations.min_GC

label:: Minimum GC content
type:: basic:decimal
description:: Discard reads with lower GC content.
required:: True
disabled:: False
hidden:: False
default:: 0.0

operations.max_GC

label:: Maximum GC content
type:: basic:decimal
description:: Discard reads with higher GC content.
required:: True
disabled:: False
hidden:: False
default:: 1.0

operations.maxns

label:: Max Ns after trimming
type:: basic:integer
description:: If non-negative, reads with more Ns than this (after trimming) will be discarded.
required:: True
disabled:: False
hidden:: False
default:: -1

operations.toss_junk

label:: Discard reads with invalid characters as bases
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

header_parsing.chastity_filter

label:: Discard reads that fail Illumina chastity filtering
type:: basic:boolean
description:: Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
required:: True
disabled:: False
hidden:: False
default:: False

header_parsing.barcode_filter

label:: Remove reads with unexpected barcodes
type:: basic:boolean
description:: Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise. A barcode must be the last part of the read header.
required:: True
disabled:: False
hidden:: False
default:: False

header_parsing.barcode_files

label:: Barcode sequences
type:: list:data:seq:nucleotide
description:: FASTA file(s) with barcode sequences.
required:: False
disabled:: False
hidden:: False

header_parsing.barcode_sequences

label:: Literal barcode sequences
type:: list:basic:string
description:: Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

header_parsing.x_min

label:: Minimum X coordinate
type:: basic:integer
description:: If positive, discard reads with a smaller X coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

header_parsing.y_min

label:: Minimum Y coordinate
type:: basic:integer
description:: If positive, discard reads with a smaller Y coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

header_parsing.x_max

label:: Maximum X coordinate
type:: basic:integer
description:: If positive, discard reads with a larger X coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

header_parsing.y_max

label:: Maximum Y coordinate
type:: basic:integer
description:: If positive, discard reads with a larger Y coordinate.
required:: True
disabled:: False
hidden:: False
default:: -1

complexity.entropy

label:: Minimum entropy
type:: basic:decimal
description:: Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
required:: True
disabled:: False
hidden:: False
default:: -1.0

complexity.entropy_window

label:: Length of sliding window used to calculate entropy
type:: basic:integer
description:: To use the sliding window set minimum entropy in range between 0.0 and 1.0.
required:: True
disabled:: False
hidden:: False
default:: 50

complexity.entropy_k

label:: Length of kmers used to calcuate entropy
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 5

complexity.entropy_mask

label:: Mask low-entropy parts of sequences with N instead of discarding
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

complexity.min_base_frequency

label:: Minimum base frequency
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 0

fastqc.nogroup

label:: Disable grouping of bases for reads >50bp
type:: basic:boolean
description:: All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Remaining reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

statistics

label:: Statistics
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

BBDuk - STAR - featureCounts - QC

data:workflow:rnaseq:featurecounts:qc:workflow-bbduk-star-featurecounts-qc (data:reads:fastq reads, data:index:star genome, data:annotation annotation, basic:string assay_type, data:index:salmon cdna_index, data:index:star rrna_reference, data:index:star globin_reference, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chim_segment_min, basic:boolean quant_mode, basic:boolean single_end, basic:string out_filter_type, basic:integer out_multimap_max, basic:integer out_mismatch_max, basic:decimal out_mismatch_nl_max, basic:integer out_score_min, basic:decimal out_mismatch_nrl_max, basic:integer align_overhang_min, basic:integer align_sjdb_overhang_min, basic:integer align_intron_size_min, basic:integer align_intron_size_max, basic:integer align_gap_max, basic:string align_end_alignment, basic:boolean out_unmapped, basic:string out_sam_attributes, basic:string out_rg_line, basic:integer n_reads, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:boolean by_read_group, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v6.2.0]

RNA-seq pipeline comprised of preprocessing, alignment and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using __Seqtk__ tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.

reads

label:: Reads (FASTQ)
type:: data:reads:fastq
description:: Reads in FASTQ file, single or paired end.
required:: True
disabled:: False
hidden:: False

genome

label:: Indexed reference genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation
type:: data:annotation
description:: GTF and GFF3 annotation formats are supported.
required:: True
disabled:: False
hidden:: False

assay_type

label:

Assay type

type:

basic:string

description:

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

Strand non-specific: non_specific
Strand-specific forward: forward
Strand-specific reverse: reverse
Detect automatically: auto

cdna_index

label:: cDNA index file
type:: data:index:salmon
description:: Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
required:: False
disabled:: False
hidden:: assay_type != ‘auto’

rrna_reference

label:: Indexed rRNA reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

globin_reference

label:: Indexed Globin reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

preprocessing.adapters

label:: Adapters
type:: list:data:seq:nucleotide
description:: FASTA file(s) with adapters.
required:: False
disabled:: False
hidden:: False

preprocessing.custom_adapter_sequences

label:: Custom adapter sequences
type:: list:basic:string
description:: Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

preprocessing.kmer_length

label:: K-mer length [k=]
type:: basic:integer
description:: Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
required:: True
disabled:: False
hidden:: False
default:: 23

preprocessing.min_k

label:: Minimum k-mer length at right end of reads used for trimming [mink=]
type:: basic:integer
required:: True
disabled:: preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
hidden:: False
default:: 11

preprocessing.hamming_distance

label:: Maximum Hamming distance for k-mers [hammingdistance=]
type:: basic:integer
description:: Hamming distance i.e. the number of mismatches allowed in the kmer.
required:: True
disabled:: False
hidden:: False
default:: 1

preprocessing.maxns

label:: Max Ns after trimming [maxns=]
type:: basic:integer
description:: If non-negative, reads with more Ns than this (after trimming) will be discarded.
required:: True
disabled:: False
hidden:: False
default:: -1

preprocessing.trim_quality

label:: Average quality below which to trim region [trimq=]
type:: basic:integer
description:: Phred algorithm is used, which is more accurate than naive trimming.
required:: True
disabled:: False
hidden:: False
default:: 10

preprocessing.min_length

label:: Minimum read length [minlength=]
type:: basic:integer
description:: Reads shorter than minimum read length after trimming are discarded.
required:: True
disabled:: False
hidden:: False
default:: 20

preprocessing.quality_encoding_offset

label:

Quality encoding offset [qin=]

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+: 33
Illumina up to 1.3+, 1.5+: 64
Auto: auto

preprocessing.ignore_bad_quality

label:: Ignore bad quality [ignorebadquality]
type:: basic:boolean
description:: Don’t crash if quality values appear to be incorrect.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.unstranded

label:: The data is unstranded [–outSAMstrandField intronMotif]
type:: basic:boolean
description:: For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.noncannonical

label:: Remove non-cannonical junctions (Cufflinks compatibility)
type:: basic:boolean
description:: It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.chimeric_reads.chimeric

label:: Detect chimeric and circular alignments [–chimOutType SeparateSAMold]
type:: basic:boolean
description:: To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.chimeric_reads.chim_segment_min

label:: Minimum length of chimeric segment [–chimSegmentMin]
type:: basic:integer
required:: True
disabled:: !alignment.chimeric_reads.chimeric
hidden:: False
default:: 20

alignment.transcript_output.quant_mode

label:: Output in transcript coordinates [–quantMode]
type:: basic:boolean
description:: With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.transcript_output.single_end

label:: Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]
type:: basic:boolean
description:: By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).
required:: True
disabled:: !t_coordinates.quant_mode
hidden:: False
default:: False

alignment.filtering_options.out_filter_type

label:

Type of filtering [–outFilterType]

type:

basic:string

description:

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.

required:

True

disabled:

False

hidden:

False

default:

Normal

choices:

Normal: Normal
BySJout: BySJout

alignment.filtering_options.out_multimap_max

label:: Maximum number of loci [–outFilterMultimapNmax]
type:: basic:integer
description:: Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_mismatch_max

label:: Maximum number of mismatches [–outFilterMismatchNmax]
type:: basic:integer
description:: Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_mismatch_nl_max

label:: Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]
type:: basic:decimal
description:: Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_score_min

label:: Minimum alignment score [–outFilterScoreMin]
type:: basic:integer
description:: Alignment will be output only if its score is higher than or equal to this value (default: 0).
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_mismatch_nrl_max

label:: Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]
type:: basic:decimal
description:: Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_overhang_min

label:: Minimum overhang [–alignSJoverhangMin]
type:: basic:integer
description:: Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_sjdb_overhang_min

label:: Minimum overhang (sjdb) [–alignSJDBoverhangMin]
type:: basic:integer
description:: Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_intron_size_min

label:: Minimum intron size [–alignIntronMin]
type:: basic:integer
description:: Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_intron_size_max

label:: Maximum intron size [–alignIntronMax]
type:: basic:integer
description:: Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_gap_max

label:: Minimum gap between mates [–alignMatesGapMax]
type:: basic:integer
description:: Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_end_alignment

label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

True

disabled:

False

hidden:

False

default:

Local

choices:

Local: Local
EndToEnd: EndToEnd
Extend5pOfRead1: Extend5pOfRead1
Extend5pOfReads12: Extend5pOfReads12

alignment.output_options.out_unmapped

label:: Output unmapped reads (SAM) [–outSAMunmapped Within]
type:: basic:boolean
description:: Output of unmapped reads in the SAM format.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.output_options.out_sam_attributes

label:

Desired SAM attributes [–outSAMattributes]

type:

basic:string

description:

A string of desired SAM attributes, in the order desired for the output SAM.

required:

True

disabled:

False

hidden:

False

default:

Standard

choices:

Standard: Standard
All: All
NH HI NM MD: NH HI NM MD
None: None

alignment.output_options.out_rg_line

label:: SAM/BAM read group line [–outSAMattrRGline]
type:: basic:string
description:: The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in -readFilesIn. Commas have to be surrounded by spaces, e.g. -outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.
required:: False
disabled:: False
hidden:: False

quantification.n_reads

label:: Number of reads in subsampled alignment file
type:: basic:integer
description:: Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
required:: True
disabled:: False
hidden:: assay_type != ‘auto’
default:: 5000000

quantification.feature_class

label:: Feature class [-t]
type:: basic:string
description:: Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
required:: True
disabled:: False
hidden:: False
default:: exon

quantification.feature_type

label:

Feature type

type:

basic:string

description:

The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.

required:

True

disabled:

False

hidden:

False

default:

gene

choices:

gene: gene
transcript: transcript

quantification.id_attribute

label:

ID attribute [-g]

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

required:

True

disabled:

False

hidden:

False

default:

gene_id

choices:

gene_id: gene_id
transcript_id: transcript_id
ID: ID
geneid: geneid

quantification.by_read_group

label:: Assign reads by read group
type:: basic:boolean
description:: RG tag is required to be present in the input BAM files.
required:: True
disabled:: False
hidden:: False
default:: True

downsampling.n_reads

label:: Number of reads
type:: basic:integer
description:: Number of reads to include in subsampling.
required:: True
disabled:: False
hidden:: False
default:: 1000000

downsampling.advanced.seed

label:: Seed [-s]
type:: basic:integer
description:: Using the same random seed makes reads subsampling more reproducible in different environments.
required:: True
disabled:: False
hidden:: False
default:: 11

downsampling.advanced.fraction

label:: Fraction of reads used
type:: basic:decimal
description:: Use the fraction of reads [0.0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
required:: False
disabled:: False
hidden:: False

downsampling.advanced.two_pass

label:: 2-pass mode [-2]
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
required:: True
disabled:: False
hidden:: False
default:: False

BBDuk - Salmon - QC

data:workflow:rnaseq:salmon:workflow-bbduk-salmon-qc (data:reads:fastq reads, data:index:salmon salmon_index, data:index:star genome, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean seq_bias, basic:boolean gc_bias, basic:decimal consensus_slack, basic:decimal min_score_fraction, basic:integer range_factorization_bins, basic:integer min_assigned_frag, basic:integer num_bootstraps, basic:integer num_gibbs_samples, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v4.3.1]

Alignment-free RNA-Seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.

reads

label:: Select sample(s) (FASTQ)
type:: data:reads:fastq
description:: Reads in FASTQ file, single or paired end.
required:: True
disabled:: False
hidden:: False

salmon_index

label:: Salmon index
type:: data:index:salmon
description:: Transcriptome index file created using the Salmon indexing tool.
required:: True
disabled:: False
hidden:: False

genome

label:: Indexed reference genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation
type:: data:annotation
description:: GTF and GFF3 annotation formats are supported.
required:: True
disabled:: False
hidden:: False

rrna_reference

label:: Indexed rRNA reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

globin_reference

label:: Indexed Globin reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

preprocessing.adapters

label:: Adapters
type:: list:data:seq:nucleotide
description:: FASTA file(s) with adapters.
required:: False
disabled:: False
hidden:: False

preprocessing.custom_adapter_sequences

label:: Custom adapter sequences
type:: list:basic:string
description:: Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

preprocessing.kmer_length

label:: K-mer length
type:: basic:integer
description:: K-mer length must be smaller or equal to the length of adapters.
required:: True
disabled:: False
hidden:: False
default:: 23

preprocessing.min_k

label:: Minimum k-mer length at right end of reads used for trimming
type:: basic:integer
required:: True
disabled:: preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
hidden:: False
default:: 11

preprocessing.hamming_distance

label:: Maximum Hamming distance for k-mers
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 1

preprocessing.maxns

label:: Max Ns after trimming
type:: basic:integer
description:: If non-negative, reads with more Ns than this (after trimming) will be discarded.
required:: True
disabled:: False
hidden:: False
default:: -1

preprocessing.trim_quality

label:: Quality below which to trim reads from the right end
type:: basic:integer
description:: Phred algorithm is used, which is more accurate than naive trimming.
required:: True
disabled:: False
hidden:: False
default:: 10

preprocessing.min_length

label:: Minimum read length
type:: basic:integer
description:: Reads shorter than minimum read length after trimming are discarded.
required:: True
disabled:: False
hidden:: False
default:: 20

preprocessing.quality_encoding_offset

label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+: 33
Illumina up to 1.3+, 1.5+: 64
Auto: auto

preprocessing.ignore_bad_quality

label:: Ignore bad quality
type:: basic:boolean
description:: Don’t crash if quality values appear to be incorrect.
required:: True
disabled:: False
hidden:: False
default:: False

quantification.seq_bias

label:: Perform sequence-specific bias correction
type:: basic:boolean
description:: Perform sequence-specific bias correction.
required:: True
disabled:: False
hidden:: False
default:: True

quantification.gc_bias

label:: Perform fragment GC bias correction
type:: basic:boolean
description:: Perform fragment GC bias correction. If single-end reads are selected as input in this workflow, it is recommended that you set this option to False. If you selected paired-end reads as input in this workflow, it is recommended that you set this option to True.
required:: False
disabled:: False
hidden:: False

quantification.consensus_slack

label:: Consensus slack
type:: basic:decimal
description:: The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.
required:: False
disabled:: False
hidden:: False

quantification.min_score_fraction

label:: Minimum alignment score fraction
type:: basic:decimal
description:: The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].
required:: True
disabled:: False
hidden:: False
default:: 0.65

quantification.range_factorization_bins

label:: Range factorization bins
type:: basic:integer
description:: Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.
required:: True
disabled:: False
hidden:: False
default:: 4

quantification.min_assigned_frag

label:: Minimum number of assigned fragments
type:: basic:integer
description:: The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.
required:: True
disabled:: False
hidden:: False
default:: 10

quantification.num_bootstraps

label:: –numBootstraps
type:: basic:integer
description:: Salmon has the ability to optionally compute bootstrapped abundance estimates. This is done by resampling (with replacement) from the counts assigned to the fragment equivalence classes, and then re-running the optimization procedure, either the EM or VBEM, for each such sample. The values of these different bootstraps allows us to assess technical variance in the main abundance estimates we produce. Such estimates can be useful for downstream (e.g. differential expression) tools that can make use of such uncertainty estimates. This option takes a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required.
required:: False
disabled:: quantification.num_gibbs_samples
hidden:: False

quantification.num_gibbs_samples

label:: –numGibbsSamples
type:: basic:integer
description:: Just as with the bootstrap procedure above, this option produces samples that allow us to estimate the variance in abundance estimates. However, in this case the samples are generated using posterior Gibbs sampling over the fragment equivalence classes rather than bootstrapping. We are currently analyzing these different approaches to assess the potential trade-offs in time / accuracy. The –numBootstraps and –numGibbsSamples options are mutually exclusive (i.e. in a given run, you must set at most one of these options to a positive integer.)
required:: False
disabled:: quantification.num_bootstraps
hidden:: False

downsampling.n_reads

label:: Number of reads
type:: basic:integer
description:: Number of reads to include in subsampling.
required:: True
disabled:: False
hidden:: False
default:: 10000000

downsampling.seed

label:: Number of reads
type:: basic:integer
description:: Using the same random seed makes reads subsampling reproducible in different environments.
required:: True
disabled:: False
hidden:: False
default:: 11

downsampling.fraction

label:: Fraction of reads
type:: basic:decimal
description:: Use the fraction of reads [0.0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
required:: False
disabled:: False
hidden:: False

downsampling.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory usage.
required:: True
disabled:: False
hidden:: False
default:: False

BED file

data:bedupload-bed (basic:file src, basic:string species, basic:string build)[Source: v1.5.0]

Import a BED file (.bed) which is a tab-delimited text file that defines a feature track. It can have any file extension, but .bed is recommended. The BED file format is described on the [UCSC Genome Bioinformatics web site](http://genome.ucsc.edu/FAQ/FAQformat#format1).

src

label:: BED file
type:: basic:file
description:: Upload BED file annotation track. The first three required BED fields are chrom, chromStart and chromEnd.
required:: True
validate_regex:: \.(bed|narrowPeak)$

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Genome build
type:: basic:string

bed

label:: BED file
type:: basic:file

bed_jbrowse

label:: Bgzip bed file for JBrowse
type:: basic:file

tbi_jbrowse

label:: Bed file index for Jbrowse
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BEDPE file

data:bedpe:upload-bedpe (basic:file src, basic:string species, basic:string build)[Source: v1.3.1]

Upload BEDPE files.

src

label:: Select BEDPE file to upload
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

bedpe

label:: BEDPE file
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

BWA ALN

data:alignment:bam:bwaalnalignment-bwa-aln (data:index:bwa genome, data:reads:fastq reads, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v2.6.2]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for Illumina sequence reads up to 100bp.

genome

label:: Reference genome
type:: data:index:bwa

reads

label:: Reads
type:: data:reads:fastq

q

label:: Quality threshold
type:: basic:integer
description:: Parameter for dynamic read trimming.
default:: 0

use_edit

label:: Use maximum edit distance (excludes fraction of missing alignments)
type:: basic:boolean
default:: False

edit_value

label:: Maximum edit distance
type:: basic:integer
hidden:: !use_edit
default:: 5

fraction

label:: Fraction of missing alignments
type:: basic:decimal
description:: The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
hidden:: use_edit
default:: 0.04

seeds

label:: Use seeds
type:: basic:boolean
default:: False

seed_length

label:: Seed length
type:: basic:integer
description:: Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
hidden:: !seeds
default:: 35

seed_dist

label:: Seed maximum edit distance
type:: basic:integer
hidden:: !seeds
default:: 2

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

stats

label:: Statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BWA MEM

data:alignment:bam:bwamemalignment-bwa-mem (data:index:bwa genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v3.6.0]

BWA MEM is a read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW). See [here](http://bio-bwa.sourceforge.net/) for more information.

genome

label:: Reference genome
type:: data:index:bwa

reads

label:: Reads
type:: data:reads:fastq

seed_l

label:: Minimum seed length
type:: basic:integer
description:: Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
default:: 19

band_w

label:: Band width
type:: basic:integer
description:: Gaps longer than this will not be found.
default:: 100

re_seeding

label:: Re-seeding factor
type:: basic:decimal
description:: Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:: 1.5

m

label:: Mark shorter split hits as secondary
type:: basic:boolean
description:: Mark shorter split hits as secondary (for Picard compatibility)
default:: False

scoring.match

label:: Score of a match
type:: basic:integer
default:: 1

scoring.missmatch

label:: Mismatch penalty
type:: basic:integer
default:: 4

scoring.gap_o

label:: Gap open penalty
type:: basic:integer
default:: 6

scoring.gap_e

label:: Gap extension penalty
type:: basic:integer
default:: 1

scoring.clipping

label:: Clipping penalty
type:: basic:integer
description:: Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:: 5

scoring.unpaired_p

label:: Penalty for an unpaired read pair
type:: basic:integer
description:: Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
default:: 9

reporting.report_all

label:: Report all found alignments
type:: basic:boolean
description:: Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
default:: False

reporting.report_tr

label:: Report threshold score
type:: basic:integer
description:: Don’t output alignment with score lower than defined number. This option only affects output.
default:: 30

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

stats

label:: Statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BWA MEM2

data:alignment:bam:bwamem2alignment-bwa-mem2 (data:index:bwamem2 genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v1.3.0]

Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. See [here](https://github.com/bwa-mem2/bwa-mem2) for more information.

genome

label:: Reference genome
type:: data:index:bwamem2

reads

label:: Reads
type:: data:reads:fastq

seed_l

label:: Minimum seed length
type:: basic:integer
description:: Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
default:: 19

band_w

label:: Band width
type:: basic:integer
description:: Gaps longer than this will not be found.
default:: 100

re_seeding

label:: Re-seeding factor
type:: basic:decimal
description:: Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:: 1.5

m

label:: Mark shorter split hits as secondary
type:: basic:boolean
description:: Mark shorter split hits as secondary (for Picard compatibility)
default:: False

scoring.match

label:: Score of a match
type:: basic:integer
default:: 1

scoring.missmatch

label:: Mismatch penalty
type:: basic:integer
default:: 4

scoring.gap_o

label:: Gap open penalty
type:: basic:integer
default:: 6

scoring.gap_e

label:: Gap extension penalty
type:: basic:integer
default:: 1

scoring.clipping

label:: Clipping penalty
type:: basic:integer
description:: Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:: 5

scoring.unpaired_p

label:: Penalty for an unpaired read pair
type:: basic:integer
description:: Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
default:: 9

reporting.report_all

label:: Report all found alignments
type:: basic:boolean
description:: Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
default:: False

reporting.report_tr

label:: Report threshold score
type:: basic:integer
description:: Don’t output alignment with score lower than defined number. This option only affects output.
default:: 30

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

stats

label:: Statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BWA SW

data:alignment:bam:bwaswalignment-bwa-sw (data:index:bwa genome, data:reads:fastq reads, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e)[Source: v2.5.2]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The paired-end mode only works for reads Illumina short-insert libraries.

genome

label:: Reference genome
type:: data:index:bwa

reads

label:: Reads
type:: data:reads:fastq

match

label:: Score of a match
type:: basic:integer
default:: 1

missmatch

label:: Mismatch penalty
type:: basic:integer
default:: 3

gap_o

label:: Gap open penalty
type:: basic:integer
default:: 5

gap_e

label:: Gap extension penalty
type:: basic:integer
default:: 2

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

stats

label:: Statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

BWA genome index

data:index:bwa:bwa-index (data:seq:nucleotide ref_seq)[Source: v1.2.0]

Create BWA genome index.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

index

label:: BWA index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

BWA-MEM2 genome index

data:index:bwamem2:bwamem2-index (data:seq:nucleotide ref_seq)[Source: v1.1.0]

Create BWA-MEM2 genome index.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

index

label:: BWA-MEM2 index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

BWA-MEM2 index files

data:index:bwamem2:upload-bwamem2-index (basic:file ref_seq, basic:file index_name, basic:string species, basic:string build)[Source: v1.0.0]

Import BWA-MEM2 index files.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: basic:file
required:: True
disabled:: False
hidden:: False

index_name

label:: BWA-MEM2 index files
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field. For sequences that are not related to any particular species (e.g. adapters file), you can select the value Other.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Macaca mulatta: Macaca mulatta
Dictyostelium discoideum: Dictyostelium discoideum
Other: Other

build

label:: Genome build
type:: basic:string
required:: True
disabled:: False
hidden:: False

index

label:: BWA-MEM2 index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Bam split

data:alignment:bam:primarybam-split (data:alignment:bam bam, data:sam:header header, data:sam:header header2)[Source: v0.9.1]

Split hybrid bam file into two bam files.

bam

label:: Hybrid alignment bam
type:: data:alignment:bam

header

label:: Primary header sam file (optional)
type:: data:sam:header
description:: If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
required:: False

header2

label:: Secondary header sam file (optional)
type:: data:sam:header
description:: If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
required:: False

bam

label:: Uploaded file
type:: basic:file

bai

label:: Index BAI
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Bamclipper

data:alignment:bam:bamclipped:bamclipper (data:alignment:bam alignment, data:bedpe bedpe, basic:boolean skip)[Source: v1.5.1]

Remove primer sequence from BAM alignments by soft-clipping. This process is a wrapper for bamclipper which can be found at https://github.com/tommyau/bamclipper.

alignment

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

bedpe

label:: BEDPE file
type:: data:bedpe
required:: False
disabled:: False
hidden:: False

skip

label:: Skip Bamclipper step
type:: basic:boolean
description:: Use this option to skip Bamclipper step.
required:: True
disabled:: False
hidden:: False
default:: False

bam

label:: Clipped BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of clipped BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Bamliquidator

data:bam:plot:bamliquidatorbamliquidator (basic:string analysis_type, list:data:alignment:bam bam, basic:string cell_type, basic:integer bin_size, data:annotation:gtf regions_gtf, data:bed regions_bed, basic:integer extension, basic:string sense, basic:boolean skip_plot, list:basic:string black_list, basic:integer threads)[Source: v0.3.3]

Set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.

analysis_type

label:

Analysis type

type:

basic:string

default:

bin

choices:

Bin mode: bin
Region mode: region
BED mode: bed

bam

label:: BAM File
type:: list:data:alignment:bam

cell_type

label:: Cell type
type:: basic:string
default:: cell_type

bin_size

label:: Bin size
type:: basic:integer
description:: Number of base pairs in each bin. The smaller the bin size the longer the runtime and the larger the data files. Default is 100000.
required:: False
hidden:: analysis_type != ‘bin’

regions_gtf

label:: Region gff file / Annotation file (.gff|.gtf)
type:: data:annotation:gtf
required:: False
hidden:: analysis_type != ‘region’

regions_bed

label:: Region bed file / Annotation file (.bed)
type:: data:bed
required:: False
hidden:: analysis_type != ‘bed’

extension

label:: Extension
type:: basic:integer
description:: Extends reads by number of bp
default:: 200

sense

label:

Mapping strand to gff file

type:

basic:string

default:

.

choices:

Forward: +
Reverse: -
Both: .

skip_plot

label:: Skip plot
type:: basic:boolean
required:: False

black_list

label:: Black list
type:: list:basic:string
description:: One or more chromosome patterns to skip during bin liquidation. Default is to skip any chromosomes that contain any of the following substrings chrUn _random Zv9_ _hap.
required:: False

threads

label:: Threads
type:: basic:integer
description:: Number of threads to run concurrently during liquidation.
default:: 1

analysis_type

label:: Analysis type
type:: basic:string
hidden:: True

output_dir

label:: Output directory
type:: basic:file

counts

label:: Counts HDF5 file
type:: basic:file

matrix

label:: Matrix file
type:: basic:file
required:: False
hidden:: analysis_type != ‘region’

summary

label:: Summary file
type:: basic:file:html
required:: False
hidden:: analysis_type != ‘bin’

Bamplot

data:bam:plot:bamplotbamplot (basic:string genome, data:annotation:gtf input_gff, basic:string input_region, list:data:alignment:bam bam, basic:integer stretch_input, basic:string color, basic:string sense, basic:integer extension, basic:boolean rpm, basic:string yscale, list:basic:string names, basic:string plot, basic:string title, basic:string scale, list:data:bed bed, basic:boolean multi_page)[Source: v1.4.3]

Plot a single locus from a bam.

genome

label:

Genome

type:

basic:string

choices:

HG19: HG19
HG18: HG18
MM8: MM8
MM9: MM9
MM10: MM10
RN6: RN6
RN4: RN4

input_gff

label:: Region string
type:: data:annotation:gtf
description:: Enter .gff file.
required:: False

input_region

label:: Region string
type:: basic:string
description:: Enter genomic region e.g. chr1:+:1-1000.
required:: False

bam

label:: Bam
type:: list:data:alignment:bam
description:: bam to plot from
required:: False

stretch_input

label:: Stretch-input
type:: basic:integer
description:: Stretch the input regions to a minimum length in bp, e.g. 10000 (for 10kb).
required:: False

color

label:: Color
type:: basic:string
description:: Enter a colon separated list of colors e.g. 255,0,0:255,125,0, default samples the rainbow.
default:: 255,0,0:255,125,0

sense

label:

Sense

type:

basic:string

description:

Map to forward, reverse or’both strands. Default maps to both.

default:

both

choices:

Forward: forward
Reverse: reverse
Both: both

extension

label:: Extension
type:: basic:integer
description:: Extends reads by n bp. Default value is 200bp.
default:: 200

rpm

label:: rpm
type:: basic:boolean
description:: Normalizes density to reads per million (rpm) Default is False.
required:: False

yscale

label:

y scale

type:

basic:string

description:

Choose either relative or uniform y axis scaling. Default is relative scaling.

default:

relative

choices:

relative: relative
uniform: uniform

names

label:: Names
type:: list:basic:string
description:: Enter a comma separated list of names for your bams.
required:: False

plot

label:

Single or multiple polt

type:

basic:string

description:

Choose either all lines on a single plot or multiple plots.

default:

merge

choices:

single: single
multiple: multiple
merge: merge

title

label:: Title
type:: basic:string
description:: Specify a title for the output plot(s), default will be the coordinate region.
default:: output

scale

label:: Scale
type:: basic:string
description:: Enter a comma separated list of multiplicative scaling factors for your bams. Default is none.
required:: False

bed

label:: Bed
type:: list:data:bed
description:: Add a space-delimited list of bed files to plot.
required:: False

multi_page

label:: Multi page
type:: basic:boolean
description:: If flagged will create a new pdf for each region.
default:: False

plot

label:: region plot
type:: basic:file

BaseQualityScoreRecalibrator

data:alignment:bam:bqsr:bqsr (data:alignment:bam bam, data:seq:nucleotide reference, list:data:variants:vcf known_sites, data:bed intervals, basic:string read_group, basic:string validation_stringency, basic:boolean use_original_qualities, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v2.5.1]

A two pass process of BaseRecalibrator and ApplyBQSR from GATK. See GATK website for more information on BaseRecalibrator. It is possible to modify read group using GATK’s AddOrReplaceGroups through Replace read groups in BAM (``read_group``) input field.

bam

label:: BAM file containing reads
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

reference

label:: Reference genome file
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

known_sites

label:: List of known sites of variation
type:: list:data:variants:vcf
required:: True
disabled:: False
hidden:: False

intervals

label:: One or more genomic intervals over which to operate.
type:: data:bed
description:: This field is optional, but it can speed up the process by restricting calculations to specific genome regions.
required:: False
disabled:: False
hidden:: False

read_group

label:: Replace read groups in BAM
type:: basic:string
description:: Replace read groups in a BAM file.This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.
required:: True
disabled:: False
hidden:: False
default:

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

advanced.use_original_qualities

label:: Use the base quality scores from the OQ tag
type:: basic:boolean
description:: This flag tells GATK to use the original base qualities (that were in the data before BQSR/recalibration) which are stored in the OQ tag, if they are present, rather than use the post-recalibration quality scores. If no OQ tag is present for a read, the standard qual score will be used.
required:: True
disabled:: False
hidden:: False
default:: False

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

bam

label:: Base quality score recalibrated BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of base quality score recalibrated BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

recal_table

label:: Recalibration tabled
type:: basic:file
required:: True
disabled:: False
hidden:: False

BaseSpace file

data:file:basespace-file-import (basic:string file_id, basic:secret access_token_secret, basic:string output, basic:integer tries, basic:boolean verbose)[Source: v1.5.1]

Import a file from Illumina BaseSpace.

file_id

label:: BaseSpace file ID
type:: basic:string
required:: True
disabled:: False
hidden:: False

access_token_secret

label:: BaseSpace access token
type:: basic:secret
description:: BaseSpace access token secret handle needed to download the file.
required:: True
disabled:: False
hidden:: False

advanced.output

label:

Output

type:

basic:string

description:

Sets what is printed to standard output. Argument ‘Full’ outputs everything, argument ‘Filename’ outputs only file names of downloaded files.

required:

True

disabled:

False

hidden:

False

default:

filename

choices:

Full: full
Filename: filename

advanced.tries

label:: Tries
type:: basic:integer
description:: Number of tries to download a file before giving up.
required:: True
disabled:: False
hidden:: False
default:: 3

advanced.verbose

label:: Verbose
type:: basic:boolean
description:: Print detailed exception information to standard output when error occurs. Output argument had no effect on this argument.
required:: True
disabled:: False
hidden:: False
default:: False

file

label:: File with reads
type:: basic:file
required:: True
disabled:: False
hidden:: False

Bedtools bamtobed

data:bedpe:bedtools-bamtobed (data:alignment:bam alignment)[Source: v1.3.1]

Takes in a BAM file and calculates a normalization factor in BEDPE format. Done by sorting with Samtools and transformed with Bedtools.

alignment

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

bedpe

label:: BEDPE file
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Beta Cut & Run workflow

data:workflow:cutnrun:workflow-cutnrun-beta (data:reads:fastq:paired reads, basic:integer quality, basic:integer nextseq, basic:integer min_length, list:basic:string adapter_1, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, data:index:bowtie2 genome, data:index:bowtie2 spikein_genome, basic:string alignment_mode, basic:string speed, basic:boolean dovetail, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean discordantly, basic:boolean no_unal, basic:boolean skip_norm, basic:decimal scale, basic:boolean downsample_reads, basic:integer n_reads, basic:boolean remove_duplicates)[Source: v2.0.0]

Beta Cut & Run workflow. Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN, which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome (optional). Aligned reads are processed to produce bigwig files to be viewed in a genome browser.

reads

label:: Input Reads (FASTQ)
type:: data:reads:fastq:paired
description:: Paired-end reads in FASTQ file.
required:: True
disabled:: False
hidden:: False

trimming_options.quality

label:: Quality cutoff
type:: basic:integer
description:: Trim low-quality ends from reads based on Phred score. Default: 20.
required:: True
disabled:: False
hidden:: False
default:: 20

trimming_options.nextseq

label:: NextSeq/NovaSeq trim cutoff
type:: basic:integer
description:: NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
required:: False
disabled:: False
hidden:: False

trimming_options.min_length

label:: Minimum length after trimming
type:: basic:integer
description:: Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than the specified length to be printed out to validated paired-end files. A value of 0 disables filtering based on length. Default: 20.
required:: True
disabled:: False
hidden:: False
default:: 20

trimming_options.adapter_options.adapter_1

label:: Read 1 adapter sequence
type:: list:basic:string
description:: Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with Read 1 adapters file and Universal adapters.
required:: False
disabled:: False
hidden:: False
default:: []

trimming_options.adapter_options.adapter_2

label:: Read 2 adapter sequence
type:: list:basic:string
description:: Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with Read 2 adapters file and Universal adapters.
required:: False
disabled:: False
hidden:: False
default:: []

trimming_options.adapter_options.adapter_file_1

label:: Read 1 adapters file
type:: data:seq:nucleotide
description:: This is mutually exclusive with Read 1 adapters and Universal adapters.
required:: False
disabled:: False
hidden:: False

trimming_options.adapter_options.adapter_file_2

label:: Read 2 adapters file
type:: data:seq:nucleotide
description:: This is mutually exclusive with Read 2 adapters and Universal adapters.
required:: False
disabled:: False
hidden:: False

trimming_options.adapter_options.universal_adapter

label:

Universal adapters

type:

basic:string

description:

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the min length value to 18bp. If smallRNA libraries are paired-end, then Read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required:

False

disabled:

False

hidden:

False

choices:

Illumina: --illumina
Nextera: --nextera
Illumina small RNA: --small_rna

trimming_options.adapter_options.stringency

label:: Overlap with adapter sequence required to trim
type:: basic:integer
description:: Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
required:: True
disabled:: False
hidden:: False
default:: 1

trimming_options.adapter_options.error_rate

label:: Maximum allowed error rate
type:: basic:decimal
description:: Number of errors divided by the length of the matching region. Default: 0.1.
required:: True
disabled:: False
hidden:: False
default:: 0.1

alignment_options.genome

label:: Species genome
type:: data:index:bowtie2
required:: True
disabled:: False
hidden:: False

alignment_options.spikein_genome

label:: Spike-in genome
type:: data:index:bowtie2
required:: False
disabled:: normalization_options.skip_norm == true
hidden:: False

alignment_options.alignment_mode

label:

Alignment mode

type:

basic:string

description:

Local: Some characters may be omitted (‘soft clipped’) from the ends in order to achieve the greatest possible alignment score. End-to-end: Option without any trimming (or ‘soft clipping’) of bases from either end. This option is enabled by default and is suitable if reads have been clipped beforehand.

required:

True

disabled:

False

hidden:

False

default:

--end-to-end

choices:

Local: --local
End-to-end: --end-to-end

alignment_options.speed

label:

Speed vs. Sensitivity

type:

basic:string

description:

Setting for aligning fast or accurately. Default: Very sensitive.

required:

True

disabled:

False

hidden:

False

default:

--very-sensitive

choices:

Very fast: --very-fast
Fast: --fast
Sensitive: --sensitive
Very sensitive: --very-sensitive

alignment_options.pe_options.dovetail

label:: Dovetail
type:: basic:boolean
description:: If the mates dovetail, it implies that if the alignment of one mate extends beyond the starting point of the other, it results in the incorrect mate initiating upstream. This condition is considered concordant. Default: True.
required:: True
disabled:: False
hidden:: False
default:: True

alignment_options.pe_options.rep_se

label:: Report single ended
type:: basic:boolean
description:: If paired alignment cannot be found, Bowtie2 tries to find alignments for the individual mates. Default: False.
required:: True
disabled:: False
hidden:: False
default:: False

alignment_options.pe_options.minins

label:: Minimal distance
type:: basic:integer
description:: The minimum fragment length (–minins) for valid paired-end alignments. Default: 10.
required:: True
disabled:: False
hidden:: False
default:: 10

alignment_options.pe_options.maxins

label:: Maximal distance
type:: basic:integer
description:: The maximum fragment length (–maxins) for valid paired-end alignments. Default: 700.
required:: True
disabled:: False
hidden:: False
default:: 700

alignment_options.pe_options.discordantly

label:: Report discordantly matched read
type:: basic:boolean
description:: If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance), alignment will still be reported. Useful for detecting structural variations. Default: False.
required:: True
disabled:: False
hidden:: False
default:: False

alignment_options.output_options.no_unal

label:: Suppress SAM records for unaligned reads
type:: basic:boolean
description:: When enabled, suppress SAM records for unaligned reads. Default: True.
required:: True
disabled:: False
hidden:: False
default:: True

normalization_options.skip_norm

label:: Skip normalization
type:: basic:boolean
description:: Skip the spike-in normalization step of BigWig output. Use this if you don’t provide a spike-in. Default: False.
required:: True
disabled:: False
hidden:: False
default:: False

normalization_options.scale

label:: Scale factor
type:: basic:decimal
description:: Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)). Default: 10000.
required:: True
disabled:: normalization_options.skip_norm == true
hidden:: False
default:: 10000

downsampling_options.downsample_reads

label:: Downsample reads
type:: basic:boolean
description:: Option to downsample reads before trimming. Default: True
required:: True
disabled:: False
hidden:: False
default:: True

downsampling_options.n_reads

label:: Number of reads to downsample
type:: basic:integer
description:: Number of reads to downsample from the input FASTQ file. Default: 10M.
required:: True
disabled:: downsampling_options.downsample_reads == false
hidden:: False
default:: 10000000

deduplication_options.remove_duplicates

label:: Remove duplicates
type:: basic:boolean
description:: Option on how to handle duplicate reads. True: Mark and remove duplicate reads. False: Only mark duplicate reads. Note that this option is only available for species genome. In case of spike-in genome, duplicate reads are always removed. Default: False.
required:: True
disabled:: False
hidden:: False
default:: False

Bisulfite conversion rate

data:wgbs:bsrate:bs-conversion-rate (data:alignment:bam:walt mr, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich)[Source: v1.3.1]

Estimate bisulfite conversion rate in a control set. The program bsrate included in [Methpipe] (https://github.com/smithlabcode/methpipe) will estimate the bisulfite conversion rate.

mr

label:: Aligned reads from bisulfite sequencing
type:: data:alignment:bam:walt
description:: Bisulfite specifc alignment such as WALT is required as .mr file type is used. Duplicatesshould be removed to reduce any bias introduced by incomplete conversion on PCR duplicatereads.
required:: True
disabled:: False
hidden:: False

skip

label:: Skip Bisulfite conversion rate step
type:: basic:boolean
description:: Bisulfite conversion rate step can be skipped.
required:: True
disabled:: False
hidden:: False
default:: False

sequence

label:: Unmethylated control sequence
type:: data:seq:nucleotide
description:: Separate unmethylated control sequence FASTA file is required to estimate bisulfiteconversion rate.
required:: False
disabled:: False
hidden:: False

count_all

label:: Count all cytosines including CpGs
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

read_length

label:: Average read length
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 150

max_mismatch

label:: Maximum fraction of mismatches
type:: basic:decimal
required:: False
disabled:: False
hidden:: False

a_rich

label:: Reads are A-rich
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

report

label:: Bisulfite conversion rate report
type:: basic:file
required:: True
disabled:: False
hidden:: False

Bowtie (Dicty)

data:alignment:bam:bowtie1alignment-bowtie (data:index:bowtie genome, data:reads:fastq reads, basic:string mode, basic:integer m, basic:integer l, basic:boolean use_se, basic:integer trim_5, basic:integer trim_3, basic:integer trim_nucl, basic:integer trim_iter, basic:string r)[Source: v2.5.2]

An ultrafast memory-efficient short read aligner.

genome

label:: Reference genome
type:: data:index:bowtie

reads

label:: Reads
type:: data:reads:fastq

mode

label:

Alignment mode

type:

basic:string

description:

When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq’s default policy. 1. Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the “seed”. 2. The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40. In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.

default:

-n

choices:

Use qualities (-n): -n
Use mismatches (-v): -v

m

label:: Allowed mismatches
type:: basic:integer
description:: When used with “Use qualities (-n)” it is the maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/–seedlen). This may be 0, 1, 2 or 3 and the default is 2 When used with “Use mismatches (-v)” report alignments with at most <int> mismatches.
default:: 2

l

label:: Seed length (for -n only)
type:: basic:integer
description:: Only for “Use qualities (-n)”. Seed length (-l) is the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
default:: 28

use_se

label:: Map as single-ended (for paired end reads only)
type:: basic:boolean
description:: If this option is selected paired-end reads will be mapped as single-ended.
default:: False

start_trimming.trim_5

label:: Bases to trim from 5’
type:: basic:integer
description:: Number of bases to trim from from 5’ (left) end of each read before alignment
default:: 0

start_trimming.trim_3

label:: Bases to trim from 3’
type:: basic:integer
description:: Number of bases to trim from from 3’ (right) end of each read before alignment
default:: 0

trimming.trim_nucl

label:: Bases to trim
type:: basic:integer
description:: Number of bases to trim from 3’ end in each iteration.
default:: 2

trimming.trim_iter

label:: Iterations
type:: basic:integer
description:: Number of iterations.
default:: 0

reporting.r

label:

Reporting mode

type:

basic:string

description:

Report up to <int> valid alignments per read or pair (-k) (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the –best and –strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/–offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

default:

-a -m 1 --best --strata

choices:

Report unique alignments: -a -m 1 --best --strata
Report all alignments: -a --best
Report all alignments in the best stratum: -a --best --strata

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

stats

label:: Statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Bowtie genome index

data:index:bowtie:bowtie-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]

Create Bowtie genome index.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

index

label:: Bowtie index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Bowtie2

data:alignment:bam:bowtie2alignment-bowtie2 (data:index:bowtie2 genome, data:reads:fastq reads, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:boolean no_unal)[Source: v2.8.2]

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small–typically about 2.2 GB for the human genome (2.9 GB for paired-end). See [here](http://bowtie-bio.sourceforge.net/index.shtml) for more information.

genome

label:: Reference genome
type:: data:index:bowtie2

reads

label:: Reads
type:: data:reads:fastq

mode

label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--end-to-end

choices:

end to end mode: --end-to-end
local: --local

speed

label:

Speed vs. Sensitivity

type:

basic:string

description:

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

required:

False

choices:

Very fast: --very-fast
Fast: --fast
Sensitive: --sensitive
Very sensitive: --very-sensitive

PE_options.use_se

label:: Map as single-ended (for paired-end reads only)
type:: basic:boolean
description:: If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
default:: False

PE_options.discordantly

label:: Report discordantly matched read
type:: basic:boolean
description:: If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
default:: True

PE_options.rep_se

label:: Report single ended
type:: basic:boolean
description:: If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
default:: True

PE_options.minins

label:: Minimal distance
type:: basic:integer
description:: The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
default:: 0

PE_options.maxins

label:: Maximal distance
type:: basic:integer
description:: The maximum fragment length for valid paired-end alignments.
default:: 500

PE_options.no_overlap

label:: Not concordant when mates overlap
type:: basic:boolean
description:: When true, it is considered not concordant when mates overlap at all. Defaul is false.
default:: False

PE_options.dovetail

label:: Dovetail
type:: basic:boolean
description:: If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment.
default:: False

alignment_options.N

label:: Number of mismatches allowed in seed alignment (N)
type:: basic:integer
description:: Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
required:: False

alignment_options.L

label:: Length of seed substrings (L)
type:: basic:integer
description:: Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
required:: False

alignment_options.gbar

label:: Disallow gaps within positions (gbar)
type:: basic:integer
description:: Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
required:: False

alignment_options.mp

label:: Maximal and minimal mismatch penalty (mp)
type:: basic:string
description:: Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
required:: False

alignment_options.rdg

label:: Set read gap open and extend penalties (rdg)
type:: basic:string
description:: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:: False

alignment_options.rfg

label:: Set reference gap open and close penalties (rfg)
type:: basic:string
description:: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:: False

alignment_options.score_min

label:: Minimum alignment score needed for “valid” alignment (score_min)
type:: basic:string
description:: Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
required:: False

start_trimming.trim_5

label:: Bases to trim from 5’
type:: basic:integer
description:: Number of bases to trim from from 5’ (left) end of each read before alignment
default:: 0

start_trimming.trim_3

label:: Bases to trim from 3’
type:: basic:integer
description:: Number of bases to trim from from 3’ (right) end of each read before alignment
default:: 0

trimming.trim_iter

label:: Iterations
type:: basic:integer
description:: Number of iterations.
default:: 0

trimming.trim_nucl

label:: Bases to trim
type:: basic:integer
description:: Number of bases to trim from 3’ end in each iteration.
default:: 2

reporting.rep_mode

label:

Report mode

type:

basic:string

description:

Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments

default:

def

choices:

Default mode: def
-k mode: k
-a mode (very slow): a

reporting.k_reports

label:: Number of reports (for -k mode only)
type:: basic:integer
description:: Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. default: 5
default:: 5

output_opts.no_unal

label:: Suppress SAM records for unaligned reads
type:: basic:boolean
description:: When true, suppress SAM records for unaligned reads. Default is false.
default:: False

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

stats

label:: Statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Bowtie2 genome index

data:index:bowtie2:bowtie2-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]

Create Bowtie2 genome index.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

index

label:: Bowtie2 index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Calculate coverage (bamCoverage)

data:coverage:bigwig:calculate-bigwig (data:alignment:bam alignment, data:bedpe bedpe, basic:decimal scale, basic:integer bin_size)[Source: v2.0.1]

Calculate bigWig coverage track. Deeptools bamCoverage takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig) as output. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. For more information is available in the [bamCoverage documentation](https://deeptools.readthedocs.io/en/latest/content/tools/bamCoverage.html).

alignment

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

bedpe

label:: BEDPE Normalization factor
type:: data:bedpe
description:: The BEDPE file describes disjoint genome features, such as structural variations or paired-end sequence alignments. It is used to estimate the scale factor [–scaleFactor].
required:: False
disabled:: False
hidden:: False

scale

label:: Scale for the normalization factor
type:: basic:decimal
description:: Magnitude of the scale factor. The scaling factor [–scaleFactor] is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).
required:: True
disabled:: !bedpe
hidden:: False
default:: 10000

bin_size

label:: Bin size[–binSize]
type:: basic:integer
description:: Size of the bins (in bp) for the output bigWig file. A smaller bin size value will result in a higher resolution of the coverage track but also in a larger file size.
required:: True
disabled:: False
hidden:: False
default:: 50

bigwig

label:: Coverage file (bigWig)
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Cell Ranger Count

data:scexpression:10x:cellranger-count (data:screads:10x: reads, data:genomeindex:10x: genome_index, basic:string chemistry, basic:integer trim_r1, basic:integer trim_r2, basic:integer expected_cells, basic:integer force_cells)[Source: v1.2.2]

Perform gene expression analysis. Generate single cell feature counts for a single library. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count

reads

label:: 10x reads data object
type:: data:screads:10x:
required:: True
disabled:: False
hidden:: False

genome_index

label:: 10x genome index data object
type:: data:genomeindex:10x:
required:: True
disabled:: False
hidden:: False

chemistry

label:

Chemistry

type:

basic:string

description:

Assay configuration. By default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection.

required:

False

disabled:

False

hidden:

False

default:

auto

choices:

auto: auto
threeprime: Single Cell 3'
fiveprime: Single Cell 5'
SC3Pv1: Single Cell 3' v1
SC3Pv2: Single Cell 3' v2
SC3Pv3: Single Cell 3' v3
C5P-PE: Single Cell 5' paired-end
SC5P-R2: Single Cell 5' R2-only

trim_r1

label:: Trim R1
type:: basic:integer
description:: Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3’ v2 or Single Cell 5’. This and “Trim R2” are useful for determining the optimal read length for sequencing.
required:: False
disabled:: False
hidden:: False

trim_r2

label:: Trim R2
type:: basic:integer
description:: Hard-trim the input R2 sequence to this length.
required:: False
disabled:: False
hidden:: False

expected_cells

label:: Expected number of recovered cells
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 3000

force_cells

label:: Force cell number
type:: basic:integer
description:: Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
required:: False
disabled:: False
hidden:: False

matrix_filtered

label:: Matrix (filtered)
type:: basic:file
required:: True
disabled:: False
hidden:: False

genes_filtered

label:: Genes (filtered)
type:: basic:file
required:: True
disabled:: False
hidden:: False

barcodes_filtered

label:: Barcodes (filtered)
type:: basic:file
required:: True
disabled:: False
hidden:: False

matrix_raw

label:: Matrix (raw)
type:: basic:file
required:: True
disabled:: False
hidden:: False

genes_raw

label:: Genes (raw)
type:: basic:file
required:: True
disabled:: False
hidden:: False

barcodes_raw

label:: Barcodes (raw)
type:: basic:file
required:: True
disabled:: False
hidden:: False

report

label:: Report
type:: basic:file:html
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID source
type:: basic:string
required:: True
disabled:: False
hidden:: False

Cell Ranger Mkref

data:genomeindex:10x:cellranger-mkref (data:seq:nucleotide: genome, data:annotation:gtf: annotation)[Source: v2.1.3]

Reference preparation tool for 10x Genomics Cell Ranger. Build a Cell Ranger-compatible reference from genome FASTA and gene GTF files. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references

genome

label:: Reference genome
type:: data:seq:nucleotide:
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation
type:: data:annotation:gtf:
required:: True
disabled:: False
hidden:: False

genome_index

label:: Indexed genome
type:: basic:dir
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID source
type:: basic:string
required:: True
disabled:: False
hidden:: False

ChIP-Seq (Gene Score)

data:chipseq:genescorechipseq-genescore (data:chipseq:peakscore peakscore, basic:decimal fdr, basic:decimal pval, basic:decimal logratio)[Source: v1.3.1]

Chip-Seq analysis - Gene Score (BCM)

peakscore

label:: PeakScore file
type:: data:chipseq:peakscore
description:: PeakScore file

fdr

label:: FDR threshold
type:: basic:decimal
description:: FDR threshold value (default = 0.00005).
default:: 5e-05

pval

label:: Pval threshold
type:: basic:decimal
description:: Pval threshold value (default = 0.00005).
default:: 5e-05

logratio

label:: Log-ratio threshold
type:: basic:decimal
description:: Log-ratio threshold value (default = 2).
default:: 2.0

genescore

label:: Gene Score
type:: basic:file

ChIP-Seq (Peak Score)

data:chipseq:peakscorechipseq-peakscore (data:chipseq:callpeak:macs2 peaks, data:bed bed)[Source: v2.3.1]

Chip-Seq analysis - Peak Score (BCM)

peaks

label:: MACS2 results
type:: data:chipseq:callpeak:macs2
description:: MACS2 results file (NarrowPeak)

bed

label:: BED file
type:: data:bed

peak_score

label:: Peak Score
type:: basic:file

ChIP-seq (MACS2)

data:chipseq:batch:macs2macs2-batch (list:data:alignment:bam alignments, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.5.1]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

alignments

label:: Aligned reads
type:: list:data:alignment:bam
description:: Select multiple treatment/background samples.

promoter

label:: Promoter regions BED file
type:: data:bed
description:: BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:: False

tagalign

label:: Use tagAlign files
type:: basic:boolean
description:: Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:: True

prepeakqc_settings.q_threshold

label:: Quality filtering threshold
type:: basic:integer
default:: 30

prepeakqc_settings.n_sub

label:: Number of reads to subsample
type:: basic:integer
default:: 15000000

prepeakqc_settings.tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
default:: False

prepeakqc_settings.shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:: False

settings.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

1: 1
auto: auto
all: all

settings.duplicates_prepeak

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

1: 1
auto: auto
all: all

settings.qvalue

label:: Q-value cutoff
type:: basic:decimal
description:: The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:: False
disabled:: settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: False
disabled:: settings.qvalue
hidden:: tagalign

settings.pvalue_prepeak

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:: settings.qvalue
hidden:: !tagalign || settings.qvalue
default:: 1e-05

settings.cap_num

label:: Cap number of peaks by taking top N peaks
type:: basic:integer
description:: To keep all peaks set value to 0.
disabled:: settings.broad
default:: 500000

settings.mfold_lower

label:: MFOLD range (lower limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.mfold_upper

label:: MFOLD range (upper limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.slocal

label:: Small local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.llocal

label:: Large local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.extsize

label:: extsize
type:: basic:integer
description:: While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:: False

settings.shift

label:: Shift
type:: basic:integer
description:: Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:: False

settings.band_width

label:: Band width
type:: basic:integer
description:: The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:: False

settings.nolambda

label:: Use backgroud lambda as local lambda
type:: basic:boolean
description:: With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:: False

settings.fix_bimodal

label:: Turn on the auto paired-peak model process
type:: basic:boolean
description:: Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
default:: False

settings.nomodel

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: tagalign
default:: False

settings.nomodel_prepeak

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: !tagalign
default:: True

settings.down_sample

label:: Down-sample
type:: basic:boolean
description:: When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
default:: False

settings.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:: True

settings.spmr

label:: Save signal per million reads for fragment pileup profiles
type:: basic:boolean
disabled:: settings.bedgraph === false
default:: True

settings.call_summits

label:: Call summits
type:: basic:boolean
description:: MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:: False

settings.broad

label:: Composite broad regions
type:: basic:boolean
description:: When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:: settings.call_summits === true
default:: False

settings.broad_cutoff

label:: Broad cutoff
type:: basic:decimal
description:: Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:: False
disabled:: settings.call_summits === true || settings.broad !== true

chipqc_settings.blacklist

label:: Blacklist regions
type:: data:bed
description:: BED file containing genomic regions that should be excluded from the analysis.
required:: False

chipqc_settings.calculate_enrichment

label:: Calculate enrichment
type:: basic:boolean
description:: Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
default:: False

chipqc_settings.profile_window

label:: Window size
type:: basic:integer
description:: An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
default:: 400

chipqc_settings.shift_size

label:: Shift size
type:: basic:string
description:: Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
default:: 1:300

ChIP-seq (MACS2-ROSE2)

data:chipseq:batch:macs2macs2-rose2-batch (list:data:alignment:bam alignments, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:boolean use_filtered_bam, basic:integer tss, basic:integer stitch, data:bed mask, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.5.1]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

alignments

label:: Aligned reads
type:: list:data:alignment:bam
description:: Select multiple treatment/background samples.

promoter

label:: Promoter regions BED file
type:: data:bed
description:: BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:: False

tagalign

label:: Use tagAlign files
type:: basic:boolean
description:: Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:: True

prepeakqc_settings.q_threshold

label:: Quality filtering threshold
type:: basic:integer
default:: 30

prepeakqc_settings.n_sub

label:: Number of reads to subsample
type:: basic:integer
default:: 15000000

prepeakqc_settings.tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
default:: False

prepeakqc_settings.shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:: False

settings.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

1: 1
auto: auto
all: all

settings.duplicates_prepeak

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

1: 1
auto: auto
all: all

settings.qvalue

label:: Q-value cutoff
type:: basic:decimal
description:: The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:: False
disabled:: settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: False
disabled:: settings.qvalue
hidden:: tagalign

settings.pvalue_prepeak

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:: settings.qvalue
hidden:: !tagalign || settings.qvalue
default:: 1e-05

settings.cap_num

label:: Cap number of peaks by taking top N peaks
type:: basic:integer
description:: To keep all peaks set value to 0.
disabled:: settings.broad
default:: 500000

settings.mfold_lower

label:: MFOLD range (lower limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.mfold_upper

label:: MFOLD range (upper limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.slocal

label:: Small local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.llocal

label:: Large local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.extsize

label:: extsize
type:: basic:integer
description:: While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:: False

settings.shift

label:: Shift
type:: basic:integer
description:: Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:: False

settings.band_width

label:: Band width
type:: basic:integer
description:: The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:: False

settings.nolambda

label:: Use backgroud lambda as local lambda
type:: basic:boolean
description:: With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:: False

settings.fix_bimodal

label:: Turn on the auto paired-peak model process
type:: basic:boolean
description:: Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
default:: False

settings.nomodel

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: tagalign
default:: False

settings.nomodel_prepeak

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: !tagalign
default:: True

settings.down_sample

label:: Down-sample
type:: basic:boolean
description:: When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
default:: False

settings.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:: True

settings.spmr

label:: Save signal per million reads for fragment pileup profiles
type:: basic:boolean
disabled:: settings.bedgraph === false
default:: True

settings.call_summits

label:: Call summits
type:: basic:boolean
description:: MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:: False

settings.broad

label:: Composite broad regions
type:: basic:boolean
description:: When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:: settings.call_summits === true
default:: False

settings.broad_cutoff

label:: Broad cutoff
type:: basic:decimal
description:: Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:: False
disabled:: settings.call_summits === true || settings.broad !== true

rose_settings.use_filtered_bam

label:: Use Filtered BAM File
type:: basic:boolean
description:: Use filtered BAM file from a MACS2 object to rank enhancers by.
default:: True

rose_settings.tss

label:: TSS exclusion
type:: basic:integer
description:: Enter a distance from TSS to exclude. 0 = no TSS exclusion
default:: 0

rose_settings.stitch

label:: Stitch
type:: basic:integer
description:: Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
required:: False

rose_settings.mask

label:: Masking BED file
type:: data:bed
description:: Mask a set of regions from analysis. Provide a BED of masking regions.
required:: False

chipqc_settings.blacklist

label:: Blacklist regions
type:: data:bed
description:: BED file containing genomic regions that should be excluded from the analysis.
required:: False

chipqc_settings.calculate_enrichment

label:: Calculate enrichment
type:: basic:boolean
description:: Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
default:: False

chipqc_settings.profile_window

label:: Window size
type:: basic:integer
description:: An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
default:: 400

chipqc_settings.shift_size

label:: Shift size
type:: basic:string
description:: Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
default:: 1:300

Chemical Mutagenesis

data:workflow:chemutworkflow-chemut (basic:string analysis_type, data:seq:nucleotide genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean base_recalibration, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:integer stand_call_conf, basic:integer mbq, basic:integer read_depth)[Source: v2.1.0]

analysis_type

label:

Analysis type

type:

basic:string

description:

Choice of the analysis type. Use “SNV” or “INDEL” options to run the GATK analysis only on the haploid portion of the dicty genome. Choose options SNV_CHR2 or INDEL_CHR2 to run the analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).

default:

snv

choices:

SNV: snv
INDEL: indel
SNV_CHR2: snv_chr2
INDEL_CHR2: indel_chr2

genome

label:: Reference genome
type:: data:seq:nucleotide

parental_strains

label:: Parental strains
type:: list:data:alignment:bam

mutant_strains

label:: Mutant strains
type:: list:data:alignment:bam

Vc.base_recalibration

label:: Do variant base recalibration
type:: basic:boolean
default:: False

Vc.known_sites

label:: Known sites (dbSNP)
type:: data:variants:vcf
required:: False

Vc.known_indels

label:: Known indels
type:: list:data:variants:vcf
required:: False
hidden:: Vc.base_recalibration === false

Vc.stand_call_conf

label:: Calling confidence threshold
type:: basic:integer
description:: The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.
default:: 30

Vc.mbq

label:: Min base quality
type:: basic:integer
description:: Minimum base quality required to consider a base for calling.
default:: 10

Vf.read_depth

label:: Read depth cutoff
type:: basic:integer
description:: The minimum number of replicate reads required for a variant site to be included.
default:: 5

ChipQC

data:chipqc:chipqc (data:alignment:bam alignment, data:chipseq:callpeak peaks, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer quality_threshold, basic:integer profile_window, basic:string shift_size)[Source: v1.4.2]

Calculate quality control metrics for ChIP-seq samples. The analysis is based on ChIPQC package which computs a variety of quality control metrics and statistics, and provides plots and a report for assessment of experimental data for further analysis.

alignment

label:: Aligned reads
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

peaks

label:: Called peaks
type:: data:chipseq:callpeak
required:: True
disabled:: False
hidden:: False

blacklist

label:: Blacklist regions
type:: data:bed
description:: BED file containing genomic regions that should be excluded from the analysis.
required:: False
disabled:: False
hidden:: False

calculate_enrichment

label:: Calculate enrichment
type:: basic:boolean
description:: Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
required:: True
disabled:: False
hidden:: False
default:: False

advanced.quality_threshold

label:: Mapping quality threshold
type:: basic:integer
description:: Only reads with mapping quality scores above this threshold will be used for some statistics.
required:: True
disabled:: False
hidden:: False
default:: 15

advanced.profile_window

label:: Window size
type:: basic:integer
description:: An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
required:: True
disabled:: False
hidden:: False
default:: 400

advanced.shift_size

label:: Shift size
type:: basic:string
description:: Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
required:: True
disabled:: False
hidden:: False
default:: 1:300

report_folder

label:: ChipQC report folder
type:: basic:dir
required:: True
disabled:: False
hidden:: False

ccplot

label:: Cross coverage score plot
type:: basic:file
required:: True
disabled:: False
hidden:: False

coverage_histogram

label:: SSD metric plot
type:: basic:file
required:: True
disabled:: False
hidden:: False

peak_profile

label:: Peak profile plot
type:: basic:file
required:: True
disabled:: False
hidden:: False

peaks_barplot

label:: Barplot of reads in peaks
type:: basic:file
required:: True
disabled:: False
hidden:: False

peaks_density_plot

label:: Density plot of reads in peaks
type:: basic:file
required:: True
disabled:: False
hidden:: False

enrichment_heatmap

label:: Heatmap of reads in genomic features
type:: basic:file
required:: False
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Convert GFF3 to GTF

data:annotation:gtfgff-to-gtf (data:annotation:gff3 annotation)[Source: v0.6.0]

Convert GFF3 file to GTF format.

annotation

label:: Annotation (GFF3)
type:: data:annotation:gff3
description:: Annotation in GFF3 format.

annot

label:: Converted GTF file
type:: basic:file

annot_sorted

label:: Sorted GTF file
type:: basic:file

annot_sorted_idx_igv

label:: Igv index for sorted GTF file
type:: basic:file

annot_sorted_track_jbrowse

label:: Jbrowse track for sorted GTF
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Convert files to reads (paired-end)

data:reads:fastq:paired:files-to-fastq-paired (list:data:file src1, list:data:file src2, basic:boolean merge_lanes)[Source: v1.6.0]

Convert FASTQ files to paired-end reads.

src1

label:: Mate1
type:: list:data:file
required:: True
disabled:: False
hidden:: False

src2

label:: Mate2
type:: list:data:file
required:: True
disabled:: False
hidden:: False

merge_lanes

label:: Merge lanes
type:: basic:boolean
description:: Merge sample data split into multiple sequencing lanes into a single FASTQ file.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Reads file (mate 1)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Reads file (mate 2)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC (Upstream)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Quality control with FastQC (Downstream)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FasQC archive (Upstream)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download FasQC archive (Downstream)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Convert files to reads (single-end)

data:reads:fastq:single:files-to-fastq-single (list:data:file src, basic:boolean merge_lanes)[Source: v1.6.0]

Convert FASTQ files to single-end reads.

src

label:: Reads
type:: list:data:file
description:: Sequencing reads in FASTQ format
required:: True
disabled:: False
hidden:: False

merge_lanes

label:: Merge lanes
type:: basic:boolean
description:: Merge sample data split into multiple sequencing lanes into a single FASTQ file.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Reads file
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Cuffdiff 2.2

data:differentialexpression:cuffdiff:cuffdiff (list:data:cufflinks:cuffquant case, list:data:cufflinks:cuffquant control, list:basic:string labels, data:annotation annotation, data:seq:nucleotide genome, basic:boolean multi_read_correct, basic:boolean create_sets, basic:decimal gene_logfc, basic:decimal gene_fdr, basic:decimal fdr, basic:string library_type, basic:string library_normalization, basic:string dispersion_method)[Source: v3.4.0]

Run Cuffdiff 2.2 analysis. Cuffdiff finds significant changes in transcript expression, splicing, and promoter use. You can use it to find differentially expressed genes and transcripts, as well as genes that are being differentially regulated at the transcriptional and post-transcriptional level. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/) and [here](https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cuffdiff/7) for more information.

case

label:: Case samples
type:: list:data:cufflinks:cuffquant
required:: True
disabled:: False
hidden:: False

control

label:: Control samples
type:: list:data:cufflinks:cuffquant
required:: True
disabled:: False
hidden:: False

labels

label:: Group labels
type:: list:basic:string
description:: Define labels for each sample group.
required:: True
disabled:: False
hidden:: False
default:: ['control', 'case']

annotation

label:: Annotation (GTF/GFF3)
type:: data:annotation
description:: A transcript annotation file produced by cufflinks, cuffcompare, or other tool.
required:: True
disabled:: False
hidden:: False

genome

label:: Run bias detection and correction algorithm
type:: data:seq:nucleotide
description:: Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
required:: False
disabled:: False
hidden:: False

multi_read_correct

label:: Do initial estimation procedure to more accurately weight reads with multiple genome mappings
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

create_sets

label:: Create gene sets
type:: basic:boolean
description:: After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
required:: True
disabled:: False
hidden:: False
default:: False

gene_logfc

label:: Log2 fold change threshold for gene sets
type:: basic:decimal
description:: Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
required:: True
disabled:: False
hidden:: !create_sets
default:: 1.0

gene_fdr

label:: FDR threshold for gene sets
type:: basic:decimal
required:: True
disabled:: False
hidden:: !create_sets
default:: 0.05

fdr

label:: Allowed FDR
type:: basic:decimal
description:: The allowed false discovery rate. The default is 0.05.
required:: True
disabled:: False
hidden:: False
default:: 0.05

library_type

label:

Library type

type:

basic:string

description:

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

required:

True

disabled:

False

hidden:

False

default:

fr-unstranded

choices:

fr-unstranded: fr-unstranded
fr-firststrand: fr-firststrand
fr-secondstrand: fr-secondstrand

library_normalization

label:

Library normalization method

type:

basic:string

description:

You can control how library sizes (i.e. sequencing depths) are normalized in Cufflinks and Cuffdiff. Cuffdiff has several methods that require multiple libraries in order to work. Library normalization methods supported by Cufflinks work on one library at a time.

required:

True

disabled:

False

hidden:

False

default:

geometric

choices:

geometric: geometric
classic-fpkm: classic-fpkm
quartile: quartile

dispersion_method

label:

Dispersion method

type:

basic:string

description:

Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010).

required:

True

disabled:

False

hidden:

False

default:

pooled

choices:

pooled: pooled
per-condition: per-condition
blind: blind
poisson: poisson

raw

label:: Differential expression
type:: basic:file
required:: True
disabled:: False
hidden:: False

de_json

label:: Results table (JSON)
type:: basic:json
required:: True
disabled:: False
hidden:: False

de_file

label:: Results table (file)
type:: basic:file
required:: True
disabled:: False
hidden:: False

transcript_diff_exp

label:: Differential expression (transcript level)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tss_group_diff_exp

label:: Differential expression (primary transcript)
type:: basic:file
required:: True
disabled:: False
hidden:: False

cds_diff_exp

label:: Differential expression (coding sequence)
type:: basic:file
required:: True
disabled:: False
hidden:: False

cuffdiff_output

label:: Cuffdiff output
type:: basic:file
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID database
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

feature_type

label:: Feature type
type:: basic:string
required:: True
disabled:: False
hidden:: False

Cufflinks 2.2

data:cufflinks:cufflinkscufflinks (data:alignment:bam alignment, data:annotation annotation, data:seq:nucleotide genome, data:annotation:gtf mask_file, basic:string library_type, basic:string annotation_usage, basic:boolean multi_read_correct)[Source: v3.2.1]

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. See [here](http://cole-trapnell-lab.github.io/cufflinks/) for more information.

alignment

label:: Aligned reads
type:: data:alignment:bam

annotation

label:: Annotation (GTF/GFF3)
type:: data:annotation
required:: False

genome

label:: Run bias detection and correction algorithm
type:: data:seq:nucleotide
description:: Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
required:: False

mask_file

label:: Mask file
type:: data:annotation:gtf
description:: Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
required:: False

library_type

label:

Library type

type:

basic:string

description:

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

default:

fr-unstranded

choices:

fr-unstranded: fr-unstranded
fr-firststrand: fr-firststrand
fr-secondstrand: fr-secondstrand

annotation_usage

label:

Instruct Cufflinks how to use the provided annotation (GFF/GTF) file

type:

basic:string

description:

GTF-guide - tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled. –GTF - tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript.

default:

--GTF-guide

choices:

Use supplied reference annotation to guide RABT assembly (–GTF-guide): --GTF-guide
Use supplied reference annotation to estimate isoform expression (–GTF): --GTF

multi_read_correct

label:: Do initial estimation procedure to more accurately weight reads with multiple genome mappings
type:: basic:boolean
description:: Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
default:: False

transcripts

label:: Assembled transcript isoforms
type:: basic:file

isoforms_fpkm_tracking

label:: Isoforms FPKM tracking
type:: basic:file

genes_fpkm_tracking

label:: Genes FPKM tracking
type:: basic:file

skipped_loci

label:: Skipped loci
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Cuffmerge

data:annotation:cuffmergecuffmerge (list:data:cufflinks:cufflinks expressions, list:data:annotation:gtf gtf, data:annotation gff, data:seq:nucleotide genome, basic:integer threads)[Source: v2.2.0]

Cufflinks includes a script called Cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. The main purpose of Cuffmerge is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/) for more information.

expressions

label:: Cufflinks transcripts (GTF)
type:: list:data:cufflinks:cufflinks
required:: False

gtf

label:: Annotation files (GTF)
type:: list:data:annotation:gtf
description:: Annotation files you wish to merge together with Cufflinks produced annotation files (e.g. upload Cufflinks annotation GTF file)
required:: False

gff

label:: Reference annotation (GTF/GFF3)
type:: data:annotation
description:: An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.
required:: False

genome

label:: Reference genome
type:: data:seq:nucleotide
description:: This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats. Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension
required:: False

threads

label:: Use this many processor threads
type:: basic:integer
description:: Use this many threads to align reads. The default is 1.
default:: 1

annot

label:: Merged GTF file
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Cuffnorm

data:cuffnormcuffnorm (list:data:cufflinks:cuffquant cuffquant, data:annotation annotation, basic:boolean useERCC)[Source: v2.5.0]

Cufflinks includes a program, Cuffnorm, that you can use to generate tables of expression values that are properly normalized for library size. Cuffnorm takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM, BAM, or CXB files for two or more samples. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/) for more information. Replicate relation needs to be defined for Cuffnorm to account for replicates. If the replicate relation is not defined, each sample will be treated individually.

cuffquant

label:: Cuffquant expression file
type:: list:data:cufflinks:cuffquant

annotation

label:: Annotation (GTF/GFF3)
type:: data:annotation
description:: A transcript annotation file produced by cufflinks, cuffcompare, or other source.

useERCC

label:: ERCC spike-in normalization
type:: basic:boolean
description:: Use ERRCC spike-in controls for normalization.
default:: False

genes_count

label:: Genes count
type:: basic:file

genes_fpkm

label:: Genes FPKM
type:: basic:file

genes_attr

label:: Genes attr table
type:: basic:file

isoform_count

label:: Isoform count
type:: basic:file

isoform_fpkm

label:: Isoform FPKM
type:: basic:file

isoform_attr

label:: Isoform attr table
type:: basic:file

cds_count

label:: CDS count
type:: basic:file

cds_fpkm

label:: CDS FPKM
type:: basic:file

cds_attr

label:: CDS attr table
type:: basic:file

tss_groups_count

label:: TSS groups count
type:: basic:file

tss_groups_fpkm

label:: TSS groups FPKM
type:: basic:file

tss_attr

label:: TSS attr table
type:: basic:file

run_info

label:: Run info
type:: basic:file

raw_scatter

label:: FPKM exp scatter plot
type:: basic:file

boxplot

label:: Boxplot
type:: basic:file

fpkm_exp_raw

label:: FPKM exp raw
type:: basic:file

replicate_correlations

label:: Replicate correlatios plot
type:: basic:file

fpkm_means

label:: FPKM means
type:: basic:file

exp_fpkm_means

label:: Exp FPKM means
type:: basic:file

norm_scatter

label:: FKPM exp scatter normalized plot
type:: basic:file
required:: False

fpkm_exp_norm

label:: FPKM exp normalized
type:: basic:file
required:: False

spike_raw

label:: Spike raw
type:: basic:file
required:: False

spike_norm

label:: Spike normalized
type:: basic:file
required:: False

R_data

label:: All R normalization data
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Cuffquant 2.2

data:cufflinks:cuffquantcuffquant (data:alignment:bam alignment, data:annotation annotation, data:seq:nucleotide genome, data:annotation:gtf mask_file, basic:string library_type, basic:boolean multi_read_correct)[Source: v2.3.1]

Cuffquant allows you to compute the gene and transcript expression profiles and save these profiles to files that you can analyze later with Cuffdiff or Cuffnorm. See [here](http://cole-trapnell-lab.github.io/cufflinks/manual/) for more information.

alignment

label:: Aligned reads
type:: data:alignment:bam

annotation

label:: Annotation (GTF/GFF3)
type:: data:annotation

genome

label:: Run bias detection and correction algorithm
type:: data:seq:nucleotide
description:: Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
required:: False

mask_file

label:: Mask file
type:: data:annotation:gtf
description:: Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
required:: False

library_type

label:

Library type

type:

basic:string

description:

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

default:

fr-unstranded

choices:

fr-unstranded: fr-unstranded
fr-firststrand: fr-firststrand
fr-secondstrand: fr-secondstrand

multi_read_correct

label:: Do initial estimation procedure to more accurately weight reads with multiple genome mappings
type:: basic:boolean
description:: Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
default:: False

cxb

label:: Abundances (.cxb)
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Cuffquant results

data:cufflinks:cuffquantupload-cxb (basic:file src, basic:string source, basic:string species, basic:string build, basic:string feature_type)[Source: v1.3.3]

Upload Cuffquant results file (.cxb)

src

label:: Cuffquant file
type:: basic:file
description:: Upload Cuffquant results file. Supported extention: *.cxb
required:: True
validate_regex:: \.(cxb)$

source

label:

Gene ID database

type:

basic:string

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Build
type:: basic:string

feature_type

label:

Feature type

type:

basic:string

default:

gene

choices:

gene: gene
transcript: transcript
exon: exon

cxb

label:: Cuffquant results
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

feature_type

label:: Feature type
type:: basic:string

Cut & Run

data:workflow:cutnrunworkflow-cutnrun (data:reads:fastq:paired reads, basic:integer quality, basic:integer nextseq, basic:string phred, basic:integer min_length, basic:integer max_n, basic:boolean retain_unpaired, basic:integer unpaired_len_1, basic:integer unpaired_len_2, basic:integer clip_r1, basic:integer clip_r2, basic:integer three_prime_r1, basic:integer three_prime_r2, list:basic:string adapter, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, basic:integer trim_5, basic:integer trim_3, data:index:bowtie2 genome, basic:string mode, basic:string speed, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:boolean no_unal, data:index:bowtie2 genome, basic:string mode, basic:string speed, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:boolean no_unal, basic:string format, basic:decimal pvalue, basic:string duplicates, basic:boolean bedgraph, basic:integer min_frag_length, basic:integer max_frag_length, basic:decimal scale)[Source: v1.6.0]

Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome. Aligned reads are processed to produce bigwig files to be viewed in a genome browser. Peaks are called using MACS2. Lenght-selection of reads is performed using alignmentSieve tool from the deeptools package.

reads

label:: Input reads
type:: data:reads:fastq:paired

options_trimming.quality_trim.quality

label:: Quality cutoff
type:: basic:integer
description:: Trim low-quality ends from reads based on Phred score.
required:: False

options_trimming.quality_trim.nextseq

label:: NextSeq/NovaSeq trim cutoff
type:: basic:integer
description:: NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
required:: False

options_trimming.quality_trim.phred

label:

Phred score encoding

type:

basic:string

description:

Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1 .9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming.

default:

--phred33

choices:

ASCII+33: --phred33
ASCII+64: --phred64

options_trimming.quality_trim.min_length

label:: Minimum length after trimming
type:: basic:integer
description:: Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.
default:: 20

options_trimming.quality_trim.max_n

label:: Maximum number of Ns
type:: basic:integer
description:: Read exceeding this limit will result in the entire pair being removed from the trimmed output files.
required:: False

options_trimming.quality_trim.retain_unpaired

label:: Retain unpaired reads after trimming
type:: basic:boolean
description:: If only one of the two paired-end reads “became too short, the longer read will be written.
default:: False

options_trimming.quality_trim.unpaired_len_1

label:: Unpaired read length cutoff of mate 1
type:: basic:integer
hidden:: !quality_trim.retain_unpaired
default:: 35

options_trimming.quality_trim.unpaired_len_2

label:: Unpaired read length cutoff for mate 2
type:: basic:integer
hidden:: !quality_trim.retain_unpaired
default:: 35

options_trimming.quality_trim.clip_r1

label:: Trim bases from 5’ end of read 1
type:: basic:integer
description:: This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.
required:: False

options_trimming.quality_trim.clip_r2

label:: Trim bases from 5’ end of read 2
type:: basic:integer
description:: This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.
required:: False

options_trimming.quality_trim.three_prime_r1

label:: Trim bases from 3’ end of read 1
type:: basic:integer
description:: Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
required:: False

options_trimming.quality_trim.three_prime_r2

label:: Trim bases from 3’ end of read 2
type:: basic:integer
description:: Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
required:: False

options_trimming.adapter_trim.adapter

label:: Read 1 adapter sequence
type:: list:basic:string
description:: Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.
required:: False

options_trimming.adapter_trim.adapter_2

label:: Read 2 adapter sequence
type:: list:basic:string
description:: Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.
required:: False

options_trimming.adapter_trim.adapter_file_1

label:: Read 1 adapters file
type:: data:seq:nucleotide
description:: This is mutually exclusive with read 1 adapters and universal adapters.
required:: False

options_trimming.adapter_trim.adapter_file_2

label:: Read 2 adapters file
type:: data:seq:nucleotide
description:: This is mutually exclusive with read 2 adapters and universal adapters.
required:: False

options_trimming.adapter_trim.universal_adapter

label:

Universal adapters

type:

basic:string

description:

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required:

False

choices:

Illumina: --illumina
Nextera: --nextera
Illumina small RNA: --small_rna

options_trimming.adapter_trim.stringency

label:: Overlap with adapter sequence required to trim
type:: basic:integer
description:: Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
default:: 1

options_trimming.adapter_trim.error_rate

label:: Maximum allowed error rate
type:: basic:decimal
description:: Number of errors divided by the length of the matching region. Default value of 0.1.
default:: 0.1

options_trimming.hard_trim.trim_5

label:: Hard trim sequence from 3’ end
type:: basic:integer
description:: Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.
required:: False

options_trimming.hard_trim.trim_3

label:: Hard trim sequences from 5’ end
type:: basic:integer
description:: Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.
required:: False

options_aln_species.genome

label:: Species genome
type:: data:index:bowtie2

options_aln_species.mode

label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--local

choices:

end to end mode: --end-to-end
local: --local

options_aln_species.speed

label:

Speed vs. Sensitivity

type:

basic:string

description:

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

default:

--very-sensitive

choices:

Very fast: --very-fast
Fast: --fast
Sensitive: --sensitive
Very sensitive: --very-sensitive

options_aln_species.discordantly

label:: Report discordantly matched read
type:: basic:boolean
description:: If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
default:: True

options_aln_species.rep_se

label:: Report single ended
type:: basic:boolean
description:: If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).
default:: True

options_aln_species.minins

label:: Minimal distance
type:: basic:integer
description:: The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.
default:: 10

options_aln_species.maxins

label:: Maximal distance
type:: basic:integer
description:: The maximum fragment length (–maxins) for valid paired-end alignments.
default:: 700

options_aln_species.no_overlap

label:: Not concordant when mates overlap
type:: basic:boolean
description:: When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).
default:: False

options_aln_species.dovetail

label:: Dovetail
type:: basic:boolean
description:: If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.
default:: False

options_aln_species.no_unal

label:: Suppress SAM records for unaligned reads
type:: basic:boolean
description:: When true, suppress SAM records for unaligned reads. Default is true (–no-unal).
default:: True

options_aln_spikein.genome

label:: Spike-in genome
type:: data:index:bowtie2

options_aln_spikein.mode

label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--local

choices:

end to end mode: --end-to-end
local: --local

options_aln_spikein.speed

label:

Speed vs. Sensitivity

type:

basic:string

description:

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

default:

--very-sensitive

choices:

Very fast: --very-fast
Fast: --fast
Sensitive: --sensitive
Very sensitive: --very-sensitive

options_aln_spikein.discordantly

label:: Report discordantly matched read
type:: basic:boolean
description:: If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
default:: True

options_aln_spikein.rep_se

label:: Report single ended
type:: basic:boolean
description:: If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).
default:: True

options_aln_spikein.minins

label:: Minimal distance
type:: basic:integer
description:: The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.
default:: 10

options_aln_spikein.maxins

label:: Maximal distance
type:: basic:integer
description:: The maximum fragment length (–maxins) for valid paired-end alignments.
default:: 700

options_aln_spikein.no_overlap

label:: Not concordant when mates overlap
type:: basic:boolean
description:: When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).
default:: True

options_aln_spikein.dovetail

label:: Dovetail
type:: basic:boolean
description:: If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.
default:: False

options_aln_spikein.no_unal

label:: Suppress SAM records for unaligned reads
type:: basic:boolean
description:: When true, suppress SAM records for unaligned reads. Default is true (–no-unal).
default:: True

options_pc.format

label:

Format of tag file

type:

basic:string

description:

This specifies the format of input files. For paired-end data the format dicates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.

required:

False

default:

BAMPE

choices:

BAM: BAM
BAMPE: BAMPE

options_pc.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff.
required:: False
default:: 0.001

options_pc.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

default:

all

choices:

1: 1
auto: auto
all: all

options_pc.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10(pvalue) and -log10(qvalue) scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:: True

options_sieve.min_frag_length

label:: Minimum fragment length
type:: basic:integer
description:: The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. Default is 0.
default:: 0

options_sieve.max_frag_length

label:: Maximum fragment length
type:: basic:integer
description:: The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. Default is 0.
default:: 0

options_scale.scale

label:: Scale factor
type:: basic:decimal
description:: Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).
default:: 10000

Cutadapt (3’ mRNA-seq, single-end)

data:reads:fastq:single:cutadapt:cutadapt-3prime-single (data:reads:fastq:single reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap, basic:integer times)[Source: v1.4.2]

Process 3’ mRNA-seq datasets using Cutadapt tool.

reads

label:: Select sample(s)
type:: data:reads:fastq:single
required:: True
disabled:: False
hidden:: False

options.nextseq_trim

label:: NextSeq/NovaSeq trim
type:: basic:integer
description:: NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
required:: True
disabled:: False
hidden:: False
default:: 10

options.quality_cutoff

label:: Quality cutoff
type:: basic:integer
description:: Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
required:: False
disabled:: False
hidden:: False

options.min_len

label:: Discard reads shorter than specified minimum length.
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 20

options.min_overlap

label:: Mimimum overlap
type:: basic:integer
description:: Minimum overlap between adapter and read for an adapter to be found.
required:: True
disabled:: False
hidden:: False
default:: 20

options.times

label:: Remove up to a specified number of adapters from each read.
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 2

fastq

label:: Reads file.
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

report

label:: Cutadapt report
type:: basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC.
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive.
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Cutadapt (Corall RNA-Seq, paired-end)

data:reads:fastq:paired:cutadapt:cutadapt-corall-paired (data:reads:fastq:paired reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap)[Source: v1.3.2]

Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.

reads

label:: Select sample(s)
type:: data:reads:fastq:paired
required:: True
disabled:: False
hidden:: False

options.nextseq_trim

label:: NextSeq/NovaSeq trim
type:: basic:integer
description:: NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
required:: True
disabled:: False
hidden:: False
default:: 10

options.quality_cutoff

label:: Quality cutoff
type:: basic:integer
description:: Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
required:: False
disabled:: False
hidden:: False

options.min_len

label:: Minimum read length
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 20

options.min_overlap

label:: Mimimum overlap
type:: basic:integer
description:: Minimum overlap between adapter and read for an adapter to be found.
required:: True
disabled:: False
hidden:: False
default:: 20

fastq

label:: Remaining mate1 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Remaining mate2 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

report

label:: Cutadapt report
type:: basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Mate1 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Mate2 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download mate1 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download mate2 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Cutadapt (Corall RNA-Seq, single-end)

data:reads:fastq:single:cutadapt:cutadapt-corall-single (data:reads:fastq:single reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap)[Source: v1.4.2]

Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.

reads

label:: Select sample(s)
type:: data:reads:fastq:single
required:: True
disabled:: False
hidden:: False

options.nextseq_trim

label:: NextSeq/NovaSeq trim
type:: basic:integer
description:: NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
required:: True
disabled:: False
hidden:: False
default:: 10

options.quality_cutoff

label:: Quality cutoff
type:: basic:integer
description:: Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
required:: False
disabled:: False
hidden:: False

options.min_len

label:: Minimum read length
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 20

options.min_overlap

label:: Mimimum overlap
type:: basic:integer
description:: Minimum overlap between adapter and read for an adapter to be found.
required:: True
disabled:: False
hidden:: False
default:: 20

fastq

label:: Reads file
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

report

label:: Cutadapt report
type:: basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Cutadapt (paired-end)

data:reads:fastq:paired:cutadaptcutadapt-paired (data:reads:fastq:paired reads, data:seq:nucleotide mate1_5prime_file, data:seq:nucleotide mate1_3prime_file, data:seq:nucleotide mate2_5prime_file, data:seq:nucleotide mate2_3prime_file, list:basic:string mate1_5prime_seq, list:basic:string mate1_3prime_seq, list:basic:string mate2_5prime_seq, list:basic:string mate2_3prime_seq, basic:integer times, basic:decimal error_rate, basic:integer min_overlap, basic:boolean match_read_wildcards, basic:boolean no_indels, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer maxlen, basic:integer max_n, basic:string pair_filter)[Source: v2.7.2]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

reads

label:: Select sample(s)
type:: data:reads:fastq:paired

adapters.mate1_5prime_file

label:: 5 prime adapter file for Mate 1
type:: data:seq:nucleotide
required:: False

adapters.mate1_3prime_file

label:: 3 prime adapter file for Mate 1
type:: data:seq:nucleotide
required:: False

adapters.mate2_5prime_file

label:: 5 prime adapter file for Mate 2
type:: data:seq:nucleotide
required:: False

adapters.mate2_3prime_file

label:: 3 prime adapter file for Mate 2
type:: data:seq:nucleotide
required:: False

adapters.mate1_5prime_seq

label:: 5 prime adapter sequence for Mate 1
type:: list:basic:string
required:: False

adapters.mate1_3prime_seq

label:: 3 prime adapter sequence for Mate 1
type:: list:basic:string
required:: False

adapters.mate2_5prime_seq

label:: 5 prime adapter sequence for Mate 2
type:: list:basic:string
required:: False

adapters.mate2_3prime_seq

label:: 3 prime adapter sequence for Mate 2
type:: list:basic:string
required:: False

adapters.times

label:: Times
type:: basic:integer
description:: Remove up to COUNT adapters from each read.
default:: 1

adapters.error_rate

label:: Error rate
type:: basic:decimal
description:: Maximum allowed error rate (no. of errors divided by the length of the matching region).
default:: 0.1

adapters.min_overlap

label:: Minimal overlap
type:: basic:integer
description:: Minimum overlap for an adapter match.
default:: 3

adapters.match_read_wildcards

label:: Match read wildcards
type:: basic:boolean
description:: Interpret IUPAC wildcards in reads.
default:: False

adapters.no_indels

label:: No indels
type:: basic:boolean
description:: Disable (disallow) insertions and deletions in adapters.
default:: False

modify_reads.nextseq_trim

label:: NextSeq-specific quality trimming
type:: basic:integer
description:: NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
required:: False

modify_reads.leading

label:: Quality on 5 prime
type:: basic:integer
description:: Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base.
required:: False

modify_reads.trailing

label:: Quality on 3 prime
type:: basic:integer
description:: Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base.
required:: False

modify_reads.crop

label:: Crop
type:: basic:integer
description:: Cut the specified number of bases from the end of the reads.
required:: False

modify_reads.headcrop

label:: Headcrop
type:: basic:integer
description:: Cut the specified number of bases from the start of the reads.
required:: False

filtering.minlen

label:: Min length
type:: basic:integer
description:: Drop the read if it is below a specified.
required:: False

filtering.maxlen

label:: Max length
type:: basic:integer
description:: Drop the read if it is above a specified length.
required:: False

filtering.max_n

label:: Max numebr of N-s
type:: basic:integer
description:: Discard reads having more ‘N’ bases than specified.
required:: False

filtering.pair_filter

label:

Which of the reads have to match the filtering criterion

type:

basic:string

description:

Which of the reads in a paired-end read have to match the filtering criterion in order for the pair to be filtered.

default:

any

choices:

Any of the reads in a paired-end read have to match the filtering criterion: any
Both of the reads in a paired-end read have to match the filtering criterion: both

fastq

label:: Reads file (forward)
type:: list:basic:file

fastq2

label:: Reads file (reverse)
type:: list:basic:file

report

label:: Cutadapt report
type:: basic:file

fastqc_url

label:: Quality control with FastQC (forward)
type:: list:basic:file:html

fastqc_url2

label:: Quality control with FastQC (reverse)
type:: list:basic:file:html

fastqc_archive

label:: Download FastQC archive (forward)
type:: list:basic:file

fastqc_archive2

label:: Download FastQC archive (reverse)
type:: list:basic:file

Cutadapt (single-end)

data:reads:fastq:single:cutadaptcutadapt-single (data:reads:fastq:single reads, data:seq:nucleotide up_primers_file, data:seq:nucleotide down_primers_file, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:integer polya_tail, basic:integer min_overlap, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer maxlen, basic:integer max_n, basic:boolean match_read_wildcards, basic:boolean no_indels, basic:integer times, basic:decimal error_rate)[Source: v2.5.2]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

reads

label:: Select sample(s)
type:: data:reads:fastq:single

adapters.up_primers_file

label:: 5 prime adapter file
type:: data:seq:nucleotide
required:: False

adapters.down_primers_file

label:: 3 prime adapter file
type:: data:seq:nucleotide
required:: False

adapters.up_primers_seq

label:: 5 prime adapter sequence
type:: list:basic:string
required:: False

adapters.down_primers_seq

label:: 3 prime adapter sequence
type:: list:basic:string
required:: False

adapters.polya_tail

label:: Poly-A tail
type:: basic:integer
description:: Length of poly-A tail, example - AAAN -> 3, AAAAAN -> 5
required:: False

adapters.min_overlap

label:: Minimal overlap
type:: basic:integer
description:: Minimum overlap for an adapter match
default:: 3

modify_reads.nextseq_trim

label:: NextSeq-specific quality trimming
type:: basic:integer
description:: NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
required:: False

modify_reads.leading

label:: Quality on 5 prime
type:: basic:integer
description:: Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
required:: False

modify_reads.trailing

label:: Quality on 3 prime
type:: basic:integer
description:: Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
required:: False

modify_reads.crop

label:: Crop
type:: basic:integer
description:: Cut the read to a specified length by removing bases from the end
required:: False

modify_reads.headcrop

label:: Headcrop
type:: basic:integer
description:: Cut the specified number of bases from the start of the read
required:: False

filtering.minlen

label:: Min length
type:: basic:integer
description:: Drop the read if it is below a specified length
required:: False

filtering.maxlen

label:: Max length
type:: basic:integer
description:: Drop the read if it is above a specified length.
required:: False

filtering.max_n

label:: Max numebr of N-s
type:: basic:integer
description:: Discard reads having more ‘N’ bases than specified.
required:: False

filtering.match_read_wildcards

label:: Match read wildcards
type:: basic:boolean
description:: Interpret IUPAC wildcards in reads.
required:: False
default:: False

filtering.no_indels

label:: No indels
type:: basic:boolean
description:: Disable (disallow) insertions and deletions in adapters.
default:: False

filtering.times

label:: Times
type:: basic:integer
description:: Remove up to COUNT adapters from each read.
default:: 1

filtering.error_rate

label:: Error rate
type:: basic:decimal
description:: Maximum allowed error rate (no. of errors divided by the length of the matching region).
default:: 0.1

fastq

label:: Reads file
type:: list:basic:file

report

label:: Cutadapt report
type:: basic:file

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file

Cutadapt - STAR - StringTie (Corall, paired-end)

data:workflow:rnaseq:corallworkflow-corall-paired (data:reads:fastq:paired reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string feature_class, basic:string id_attribute)[Source: v5.2.0]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

reads

label:: Select sample(s)
type:: data:reads:fastq:paired

star_index

label:: Genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.

annotation

label:: Annotation
type:: data:annotation
description:: Genome annotation file (GTF).

rrna_reference

label:: Indexed rRNA reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:: Indexed Globin reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.

cutadapt.quality_cutoff

label:: Reads quality cutoff
type:: basic:integer
description:: Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
required:: False

downsampling.n_reads

label:: Number of reads
type:: basic:integer
default:: 1000000

downsampling.seed

label:: Seed
type:: basic:integer
default:: 11

downsampling.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:: False

downsampling.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:: False

quantification.feature_class

label:: Feature class
type:: basic:string
description:: Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
default:: exon

quantification.id_attribute

label:

ID attribute

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

default:

gene_id

choices:

gene_id: gene_id
transcript_id: transcript_id
ID: ID
geneid: geneid

Cutadapt - STAR - StringTie (Corall, single-end)

data:workflow:rnaseq:corallworkflow-corall-single (data:reads:fastq:single reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string feature_class, basic:string id_attribute)[Source: v5.2.0]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

reads

label:: Select sample(s)
type:: data:reads:fastq:single

star_index

label:: Genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.

annotation

label:: Annotation
type:: data:annotation
description:: Genome annotation file (GTF).

rrna_reference

label:: Indexed rRNA reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:: Indexed Globin reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.

cutadapt.quality_cutoff

label:: Reads quality cutoff
type:: basic:integer
description:: Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
required:: False

downsampling.n_reads

label:: Number of reads
type:: basic:integer
default:: 1000000

downsampling.seed

label:: Seed
type:: basic:integer
default:: 11

downsampling.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:: False

downsampling.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:: False

quantification.feature_class

label:: Feature class
type:: basic:string
description:: Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
default:: exon

quantification.id_attribute

label:

ID attribute

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

default:

gene_id

choices:

gene_id: gene_id
transcript_id: transcript_id
ID: ID
geneid: geneid

DESeq2

data:differentialexpression:deseq2:differentialexpression-deseq2 (list:data:expression case, list:data:expression control, basic:boolean create_sets, basic:decimal logfc, basic:decimal fdr, basic:boolean beta_prior, basic:boolean count, basic:integer min_count_sum, basic:boolean cook, basic:decimal cooks_cutoff, basic:boolean independent, basic:decimal alpha)[Source: v3.6.0]

Run DESeq2 analysis. The DESeq2 package estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. See [here](https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf) and [here](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) for more information.

case

label:: Case
type:: list:data:expression
description:: Case samples (replicates)
required:: True
disabled:: False
hidden:: False

control

label:: Control
type:: list:data:expression
description:: Control samples (replicates)
required:: True
disabled:: False
hidden:: False

create_sets

label:: Create gene sets
type:: basic:boolean
description:: After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
required:: True
disabled:: False
hidden:: False
default:: False

logfc

label:: Log2 fold change threshold for gene sets
type:: basic:decimal
description:: Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
required:: True
disabled:: False
hidden:: !create_sets
default:: 1.0

fdr

label:: FDR threshold for gene sets
type:: basic:decimal
required:: True
disabled:: False
hidden:: !create_sets
default:: 0.05

options.beta_prior

label:: Beta prior
type:: basic:boolean
description:: Whether or not to put a zero-mean normal prior on the non-intercept coefficients.
required:: True
disabled:: False
hidden:: False
default:: False

filter_options.count

label:: Filter genes based on expression count
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

filter_options.min_count_sum

label:: Minimum gene expression count summed over all samples
type:: basic:integer
description:: Filter genes in the expression matrix input. Remove genes where the expression count sum over all samples is below the threshold.
required:: True
disabled:: False
hidden:: !filter_options.count
default:: 10

filter_options.cook

label:: Filter genes based on Cook’s distance
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

filter_options.cooks_cutoff

label:: Threshold on Cook’s distance
type:: basic:decimal
description:: If one or more samples have Cook’s distance larger than the threshold set here, the p-value for the row is set to NA. If left empty, the default threshold of 0.99 quantile of the F(p, m-p) distribution is used, where p is the number of coefficients being fitted and m is the number of samples. This test excludes Cook’s distance of samples belonging to experimental groups with only two samples.
required:: False
disabled:: False
hidden:: !filter_options.cook

filter_options.independent

label:: Apply independent gene filtering
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

filter_options.alpha

label:: Significance cut-off used for optimizing independent gene filtering
type:: basic:decimal
description:: The value should be set to adjusted p-value cut-off (FDR).
required:: True
disabled:: False
hidden:: !filter_options.independent
default:: 0.1

raw

label:: Differential expression
type:: basic:file
required:: True
disabled:: False
hidden:: False

de_json

label:: Results table (JSON)
type:: basic:json
required:: True
disabled:: False
hidden:: False

de_file

label:: Results table (file)
type:: basic:file
required:: True
disabled:: False
hidden:: False

count_matrix

label:: Count matrix
type:: basic:file
required:: True
disabled:: False
hidden:: False

count_matrix_normalized

label:: Normalized count matrix (median of ratios)
type:: basic:file
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID database
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

feature_type

label:: Feature type
type:: basic:string
required:: True
disabled:: False
hidden:: False

Detect library strandedness

data:strandednesslibrary-strandedness (data:reads:fastq reads, basic:integer read_number, data:index:salmon salmon_index)[Source: v0.6.2]

This process uses the Salmon transcript quantification tool to automatically infer the NGS library strandedness. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

reads

label:: Sequencing reads
type:: data:reads:fastq
description:: Sequencing reads in .fastq format. Both single and paired-end libraries are supported

read_number

label:: Number of input reads
type:: basic:integer
description:: Number of sequencing reads that are subsampled from each of the original .fastq files before library strand detection
default:: 50000

salmon_index

label:: Transcriptome index file
type:: data:index:salmon
description:: Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results

strandedness

label:: Library strandedness type
type:: basic:string
description:: The predicted library strandedness type. The codes U and IU indicate ‘strand non-specific’ library for single or paired-end reads, respectively. Codes SF and ISF correspond to the ‘strand-specific forward’ library, for the single or paired-end reads, respectively. For ‘strand-specific reverse’ library, the corresponding codes are SR and ISR. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

fragment_ratio

label:: Compatible fragment ratio
type:: basic:decimal
description:: The ratio of fragments that support the predicted library strandedness type

log

label:: Log file
type:: basic:file
description:: Analysis log file.

Dictyostelium expressions

data:expression:polyaexpression-dicty (data:alignment:bam alignment, data:annotation:gff3 gff, data:mappability:bcm mappable)[Source: v1.4.2]

Dictyostelium-specific pipeline. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

alignment

label:: Aligned sequence
type:: data:alignment:bam

gff

label:: Features (GFF3)
type:: data:annotation:gff3

mappable

label:: Mappability
type:: data:mappability:bcm

exp

label:: Expression RPKUM (polyA)
type:: basic:file
description:: mRNA reads scaled by uniquely mappable part of exons.

rpkmpolya

label:: Expression RPKM (polyA)
type:: basic:file
description:: mRNA reads scaled by exon length.

rc

label:: Read counts (polyA)
type:: basic:file
description:: mRNA reads uniquely mapped to gene exons.

rpkum

label:: Expression RPKUM
type:: basic:file
description:: Reads scaled by uniquely mappable part of exons.

rpkm

label:: Expression RPKM
type:: basic:file
description:: Reads scaled by exon length.

rc_raw

label:: Read counts (raw)
type:: basic:file
description:: Reads uniquely mapped to gene exons.

exp_json

label:: Expression RPKUM (polyA) (json)
type:: basic:json

exp_type

label:: Expression Type (default output)
type:: basic:string

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

feature_type

label:: Feature type
type:: basic:string

Differential Expression (table)

data:differentialexpression:uploadupload-diffexp (basic:file src, basic:string gene_id, basic:string logfc, basic:string fdr, basic:string logodds, basic:string fwer, basic:string pvalue, basic:string stat, basic:string source, basic:string species, basic:string build, basic:string feature_type, list:data:expression case, list:data:expression control)[Source: v1.5.1]

Upload Differential Expression table.

src

label:: Differential expression file
type:: basic:file
description:: Differential expression file. Supported file types: *.xls, *.xlsx, *.tab (tab-delimited file), *.diff. DE file must include columns with log2(fold change) and FDR or pval information. DE file must contain header row with column names. Accepts DESeq, DESeq2, edgeR and CuffDiff output files.
validate_regex:: \.(xls|xlsx|tab|tab.gz|diff|diff.gz)$

gene_id

label:: Gene ID label
type:: basic:string

logfc

label:: LogFC label
type:: basic:string

fdr

label:: FDR label
type:: basic:string
required:: False

logodds

label:: LogOdds label
type:: basic:string
required:: False

fwer

label:: FWER label
type:: basic:string
required:: False

pvalue

label:: Pvalue label
type:: basic:string
required:: False

stat

label:: Statistics label
type:: basic:string
required:: False

source

label:

Gene ID database

type:

basic:string

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Build
type:: basic:string
description:: Genome build or annotation version.

feature_type

label:

Feature type

type:

basic:string

default:

gene

choices:

gene: gene
transcript: transcript
exon: exon

case

label:: Case
type:: list:data:expression
description:: Case samples (replicates)
required:: False

control

label:: Control
type:: list:data:expression
description:: Control samples (replicates)
required:: False

raw

label:: Differential expression
type:: basic:file

de_json

label:: Results table (JSON)
type:: basic:json

de_file

label:: Results table (file)
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

feature_type

label:: Feature type
type:: basic:string

Differential expression of shRNA

data:shrna:differentialexpression:differentialexpression-shrna (data:file parameter_file, list:data:expression:shrna2quant: expression_data)[Source: v1.3.0]

Performing differential expression on a list of objects. Analysis starts by inputting a set of expression files (count matrices) and a parameter file. Parameter file is an xlsx file and consists of tabs: - `sample_key`: Should have column sample with exact sample name as input expression file(s), columns defining treatment and lastly a column which indicates replicate. - `contrasts`: Define groups which will be used to perform differential expression analysis. Model for DE uses these contrasts and replicate number. In R annotation, this would be ` ~ 1 + group + replicate`. Table should have two columns named `group_1` and `group_2`. - `overall_contrasts`: This is a layer “above” `contrasts`, where results from two contrasts are compared for lethal, beneficial and neutral species. Thresholds governing classification can be found in `classification_parameters` tab. - `classification_parameters`: This tab holds three columns, `threshold`, `value` and `description`. Only the first two are used in the workflow, description is for your benefit. This process outputs DESeq2 results, classified results based on provided thresholds and counts of beneficial and lethal species.

parameter_file

label:: Excel parameter file (.xlsx)
type:: data:file
description:: Select .xlsx file which holds parameters for analysis. See [here](https://github.com/genialis/shRNAde/blob/master/inst/extdata/template_doDE_inputs.xlsx) for a template.
required:: True
disabled:: False
hidden:: False

expression_data

label:: List of expression files from shrna2quant
type:: list:data:expression:shrna2quant:
required:: True
disabled:: False
hidden:: False

deseq_results

label:: DESeq2 results
type:: basic:file
required:: True
disabled:: False
hidden:: False

class_results

label:: Results classified based on thresholds provided by the user
type:: basic:file
required:: True
disabled:: False
hidden:: False

beneficial_counts

label:: shRNAs considered as beneficial based on user input
type:: basic:file
required:: True
disabled:: False
hidden:: False

lethal_counts

label:: shRNAs considered as lethal based on user input
type:: basic:file
required:: True
disabled:: False
hidden:: False

Ensembl Variant Effect Predictor

data:variants:vcf:vep:ensembl-vep (data:variants:vcf vcf, data:vep:cache cache, data:seq:nucleotide ref_seq, basic:integer n_forks)[Source: v2.1.0]

Run Ensembl-VEP. VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. This process accepts VCF file and VEP cache directory to produce VCF file with annotated variants, its index and summary of the procces.

vcf

label:: Input VCF file
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

cache

label:: Cache directory for Ensembl-VEP
type:: data:vep:cache
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

n_forks

label:: Number of forks
type:: basic:integer
description:: Using forking enables VEP to run multiple parallel threads, with each thread processing a subset of your input. Forking can dramatically improve runtime.
required:: True
disabled:: False
hidden:: False
default:: 2

vcf

label:: Annotated VCF file
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

summary

label:: Summary of the analysis
type:: basic:file:html
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Ensembl-VEP cache directory

data:vep:cache:upload-vep-cache (basic:file cache_file, basic:string species, basic:string build, basic:string release)[Source: v1.1.0]

Import VEP cache directory.

cache_file

label:: Compressed cache directory
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu.

required:

True

disabled:

False

hidden:

False

default:

Homo sapiens

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus

build

label:: Genome build
type:: basic:string
required:: True
disabled:: False
hidden:: False

release

label:: Cache release
type:: basic:string
required:: True
disabled:: False
hidden:: False

cache

label:: Cache directory
type:: basic:dir
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

release

label:: Cache release
type:: basic:string
required:: True
disabled:: False
hidden:: False

Expression Time Course

data:etcetc-bcm (list:data:expression expressions, basic:boolean avg)[Source: v1.2.2]

Select gene expression data and form a time course.

expressions

label:: RPKM expression profile
type:: list:data:expression
required:: True

avg

label:: Average by time
type:: basic:boolean
default:: True

etcfile

label:: Expression time course file
type:: basic:file

etc

label:: Expression time course
type:: basic:json

Expression aggregator

data:aggregator:expressionexpression-aggregator (list:data:expression exps, basic:string group_by, data:aggregator:expression expr_aggregator)[Source: v0.5.1]

Collect expression data from samples grouped by sample descriptor field. The Expression aggregator process should not be run in Batch Mode, as this will create redundant outputs. Rather, select multiple samples below for which you wish to aggregate the expression matrix.

exps

label:: Expressions
type:: list:data:expression

group_by

label:: Sample descriptor field
type:: basic:string

expr_aggregator

label:: Expression aggregator
type:: data:aggregator:expression
required:: False

exp_matrix

label:: Expression matrix
type:: basic:file

box_plot

label:: Box plot
type:: basic:json

log_box_plot

label:: Log box plot
type:: basic:json

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

exp_type

label:: Expression type
type:: basic:string

Expression matrix

data:expressionsetmergeexpressions (list:data:expression exps, list:basic:string genes)[Source: v1.4.2]

Merge expression data to create an expression matrix where each column represents all the gene expression levels from a single experiment, and each row represents the expression of a gene across all experiments.

exps

label:: Gene expressions
type:: list:data:expression

genes

label:: Filter genes
type:: list:basic:string
required:: False

expset

label:: Expression set
type:: basic:file

expset_type

label:: Expression set type
type:: basic:string

Expression time course

data:etcupload-etc (basic:file src)[Source: v1.4.1]

Upload Expression time course.

src

label:: Expression time course file (xls or tab)
type:: basic:file
description:: Expression time course
required:: True
validate_regex:: \.(xls|xlsx|tab)$

etcfile

label:: Expression time course file
type:: basic:file

etc

label:: Expression time course
type:: basic:json

FASTA file

data:seq:nucleotide:upload-fasta-nucl (basic:file src, basic:string species, basic:string build)[Source: v3.2.0]

Import nucleotide sequence file in FASTA format. FASTA file is a text-based format for representing nucleotide sequences, in which nucleotides are represented using single-letter codes. The uploaded FASTA file can hold multiple nucleotide sequences.

src

label:: Sequence file (FASTA)
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field. For sequences that are not related to any particular species (e.g. adapters file), you can select the value Other.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Macaca mulatta: Macaca mulatta
Dictyostelium discoideum: Dictyostelium discoideum
Other: Other

build

label:: Genome build
type:: basic:string
description:: Enter a genome build information associated with the uploaded sequence(s).
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta_dict

label:: FASTA dictionary
type:: basic:file
required:: True
disabled:: False
hidden:: False

num_seqs

label:: Number of sequences
type:: basic:integer
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

FASTQ file (paired-end)

data:reads:fastq:paired:upload-fastq-paired (list:basic:file src1, list:basic:file src2, basic:boolean merge_lanes)[Source: v2.6.0]

Import paired-end reads in FASTQ format. Import paired-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

src1

label:: Mate1
type:: list:basic:file
description:: Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
required:: True
disabled:: False
hidden:: False

src2

label:: Mate2
type:: list:basic:file
description:: Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
required:: True
disabled:: False
hidden:: False

merge_lanes

label:: Merge lanes
type:: basic:boolean
description:: Merge sample data split into multiple sequencing lanes into a single FASTQ file.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Reads file (mate 1)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Reads file (mate 2)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC (Upstream)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Quality control with FastQC (Downstream)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive (Upstream)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download FastQC archive (Downstream)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

FASTQ file (single-end)

data:reads:fastq:single:upload-fastq-single (list:basic:file src, basic:boolean merge_lanes)[Source: v2.6.0]

Import single-end reads in FASTQ format. Import single-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

src

label:: Reads
type:: list:basic:file
description:: Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
required:: True
disabled:: False
hidden:: False

merge_lanes

label:: Merge lanes
type:: basic:boolean
description:: Merge sample data split into multiple sequencing lanes into a single FASTQ file.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Reads file
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Find similar genes

data:similarexpression:find-similar (list:data:expression expressions, basic:string gene, basic:string distance)[Source: v1.3.1]

Find genes with similar expression profile. Find genes that have similar expression over time to the query gene.

expressions

label:: Time series relation
type:: list:data:expression
description:: Select time course to which the expressions belong to.
required:: True
disabled:: False
hidden:: False

gene

label:: Query gene
type:: basic:string
description:: Select a gene to which others are compared.
required:: True
disabled:: False
hidden:: False

distance

label:

Distance metric

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

spearman

choices:

Euclidean: euclidean
Spearman: spearman
Pearson: pearson

similar_genes

label:: Similar genes
type:: basic:json
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID database
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

feature_type

label:: Feature type
type:: basic:string
required:: True
disabled:: False
hidden:: False

GAF file

data:gaf:2:0upload-gaf (basic:file src, basic:string source, basic:string species)[Source: v1.4.0]

GO annotation file (GAF v2.0) relating gene ID and associated GO terms

src

label:: GO annotation file (GAF v2.0)
type:: basic:file
description:: Upload GO annotation file (GAF v2.0) relating gene ID and associated GO terms

source

label:

Gene ID database

type:

basic:string

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
MGI: MGI
NCBI: NCBI
UCSC: UCSC
UniProtKB: UniProtKB

species

label:: Species
type:: basic:string

gaf

label:: GO annotation file (GAF v2.0)
type:: basic:file

gaf_obj

label:: GAF object
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

GATK GenomicsDBImport

data:genomicsdb:gatk-genomicsdb-import (list:data:variants:gvcf gvcfs, data:bed intervals, basic:boolean use_existing, data:genomicsdb existing_db, basic:integer batch_size, basic:boolean consolidate, basic:integer max_heap_size, basic:boolean use_cms_gc)[Source: v1.3.0]

Import single-sample GVCFs into GenomicsDB before joint genotyping.

gvcfs

label:: Input data (GVCF)
type:: list:data:variants:gvcf
required:: True
disabled:: False
hidden:: False

intervals

label:: Intervals file (.bed)
type:: data:bed
description:: Intervals file is required if a new database will be created.
required:: False
disabled:: False
hidden:: False

use_existing

label:: Add new samples to an existing GenomicsDB workspace
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

existing_db

label:: Select a GATK GenomicsDB object
type:: data:genomicsdb
description:: Instead of creating a new database the GVCFs are added to this database and a new GenomicsDB object is created.
required:: False
disabled:: False
hidden:: !use_existing

advanced_options.batch_size

label:: Batch size
type:: basic:integer
description:: Batch size controls the number of samples for which readers are open at once and therefore provides a way to minimize memory consumption. However, it can take longer to complete. Use the consolidate flag if more than a hundred batches were used. This will improve feature read time. batchSize=0 means no batching (i.e. readers for all samples will be opened at once).
required:: True
disabled:: False
hidden:: False
default:: 0

advanced_options.consolidate

label:: Consolidate
type:: basic:boolean
description:: Boolean flag to enable consolidation. If importing data in batches, a new fragment is created for each batch. In case thousands of fragments are created, GenomicsDB feature readers will try to open ~20x as many files. Also, internally GenomicsDB would consume more memory to maintain bookkeeping data from all fragments. Use this flag to merge all fragments into one. Merging can potentially improve read performance, however overall benefit might not be noticeable as the top Java layers have significantly higher overheads. This flag has no effect if only one batch is used.
required:: True
disabled:: False
hidden:: False
default:: False

advanced_options.max_heap_size

label:: Java maximum heap size in GB (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size.
required:: True
disabled:: False
hidden:: False
default:: 28

advanced_options.use_cms_gc

label:: Use CMS Garbage Collector in Java
type:: basic:boolean
description:: The Concurrent Mark Sweep (CMS) implementation uses multiple garbage collector threads for garbage collection.
required:: True
disabled:: False
hidden:: False
default:: True

database

label:: GenomicsDB workspace
type:: basic:dir
required:: True
disabled:: False
hidden:: False

intervals

label:: Intervals file
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK GenotypeGVCFs

data:variants:vcf:genotypegvcfs:gatk-genotype-gvcfs (data:genomicsdb database, data:seq:nucleotide ref_seq, data:variants:vcf dbsnp, basic:integer n_jobs, basic:integer max_heap_size)[Source: v2.3.0]

Consolidate GVCFs and run joint calling using GenotypeGVCFs tool.

database

label:: GATK GenomicsDB
type:: data:genomicsdb
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

dbsnp

label:: dbSNP file
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

advanced_options.n_jobs

label:: Number of concurent jobs
type:: basic:integer
description:: Use a fixed number of jobs for genotyping instead of determining it based on the number of available cores.
required:: False
disabled:: False
hidden:: False

advanced_options.max_heap_size

label:: Java maximum heap size in GB (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size.
required:: True
disabled:: False
hidden:: False
default:: 28

vcf

label:: GVCF file
type:: basic:file
required:: True
disabled:: False
hidden:: False

vcf_dir

label:: Folder with split GVCFs
type:: basic:dir
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK HaplotypeCaller (GVCF)

data:variants:gvcf:gatk-haplotypecaller-gvcf (data:alignment:bam bam, data:seq:nucleotide ref_seq, data:bed intervals, basic:decimal contamination)[Source: v1.3.0]

Run GATK HaplotypeCaller in GVCF mode.

bam

label:: Analysis ready BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

options.intervals

label:: Use intervals BED file to limit the analysis to the specified parts of the genome.
type:: data:bed
required:: False
disabled:: False
hidden:: False

options.contamination

label:: Contamination fraction
type:: basic:decimal
description:: Fraction of contamination in sequencing data (for all samples) to aggressively remove.
required:: True
disabled:: False
hidden:: False
default:: 0

vcf

label:: GVCF file
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK MergeVcfs

data:variants:vcf:mergevcfs:gatk-merge-vcfs (list:data:variants:vcf vcfs, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.2.0]

Combine multiple variant files into a single variant file using GATK MergeVcfs.

vcfs

label:: Input data (VCFs)
type:: list:data:variants:vcf
required:: True
disabled:: False
hidden:: False

advanced_options.ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
description:: Optionally use a sequence dictionary file (.dict) if the input VCF does not contain a complete contig list.
required:: False
disabled:: False
hidden:: False

advanced_options.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced_options.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: Merged VCF
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK SelectVariants (multi-sample)

data:variants:vcf:selectvariants:gatk-select-variants (data:variants:vcf vcf, data:bed intervals, list:basic:string select_type, basic:boolean exclude_filtered, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.2.0]

Select a subset of variants based on various criteria using GATK SelectVariants. This tool works with multi-sample VCF file as an input.

vcf

label:: Input data (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

intervals

label:: Intervals file (.bed)
type:: data:bed
description:: One or more genomic intervals over which to operate. This can also be used to get data from a specific interval.
required:: False
disabled:: False
hidden:: False

select_type

label:: Select only a certain type of variants from the input file
type:: list:basic:string
description:: This argument selects particular kinds of variants out of a list. If left empty, there is no type selection and all variant types are considered for other selection criteria. Valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. Can be specified multiple times.
required:: False
disabled:: False
hidden:: False

exclude_filtered

label:: Don’t include filtered sites
type:: basic:boolean
description:: If this flag is enabled, sites that have been marked as filtered (i.e. have anything other than `.` or `PASS` in the FILTER field) will be excluded from the output.
required:: True
disabled:: False
hidden:: False
default:: False

advanced_options.ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: False
disabled:: False
hidden:: False

advanced_options.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced_options.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: Selected variants (VCF)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK SelectVariants (single-sample)

data:variants:vcf:selectvariants:single:gatk-select-variants-single (data:variants:vcf vcf, data:bed intervals, list:basic:string select_type, basic:boolean exclude_filtered, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.1.0]

Select a subset of variants based on various criteria using GATK SelectVariants. This tool works with single-sample VCF file as an input.

vcf

label:: Input data (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

intervals

label:: Intervals file (.bed)
type:: data:bed
description:: One or more genomic intervals over which to operate. This can also be used to get data from a specific interval.
required:: False
disabled:: False
hidden:: False

select_type

label:: Select only a certain type of variants from the input file
type:: list:basic:string
description:: This argument selects particular kinds of variants out of a list. If left empty, there is no type selection and all variant types are considered for other selection criteria. Valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. Can be specified multiple times.
required:: False
disabled:: False
hidden:: False

exclude_filtered

label:: Don’t include filtered sites
type:: basic:boolean
description:: If this flag is enabled, sites that have been marked as filtered (i.e. have anything other than `.` or `PASS` in the FILTER field) will be excluded from the output.
required:: True
disabled:: False
hidden:: False
default:: False

advanced_options.ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: False
disabled:: False
hidden:: False

advanced_options.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced_options.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: Selected variants (VCF)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK SplitNCigarReads

data:alignment:bam:splitncigar:gatk-split-ncigar (data:alignment:bam bam, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.2.0]

Splits reads that contain Ns in their cigar string. Identifies all N cigar elements and creates k+1 new reads (where k is the number of N cigar elements). The first read includes the bases that are to the left of the first N element, while the part of the read that is to the right of the N (including the Ns) is hard clipped and so on for the rest of the new reads. Used for post-processing RNA reads aligned against the full reference.

bam

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence FASTA file
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

bam

label:: BAM file with reads split at N CIGAR elements
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK VariantFiltration (multi-sample)

data:variants:vcf:variantfiltration:gatk-variant-filtration (data:variants:vcf vcf, data:seq:nucleotide ref_seq, list:basic:string filter_expressions, list:basic:string filter_name, list:basic:string genotype_filter_expressions, list:basic:string genotype_filter_name, data:variants:vcf mask, basic:string mask_name, basic:integer cluster, basic:integer window, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.3.0]

Filter multi-sample variant calls based on INFO and/or FORMAT annotations. This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed. If you want to remove failing variants, use GATK SelectVariants process.

vcf

label:: Input data (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

filter_expressions

label:: Expressions used with INFO fields to filter
type:: list:basic:string
description:: VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.
required:: False
disabled:: False
hidden:: False

filter_name

label:: Names to use for the list of filters
type:: list:basic:string
description:: This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.
required:: False
disabled:: False
hidden:: False

genotype_filter_expressions

label:: Expressions used with FORMAT field to filter
type:: list:basic:string
description:: Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’. This filter expression will filter all the samples in the multi-sample VCF file.
required:: False
disabled:: False
hidden:: False

genotype_filter_name

label:: Names to use for the list of genotype filters
type:: list:basic:string
description:: Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.
required:: False
disabled:: False
hidden:: False

mask

label:: Input mask
type:: data:variants:vcf
description:: Any variant which overlaps entries from the provided mask file will be filtered.
required:: False
disabled:: False
hidden:: False

mask_name

label:: The text to put in the FILTER field if a ‘mask’ is provided
type:: basic:string
description:: When using the mask file, the mask name will be annotated in the variant record.
required:: False
disabled:: !mask
hidden:: False

advanced.cluster

label:: Cluster size
type:: basic:integer
description:: The number of SNPs which make up a cluster. Must be at least 2.
required:: True
disabled:: False
hidden:: False
default:: 3

advanced.window

label:: Window size
type:: basic:integer
description:: The window size (in bases) in which to evaluate clustered SNPs.
required:: True
disabled:: False
hidden:: False
default:: 0

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: Filtered variants (VCF)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK VariantFiltration (single-sample)

data:variants:vcf:variantfiltration:single:gatk-variant-filtration-single (data:variants:vcf vcf, data:seq:nucleotide ref_seq, list:basic:string filter_expressions, list:basic:string filter_name, list:basic:string genotype_filter_expressions, list:basic:string genotype_filter_name, data:variants:vcf mask, basic:string mask_name, basic:integer cluster, basic:integer window, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.3.0]

Filter single-sample variant calls based on INFO and/or FORMAT annotations. This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed. If you want to remove failing variants, use GATK SelectVariants process.

vcf

label:: Input data (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

filter_expressions

label:: Expressions used with INFO fields to filter
type:: list:basic:string
description:: VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.
required:: False
disabled:: False
hidden:: False

filter_name

label:: Names to use for the list of filters
type:: list:basic:string
description:: This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.
required:: False
disabled:: False
hidden:: False

genotype_filter_expressions

label:: Expressions used with FORMAT field to filter
type:: list:basic:string
description:: Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’.
required:: False
disabled:: False
hidden:: False

genotype_filter_name

label:: Names to use for the list of genotype filters
type:: list:basic:string
description:: Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.
required:: False
disabled:: False
hidden:: False

mask

label:: Input mask
type:: data:variants:vcf
description:: Any variant which overlaps entries from the provided mask file will be filtered.
required:: False
disabled:: False
hidden:: False

mask_name

label:: The text to put in the FILTER field if a ‘mask’ is provided
type:: basic:string
description:: When using the mask file, the mask name will be annotated in the variant record.
required:: False
disabled:: !mask
hidden:: False

advanced.cluster

label:: Cluster size
type:: basic:integer
description:: The number of SNPs which make up a cluster. Must be at least 2.
required:: True
disabled:: False
hidden:: False
default:: 3

advanced.window

label:: Window size
type:: basic:integer
description:: The window size (in bases) in which to evaluate clustered SNPs.
required:: True
disabled:: False
hidden:: False
default:: 0

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: Filtered variants (VCF)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK VariantsToTable

data:variantstable:variants-to-table (data:variants:vcf vcf, list:basic:string vcf_fields, list:basic:string gf_fields, basic:boolean split_alleles)[Source: v1.2.0]

Run GATK VariantsToTable. This tool extracts specified fields for each variant in a VCF file to a tab-delimited table, which may be easier to work with than a VCF. For additional information, please see [manual page](https://gatk.broadinstitute.org/hc/en-us/articles/360036711531-VariantsToTable)

vcf

label:: Input VCF file
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

vcf_fields

label:: Select VCF fields
type:: list:basic:string
description:: The name of a standard VCF field or an INFO field to include in the output table. The field can be any standard VCF column (e.g. CHROM, ID, QUAL) or any annotation name in the INFO field (e.g. AC, AF).
required:: True
disabled:: False
hidden:: False
default:: ['CHROM', 'POS', 'ID', 'REF', 'ALT']

advanced_options.gf_fields

label:: Include FORMAT/sample-level fields
type:: list:basic:string
required:: True
disabled:: False
hidden:: False
default:: ['GT', 'GQ']

advanced_options.split_alleles

label:: Split multi-allelic records into multiple lines
type:: basic:boolean
description:: By default, a variant record with multiple ALT alleles will be summarized in one line, with per alt-allele fields (e.g. allele depth) separated by commas.This may cause difficulty when the table is loaded by an R script, for example. Use this flag to write multi-allelic records on separate lines of output.
required:: True
disabled:: False
hidden:: False
default:: True

tsv

label:: Tab-delimited file with variants
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK filter variants (VQSR)

data:variants:vcf:vqsr:gatk-vqsr (data:variants:vcf vcf, data:variants:vcf dbsnp, data:variants:vcf mills, data:variants:vcf axiom_poly, data:variants:vcf hapmap, data:variants:vcf omni, data:variants:vcf thousand_genomes, basic:boolean use_as_anno, list:basic:string indel_anno_fields, list:basic:string snp_anno_fields, basic:decimal indel_filter_level, basic:decimal snp_filter_level, basic:integer max_gaussians_indels, basic:integer max_gaussians_snps)[Source: v1.2.0]

Filter WGS variants using Variant Quality Score Recalibration (VQSR) procedure.

vcf

label:: Input data (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

resource_files.dbsnp

label:: dbSNP file
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

resource_files.mills

label:: Mills and 1000G gold standard indels
type:: data:variants:vcf
required:: False
disabled:: False
hidden:: False

resource_files.axiom_poly

label:: 1000G Axiom genotype data
type:: data:variants:vcf
required:: False
disabled:: False
hidden:: False

resource_files.hapmap

label:: HapMap variants
type:: data:variants:vcf
required:: False
disabled:: False
hidden:: False

resource_files.omni

label:: 1000G Omni variants
type:: data:variants:vcf
required:: False
disabled:: False
hidden:: False

resource_files.thousand_genomes

label:: 1000G high confidence SNPs
type:: data:variants:vcf
required:: False
disabled:: False
hidden:: False

advanced_options.use_as_anno

label:: –use-allele-specific-annotations
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced_options.indel_anno_fields

label:: Annotation fields (INDEL filtering)
type:: list:basic:string
required:: True
disabled:: False
hidden:: False
default:: ['FS', 'ReadPosRankSum', 'MQRankSum', 'QD', 'SOR', 'DP']

advanced_options.snp_anno_fields

label:: Annotation fields (SNP filtering)
type:: list:basic:string
required:: True
disabled:: False
hidden:: False
default:: ['QD', 'MQRankSum', 'ReadPosRankSum', 'FS', 'MQ', 'SOR', 'DP']

advanced_options.indel_filter_level

label:: –truth-sensitivity-filter-level (INDELs)
type:: basic:decimal
required:: True
disabled:: False
hidden:: False
default:: 99.0

advanced_options.snp_filter_level

label:: –truth-sensitivity-filter-level (SNPs)
type:: basic:decimal
required:: True
disabled:: False
hidden:: False
default:: 99.7

advanced_options.max_gaussians_indels

label:: –max-gaussians (INDELs)
type:: basic:integer
description:: This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. This parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with a No data found error message. In this case, try decrementing the –max-gaussians value.
required:: True
disabled:: False
hidden:: False
default:: 4

advanced_options.max_gaussians_snps

label:: –max-gaussians (SNPs)
type:: basic:integer
description:: This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. This parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with a No data found error message. In this case, try decrementing the –max-gaussians value.
required:: True
disabled:: False
hidden:: False
default:: 6

vcf

label:: GVCF file
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK refine variants

data:variants:vcf:refinevariants:gatk-refine-variants (data:variants:vcf vcf, data:seq:nucleotide ref_seq, data:variants:vcf vcf_pop)[Source: v1.1.1]

Run GATK Genotype Refinement. The goal of the Genotype Refinement workflow is to use additional data to improve the accuracy of genotype calls and to filter genotype calls that are not reliable enough for downstream analysis. In this sense it serves as an optional extension of the variant calling workflow, intended for researchers whose work requires high-quality identification of individual genotypes. For additional information, please see [manual page](https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants)

vcf

label:: The main input, as produced in the GATK VQSR process
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

vcf_pop

label:: Population-level variant set (VCF)
type:: data:variants:vcf
required:: False
disabled:: False
hidden:: False

vcf

label:: Refined multi-sample vcf
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GATK4 (HaplotypeCaller)

data:variants:vcf:gatk:hc:vc-gatk4-hc (data:alignment:bam alignment, data:seq:nucleotide genome, data:bed intervals_bed, data:variants:vcf dbsnp, basic:integer stand_call_conf, basic:integer mbq, basic:integer max_reads, basic:integer interval_padding, basic:boolean soft_clipped, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.5.0]

GATK HaplotypeCaller Variant Calling. Call germline SNPs and indels via local re-assembly of haplotypes. The HaplotypeCaller is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. This allows the HaplotypeCaller to be more accurate when calling regions that are traditionally difficult to call, for example when they contain different types of variants close to each other. It also makes the HaplotypeCaller much better at calling indels than position-based callers like UnifiedGenotyper.

alignment

label:: Analysis ready BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

genome

label:: Reference genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

intervals_bed

label:: Intervals (from BED file)
type:: data:bed
description:: Use this option to perform the analysis over only part of the genome.
required:: False
disabled:: False
hidden:: False

dbsnp

label:: dbSNP file
type:: data:variants:vcf
description:: Database of known polymorphic sites.
required:: True
disabled:: False
hidden:: False

stand_call_conf

label:: Min call confidence threshold
type:: basic:integer
description:: The minimum phred-scaled confidence threshold at which variants should be called.
required:: True
disabled:: False
hidden:: False
default:: 30

mbq

label:: Min Base Quality
type:: basic:integer
description:: Minimum base quality required to consider a base for calling.
required:: True
disabled:: False
hidden:: False
default:: 20

max_reads

label:: Max reads per aligment start site
type:: basic:integer
description:: Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.
required:: True
disabled:: False
hidden:: False
default:: 50

advanced.interval_padding

label:: Interval padding
type:: basic:integer
description:: Amount of padding (in bp) to add to each interval you are including. The recommended value is 100.
required:: False
disabled:: False
hidden:: !intervals_bed

advanced.soft_clipped

label:: Do not analyze soft clipped bases in the reads
type:: basic:boolean
description:: Suitable option for RNA-seq variant calling.
required:: True
disabled:: False
hidden:: False
default:: False

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: VCF file
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

GEO import

data:geo:geo-import (basic:string gse_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned, basic:file mapping_file, basic:string source, basic:string build)[Source: v2.7.2]

Import all runs from a GEO Series. WARNING: Additional costs for storage and processing may be incurred if a very large data set is selected. RNA-seq ChIP-Seq, ATAC-Seq and expression microarray datasets can be uploaded. For RNA-Seq data sets this runs the SRA import process for each experiment (SRX) from the selected RNA-Seq GEO Series. The same procedure is followed for ChIP-Seq and ATAC-Seq data sets. If GSE contains microarray data, it downloads individual samples and uploads them as microarray expression objects. Probe IDs can be mapped to the Ensembl IDs if the corresponding GPL platform is supported, otherwise, a custom mapping file should be provided. Currently supported platforms are: GPL74, GPL201, GPL96, GPL571, GPL97, GPL570, GPL91, GPL8300, GPL92, GPL93, GPL94, GPL95, GPL17586, GPL5175, GPL80, GPL6244, GPL16686, GPL15207, GPL1352, GPL11068, GPL26966, GPL6848, GPL14550, GPL17077, GPL16981, GPL13497, GPL6947, GPL10558, GPL6883, GPL13376,GPL6884, GPL6254. In addition metadata table with sample information is created and uploaded to the same collection.

gse_accession

label:: GEO accession
type:: basic:string
description:: Enter a GEO series accession number.
required:: True
disabled:: False
hidden:: False

advanced.prefetch

label:: Prefetch SRA file
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

advanced.max_size_prefetch

label:: Maximum file size to download in KB
type:: basic:string
description:: A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
required:: True
disabled:: False
hidden:: False
default:: 20G

advanced.min_spot_id

label:: Minimum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.max_spot_id

label:: Maximum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.min_read_len

label:: Minimum read length
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.clip

label:: Clip adapter sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.aligned

label:: Dump only aligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.unaligned

label:: Dump only unaligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.mapping_file

label:: File with probe ID mappings
type:: basic:file
description:: The file should be tab-separated and contain two columns with their column names. The first column should contain Gene IDs and the second one should contain probe names. Supported file extensions are .tab.*, .tsv.*, .txt.*
required:: False
disabled:: False
hidden:: False

advanced.source

label:

Gene ID source

type:

basic:string

description:

Gene ID source used for probe mapping is required when using a custom file.

required:

False

disabled:

False

hidden:

False

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

advanced.build

label:: Genome build
type:: basic:string
description:: Genome build of mapping file is required when using a custom file.
required:: False
disabled:: False
hidden:: False

GFF3 file

data:annotation:gff3upload-gff3 (basic:file src, basic:string source, basic:string species, basic:string build)[Source: v3.5.0]

Import a General Feature Format (GFF) file which is a file format used for describing genes and other features of DNA, RNA and protein sequences. See [here](https://useast.ensembl.org/info/website/upload/gff3.html) and [here](https://en.wikipedia.org/wiki/General_feature_format) for more information.

src

label:: Annotation (GFF3)
type:: basic:file
description:: Annotation in GFF3 format. Supported extensions are: .gff, .gff3 and .gtf
validate_regex:: \.(gff|gff3|gtf)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

source

label:

Gene ID database

type:

basic:string

choices:

DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum

build

label:: Build
type:: basic:string

annot

label:: Uploaded GFF3 file
type:: basic:file

annot_sorted

label:: Sorted GFF3 file
type:: basic:file

annot_sorted_idx_igv

label:: IGV index for sorted GFF3
type:: basic:file

annot_sorted_track_jbrowse

label:: Jbrowse track for sorted GFF3
type:: basic:file

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

GTF file

data:annotation:gtfupload-gtf (basic:file src, basic:string source, basic:string species, basic:string build)[Source: v3.5.0]

Import a Gene Transfer Format (GTF) file. It is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information. See [here](https://en.wikipedia.org/wiki/General_feature_format) for differences between GFF and GTF files.

src

label:: Annotation (GTF)
type:: basic:file
description:: Annotation in GTF format.
validate_regex:: \.(gtf|gff)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

source

label:

Gene ID database

type:

basic:string

choices:

DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum

build

label:: Build
type:: basic:string

annot

label:: Uploaded GTF file
type:: basic:file

annot_sorted

label:: Sorted GTF file
type:: basic:file

annot_sorted_idx_igv

label:: IGV index for sorted GTF file
type:: basic:file
required:: False

annot_sorted_track_jbrowse

label:: Jbrowse track for sorted GTF
type:: basic:file
required:: False

source

label:: Gene ID database
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Gene set

data:geneset:upload-geneset (basic:file src, basic:string source, basic:string species)[Source: v1.3.2]

Upload a set of genes. Provide one gene ID per line in a .tab, .tab.gz, or .txt file format.

src

label:: Gene set
type:: basic:file
description:: List of genes (.tab/.txt extension), one gene ID per line.
required:: True
disabled:: False
hidden:: False

source

label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

geneset

label:: Gene set
type:: basic:file
required:: True
disabled:: False
hidden:: False

geneset_json

label:: Gene set (JSON)
type:: basic:json
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID source
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Gene set (create from Venn diagram)

data:geneset:venn:create-geneset-venn (list:basic:string genes, basic:string source, basic:string species, basic:file venn)[Source: v1.3.2]

Create a gene set from a Venn diagram.

genes

label:: Genes
type:: list:basic:string
description:: List of genes.
required:: True
disabled:: False
hidden:: False

source

label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

venn

label:: Venn diagram
type:: basic:file
description:: JSON file of Venn diagram.
required:: True
disabled:: False
hidden:: False

geneset

label:: Gene set
type:: basic:file
required:: True
disabled:: False
hidden:: False

geneset_json

label:: Gene set (JSON)
type:: basic:json
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID source
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

venn

label:: Venn diagram
type:: basic:json
required:: True
disabled:: False
hidden:: False

Gene set (create)

data:geneset:create-geneset (list:basic:string genes, basic:string source, basic:string species)[Source: v1.3.2]

Create a gene set from a list of genes.

genes

label:: Genes
type:: list:basic:string
description:: List of genes.
required:: True
disabled:: False
hidden:: False

source

label:

Gene ID source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

geneset

label:: Gene set
type:: basic:file
required:: True
disabled:: False
hidden:: False

geneset_json

label:: Gene set (JSON)
type:: basic:json
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID source
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

HISAT2

data:alignment:bam:hisat2alignment-hisat2 (data:index:hisat2 genome, data:reads:fastq reads, basic:boolean softclip, basic:integer noncansplice, basic:boolean cufflinks)[Source: v2.6.1]

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of genomes (as well as to a single reference genome). See [here](https://ccb.jhu.edu/software/hisat2/index.shtml) for more information.

genome

label:: Reference genome
type:: data:index:hisat2

reads

label:: Reads
type:: data:reads:fastq

softclip

label:: Disallow soft clipping
type:: basic:boolean
default:: False

spliced_alignments.noncansplice

label:: Non-canonical splice sites penalty (optional)
type:: basic:integer
description:: Sets the penalty for each pair of non-canonical splice sites (e.g. non-GT/AG).
required:: False

spliced_alignments.cufflinks

label:: Report alignments tailored specifically for Cufflinks
type:: basic:boolean
description:: With this option, HISAT2 looks for novel splice sites with three signals (GT/AG, GC/AG, AT/AC), but all user-provided splice sites are used irrespective of their signals. HISAT2 produces an optional field, XS:A:[+-], for every spliced alignment.
default:: False

bam

label:: Alignment file
type:: basic:file
description:: Position sorted alignment

bai

label:: Index BAI
type:: basic:file

stats

label:: Statistics
type:: basic:file

splice_junctions

label:: Splice junctions
type:: basic:file

unmapped_f

label:: Unmapped reads (mate 1)
type:: basic:file
required:: False

unmapped_r

label:: Unmapped reads (mate 2)
type:: basic:file
required:: False

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

HISAT2 genome index

data:index:hisat2:hisat2-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]

Create HISAT2 genome index.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

index

label:: HISAT2 index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

HMR

data:wgbs:hmrhmr (data:wgbs:methcounts methcounts)[Source: v1.4.0]

Identify hypo-methylated regions.

methcounts

label:: Methylation levels
type:: data:wgbs:methcounts
description:: Methylation levels data calculated using methcounts.

hmr

label:: Hypo-methylated regions
type:: basic:file

tbi_jbrowse

label:: Bed file index for Jbrowse
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Hierarchical clustering of time courses

data:clustering:hierarchical:etc:clustering-hierarchical-etc (list:data:expression expressions, list:basic:string genes, basic:string gene_species, basic:string gene_source, basic:string distance, basic:string linkage, basic:boolean ordering)[Source: v1.3.1]

Cluster gene expression time courses. Hierarchical clustering of expression time courses.

expressions

label:: Time series relation
type:: list:data:expression
description:: Select time course to which the expressions belong to.
required:: True
disabled:: False
hidden:: False

genes

label:: Gene subset
type:: list:basic:string
description:: Select at least two genes or leave this field empty.
required:: False
disabled:: False
hidden:: False

gene_species

label:

Species

type:

basic:string

description:

Species to which the selected genes belong to. This field is required if gene subset is set.

required:

False

disabled:

False

hidden:

!genes

choices:

Dictyostelium discoideum: Dictyostelium discoideum
Homo sapiens: Homo sapiens
Macaca mulatta: Macaca mulatta
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus

gene_source

label:: Gene ID database of selected genes
type:: basic:string
description:: This field is required if gene subset is set.
required:: False
disabled:: False
hidden:: !genes

distance

label:

Distance metric

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

spearman

choices:

Euclidean: euclidean
Spearman: spearman
Pearson: pearson

linkage

label:

Linkage method

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

average

choices:

single: single
average: average
complete: complete

ordering

label:: Use optimal ordering
type:: basic:boolean
description:: Results in a more intuitive tree structure, but may slow down the clustering on large datasets
required:: True
disabled:: False
hidden:: False
default:: False

cluster

label:: Hieararhical clustering
type:: basic:json
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID database
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

feature_type

label:: Feature type
type:: basic:string
required:: True
disabled:: False
hidden:: False

IDAT file

data:methylationarray:idat:upload-idat (basic:file red_channel, basic:file green_channel, basic:string species, basic:string platform)[Source: v1.1.1]

Upload Illumina methylation array raw IDAT data. This import process accepts Illumina methylation array BeadChip raw files in IDAT format. Two input files, one for each of the Green and Red signal channels, are expected. The uploads of human (HM27, HM450, EPIC) and mouse (MM285) array types are supported.

red_channel

label:: Red channel IDAT file (*_Red.idat)
type:: basic:file
required:: True
disabled:: False
hidden:: False

green_channel

label:: Green channel IDAT file (*_Grn.idat)
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu.

required:

True

disabled:

False

hidden:

False

default:

Homo sapiens

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus

platform

label:

Protein ID database source

type:

basic:string

description:

Select a methylation array platform for human (HM450, HM27, EPIC) or mouse (MM285) samples.

required:

True

disabled:

False

hidden:

False

default:

HM450

choices:

HM450: HM450
HM27: HM27
EPIC: EPIC
MM285: MM285

red_channel

label:: Red channel IDAT file
type:: basic:file
required:: True
disabled:: False
hidden:: False

green_channel

label:: Green channel IDAT file
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

platform

label:: Platform
type:: basic:string
required:: True
disabled:: False
hidden:: False

MACS 1.4

data:chipseq:callpeak:macs14macs14 (data:alignment:bam treatment, data:alignment:bam control, basic:string pvalue)[Source: v3.5.1]

Model-based Analysis of ChIP-Seq (MACS 1.4) empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. See the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2592715/) for more information.

treatment

label:: BAM File
type:: data:alignment:bam

control

label:: BAM Background File
type:: data:alignment:bam
required:: False

pvalue

label:

P-value

type:

basic:string

default:

1e-9

choices:

1e-9: 1e-9
1e-6: 1e-6

peaks_bed

label:: Peaks (BED)
type:: basic:file

summits_bed

label:: Summits (BED)
type:: basic:file

peaks_xls

label:: Peaks (XLS)
type:: basic:file

wiggle

label:: Wiggle
type:: basic:file

control_bigwig

label:: Control (bigWig)
type:: basic:file
required:: False

treat_bigwig

label:: Treat (bigWig)
type:: basic:file

peaks_bigbed_igv_ucsc

label:: Peaks (bigBed)
type:: basic:file
required:: False

summits_bigbed_igv_ucsc

label:: Summits (bigBed)
type:: basic:file
required:: False

peaks_tbi_jbrowse

label:: JBrowse track peaks file
type:: basic:file

summits_tbi_jbrowse

label:: JBrowse track summits file
type:: basic:file

model

label:: Model
type:: basic:file
required:: False

neg_peaks

label:: Negative peaks (XLS)
type:: basic:file
required:: False

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

MACS 2.0

data:chipseq:callpeak:macs2:macs2-callpeak (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string format, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v4.8.1]

Call ChIP-Seq peaks with MACS 2.0. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

case

label:: Case (treatment)
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

control

label:: Control (background)
type:: data:alignment:bam
required:: False
disabled:: False
hidden:: False

promoter

label:: Promoter regions BED file
type:: data:bed
description:: BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:: False
disabled:: False
hidden:: False

tagalign

label:: Use tagAlign files
type:: basic:boolean
description:: Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
required:: True
disabled:: False
hidden:: False
default:: False

prepeakqc_settings.q_threshold

label:: Quality filtering threshold
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 30

prepeakqc_settings.n_sub

label:: Number of reads to subsample
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 15000000

prepeakqc_settings.tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on ‘+’ strand by 4bp and reads on ‘-’ strand by 5bp.
required:: True
disabled:: False
hidden:: False
default:: False

prepeakqc_settings.shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:: False
disabled:: False
hidden:: False

settings.format

label:

Format of tag file

type:

basic:string

description:

This specifies the format of input files. For paired-end data the format dictates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.

required:

True

disabled:

False

hidden:

tagalign

default:

BAM

choices:

BAM: BAM
BAMPE: BAMPE

settings.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

disabled:

False

hidden:

tagalign

choices:

1: 1
auto: auto
all: all

settings.duplicates_prepeak

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

True

disabled:

False

hidden:

!tagalign

default:

all

choices:

1: 1
auto: auto
all: all

settings.qvalue

label:: Q-value cutoff
type:: basic:decimal
description:: The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:: False
disabled:: settings.pvalue && settings.pvalue_prepeak
hidden:: False

settings.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: False
disabled:: settings.qvalue
hidden:: tagalign

settings.pvalue_prepeak

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: True
disabled:: settings.qvalue
hidden:: !tagalign || settings.qvalue
default:: 1e-05

settings.cap_num

label:: Cap number of peaks by taking top N peaks
type:: basic:integer
description:: To keep all peaks set value to 0.
required:: True
disabled:: settings.broad
hidden:: False
default:: 500000

settings.mfold_lower

label:: MFOLD range (lower limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False
disabled:: False
hidden:: False

settings.mfold_upper

label:: MFOLD range (upper limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False
disabled:: False
hidden:: False

settings.slocal

label:: Small local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False
disabled:: False
hidden:: False

settings.llocal

label:: Large local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False
disabled:: False
hidden:: False

settings.extsize

label:: Extension size [–extsize]
type:: basic:integer
description:: While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:: False
disabled:: False
hidden:: False

settings.shift

label:: Shift
type:: basic:integer
description:: Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:: False
disabled:: False
hidden:: settings.format == ‘BAMPE’

settings.band_width

label:: Band width
type:: basic:integer
description:: The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:: False
disabled:: False
hidden:: False

settings.nolambda

label:: Use background lambda as local lambda
type:: basic:boolean
description:: With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
required:: True
disabled:: False
hidden:: False
default:: False

settings.fix_bimodal

label:: Turn on the auto paired-peak model process
type:: basic:boolean
description:: Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
required:: True
disabled:: False
hidden:: False
default:: False

settings.nomodel

label:: Bypass building the shifting model [–nomodel]
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
required:: True
disabled:: False
hidden:: tagalign
default:: False

settings.nomodel_prepeak

label:: Bypass building the shifting model [–nomodel]
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
required:: True
disabled:: False
hidden:: !tagalign
default:: True

settings.down_sample

label:: Down-sample
type:: basic:boolean
description:: When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and unreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
required:: True
disabled:: False
hidden:: False
default:: False

settings.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
required:: True
disabled:: False
hidden:: False
default:: True

settings.spmr

label:: Save fragment pileup and control lambda
type:: basic:boolean
required:: True
disabled:: settings.bedgraph === false
hidden:: False
default:: True

settings.call_summits

label:: Call summits [–call-summits]
type:: basic:boolean
description:: MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
required:: True
disabled:: False
hidden:: False
default:: False

settings.broad

label:: Composite broad regions [–broad]
type:: basic:boolean
description:: When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
required:: True
disabled:: settings.call_summits === true
hidden:: False
default:: False

settings.broad_cutoff

label:: Broad cutoff
type:: basic:decimal
description:: Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:: False
disabled:: settings.call_summits === true || settings.broad !== true
hidden:: False

called_peaks

label:: Called peaks
type:: basic:file
required:: True
disabled:: False
hidden:: False

narrow_peaks

label:: Narrow peaks
type:: basic:file
required:: False
disabled:: False
hidden:: False

chip_qc

label:: QC report
type:: basic:file
required:: False
disabled:: False
hidden:: False

case_prepeak_qc

label:: Pre-peak QC report (case)
type:: basic:file
required:: True
disabled:: False
hidden:: False

case_tagalign

label:: Filtered tagAlign (case)
type:: basic:file
required:: True
disabled:: False
hidden:: False

case_bam

label:: Filtered BAM (case)
type:: basic:file
required:: True
disabled:: False
hidden:: False

case_bai

label:: Filtered BAM index (case)
type:: basic:file
required:: True
disabled:: False
hidden:: False

control_prepeak_qc

label:: Pre-peak QC report (control)
type:: basic:file
required:: False
disabled:: False
hidden:: False

control_tagalign

label:: Filtered tagAlign (control)
type:: basic:file
required:: False
disabled:: False
hidden:: False

control_bam

label:: Filtered BAM (control)
type:: basic:file
required:: False
disabled:: False
hidden:: False

control_bai

label:: Filtered BAM index (control)
type:: basic:file
required:: False
disabled:: False
hidden:: False

narrow_peaks_bigbed_igv_ucsc

label:: Narrow peaks (BigBed)
type:: basic:file
required:: False
disabled:: False
hidden:: False

summits

label:: Peak summits
type:: basic:file
required:: False
disabled:: False
hidden:: False

summits_tbi_jbrowse

label:: Peak summits tbi index for JBrowse
type:: basic:file
required:: False
disabled:: False
hidden:: False

summits_bigbed_igv_ucsc

label:: Summits (bigBed)
type:: basic:file
required:: False
disabled:: False
hidden:: False

broad_peaks

label:: Broad peaks
type:: basic:file
required:: False
disabled:: False
hidden:: False

gappedPeak

label:: Broad peaks (bed12/gappedPeak)
type:: basic:file
required:: False
disabled:: False
hidden:: False

treat_pileup

label:: Treatment pileup (bedGraph)
type:: basic:file
required:: False
disabled:: False
hidden:: False

treat_pileup_bigwig

label:: Treatment pileup (bigWig)
type:: basic:file
required:: False
disabled:: False
hidden:: False

control_lambda

label:: Control lambda (bedGraph)
type:: basic:file
required:: False
disabled:: False
hidden:: False

control_lambda_bigwig

label:: Control lambda (bigwig)
type:: basic:file
required:: False
disabled:: False
hidden:: False

model

label:: Model
type:: basic:file
required:: False
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

MACS2

data:workflow:chipseq:macs2rose2workflow-macs2 (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.2.0]

case

label:: Case (treatment)
type:: data:alignment:bam

control

label:: Control (background)
type:: data:alignment:bam
required:: False

promoter

label:: Promoter regions BED file
type:: data:bed
description:: BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:: False

tagalign

label:: Use tagAlign files
type:: basic:boolean
description:: Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:: False

prepeakqc_settings.q_threshold

label:: Quality filtering threshold
type:: basic:integer
default:: 30

prepeakqc_settings.n_sub

label:: Number of reads to subsample
type:: basic:integer
default:: 15000000

prepeakqc_settings.tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
default:: False

prepeakqc_settings.shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:: False

settings.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

1: 1
auto: auto
all: all

settings.duplicates_prepeak

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

1: 1
auto: auto
all: all

settings.qvalue

label:: Q-value cutoff
type:: basic:decimal
description:: The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:: False
disabled:: settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: False
disabled:: settings.qvalue
hidden:: tagalign

settings.pvalue_prepeak

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:: settings.qvalue
hidden:: !tagalign || settings.qvalue
default:: 1e-05

settings.cap_num

label:: Cap number of peaks by taking top N peaks
type:: basic:integer
description:: To keep all peaks set value to 0.
disabled:: settings.broad
default:: 500000

settings.mfold_lower

label:: MFOLD range (lower limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.mfold_upper

label:: MFOLD range (upper limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.slocal

label:: Small local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.llocal

label:: Large local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.extsize

label:: extsize
type:: basic:integer
description:: While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:: False

settings.shift

label:: Shift
type:: basic:integer
description:: Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:: False

settings.band_width

label:: Band width
type:: basic:integer
description:: The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:: False

settings.nolambda

label:: Use backgroud lambda as local lambda
type:: basic:boolean
description:: With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:: False

settings.fix_bimodal

label:: Turn on the auto paired-peak model process
type:: basic:boolean
description:: Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model is failed.
default:: False

settings.nomodel

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: tagalign
default:: False

settings.nomodel_prepeak

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: !tagalign
default:: True

settings.down_sample

label:: Down-sample
type:: basic:boolean
description:: When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
default:: False

settings.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:: True

settings.spmr

label:: Save signal per million reads for fragment pileup profiles
type:: basic:boolean
disabled:: settings.bedgraph === false
default:: True

settings.call_summits

label:: Call summits
type:: basic:boolean
description:: MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:: False

settings.broad

label:: Composite broad regions
type:: basic:boolean
description:: When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:: settings.call_summits === true
default:: False

settings.broad_cutoff

label:: Broad cutoff
type:: basic:decimal
description:: Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:: False
disabled:: settings.call_summits === true || settings.broad !== true

chipqc_settings.blacklist

label:: Blacklist regions
type:: data:bed
description:: BED file containing genomic regions that should be excluded from the analysis.
required:: False

chipqc_settings.calculate_enrichment

label:: Calculate enrichment
type:: basic:boolean
description:: Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome build which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
default:: False

chipqc_settings.profile_window

label:: Window size
type:: basic:integer
description:: An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
default:: 400

chipqc_settings.shift_size

label:: Shift size
type:: basic:string
description:: Vector of values to try when computing optimal shift sizes. It should be specified as consecutive numbers vector with start:end
default:: 1:300

MACS2 - ROSE2

data:workflow:chipseq:macs2rose2workflow-macs-rose (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:boolean use_filtered_bam, basic:integer tss, basic:integer stitch, data:bed mask, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.4.0]

case

label:: Case (treatment)
type:: data:alignment:bam

control

label:: Control (background)
type:: data:alignment:bam
required:: False

promoter

label:: Promoter regions BED file
type:: data:bed
description:: BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:: False

tagalign

label:: Use tagAlign files
type:: basic:boolean
description:: Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:: False

prepeakqc_settings.q_threshold

label:: Quality filtering threshold
type:: basic:integer
default:: 30

prepeakqc_settings.n_sub

label:: Number of reads to subsample
type:: basic:integer
default:: 15000000

prepeakqc_settings.tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
default:: False

prepeakqc_settings.shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:: False

settings.duplicates

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

tagalign

choices:

1: 1
auto: auto
all: all

settings.duplicates_prepeak

label:

Number of duplicates

type:

basic:string

description:

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required:

False

hidden:

!tagalign

default:

all

choices:

1: 1
auto: auto
all: all

settings.qvalue

label:: Q-value cutoff
type:: basic:decimal
description:: The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:: False
disabled:: settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:: False
disabled:: settings.qvalue
hidden:: tagalign

settings.pvalue_prepeak

label:: P-value cutoff
type:: basic:decimal
description:: The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:: settings.qvalue
hidden:: !tagalign || settings.qvalue
default:: 1e-05

settings.cap_num

label:: Cap number of peaks by taking top N peaks
type:: basic:integer
description:: To keep all peaks set value to 0.
disabled:: settings.broad
default:: 500000

settings.mfold_lower

label:: MFOLD range (lower limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.mfold_upper

label:: MFOLD range (upper limit)
type:: basic:integer
description:: This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:: False

settings.slocal

label:: Small local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.llocal

label:: Large local region
type:: basic:integer
description:: Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:: False

settings.extsize

label:: extsize
type:: basic:integer
description:: While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:: False

settings.shift

label:: Shift
type:: basic:integer
description:: Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:: False

settings.band_width

label:: Band width
type:: basic:integer
description:: The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:: False

settings.nolambda

label:: Use backgroud lambda as local lambda
type:: basic:boolean
description:: With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:: False

settings.fix_bimodal

label:: Turn on the auto paired-peak model process
type:: basic:boolean
description:: Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
default:: False

settings.nomodel

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: tagalign
default:: False

settings.nomodel_prepeak

label:: Bypass building the shifting model
type:: basic:boolean
description:: While on, MACS will bypass building the shifting model.
hidden:: !tagalign
default:: True

settings.down_sample

label:: Down-sample
type:: basic:boolean
description:: When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
default:: False

settings.bedgraph

label:: Save fragment pileup and control lambda
type:: basic:boolean
description:: If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:: True

settings.spmr

label:: Save signal per million reads for fragment pileup profiles
type:: basic:boolean
disabled:: settings.bedgraph === false
default:: True

settings.call_summits

label:: Call summits
type:: basic:boolean
description:: MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:: False

settings.broad

label:: Composite broad regions
type:: basic:boolean
description:: When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:: settings.call_summits === true
default:: False

settings.broad_cutoff

label:: Broad cutoff
type:: basic:decimal
description:: Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:: False
disabled:: settings.call_summits === true || settings.broad !== true

rose_settings.use_filtered_bam

label:: Use Filtered BAM File
type:: basic:boolean
description:: Use filtered BAM file from a MACS2 object to rank enhancers by.
default:: False

rose_settings.tss

label:: TSS exclusion
type:: basic:integer
description:: Enter a distance from TSS to exclude. 0 = no TSS exclusion
default:: 0

rose_settings.stitch

label:: Stitch
type:: basic:integer
description:: Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
required:: False

rose_settings.mask

label:: Masking BED file
type:: data:bed
description:: Mask a set of regions from analysis. Provide a BED of masking regions.
required:: False

chipqc_settings.blacklist

label:: Blacklist regions
type:: data:bed
description:: BED file containing genomic regions that should be excluded from the analysis.
required:: False

chipqc_settings.calculate_enrichment

label:: Calculate enrichment
type:: basic:boolean
description:: Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome build which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
default:: False

chipqc_settings.profile_window

label:: Window size
type:: basic:integer
description:: An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
default:: 400

chipqc_settings.shift_size

label:: Shift size
type:: basic:string
description:: Vector of values to try when computing optimal shift sizes. It should be specified as consecutive numbers vector with start:end
default:: 1:300

ML-ready expression

data:ml:table:expressions:upload-ml-expression (basic:file exp, basic:string source, basic:string species, data:ml:space reference_space)[Source: v1.0.2]

Upload ML-ready expression matrix.

exp

label:: Transformed expressions
type:: basic:file
description:: A TAB separated file containing transformed expression values with sample IDs for index (first column with label sample_id) and ENSEMBL IDs (recommended but not required) for the column names.
required:: True
disabled:: False
hidden:: False

source

label:

Feature source

type:

basic:string

required:

True

disabled:

False

hidden:

False

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum

reference_space

label:: Reference space of ML-ready data
type:: data:ml:space
required:: True
disabled:: False
hidden:: False

exp

label:: Transformed expressions
type:: basic:file
required:: True
disabled:: False
hidden:: False

source

label:: Feature source
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Map microarray probes

data:microarray:mapping:map-microarray-probes (list:data:microarray:normalized expressions, basic:file mapping_file, basic:string source, basic:string build)[Source: v1.1.1]

Map microarray probes to Gene IDs. Mapping can be done automatically or using a custom mapping file. For automatic probe mapping all ‘Normalized expression’ objects should have a GEO platform ID. If the platform is supported the provided probe IDs will be mapped to the corresponding Ensembl IDs. Currently supported platforms are: GPL74, GPL201, GPL96, GPL571, GPL97, GPL570, GPL91, GPL8300, GPL92, GPL93, GPL94, GPL95, GPL17586, GPL5175, GPL80, GPL6244, GPL16686, GPL15207, GPL1352, GPL11068, GPL26966, GPL6848, GPL14550, GPL17077, GPL16981, GPL13497, GPL6947, GPL10558, GPL6883, GPL13376,GPL6884, GPL6254.

expressions

label:: Normalized expressions
type:: list:data:microarray:normalized
required:: True
disabled:: False
hidden:: False

mapping_file

label:: File with probe ID mappings
type:: basic:file
description:: The file should be tab-separated and contain two columns with their column names. The first column should contain Gene IDs and the second one should contain probe names. Supported file extensions are .tab.*, .tsv.*, .txt.*
required:: False
disabled:: False
hidden:: False

source

label:

Gene ID source

type:

basic:string

description:

Gene ID source used for probe mapping is required when using a custom file.

required:

False

disabled:

False

hidden:

False

choices:

AFFY: AFFY
DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

build

label:: Genome build
type:: basic:string
description:: Genome build of mapping file is required when using a custom file.
required:: False
disabled:: False
hidden:: False

mapped_exp

label:: Mapped expressions
type:: basic:file
required:: True
disabled:: False
hidden:: False

probe_mapping

label:: Probe to transcript mapping used
type:: basic:string
required:: True
disabled:: False
hidden:: False

mapping

label:: Mapping file
type:: basic:file
required:: True
disabled:: False
hidden:: False

platform

label:: Microarray platform type
type:: basic:string
required:: True
disabled:: False
hidden:: False

platform_id

label:: GEO platform ID
type:: basic:string
required:: False
disabled:: False
hidden:: False

Mappability

data:mappability:bcmmappability-bcm (data:index:bowtie genome, data:annotation:gff3 gff, basic:integer length)[Source: v3.1.2]

Compute genome mappability. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky’s Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

genome

label:: Reference genome
type:: data:index:bowtie

gff

label:: General feature format
type:: data:annotation:gff3

length

label:: Read length
type:: basic:integer
default:: 50

mappability

label:: Mappability
type:: basic:file

Mappability info

data:mappability:bcmupload-mappability (basic:file src)[Source: v1.2.3]

Upload mappability information.

src

label:: Mappability file
type:: basic:file
description:: Mappability file: 2 column tab separated
validate_regex:: \.(tab)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

mappability

label:: Uploaded mappability
type:: basic:file

MarkDuplicates

data:alignment:bam:markduplicate:markduplicates (data:alignment:bam bam, basic:boolean skip, basic:boolean remove_duplicates, basic:string validation_stringency, basic:string assume_sort_order, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.7.0]

Remove duplicate reads from BAM file. Tool from Picard, wrapped by GATK4. See GATK MarkDuplicates for more information.

bam

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

skip

label:: Skip MarkDuplicates step
type:: basic:boolean
description:: MarkDuplicates step can be skipped.
required:: True
disabled:: False
hidden:: False
default:: False

remove_duplicates

label:: Remove duplicates
type:: basic:boolean
description:: If true do not write duplicates to the output file instead of writing them with appropriate flags set.
required:: True
disabled:: False
hidden:: False
default:: False

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

assume_sort_order

label:

Assume sort order

type:

basic:string

description:

If not null (default), assume that the input file has this order even if the header says otherwise.Possible values are unsorted, queryname, coordinate and unknown.

required:

True

disabled:

False

hidden:

False

default:

choices:

as in BAM header (default):
unsorted: unsorted
queryname: queryname
coordinate: coordinate
duplicate: duplicate
unknown: unknown

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

bam

label:: Marked duplicates BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of marked duplicates BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

metrics_file

label:: Metrics from MarkDuplicate process
type:: basic:file
required:: True
disabled:: False
hidden:: False

Merge Expressions (ETC)

data:expressionset:etcmergeetc (list:data:etc exps, list:basic:string genes)[Source: v1.2.4]

Merge Expression Time Course (ETC) data.

exps

label:: Expression Time Course (ETC)
type:: list:data:etc

genes

label:: Filter genes
type:: list:basic:string
required:: False

expset

label:: Expression set
type:: basic:file

expset_type

label:: Expression set type
type:: basic:string

Merge FASTQ (paired-end)

data:mergereads:paired:merge-fastq-paired (list:data:reads:fastq:paired: reads)[Source: v2.2.2]

Merge paired-end FASTQs into one sample. Samples are merged based on the defined replicate group relations and then uploaded as separate samples.

reads

label:: Select relations
type:: list:data:reads:fastq:paired:
description:: Define and select Replicate relations.
required:: True
disabled:: False
hidden:: False

Merge FASTQ (single-end)

data:mergereads:single:merge-fastq-single (list:data:reads:fastq:single: reads)[Source: v2.2.2]

Merge single-end FASTQs into one sample. Samples are merged based on the defined replicate group relations and then uploaded as separate samples.

reads

label:: Select relations
type:: list:data:reads:fastq:single:
description:: Define and select replicate relations.
required:: True
disabled:: False
hidden:: False

Metadata table

data:metadata:upload-metadata (basic:file src)[Source: v1.1.1]

Upload metadata file where more than one row can match to a single sample. The uploaded metadata table represents one-to-many (1:n) relation to samples in the working collection. Metadata table must contain a column with one of the following headers: “Sample ID”, “Sample name” or “Sample slug”.

src

label:: Table with metadata
type:: basic:file
description:: The metadata table should use one of the following extensions: .csv, .tab, .tsv, .xlsx, .xls
required:: True
disabled:: False
hidden:: False

table

label:: Uploaded table
type:: basic:file
required:: True
disabled:: False
hidden:: False

n_samples

label:: Number of samples
type:: basic:integer
required:: True
disabled:: False
hidden:: False

Metadata table (one-to-one)

data:metadata:unique:upload-metadata-unique (basic:file src)[Source: v1.1.1]

Upload metadata file where each row corresponds to a single sample. The uploaded metadata table represents one-to-one (1:1) relation to samples in the working collection. Metadata table must contain a column with one of the following headers: “Sample ID”, “Sample name” or “Sample slug”.

src

label:: Table with metadata
type:: basic:file
description:: The metadata table should use one of the following extensions: .csv, .tab, .tsv, .xlsx, .xls
required:: True
disabled:: False
hidden:: False

table

label:: Uploaded table
type:: basic:file
required:: True
disabled:: False
hidden:: False

n_samples

label:: Number of samples
type:: basic:integer
required:: True
disabled:: False
hidden:: False

MultiQC

data:multiqc:multiqc (list:data: data, basic:boolean dirs, basic:integer dirs_depth, basic:boolean fullnames, basic:boolean config, basic:string cl_config)[Source: v1.22.0]

Aggregate results from bioinformatics analyses across many samples into a single report. [MultiQC](http://www.multiqc.info) searches a given directory for analysis logs and compiles a HTML report. It’s a general purpose tool, perfect for summarising the output from numerous bioinformatics tools.

data

label:: Input data
type:: list:data:
required:: True
disabled:: False
hidden:: False

advanced.dirs

label:: –dirs
type:: basic:boolean
description:: Prepend directory to sample names.
required:: True
disabled:: False
hidden:: False
default:: True

advanced.dirs_depth

label:: –dirs-depth
type:: basic:integer
description:: Prepend a specified number of directories to sample names. Enter a negative number (default) to take from start of path.
required:: True
disabled:: False
hidden:: False
default:: -1

advanced.fullnames

label:: –fullnames
type:: basic:boolean
description:: Disable the sample name cleaning (leave as full file name).
required:: True
disabled:: False
hidden:: False
default:: False

advanced.config

label:: Use configuration file
type:: basic:boolean
description:: Use Genialis configuration file for MultiQC report.
required:: True
disabled:: False
hidden:: False
default:: True

advanced.cl_config

label:: –cl-config
type:: basic:string
description:: Enter text with command-line configuration options to override the defaults (e.g. custom_logo_url: https://www.genialis.com).
required:: False
disabled:: False
hidden:: False

report

label:: MultiQC report
type:: basic:file:html
required:: True
disabled:: False
hidden:: False

report_data

label:: Report data
type:: basic:dir
required:: True
disabled:: False
hidden:: False

OBO file

data:ontology:oboupload-obo (basic:file src)[Source: v1.4.0]

Upload gene ontology in OBO format.

src

label:: Gene ontology (OBO)
type:: basic:file
description:: Gene ontology in OBO format.
required:: True
validate_regex:: \.obo(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

obo

label:: Ontology file
type:: basic:file

obo_obj

label:: OBO object
type:: basic:file

PCA

data:pcapca (list:data:expression exps, list:basic:string genes, basic:string source, basic:string species)[Source: v2.4.2]

Principal component analysis (PCA)

exps

label:: Expressions
type:: list:data:expression

genes

label:: Gene subset
type:: list:basic:string
required:: False

source

label:: Gene ID database of selected genes
type:: basic:string
description:: This field is required if gene subset is set.
required:: False

species

label:

Species

type:

basic:string

description:

Species latin name. This field is required if gene subset is set.

required:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

pca

label:: PCA
type:: basic:json

Picard AlignmentSummary

data:picard:summary:alignment-summary (data:alignment:bam bam, data:seq:nucleotide genome, data:seq:nucleotide adapters, basic:string validation_stringency, basic:integer insert_size, basic:string pair_orientation, basic:boolean bisulfite, basic:boolean assume_sorted)[Source: v2.3.0]

Produce a summary of alignment metrics from BAM file. Tool from Picard, wrapped by GATK4. See GATK CollectAlignmentSummaryMetrics for more information.

bam

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

genome

label:: Genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

adapters

label:: Adapter sequences
type:: data:seq:nucleotide
required:: False
disabled:: False
hidden:: False

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

insert_size

label:: Maximum insert size
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 100000

pair_orientation

label:

Pair orientation

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

null

choices:

Unspecified: null
FR: FR
RF: RF
TANDEM: TANDEM

bisulfite

label:: BAM file consists of bisulfite sequenced reads
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

assume_sorted

label:: Sorted BAM file
type:: basic:boolean
description:: If true the sort order in the header file will be ignored.
required:: True
disabled:: False
hidden:: False
default:: False

report

label:: Alignement metrics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Picard CollectRrbsMetrics

data:picard:rrbs:rrbs-metrics (data:alignment:bam bam, data:seq:nucleotide genome, basic:integer min_quality, basic:integer next_base_quality, basic:integer min_lenght, basic:decimal mismatch_rate, basic:string validation_stringency, basic:boolean assume_sorted)[Source: v2.3.0]

Produce metrics for RRBS data based on the methylation status. This tool uses reduced representation bisulfite sequencing (Rrbs) data to determine cytosine methylation status across all reads of a genomic DNA sequence. Tool is wrapped by GATK4. See GATK CollectRrbsMetrics for more information.

bam

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

genome

label:: Genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

min_quality

label:: Threshold for base quality of a C base before it is considered
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 20

next_base_quality

label:: Threshold for quality of a base next to a C before the C base is considered
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 10

min_lenght

label:: Minimum read length
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 5

mismatch_rate

label:: Maximum fraction of mismatches in a read to be considered (Range: 0 and 1)
type:: basic:decimal
required:: True
disabled:: False
hidden:: False
default:: 0.1

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

assume_sorted

label:: Sorted BAM file
type:: basic:boolean
description:: If true the sort order in the header file will be ignored.
required:: True
disabled:: False
hidden:: False
default:: False

report

label:: RRBS summary metrics
type:: basic:file
required:: True
disabled:: False
hidden:: False

detailed_report

label:: Detailed RRBS report
type:: basic:file
required:: True
disabled:: False
hidden:: False

plot

label:: QC plots
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Picard InsertSizeMetrics

data:picard:insert:insert-size (data:alignment:bam bam, data:seq:nucleotide genome, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations, basic:string validation_stringency, basic:boolean assume_sorted)[Source: v2.3.0]

Collect metrics about the insert size of a paired-end library. Tool from Picard, wrapped by GATK4. See GATK CollectInsertSizeMetrics for more information.

bam

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

genome

label:: Genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

minimum_fraction

label:: Minimum fraction of reads in a category to be considered
type:: basic:decimal
description:: When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
required:: True
disabled:: False
hidden:: False
default:: 0.05

include_duplicates

label:: Include reads marked as duplicates in the insert size histogram
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

deviations

label:: Deviations limit
type:: basic:decimal
description:: Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
required:: True
disabled:: False
hidden:: False
default:: 10.0

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

assume_sorted

label:: Sorted BAM file
type:: basic:boolean
description:: If True, the sort order in the header file will be ignored.
required:: True
disabled:: False
hidden:: False
default:: False

report

label:: Insert size metrics
type:: basic:file
required:: True
disabled:: False
hidden:: False

plot

label:: Insert size histogram
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Picard WGS Metrics

data:picard:wgsmetrics:wgs-metrics (data:alignment:bam bam, data:seq:nucleotide genome, basic:integer read_length, basic:boolean create_histogram, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:boolean count_unpaired, basic:integer sample_size, basic:string validation_stringency)[Source: v2.4.0]

Collect metrics about coverage of whole genome sequencing. Tool from Picard, wrapped by GATK4. See GATK CollectWgsMetrics for more information.

bam

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

genome

label:: Genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

read_length

label:: Average read length
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 150

create_histogram

label:: Include data for base quality histogram in the metrics file
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

options.min_map_quality

label:: Minimum mapping quality for a read to contribute coverage
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 20

options.min_quality

label:: Minimum base quality for a base to contribute coverage
type:: basic:integer
description:: N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
required:: True
disabled:: False
hidden:: False
default:: 20

options.coverage_cap

label:: Maximum coverage cap
type:: basic:integer
description:: Treat positions with coverage exceeding this value as if they had coverage at this set value.
required:: True
disabled:: False
hidden:: False
default:: 250

options.accumulation_cap

label:: Ignore positions with coverage above this value
type:: basic:integer
description:: At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value
required:: True
disabled:: False
hidden:: False
default:: 100000

options.count_unpaired

label:: Count unpaired reads and paired reads with one end unmapped
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

options.sample_size

label:: Sample Size used for Theoretical Het Sensitivity sampling
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 10000

options.validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

required:

True

disabled:

False

hidden:

False

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

report

label:: WGS metrics report
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Pre-peakcall QC

data:prepeakqcqc-prepeak (data:alignment:bam alignment, basic:integer q_treshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift)[Source: v0.5.2]

ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. Both fragment length estimation and the tagAlign file can be used as inputs in MACS 2.0. QC report contains ENCODE 3 proposed QC metrics – [NRF, PBC bottlenecking coefficients](https://www.encodeproject.org/data-standards/terms/), [NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

alignment

label:: Aligned reads
type:: data:alignment:bam

q_treshold

label:: Quality filtering treshold
type:: basic:integer
default:: 30

n_sub

label:: Number of reads to subsample
type:: basic:integer
default:: 15000000

tn5

label:: Tn5 shifting
type:: basic:boolean
description:: Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:: False

shift

label:: User-defined cross-correlation peak strandshift
type:: basic:integer
description:: If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:: False

chip_qc

label:: QC report
type:: basic:file

tagalign

label:: Filtered tagAlign
type:: basic:file

fraglen

label:: Fragnment length
type:: basic:integer

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Prepare GEO - ChIP-Seq

data:other:geo:chipseqprepare-geo-chipseq (list:data:reads:fastq reads, list:data:chipseq:callpeak macs, basic:string name)[Source: v2.1.3]

Prepare ChIP-seq data for GEO upload.

reads

label:: Reads
type:: list:data:reads:fastq
description:: List of reads objects. Fastq files will be used.

macs

label:: MACS
type:: list:data:chipseq:callpeak
description:: List of MACS2 or MACS14 objects. BedGraph (MACS2) or Wiggle (MACS14) files will be used.

name

label:: Collection name
type:: basic:string

tarball

label:: GEO folder
type:: basic:file

table

label:: Annotation table
type:: basic:file

Prepare GEO - RNA-Seq

data:other:geo:rnaseqprepare-geo-rnaseq (list:data:reads:fastq reads, list:data:expression expressions, basic:string name)[Source: v0.2.3]

Prepare RNA-Seq data for GEO upload.

reads

label:: Reads
type:: list:data:reads:fastq
description:: List of reads objects. Fastq files will be used.

expressions

label:: Expressions
type:: list:data:expression
description:: Cuffnorm data object. Expression table will be used.

name

label:: Collection name
type:: basic:string

tarball

label:: GEO folder
type:: basic:file

table

label:: Annotation table
type:: basic:file

QoRTs QC

data:qorts:qc:qorts-qc (data:alignment:bam alignment, data:annotation:gtf annotation, basic:string stranded, data:index:salmon cdna_index, basic:integer n_reads, basic:integer maxPhredScore, basic:integer adjustPhredScore)[Source: v1.8.0]

QoRTs QC analysis.

alignment

label:: Alignment
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

annotation

label:: GTF annotation
type:: data:annotation:gtf
required:: True
disabled:: False
hidden:: False

options.stranded

label:

Assay type

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

Strand non-specific: non_specific
Strand-specific forward: forward
Strand-specific reverse: reverse
Detect automatically: auto

options.cdna_index

label:: cDNA index file
type:: data:index:salmon
required:: False
disabled:: False
hidden:: options.stranded != ‘auto’

options.n_reads

label:: Number of reads in subsampled alignment file
type:: basic:integer
required:: True
disabled:: False
hidden:: options.stranded != ‘auto’
default:: 5000000

options.maxPhredScore

label:: Max Phred Score
type:: basic:integer
required:: False
disabled:: False
hidden:: False

options.adjustPhredScore

label:: Adjust Phred Score
type:: basic:integer
required:: False
disabled:: False
hidden:: False

plot

label:: QC multiplot
type:: basic:file
required:: False
disabled:: False
hidden:: False

summary

label:: QC summary
type:: basic:file
required:: True
disabled:: False
hidden:: False

qorts_data

label:: QoRTs report data
type:: basic:file
required:: True
disabled:: False
hidden:: False

QuantSeq workflow

data:workflow:quant:featurecounts:workflow-quantseq (basic:string trimming_tool, data:reads:fastq reads, data:index:star genome, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string assay_type, data:index:star rrna_reference, data:index:star globin_reference, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality)[Source: v5.1.0]

3’ mRNA-Seq pipeline. Reads are preprocessed by __BBDuk__ or __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to assess the rRNA/globin sequence depletion rate.

trimming_tool

label:

Trimming tool

type:

basic:string

description:

Select the trimming tool. If you select BBDuk then please provide adapter sequences in fasta file(s). If you select Cutadapt as a trimming tool, pre-determined adapter sequences will be removed.

required:

True

disabled:

False

hidden:

False

choices:

BBDuk: bbduk
Cutadapt: cutadapt

reads

label:: Input reads (FASTQ)
type:: data:reads:fastq
description:: Reads in FASTQ file, single or paired end.
required:: True
disabled:: False
hidden:: False

genome

label:: Indexed reference genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

adapters

label:: Adapters
type:: list:data:seq:nucleotide
description:: Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
required:: False
disabled:: False
hidden:: trimming_tool != ‘bbduk’

annotation

label:: Annotation
type:: data:annotation
description:: GTF and GFF3 annotation formats are supported.
required:: True
disabled:: False
hidden:: False

assay_type

label:

Assay type

type:

basic:string

description:

In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

required:

False

disabled:

False

hidden:

False

choices:

Strand-specific forward: forward
Strand-specific reverse: reverse

rrna_reference

label:: Indexed rRNA reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: False
disabled:: False
hidden:: False

globin_reference

label:: Indexed Globin reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: False
disabled:: False
hidden:: False

cutadapt.quality_cutoff

label:: Reads quality cutoff
type:: basic:integer
description:: Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.
required:: False
disabled:: False
hidden:: False

downsampling.n_reads

label:: Number of reads
type:: basic:integer
description:: Number of reads to include in subsampling.
required:: True
disabled:: False
hidden:: False
default:: 1000000

downsampling.advanced.seed

label:: Number of reads
type:: basic:integer
description:: Using the same random seed makes reads subsampling reproducible in different environments.
required:: True
disabled:: False
hidden:: False
default:: 11

downsampling.advanced.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the’Number of reads’ input parameter.
required:: False
disabled:: False
hidden:: False

downsampling.advanced.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
required:: True
disabled:: False
hidden:: False
default:: False

preprocessing.quality_encoding_offset

label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+: 33
Illumina up to 1.3+, 1.5+: 64
Auto: auto

preprocessing.ignore_bad_quality

label:: Ignore bad quality
type:: basic:boolean
description:: Don’t crash if quality values appear to be incorrect.
required:: True
disabled:: False
hidden:: False
default:: False

Quantify shRNA species using bowtie2

data:expression:shrna2quantshrna-quant (data:alignment:bam alignment, basic:integer readlengths, basic:integer alignscores)[Source: v1.4.0]

Based on `bowtie2` output (.bam file) calculate number of mapped species. Input is limited to results from `bowtie2` since `YT:Z:` tag used to fetch aligned species is specific to this process. Result is a count matrix (successfully mapped reads) where species are in rows columns contain read specifics (count, species name, sequence, `AS:i:` tag value).

alignment

label:: Alignment
type:: data:alignment:bam
required:: True

readlengths

label:: Species lengths threshold
type:: basic:integer
description:: Species with read lengths below specified threshold will be removed from final output. Default is no removal.

alignscores

label:: Align scores filter threshold
type:: basic:integer
description:: Species with align score below specified threshold will be removed from final output. Default is no removal.

exp

label:: Normalized expression
type:: basic:file

rc

label:: Read counts
type:: basic:file
required:: False

exp_json

label:: Expression (json)
type:: basic:json

exp_type

label:: Expression type
type:: basic:string

source

label:: Gene ID source
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

feature_type

label:: Feature type
type:: basic:string

mapped_species

label:: Mapped species
type:: basic:file

RNA-SeQC

data:rnaseqc:qc:rnaseqc-qc (data:alignment:bam alignment, data:annotation:gtf annotation, basic:integer mapping_quality, basic:integer base_mismatch, basic:integer offset, basic:integer window_size, basic:integer gene_length, basic:integer detection_threshold, basic:boolean exclude_chimeric, basic:string stranded, data:index:salmon cdna_index, basic:integer n_reads)[Source: v2.0.0]

RNA-SeQC QC analysis. An efficient new version of RNA-SeQC that computes a comprehensive set of metrics for characterizing samples processed by a wide range of protocols. It also quantifies gene- and exon-level expression, enabling effective quality control of large-scale RNA-seq datasets. More information can be found in the [GitHub repository](https://github.com/getzlab/rnaseqc) and in the [original paper](https://academic.oup.com/bioinformatics/article/37/18/3048/6156810?login=false).

alignment

label:: Input aligned reads (BAM file)
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation file (GTF)
type:: data:annotation:gtf
description:: The input GTF file containing features to check the bam against. The file should include gene_id in the attributes column for all entries. During the process the file is formatted so the transcript_id matches the gene_id. Exons are merged to remove overlaps and exon_id field is then matched with gene_id including the consecutive exon number.
required:: True
disabled:: False
hidden:: False

rnaseqc_options.mapping_quality

label:: Mapping quality [–mapping-quality]
type:: basic:integer
description:: Set the lower bound on read quality for exon coverage counting. Reads below this number are excluded from coverage metrics.
required:: True
disabled:: False
hidden:: False
default:: 255

rnaseqc_options.base_mismatch

label:: Base mismatch [–base-mismatch]
type:: basic:integer
description:: Set the maximum number of allowed mismatches between a read and the reference sequence. Reads with more than this number of mismatches are excluded from coverage metrics.
required:: True
disabled:: False
hidden:: False
default:: 6

rnaseqc_options.offset

label:: Offset [–offset]
type:: basic:integer
description:: Set the offset into the gene for the 3’ and 5’ windows in bias calculation. A positive value shifts the 3’ and 5’ windows towards each other, while a negative value shifts them apart.
required:: True
disabled:: False
hidden:: False
default:: 150

rnaseqc_options.window_size

label:: Window size [–window-size]
type:: basic:integer
description:: Set the offset into the gene for the 3’ and 5’ windows in bias calculation.
required:: True
disabled:: False
hidden:: False
default:: 100

rnaseqc_options.gene_length

label:: Window size [–gene-length]
type:: basic:integer
description:: Set the minimum size of a gene for bias calculation. Genes below this size are ignored in the calculation.
required:: True
disabled:: False
hidden:: False
default:: 600

rnaseqc_options.detection_threshold

label:: Detection threshold [–detection-threshold]
type:: basic:integer
description:: Number of counts on a gene to consider the gene ‘detected’. Additionally, genes below this limit are excluded from 3’ bias computation.
required:: True
disabled:: False
hidden:: False
default:: 5

rnaseqc_options.exclude_chimeric

label:: Exclude chimeric reads [–exclude-chimeric]
type:: basic:boolean
description:: Exclude chimeric reads from the read counts.
required:: True
disabled:: False
hidden:: False
default:: False

strand_detection_options.stranded

label:

Assay type [–stranded]

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

Strand non-specific: non_specific
Strand-specific reverse then forward: reverse
Strand-specific forward then reverse: forward
Detect automatically: auto

strand_detection_options.cdna_index

label:: cDNA index file
type:: data:index:salmon
required:: False
disabled:: False
hidden:: strand_detection_options.stranded != ‘auto’

strand_detection_options.n_reads

label:: Number of reads in subsampled alignment file. Subsampled reads will be used in strandedness detection
type:: basic:integer
required:: True
disabled:: False
hidden:: strand_detection_options.stranded != ‘auto’
default:: 5000000

metrics

label:: metrics
type:: basic:file
required:: True
disabled:: False
hidden:: False

RNA-Seq (Cuffquant)

data:workflow:rnaseq:cuffquantworkflow-rnaseq-cuffquant (data:reads:fastq reads, data:index:hisat2 genome, data:annotation annotation)[Source: v2.1.0]

reads

label:: Input reads
type:: data:reads:fastq

genome

label:: genome
type:: data:index:hisat2

annotation

label:: Annotation file
type:: data:annotation

RNA-seq Variant Calling Workflow

data:workflow:rnaseq:variants:workflow-rnaseq-variantcalling (data:alignment:bam:star bam, data:reads:fastq reads, basic:boolean preprocessing, data:seq:nucleotide ref_seq, data:index:star genome, data:variants:vcf dbsnp, list:data:variants:vcf indels, data:bed intervals, data:variants:vcf clinvar, data:geneset geneset, list:basic:string mutations, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean two_pass_mode, basic:boolean out_unmapped, basic:string align_end_alignment, basic:string read_group, basic:integer stand_call_conf, basic:boolean soft_clipped, basic:integer interval_padding, list:basic:string filter_expressions, list:basic:string filter_name, list:basic:string genotype_filter_expressions, list:basic:string genotype_filter_name, data:variants:vcf mask, basic:string mask_name, basic:string filtering_options, list:basic:string vcf_fields, list:basic:string ann_fields, basic:boolean split_alleles, basic:boolean show_filtered, list:basic:string gf_fields, basic:boolean multiqc, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v2.4.0]

Identify variants in RNA-seq data. This pipeline follows GATK best practices recommendantions for variant calling with RNA-seq data. The pipeline steps include read alignment (STAR), data cleanup (MarkDuplicates), splitting reads that contain Ns in their cigar string (SplitNCigarReads), base quality recalibration (BaseRecalibrator, ApplyBQSR), variant calling (HaplotypeCaller), variant filtering (VariantFiltration) and variant annotation (SnpEff). The last step of the pipeline is process Mutations table which prepares variants for ReSDK VariantTables. There is also possibility to run the pipeline directly from BAM file. In this case, it is recommended that you use two-pass mode in STAR alignment as well as turn the option ‘–outSAMunmapped Within’ on.

bam

label:: Input BAM file
type:: data:alignment:bam:star
description:: Input BAM file that was computed with STAR aligner. It is highly recommended that two-pass mode was used for the alignment as well as ‘–outSAMunmapped Within’ option if you want to use BAM file as an input.
required:: False
disabled:: reads
hidden:: False

reads

label:: Input sample (FASTQ)
type:: data:reads:fastq
description:: Input data in FASTQ format.
required:: False
disabled:: bam
hidden:: False

preprocessing

label:: Perform reads processing with BBDuk
type:: basic:boolean
description:: If your reads have not been processed, set this to True.
required:: True
disabled:: bam
hidden:: False
default:: True

ref_seq

label:: Reference FASTA sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

genome

label:: Indexed reference genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.
required:: False
disabled:: bam
hidden:: False

dbsnp

label:: dbSNP file
type:: data:variants:vcf
description:: File with known variants.
required:: True
disabled:: False
hidden:: False

indels

label:: Known INDEL sites
type:: list:data:variants:vcf
required:: False
disabled:: False
hidden:: False

intervals

label:: Intervals (from BED file)
type:: data:bed
description:: Use this option to perform the analysis over only part of the genome.
required:: False
disabled:: False
hidden:: False

clinvar

label:: ClinVar VCF file
type:: data:variants:vcf
description:: [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease.
required:: False
disabled:: False
hidden:: False

geneset

label:: Gene set
type:: data:geneset
description:: Select a gene set with genes you are interested in. Only variants of genes in the selected gene set will be in the output.
required:: False
disabled:: mutations
hidden:: False

mutations

label:: Gene and its mutations
type:: list:basic:string
description:: Insert the gene you are interested in, together with mutations. First enter the name of the gene and then the mutations. Seperate gene from mutations with ‘:’ and mutations with ‘,’. Example of an input: ‘KRAS: Gly12, Gly61’. Press enter after each input (gene + mutations). NOTE: Field only accepts three character amino acid symbols. If you use this option, the selected geneset will not be used for Mutations table process.
required:: False
disabled:: geneset
hidden:: False

bbduk.adapters

label:: Adapters
type:: list:data:seq:nucleotide
description:: Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
required:: False
disabled:: False
hidden:: False

bbduk.custom_adapter_sequences

label:: Custom adapter sequences
type:: list:basic:string
description:: Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

bbduk.kmer_length

label:: K-mer length [k=]
type:: basic:integer
description:: Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
required:: True
disabled:: False
hidden:: False
default:: 23

bbduk.min_k

label:: Minimum k-mer length at right end of reads used for trimming [mink=]
type:: basic:integer
required:: True
disabled:: bbduk.adapters.length === 0 && bbduk.custom_adapter_sequences.length === 0
hidden:: False
default:: 11

bbduk.hamming_distance

label:: Maximum Hamming distance for k-mers [hammingdistance=]
type:: basic:integer
description:: Hamming distance i.e. the number of mismatches allowed in the kmer.
required:: True
disabled:: False
hidden:: False
default:: 1

bbduk.maxns

label:: Max Ns after trimming [maxns=]
type:: basic:integer
description:: If non-negative, reads with more Ns than this (after trimming) will be discarded.
required:: True
disabled:: False
hidden:: False
default:: -1

bbduk.trim_quality

label:: Average quality below which to trim region [trimq=]
type:: basic:integer
description:: Phred algorithm is used, which is more accurate than naive trimming.
required:: True
disabled:: False
hidden:: False
default:: 28

bbduk.min_length

label:: Minimum read length [minlength=]
type:: basic:integer
description:: Reads shorter than minimum read length after trimming are discarded.
required:: True
disabled:: False
hidden:: False
default:: 30

bbduk.quality_encoding_offset

label:

Quality encoding offset

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+: 33
Illumina up to 1.3+, 1.5+: 64
Auto: auto

bbduk.ignore_bad_quality

label:: Ignore bad quality
type:: basic:boolean
description:: Don’t crash if quality values appear to be incorrect.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.two_pass_mode

label:: Use two pass mode [–twopassMode]
type:: basic:boolean
description:: Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
required:: True
disabled:: False
hidden:: False
default:: True

alignment.out_unmapped

label:: Output unmapped reads (SAM) [–outSAMunmapped Within]
type:: basic:boolean
description:: Output of unmapped reads in the SAM format.
required:: True
disabled:: False
hidden:: False
default:: True

alignment.align_end_alignment

label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

True

disabled:

False

hidden:

False

default:

Local

choices:

Local: Local
EndToEnd: EndToEnd
Extend5pOfRead1: Extend5pOfRead1
Extend5pOfReads12: Extend5pOfReads12

bam_processing.read_group

label:: Replace read groups in BAM
type:: basic:string
description:: Replace read groups in a BAM file. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation.
required:: True
disabled:: False
hidden:: False
default:: -ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1

haplotype_caller.stand_call_conf

label:: Min call confidence threshold
type:: basic:integer
description:: The minimum phred-scaled confidence threshold at which variants should be called.
required:: True
disabled:: False
hidden:: False
default:: 20

haplotype_caller.soft_clipped

label:: Do not analyze soft clipped bases in the reads
type:: basic:boolean
description:: Suitable option for RNA-seq variant calling.
required:: True
disabled:: False
hidden:: False
default:: True

haplotype_caller.interval_padding

label:: Interval padding
type:: basic:integer
description:: Amount of padding (in bp) to add to each interval you are including. The recommended value is 100. Set to 0 if you want to turn it off.
required:: True
disabled:: False
hidden:: !intervals
default:: 100

variant_filtration.filter_expressions

label:: Expressions used with INFO fields to filter
type:: list:basic:string
description:: VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.
required:: True
disabled:: False
hidden:: False
default:: ['FS > 30.0', 'QD < 2.0']

variant_filtration.filter_name

label:: Names to use for the list of filters
type:: list:basic:string
description:: This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.
required:: True
disabled:: False
hidden:: False
default:: ['FS', 'QD']

variant_filtration.genotype_filter_expressions

label:: Expressions used with FORMAT field to filter
type:: list:basic:string
description:: Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’. This filter expression will filter all the samples in the multi-sample VCF file.
required:: True
disabled:: False
hidden:: False
default:: ['AD.1 < 5.0']

variant_filtration.genotype_filter_name

label:: Names to use for the list of genotype filters
type:: list:basic:string
description:: Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.
required:: True
disabled:: False
hidden:: False
default:: ['AD']

variant_filtration.mask

label:: Input mask
type:: data:variants:vcf
description:: Any variant which overlaps entries from the provided mask file will be filtered.
required:: False
disabled:: False
hidden:: False

variant_filtration.mask_name

label:: The text to put in the FILTER field if a ‘mask’ is provided
type:: basic:string
description:: When using the mask file, the mask name will be annotated in the variant record.
required:: False
disabled:: !variant_filtration.mask
hidden:: False

snpeff.filtering_options

label:: SnpEff filtering expressions
type:: basic:string
description:: Filter annotated VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)
required:: False
disabled:: False
hidden:: False

mutations_table.vcf_fields

label:: Select VCF fields
type:: list:basic:string
description:: The name of a standard VCF field or an INFO field to include in the output table. The field can be any standard VCF column (e.g. CHROM, ID, QUAL) or any annotation name in the INFO field (e.g. AC, AF). Required fields are CHROM, POS, ID, REF and ANN. If your variants file was annotated with clinvar information then fields CLNDN, CLNSIG and CLNSIGCONF might be of your interest.
required:: True
disabled:: False
hidden:: False
default:: ['CHROM', 'POS', 'ID', 'QUAL', 'REF', 'ALT', 'FILTER', 'ANN', 'CLNDN', 'CLNSIG']

mutations_table.ann_fields

label:: ANN fields to use
type:: list:basic:string
description:: Only use specific fields from the SnpEff ANN field. All available fields: Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO’ .Fields are seperated by ‘|’. For more information, follow this [link](https://pcingola.github.io/SnpEff/se_inputoutput/#ann-field-vcf-output-files).
required:: True
disabled:: False
hidden:: False
default:: ['Allele', 'Annotation', 'Annotation_Impact', 'Gene_Name', 'Feature_ID', 'HGVS.p']

mutations_table.split_alleles

label:: Split multi-allelic records into multiple lines
type:: basic:boolean
description:: By default, a variant record with multiple ALT alleles will be summarized in one line, with per alt-allele fields (e.g. allele depth) separated by commas.This may cause difficulty when the table is loaded by an R script, for example. Use this flag to write multi-allelic records on separate lines of output.
required:: True
disabled:: False
hidden:: False
default:: True

mutations_table.show_filtered

label:: Include filtered records in the output
type:: basic:boolean
description:: Include filtered records in the output of the GATK VariantsToTable.
required:: True
disabled:: False
hidden:: False
default:: True

mutations_table.gf_fields

label:: Include FORMAT/sample-level fields. Note: If you specify DP from genotype field, it will overwrite the original DP field. By default fields GT (genotype), AD (allele depth), DP (depth at the sample level), FT (sample-level filter) are included in the analysis.
type:: list:basic:string
required:: True
disabled:: False
hidden:: False
default:: ['GT', 'AD', 'DP', 'FT']

advanced.multiqc

label:: Trigger MultiQC
type:: basic:boolean
description:: If the input for the pipeline is BAM file that has been computed by the RNA-seq gene expression pipeline, than MultiQC object already exists for this sample, so there is no need for an additional MultiQC process. If the input for this pipeline is FASTQ, than MultiQC cannot be disabled.
required:: True
disabled:: False
hidden:: !bam
default:: False

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

RNA-seq variant calling preprocess

data:alignment:bam:rnaseqvc:rnaseq-vc-preprocess (data:alignment:bam bam, data:seq:nucleotide ref_seq, list:data:variants:vcf known_sites, basic:string read_group, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.3.0]

Prepare BAM file from STAR aligner for HaplotypeCaller. This process includes steps MarkDuplicates, SplitNCigarReads, read-group assignment and base quality recalibration (BQSR).

bam

label:: Alignment BAM file from STAR alignment
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence FASTA file
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

known_sites

label:: List of known sites of variation
type:: list:data:variants:vcf
description:: One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.
required:: True
disabled:: False
hidden:: False

read_group

label:: Replace read groups in BAM
type:: basic:string
description:: Replace read groups in a BAM file. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using GATK AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.
required:: True
disabled:: False
hidden:: False
default:: -ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

bam

label:: Preprocessed BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

metrics_file

label:: Metrics from MarkDuplicate process
type:: basic:file
required:: True
disabled:: False
hidden:: False

ROSE2

data:chipseq:rose2:rose2 (data:chipseq:callpeak input_macs, data:bed input_upload, basic:boolean use_filtered_bam, data:alignment:bam rankby, data:alignment:bam control, basic:integer tss, basic:integer stitch, data:bed mask)[Source: v5.2.1]

Run ROSE2. Rank Ordering of Super-Enhancers algorithm (ROSE2) takes the acetylation peaks called by a peak caller (MACS, MACS2…) and based on the in-between distances and the acetylation signal at the peaks judges whether they can be considered super-enhancers. The ranked values are plotted and by locating the inflection point in the resulting graph, super-enhancers are assigned. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

input_macs

label:: BED/narrowPeak file (MACS results)
type:: data:chipseq:callpeak
required:: False
disabled:: False
hidden:: input_upload

input_upload

label:: BED file (Upload)
type:: data:bed
required:: False
disabled:: False
hidden:: input_macs || use_filtered_bam

use_filtered_bam

label:: Use Filtered BAM File
type:: basic:boolean
description:: Use filtered BAM file from a MACS2 object to rank enhancers by. Only applicable if input is MACS2.
required:: True
disabled:: False
hidden:: input_upload
default:: False

rankby

label:: BAM file
type:: data:alignment:bam
description:: BAM file to rank enhancers by.
required:: False
disabled:: False
hidden:: use_filtered_bam

control

label:: Control BAM File
type:: data:alignment:bam
description:: BAM file to rank enhancers by.
required:: False
disabled:: False
hidden:: use_filtered_bam

tss

label:: TSS exclusion
type:: basic:integer
description:: Enter a distance from TSS to exclude. 0 = no TSS exclusion.
required:: True
disabled:: False
hidden:: False
default:: 0

stitch

label:: Stitch
type:: basic:integer
description:: Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
required:: False
disabled:: False
hidden:: False

mask

label:: Masking BED file
type:: data:bed
description:: Mask a set of regions from analysis. Provide a BED of masking regions.
required:: False
disabled:: False
hidden:: False

all_enhancers

label:: All enhancers table
type:: basic:file
required:: True
disabled:: False
hidden:: False

enhancers_with_super

label:: Super enhancers table
type:: basic:file
required:: True
disabled:: False
hidden:: False

plot_points

label:: Plot points
type:: basic:file
required:: True
disabled:: False
hidden:: False

plot_panel

label:: Plot panel
type:: basic:file
required:: True
disabled:: False
hidden:: False

enhancer_gene

label:: Enhancer to gene
type:: basic:file
required:: True
disabled:: False
hidden:: False

enhancer_top_gene

label:: Enhancer to top gene
type:: basic:file
required:: True
disabled:: False
hidden:: False

gene_enhancer

label:: Gene to Enhancer
type:: basic:file
required:: True
disabled:: False
hidden:: False

stitch_parameter

label:: Stitch parameter
type:: basic:file
required:: False
disabled:: False
hidden:: False

all_output

label:: All output
type:: basic:file
required:: True
disabled:: False
hidden:: False

scatter_plot

label:: Super-Enhancer plot
type:: basic:json
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Reads (QSEQ multiplexed, paired)

data:multiplexed:qseq:pairedupload-multiplexed-paired (basic:file reads, basic:file reads2, basic:file barcodes, basic:file annotation)[Source: v1.4.1]

Upload multiplexed NGS reds in QSEQ format.

reads

label:: Multiplexed upstream reads
type:: basic:file
description:: NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:: True
validate_regex:: ((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

reads2

label:: Multiplexed downstream reads
type:: basic:file
description:: NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:: True
validate_regex:: ((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

barcodes

label:: NGS barcodes
type:: basic:file
description:: Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:: True
validate_regex:: ((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

annotation

label:: Barcode mapping
type:: basic:file
description:: A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.
required:: True
validate_regex:: (\.tsv)$

qseq_reads

label:: Multiplexed upstream reads
type:: basic:file

qseq_reads2

label:: Multiplexed downstream reads
type:: basic:file

qseq_barcodes

label:: NGS barcodes
type:: basic:file

annotation

label:: Barcode mapping
type:: basic:file

matched

label:: Matched
type:: basic:string

notmatched

label:: Not matched
type:: basic:string

badquality

label:: Bad quality
type:: basic:string

skipped

label:: Skipped
type:: basic:string

Reads (QSEQ multiplexed, single)

data:multiplexed:qseq:singleupload-multiplexed-single (basic:file reads, basic:file barcodes, basic:file annotation)[Source: v1.4.1]

Upload multiplexed NGS reds in QSEQ format.

reads

label:: Multiplexed NGS reads
type:: basic:file
description:: NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:: True
validate_regex:: (\.(qseq)(|\.txt)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

barcodes

label:: NGS barcodes
type:: basic:file
description:: Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:: True
validate_regex:: (\.(qseq)(|\.txt)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$

annotation

label:: Barcode mapping
type:: basic:file
description:: A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.
required:: True
validate_regex:: (\.tsv)$

qseq_reads

label:: Multiplexed NGS reads
type:: basic:file

qseq_barcodes

label:: NGS barcodes
type:: basic:file

annotation

label:: Barcode mapping
type:: basic:file

matched

label:: Matched
type:: basic:string

notmatched

label:: Not matched
type:: basic:string

badquality

label:: Bad quality
type:: basic:string

skipped

label:: Skipped
type:: basic:string

Reads (scRNA 10x)

data:screads:10x:upload-sc-10x (list:basic:file barcodes, list:basic:file reads)[Source: v1.4.1]

Import 10x scRNA reads in FASTQ format.

barcodes

label:: Barcodes (.fastq.gz)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

reads

label:: Reads (.fastq.gz)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

barcodes

label:: Barcodes
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

reads

label:: Reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url_barcodes

label:: Quality control with FastQC (Barcodes)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url_reads

label:: Quality control with FastQC (Reads)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

Reverse complement FASTQ (paired-end)

data:reads:fastq:paired:seqtk:seqtk-rev-complement-paired (data:reads:fastq:paired reads, basic:string select_mate)[Source: v1.2.2]

Reverse complement paired-end FASTQ reads file using Seqtk.

reads

label:: Reads
type:: data:reads:fastq:paired
required:: True
disabled:: False
hidden:: False

select_mate

label:

Select mate

type:

basic:string

description:

Select the which mate should be reverse complemented.

required:

True

disabled:

False

hidden:

False

default:

Mate 1

choices:

Mate 1: Mate 1
Mate 2: Mate 2
Both: Both

fastq

label:: Reverse complemented FASTQ file
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Remaining mate
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC (Mate 1)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive (Mate 1)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Quality control with FastQC (Mate 2)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download FastQC archive (Mate 2)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Reverse complement FASTQ (single-end)

data:reads:fastq:single:seqtk:seqtk-rev-complement-single (data:reads:fastq:single reads)[Source: v1.3.2]

Reverse complement single-end FASTQ reads file using Seqtk.

reads

label:: Reads
type:: data:reads:fastq:single
required:: True
disabled:: False
hidden:: False

fastq

label:: Reverse complemented FASTQ file
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

SAM header

data:sam:headerupload-header-sam (basic:file src)[Source: v1.2.3]

Upload a mapping file header in SAM format.

src

label:: Header (SAM)
type:: basic:file
description:: A mapping file header in SAM format.
validate_regex:: \.(sam)$

sam

label:: Uploaded file
type:: basic:file

SRA data

data:sra:import-sra (list:basic:string sra_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v1.5.1]

Import reads from SRA. Import single or paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

sra_accession

label:: SRA accession(s)
type:: list:basic:string
required:: True
disabled:: False
hidden:: False

advanced.prefetch

label:: Prefetch SRA file
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

advanced.max_size_prefetch

label:: Maximum file size to download in KB
type:: basic:string
description:: A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
required:: True
disabled:: False
hidden:: False
default:: 20G

advanced.min_spot_id

label:: Minimum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.max_spot_id

label:: Maximum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.min_read_len

label:: Minimum read length
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.clip

label:: Clip adapter sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.aligned

label:: Dump only aligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.unaligned

label:: Dump only unaligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

SRA data (paired-end)

data:reads:fastq:paired:import-sra-paired (list:basic:string sra_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v1.6.1]

Import paired-end reads from SRA. Import paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

sra_accession

label:: SRA accession(s)
type:: list:basic:string
required:: True
disabled:: False
hidden:: False

advanced.prefetch

label:: Prefetch SRA file
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

advanced.max_size_prefetch

label:: Maximum file size to download in KB
type:: basic:string
description:: A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
required:: True
disabled:: False
hidden:: False
default:: 20G

advanced.min_spot_id

label:: Minimum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.max_spot_id

label:: Maximum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.min_read_len

label:: Minimum read length
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.clip

label:: Clip adapter sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.aligned

label:: Dump only aligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.unaligned

label:: Dump only unaligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Reads file (mate 1)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Reads file (mate 2)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC (mate 1)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Quality control with FastQC (mate 2)
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive (mate 1)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download FastQC archive (mate 2)
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

SRA data (single-end)

data:reads:fastq:single:import-sra-single (list:basic:string sra_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v1.6.1]

Import single-end reads from SRA. Import single-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

sra_accession

label:: SRA accession(s)
type:: list:basic:string
required:: True
disabled:: False
hidden:: False

advanced.prefetch

label:: Prefetch SRA file
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: True

advanced.max_size_prefetch

label:: Maximum file size to download in KB
type:: basic:string
description:: A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
required:: True
disabled:: False
hidden:: False
default:: 20G

advanced.min_spot_id

label:: Minimum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.max_spot_id

label:: Maximum spot ID
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.min_read_len

label:: Minimum read length
type:: basic:integer
required:: False
disabled:: False
hidden:: False

advanced.clip

label:: Clip adapter sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.aligned

label:: Dump only aligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

advanced.unaligned

label:: Dump only unaligned sequences
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Reads file
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

STAR

data:alignment:bam:star:alignment-star (data:reads:fastq reads, data:index:star genome, data:annotation annotation, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean gene_counts, basic:string feature_exon, basic:integer sjdb_overhang, basic:boolean chimeric, basic:integer chim_segment_min, basic:boolean quant_mode, basic:boolean single_end, basic:string out_filter_type, basic:integer out_multimap_max, basic:integer out_mismatch_max, basic:decimal out_mismatch_nl_max, basic:integer out_score_min, basic:decimal out_mismatch_nrl_max, basic:integer align_overhang_min, basic:integer align_sjdb_overhang_min, basic:integer align_intron_size_min, basic:integer align_intron_size_max, basic:integer align_gap_max, basic:string align_end_alignment, basic:boolean two_pass_mode, basic:boolean out_unmapped, basic:string out_sam_attributes, basic:string out_rg_line, list:basic:integer limit_buffer_size, basic:integer limit_sam_records, basic:integer limit_junction_reads, basic:integer limit_collapsed_junctions, basic:integer limit_inserted_junctions)[Source: v5.1.0]

Align reads with STAR aligner. Spliced Transcripts Alignment to a Reference (STAR) software is based on an alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. More information can be found in the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) and in the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/). The current version of STAR is 2.7.10b.

reads

label:: Input reads (FASTQ)
type:: data:reads:fastq
required:: True
disabled:: False
hidden:: False

genome

label:: Indexed reference genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation file (GTF/GFF3)
type:: data:annotation
description:: Insert known annotations into genome indices at the mapping stage.
required:: False
disabled:: False
hidden:: False

unstranded

label:: The data is unstranded [–outSAMstrandField intronMotif]
type:: basic:boolean
description:: For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
required:: True
disabled:: False
hidden:: False
default:: False

noncannonical

label:: Remove non-canonical junctions (Cufflinks compatibility)
type:: basic:boolean
description:: It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
required:: True
disabled:: False
hidden:: False
default:: False

gene_counts

label:: Gene count [–quantMode GeneCounts]
type:: basic:boolean
description:: With this option set to True STAR will count the number of reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters.
required:: True
disabled:: False
hidden:: False
default:: False

annotation_options.feature_exon

label:: Feature type [–sjdbGTFfeatureExon]
type:: basic:string
description:: Feature type in GTF file to be used as exons for building transcripts.
required:: True
disabled:: False
hidden:: False
default:: exon

annotation_options.sjdb_overhang

label:: Junction length [–sjdbOverhang]
type:: basic:integer
description:: This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In the case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.
required:: True
disabled:: False
hidden:: False
default:: 100

detect_chimeric.chimeric

label:: Detect chimeric and circular alignments [–chimOutType SeparateSAMold]
type:: basic:boolean
description:: To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments.Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
required:: True
disabled:: False
hidden:: False
default:: False

detect_chimeric.chim_segment_min

label:: Minimum length of chimeric segment [–chimSegmentMin]
type:: basic:integer
required:: True
disabled:: !detect_chimeric.chimeric
hidden:: False
default:: 20

t_coordinates.quant_mode

label:: Output in transcript coordinates [–quantMode TranscriptomeSAM]
type:: basic:boolean
description:: With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
required:: True
disabled:: False
hidden:: False
default:: False

t_coordinates.single_end

label:: Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]
type:: basic:boolean
description:: By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).
required:: True
disabled:: !t_coordinates.quant_mode
hidden:: False
default:: False

filtering.out_filter_type

label:

Type of filtering [–outFilterType]

type:

basic:string

description:

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.

required:

True

disabled:

False

hidden:

False

default:

Normal

choices:

Normal: Normal
BySJout: BySJout

filtering.out_multimap_max

label:: Maximum number of loci [–outFilterMultimapNmax]
type:: basic:integer
description:: Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).
required:: False
disabled:: False
hidden:: False

filtering.out_mismatch_max

label:: Maximum number of mismatches [–outFilterMismatchNmax]
type:: basic:integer
description:: Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.
required:: False
disabled:: False
hidden:: False

filtering.out_mismatch_nl_max

label:: Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]
type:: basic:decimal
description:: Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.
required:: False
disabled:: False
hidden:: False

filtering.out_score_min

label:: Minumum alignment score [–outFilterScoreMin]
type:: basic:integer
description:: Alignment will be output only if its score is higher than or equal to this value (default: 0).
required:: False
disabled:: False
hidden:: False

filtering.out_mismatch_nrl_max

label:: Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]
type:: basic:decimal
description:: Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.
required:: False
disabled:: False
hidden:: False

alignment.align_overhang_min

label:: Minimum overhang [–alignSJoverhangMin]
type:: basic:integer
description:: Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:: False
disabled:: False
hidden:: False

alignment.align_sjdb_overhang_min

label:: Minimum overhang (sjdb) [–alignSJDBoverhangMin]
type:: basic:integer
description:: Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:: False
disabled:: False
hidden:: False

alignment.align_intron_size_min

label:: Minimum intron size [–alignIntronMin]
type:: basic:integer
description:: Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:: False
disabled:: False
hidden:: False

alignment.align_intron_size_max

label:: Maximum intron size [–alignIntronMax]
type:: basic:integer
description:: Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).
required:: False
disabled:: False
hidden:: False

alignment.align_gap_max

label:: Minimum gap between mates [–alignMatesGapMax]
type:: basic:integer
description:: Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:: False
disabled:: False
hidden:: False

alignment.align_end_alignment

label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

False

disabled:

False

hidden:

False

choices:

Local: Local
EndToEnd: EndToEnd
Extend5pOfRead1: Extend5pOfRead1
Extend5pOfReads12: Extend5pOfReads12

two_pass_mapping.two_pass_mode

label:: Use two pass mode [–twopassMode]
type:: basic:boolean
description:: Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
required:: True
disabled:: False
hidden:: False
default:: False

output_options.out_unmapped

label:: Output unmapped reads (SAM) [–outSAMunmapped Within]
type:: basic:boolean
description:: Output of unmapped reads in the SAM format.
required:: True
disabled:: False
hidden:: False
default:: False

output_options.out_sam_attributes

label:

Desired SAM attributes [–outSAMattributes]

type:

basic:string

description:

A string of desired SAM attributes, in the order desired for the output SAM.

required:

True

disabled:

False

hidden:

False

default:

Standard

choices:

Standard: Standard
All: All
NH HI NM MD: NH HI NM MD
None: None

output_options.out_rg_line

label:: SAM/BAM read group line [–outSAMattrRGline]
type:: basic:string
description:: The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in –readFilesIn. Commas have to be surrounded by spaces, e.g. –outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.
required:: False
disabled:: False
hidden:: False

limits.limit_buffer_size

label:: Buffer size [–limitIObufferSize]
type:: list:basic:integer
description:: Maximum available buffers size (bytes) for input/output, per thread. Parameter requires two numbers - separate sizes for input and output buffers.
required:: True
disabled:: False
hidden:: False
default:: [30000000, 50000000]

limits.limit_sam_records

label:: Maximum size of the SAM record [–limitOutSAMoneReadBytes]
type:: basic:integer
description:: Maximum size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax.
required:: True
disabled:: False
hidden:: False
default:: 100000

limits.limit_junction_reads

label:: Maximum number of junctions [–limitOutSJoneRead]
type:: basic:integer
description:: Maximum number of junctions for one read (including all multi-mappers).
required:: True
disabled:: False
hidden:: False
default:: 1000

limits.limit_collapsed_junctions

label:: Maximum number of collapsed junctions [–limitOutSJcollapsed]
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 1000000

limits.limit_inserted_junctions

label:: Maximum number of junction to be inserted [–limitSjdbInsertNsj]
type:: basic:integer
description:: Maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run.
required:: True
disabled:: False
hidden:: False
default:: 1000000

bam

label:: Alignment file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: BAM file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

unmapped_1

label:: Unmapped reads (mate 1)
type:: basic:file
required:: False
disabled:: False
hidden:: False

unmapped_2

label:: Unmapped reads (mate 2)
type:: basic:file
required:: False
disabled:: False
hidden:: False

sj

label:: Splice junctions
type:: basic:file
required:: True
disabled:: False
hidden:: False

chimeric

label:: Chimeric alignments
type:: basic:file
required:: False
disabled:: False
hidden:: False

alignment_transcriptome

label:: Alignment (transcriptome coordinates)
type:: basic:file
required:: False
disabled:: False
hidden:: False

gene_counts

label:: Gene counts
type:: basic:file
required:: False
disabled:: False
hidden:: False

stats

label:: Statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

STAR genome index

data:index:star:alignment-star-index (data:seq:nucleotide ref_seq, data:annotation annotation, basic:string source, basic:string feature_exon, basic:integer sjdb_overhang, basic:integer genome_sa_string_len, basic:integer genome_chr_bin_size, basic:integer genome_sa_sparsity)[Source: v4.0.0]

Generate STAR genome index. Generate genome indices files from the supplied reference genome sequence and GTF files. The current version of STAR is 2.7.10b.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation file (GTF/GFF3)
type:: data:annotation
description:: Insert known annotations into genome indices at the indexing stage.
required:: False
disabled:: False
hidden:: False

source

label:

Gene ID Database Source

type:

basic:string

required:

False

disabled:

annotation

hidden:

False

choices:

ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

annotation_options.feature_exon

label:: Feature type [–sjdbGTFfeatureExon]
type:: basic:string
description:: Feature type in GTF file to be used as exons for building transcripts.
required:: True
disabled:: False
hidden:: False
default:: exon

annotation_options.sjdb_overhang

label:: Junction length [–sjdbOverhang]
type:: basic:integer
description:: This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.
required:: True
disabled:: False
hidden:: False
default:: 100

advanced.genome_sa_string_len

label:: Small genome adjustment [–genomeSAindexNbases]
type:: basic:integer
description:: For small genomes, the parameter –genomeSAindexNbases needs to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.
required:: False
disabled:: False
hidden:: False

advanced.genome_chr_bin_size

label:: Bin size for genome storage [–genomeChrBinNbits]
type:: basic:integer
description:: If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the –genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: –genomeChrBinNbits = min(18, log2(GenomeLength / NumberOfReferences)). For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.
required:: False
disabled:: False
hidden:: False

advanced.genome_sa_sparsity

label:: Suffix array sparsity [–genomeSAsparseD]
type:: basic:integer
description:: Suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction (integer > 0, default = 1).
required:: False
disabled:: False
hidden:: False

index

label:: Indexed genome
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID source
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

STAR-based gene quantification workflow

data:workflow:rnaseq:star:qc:workflow-bbduk-star-qc (data:reads:fastq reads, data:index:star genome, data:annotation annotation, basic:string assay_type, data:index:salmon cdna_index, data:index:star rrna_reference, data:index:star globin_reference, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chim_segment_min, basic:boolean quant_mode, basic:boolean single_end, basic:string out_filter_type, basic:integer out_multimap_max, basic:integer out_mismatch_max, basic:decimal out_mismatch_nl_max, basic:integer out_score_min, basic:decimal out_mismatch_nrl_max, basic:integer align_overhang_min, basic:integer align_sjdb_overhang_min, basic:integer align_intron_size_min, basic:integer align_intron_size_max, basic:integer align_gap_max, basic:string align_end_alignment, basic:boolean two_pass_mode, basic:boolean out_unmapped, basic:string out_sam_attributes, basic:string out_rg_line, basic:integer n_reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.4.0]

STAR-based RNA-seq pipeline. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. STAR aligner counts and reports the number of aligned reads per gene while mapping. STAR version used is 2.7.10b. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are downsampled (using __Seqtk__ tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences. Final step of the workflow is QoRTs QC analysis with downsampled reads.

reads

label:: Reads (FASTQ)
type:: data:reads:fastq
description:: Reads in FASTQ file, single or paired end.
required:: True
disabled:: False
hidden:: False

genome

label:: Indexed reference genome
type:: data:index:star
description:: Genome index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

annotation

label:: Annotation
type:: data:annotation
description:: GTF and GFF3 annotation formats are supported.
required:: True
disabled:: False
hidden:: False

assay_type

label:

Assay type

type:

basic:string

description:

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

required:

True

disabled:

False

hidden:

False

default:

non_specific

choices:

Strand non-specific: non_specific
Strand-specific forward: forward
Strand-specific reverse: reverse
Detect automatically: auto

cdna_index

label:: Indexed cDNA reference sequence
type:: data:index:salmon
description:: Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
required:: False
disabled:: False
hidden:: assay_type != ‘auto’

rrna_reference

label:: Indexed rRNA reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

globin_reference

label:: Indexed Globin reference sequence
type:: data:index:star
description:: Reference sequence index prepared by STAR aligner indexing tool.
required:: True
disabled:: False
hidden:: False

preprocessing.adapters

label:: Adapters
type:: list:data:seq:nucleotide
description:: FASTA file(s) with adapters.
required:: False
disabled:: False
hidden:: False

preprocessing.custom_adapter_sequences

label:: Custom adapter sequences
type:: list:basic:string
description:: Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:: False
disabled:: False
hidden:: False
default:: []

preprocessing.kmer_length

label:: K-mer length [k=]
type:: basic:integer
description:: K-mer length used for finding contaminants. Contaminants shorter than k-mer length will not be found. K-mer length must be at least 1.
required:: True
disabled:: False
hidden:: False
default:: 23

preprocessing.min_k

label:: Minimum k-mer length at right end of reads used for trimming [mink=]
type:: basic:integer
required:: True
disabled:: preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
hidden:: False
default:: 11

preprocessing.hamming_distance

label:: Maximum Hamming distance for k-mers [hammingdistance=]
type:: basic:integer
description:: Hamming distance i.e. the number of mismatches allowed in the k-mer.
required:: True
disabled:: False
hidden:: False
default:: 1

preprocessing.maxns

label:: Max Ns after trimming [maxns=]
type:: basic:integer
description:: If non-negative, reads with more Ns than this (after trimming) will be discarded.
required:: True
disabled:: False
hidden:: False
default:: -1

preprocessing.trim_quality

label:: Average quality below which to trim region [trimq=]
type:: basic:integer
description:: Phred algorithm is used, which is more accurate than naive trimming.
required:: True
disabled:: False
hidden:: False
default:: 10

preprocessing.min_length

label:: Minimum read length [minlength=]
type:: basic:integer
description:: Reads shorter than minimum read length after trimming are discarded.
required:: True
disabled:: False
hidden:: False
default:: 20

preprocessing.quality_encoding_offset

label:

Quality encoding offset [qin=]

type:

basic:string

description:

Quality encoding offset for input FASTQ files.

required:

True

disabled:

False

hidden:

False

default:

auto

choices:

Sanger / Illumina 1.8+: 33
Illumina up to 1.3+, 1.5+: 64
Auto: auto

preprocessing.ignore_bad_quality

label:: Ignore bad quality [ignorebadquality]
type:: basic:boolean
description:: Don’t crash if quality values appear to be incorrect.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.unstranded

label:: The data is unstranded [–outSAMstrandField intronMotif]
type:: basic:boolean
description:: For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.noncannonical

label:: Remove non-canonical junctions (Cufflinks compatibility)
type:: basic:boolean
description:: It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.chimeric_reads.chimeric

label:: Detect chimeric and circular alignments [–chimOutType SeparateSAMold]
type:: basic:boolean
description:: To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.chimeric_reads.chim_segment_min

label:: Minimum length of chimeric segment [–chimSegmentMin]
type:: basic:integer
required:: True
disabled:: !alignment.chimeric_reads.chimeric
hidden:: False
default:: 20

alignment.transcript_output.quant_mode

label:: Output in transcript coordinates [–quantMode]
type:: basic:boolean
description:: With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
required:: True
disabled:: False
hidden:: False
default:: False

alignment.transcript_output.single_end

label:: Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]
type:: basic:boolean
description:: By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).
required:: True
disabled:: !t_coordinates.quant_mode
hidden:: False
default:: False

alignment.filtering_options.out_filter_type

label:

Type of filtering [–outFilterType]

type:

basic:string

description:

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.

required:

True

disabled:

False

hidden:

False

default:

Normal

choices:

Normal: Normal
BySJout: BySJout

alignment.filtering_options.out_multimap_max

label:: Maximum number of loci [–outFilterMultimapNmax]
type:: basic:integer
description:: Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_mismatch_max

label:: Maximum number of mismatches [–outFilterMismatchNmax]
type:: basic:integer
description:: Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_mismatch_nl_max

label:: Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]
type:: basic:decimal
description:: Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_score_min

label:: Minimum alignment score [–outFilterScoreMin]
type:: basic:integer
description:: Alignment will be output only if its score is higher than or equal to this value (default: 0).
required:: False
disabled:: False
hidden:: False

alignment.filtering_options.out_mismatch_nrl_max

label:: Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]
type:: basic:decimal
description:: Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_overhang_min

label:: Minimum overhang [–alignSJoverhangMin]
type:: basic:integer
description:: Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_sjdb_overhang_min

label:: Minimum overhang (sjdb) [–alignSJDBoverhangMin]
type:: basic:integer
description:: Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_intron_size_min

label:: Minimum intron size [–alignIntronMin]
type:: basic:integer
description:: Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_intron_size_max

label:: Maximum intron size [–alignIntronMax]
type:: basic:integer
description:: Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_gap_max

label:: Minimum gap between mates [–alignMatesGapMax]
type:: basic:integer
description:: Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:: False
disabled:: False
hidden:: False

alignment.alignment_options.align_end_alignment

label:

Read ends alignment [–alignEndsType]

type:

basic:string

description:

Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.

required:

True

disabled:

False

hidden:

False

default:

Local

choices:

Local: Local
EndToEnd: EndToEnd
Extend5pOfRead1: Extend5pOfRead1
Extend5pOfReads12: Extend5pOfReads12

alignment.two_pass_mapping.two_pass_mode

label:: Use two pass mode [–twopassMode]
type:: basic:boolean
description:: Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
required:: True
disabled:: False
hidden:: False
default:: True

alignment.output_options.out_unmapped

label:: Output unmapped reads (SAM) [–outSAMunmapped Within]
type:: basic:boolean
description:: Output of unmapped reads in the SAM format.
required:: True
disabled:: False
hidden:: False
default:: True

alignment.output_options.out_sam_attributes

label:

Desired SAM attributes [–outSAMattributes]

type:

basic:string

description:

A string of desired SAM attributes, in the order desired for the output SAM.

required:

True

disabled:

False

hidden:

False

default:

Standard

choices:

Standard: Standard
All: All
NH HI NM MD: NH HI NM MD
None: None

alignment.output_options.out_rg_line

label:: SAM/BAM read group line [–outSAMattrRGline]
type:: basic:string
description:: The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines corresponds to different (comma separated) input files in -readFilesIn. Commas have to be surrounded by spaces, e.g. -outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.
required:: False
disabled:: False
hidden:: False

quantification.n_reads

label:: Number of reads in subsampled alignment file for strandedness detection
type:: basic:integer
description:: Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
required:: True
disabled:: False
hidden:: assay_type != ‘auto’
default:: 5000000

downsampling.n_reads

label:: Number of reads
type:: basic:integer
description:: Number of reads to include in downsampling.
required:: True
disabled:: False
hidden:: False
default:: 1000000

downsampling.advanced.seed

label:: Seed [-s]
type:: basic:integer
description:: Using the same random seed makes reads downsampling more reproducible in different environments.
required:: True
disabled:: False
hidden:: False
default:: 11

downsampling.advanced.fraction

label:: Fraction of reads used
type:: basic:decimal
description:: Use the fraction of reads [0.0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
required:: False
disabled:: False
hidden:: False

downsampling.advanced.two_pass

label:: 2-pass mode [-2]
type:: basic:boolean
description:: Enable two-pass mode when downsampling. Two-pass mode is twice as slow but with much reduced memory.
required:: True
disabled:: False
hidden:: False
default:: False

Salmon Index

data:index:salmonsalmon-index (data:seq:nucleotide nucl, data:file decoys, basic:boolean gencode, basic:boolean keep_duplicates, basic:string source, basic:string species, basic:string build, basic:integer kmerlen)[Source: v2.2.1]

Generate index files for Salmon transcript quantification tool.

nucl

label:: Nucleotide sequence
type:: data:seq:nucleotide
description:: A CDS sequence file in .FASTA format.

decoys

label:: Decoys
type:: data:file
description:: Treat these sequences as decoys that may have sequence homologous to some known transcript.
required:: False

gencode

label:: Gencode
type:: basic:boolean
description:: This flag will expect the input transcript FASTA to be in GENCODE format, and will split the transcript name at the first ‘|’ character. These reduced names will be used in the output and when looking for these transcripts in a gene to transcript GTF.
default:: False

keep_duplicates

label:: Keep duplicates
type:: basic:boolean
description:: This flag will disable the default indexing behavior of discarding sequence-identical duplicate transcripts. If this flag is passed, then duplicate transcripts that appear in the input will be retained and quantified separately.
default:: False

source

label:

Source of attribute ID

type:

basic:string

choices:

DICTYBASE: DICTYBASE
ENSEMBL: ENSEMBL
NCBI: NCBI
UCSC: UCSC

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum

build

label:: Genome build
type:: basic:string

kmerlen

label:: Size of k-mers
type:: basic:integer
description:: The size of k-mers that should be used for the quasi index. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads.
default:: 31

index

label:: Salmon index
type:: basic:dir

source

label:: Source of attribute ID
type:: basic:string

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Samtools bedcov

data:bedcov:samtools-bedcov (data:alignment:bam bam, data:bed bedfile, basic:integer min_read_qual, basic:boolean rm_del_ref_skips, basic:string output_option)[Source: v1.2.0]

Samtools bedcov. Reports the total read base count (i.e. the sum of per base read depths) for each genomic region specified in the supplied BED file. The regions are output as they appear in the BED file and are 0-based. The output is formatted as tab-delimited data, where the initial three columns indicate the chromosome, start, and end positions of the region. The subsequent column provides either the cumulative read base counts or the normalized sum of read base counts based on the length of each individual region (mean coverage). For more information about samtools bedcov, click [here](https://www.htslib.org/doc/samtools-bedcov.html).

bam

label:: Input BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

bedfile

label:: Target BED file
type:: data:bed
description:: Target BED file with regions to extract.
required:: True
disabled:: False
hidden:: False

advanced.min_read_qual

label:: Minimum read mapping quality
type:: basic:integer
description:: Only count reads with mapping quality greater than or equal to [-Q]
required:: False
disabled:: False
hidden:: False

advanced.rm_del_ref_skips

label:: Skip deletions and ref skips
type:: basic:boolean
description:: Do not include deletions (D) and ref skips (N) in bedcov computation. [-j]
required:: True
disabled:: False
hidden:: False
default:: False

advanced.output_option

label:

Metric by which to output coverage

type:

basic:string

description:

Opt for either displaying the cumulative read base counts or the normalized read base counts based on the length of each region. The latter approach is not part of samtools but implemented within the resolwe-bio process.

required:

False

disabled:

False

hidden:

False

default:

sum

choices:

Sum (default): sum
Mean: mean

coverage_report

label:: Output coverage report
type:: basic:file
required:: True
disabled:: False
hidden:: False

Samtools coverage (multi-sample)

data:samtoolscoverage:multi:samtools-coverage-multi (list:data:alignment:bam bam, basic:string region, basic:integer min_read_length, basic:integer min_mq, basic:integer min_bq, list:basic:string excl_flags, basic:integer depth, basic:boolean no_header)[Source: v1.0.0]

Samtools coverage for multiple BAM files. Computes the depth at each position or region and creates tabulated text. For more information about samtools coverage, click [here](https://www.htslib.org/doc/samtools-coverage.html).

bam

label:: Input BAM files
type:: list:data:alignment:bam
description:: Select BAM file(s) for the analysis. Coverage information will be calculated from the merged alignments.
required:: True
disabled:: False
hidden:: False

region

label:: Region
type:: basic:string
description:: Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.
required:: False
disabled:: False
hidden:: False

advanced.min_read_length

label:: Minimum read length
type:: basic:integer
description:: Ignore reads shorter than specified number of base pairs.
required:: False
disabled:: False
hidden:: False

advanced.min_mq

label:: Minimum mapping quality
type:: basic:integer
description:: Minimum mapping quality for an alignment to be used.
required:: False
disabled:: False
hidden:: False

advanced.min_bq

label:: Minimum base quality
type:: basic:integer
description:: Minimum base quality for a base to be considered.
required:: False
disabled:: False
hidden:: False

advanced.excl_flags

label:: Filter flags
type:: list:basic:string
description:: Filter flags: skip reads with mask bits set. Press ENTER after each flag.
required:: True
disabled:: False
hidden:: False
default:: ['UNMAP', 'SECONDARY', 'QCFAIL', 'DUP']

advanced.depth

label:: Maximum allowed coverage depth
type:: basic:integer
description:: If 0, depth is set to the maximum integer value effectively removing any depth limit.
required:: True
disabled:: False
hidden:: False
default:: 1000000

advanced.no_header

label:: No header
type:: basic:boolean
description:: Do not output header.
required:: True
disabled:: False
hidden:: False
default:: False

table

label:: Output coverage table
type:: basic:file
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Samtools coverage (single-sample)

data:samtoolscoverage:single:samtools-coverage-single (data:alignment:bam bam, basic:string region, basic:integer min_read_length, basic:integer min_mq, basic:integer min_bq, list:basic:string excl_flags, basic:integer depth, basic:boolean no_header)[Source: v1.0.0]

Samtools coverage for a single BAM file. Computes the depth at each position or region and creates tabulated text. For more information about samtools coverage, click [here](https://www.htslib.org/doc/samtools-coverage.html).

bam

label:: Input BAM file
type:: data:alignment:bam
description:: Select BAM file for the analysis
required:: True
disabled:: False
hidden:: False

region

label:: Region
type:: basic:string
description:: Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.
required:: False
disabled:: False
hidden:: False

advanced.min_read_length

label:: Minimum read length
type:: basic:integer
description:: Ignore reads shorter than specified number of base pairs.
required:: False
disabled:: False
hidden:: False

advanced.min_mq

label:: Minimum mapping quality
type:: basic:integer
description:: Minimum mapping quality for an alignment to be used.
required:: False
disabled:: False
hidden:: False

advanced.min_bq

label:: Minimum base quality
type:: basic:integer
description:: Minimum base quality for a base to be considered.
required:: False
disabled:: False
hidden:: False

advanced.excl_flags

label:: Filter flags
type:: list:basic:string
description:: Filter flags: skip reads with mask bits set. Press ENTER after each flag.
required:: True
disabled:: False
hidden:: False
default:: ['UNMAP', 'SECONDARY', 'QCFAIL', 'DUP']

advanced.depth

label:: Maximum allowed coverage depth
type:: basic:integer
description:: If 0, depth is set to the maximum integer value effectively removing any depth limit.
required:: True
disabled:: False
hidden:: False
default:: 1000000

advanced.no_header

label:: No header
type:: basic:boolean
description:: Do not output header.
required:: True
disabled:: False
hidden:: False
default:: False

table

label:: Output coverage table
type:: basic:file
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Samtools fastq (paired-end)

data:reads:fastq:paired:bamtofastq:bamtofastq-paired (data:alignment:bam bam)[Source: v1.3.2]

Convert aligned reads in BAM format to paired-end FASTQ files format.

bam

label:: BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

fastq

label:: Remaining mate1 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Remaining mate2 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Mate1 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Mate2 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download mate1 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download mate2 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Samtools idxstats

data:samtools:idxstats:samtools-idxstats (data:alignment:bam alignment)[Source: v1.4.2]

Retrieve and print stats in the index file.

alignment

label:: Alignment
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

report

label:: Samtools idxstats report
type:: basic:file
required:: True
disabled:: False
hidden:: False

Samtools view

data:alignment:bam:samtools:samtools-view (data:alignment:bam bam, basic:string region, data:bed bedfile, basic:boolean include_header, basic:boolean only_header, basic:decimal subsample, basic:integer subsample_seed, basic:integer threads)[Source: v1.0.1]

Samtools view. With no options or regions specified, saves all alignments in the specified input alignment file in BAM format to standard output also in BAM format. You may specify one or more space-separated region specifications to restrict output to only those alignments which overlap the specified region(s). For more information about samtools view, click [here](https://www.htslib.org/doc/samtools-view.html).

bam

label:: Input BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

region

label:: Region
type:: basic:string
description:: Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.
required:: False
disabled:: False
hidden:: bedfile

bedfile

label:: Target BED file
type:: data:bed
description:: Target BED file with regions to extract.If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30292-39103.
required:: False
disabled:: False
hidden:: region

advanced.include_header

label:: Include the header in the output
type:: basic:boolean
required:: True
disabled:: advanced.only_header
hidden:: False
default:: True

advanced.only_header

label:: Output the header only
type:: basic:boolean
description:: Selecting this option overrides all other options.
required:: True
disabled:: advanced.include_header
hidden:: False
default:: False

advanced.subsample

label:: Fraction of the input alignments
type:: basic:decimal
description:: Output only a proportion of the input alignments, as specified by 0.0 ≤ FLOAT ≤ 1.0, which gives the fraction of templates/pairs to be kept. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate.
required:: False
disabled:: False
hidden:: False

advanced.subsample_seed

label:: Subsampling seed
type:: basic:integer
description:: Subsampling seed used to influence which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected.
required:: True
disabled:: False
hidden:: !advanced.subsample
default:: 11

advanced.threads

label:: Number of threads
type:: basic:integer
description:: Number of BAM compression threads to use in addition to main thread.
required:: True
disabled:: False
hidden:: False
default:: 2

bam

label:: Output BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Ouput index file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: False
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Secondary hybrid BAM file

data:alignment:bam:secondaryupload-bam-secondary (data:alignment:bam bam, basic:file src, basic:string species, basic:string build)[Source: v0.10.0]

Upload a secondary mapping file in BAM format.

bam

label:: Hybrid bam
type:: data:alignment:bam
description:: Secondary bam will be appended to the same sample where hybrid bam is.
required:: False

src

label:: Mapping (BAM)
type:: basic:file
description:: A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
validate_regex:: \.(bam)$

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Drosophila melanogaster: Drosophila melanogaster
Mus musculus: Mus musculus

build

label:: Build
type:: basic:string

bam

label:: Uploaded file
type:: basic:file

bai

label:: Index BAI
type:: basic:file

stats

label:: Alignment statistics
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Single cell BAM file and index

data:alignment:bam:scseq:upload-bam-scseq-indexed (basic:file src, basic:file src2, data:screads: reads, basic:string species, basic:string build)[Source: v1.4.1]

Import scSeq BAM file and index.

src

label:: Mapping (BAM)
type:: basic:file
description:: A mapping file in BAM format.
required:: True
disabled:: False
hidden:: False

src2

label:: BAM index (*.bam.bai file)
type:: basic:file
description:: An index file of a BAM mapping file (ending with bam.bai).
required:: True
disabled:: False
hidden:: False

reads

label:: Single cell fastq reads
type:: data:screads:
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
description:: Species latin name.
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

bam

label:: Uploaded BAM
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index BAI
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Spike-ins quality control

data:spikeinsspikein-qc (list:data:expression samples, basic:string mix)[Source: v1.4.1]

Plot spike-ins measured abundances for samples quality control. The process will output graphs showing the correlation between known concentration of ERCC spike-ins and sample’s measured abundance.

samples

label:: Expressions with spike-ins
type:: list:data:expression

mix

label:

Spike-ins mix

type:

basic:string

description:

Select spike-ins mix.

choices:

ERCC Mix 1: ercc_mix1
ERCC Mix 2: ercc_mix2
SIRV-Set 3: sirv_set3

plots

label:: Plot figures
type:: list:basic:file
required:: False

report

label:: HTML report with results
type:: basic:file:html
required:: False
hidden:: True

report_zip

label:: ZIP file contining HTML report with results
type:: basic:file
required:: False

Subsample FASTQ (paired-end)

data:reads:fastq:paired:seqtk:seqtk-sample-paired (data:reads:fastq:paired reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.5.2]

Subsample reads from FASTQ files (paired-end). [Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).

reads

label:: Reads
type:: data:reads:fastq:paired
required:: True
disabled:: False
hidden:: False

n_reads

label:: Number of reads
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 1000000

advanced.seed

label:: Seed
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 11

advanced.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
required:: False
disabled:: False
hidden:: False

advanced.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Remaining mate 1 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Remaining mate 2 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Mate 1 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Mate 2 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download mate 1 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download mate 2 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Subsample FASTQ (single-end)

data:reads:fastq:single:seqtk:seqtk-sample-single (data:reads:fastq:single reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.5.2]

Subsample reads from FASTQ file (single-end). [Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).

reads

label:: Reads
type:: data:reads:fastq:single
required:: True
disabled:: False
hidden:: False

n_reads

label:: Number of reads
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 1000000

advanced.seed

label:: Seed
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 11

advanced.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
required:: False
disabled:: False
hidden:: False

advanced.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
required:: True
disabled:: False
hidden:: False
default:: False

fastq

label:: Remaining reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Subsample FASTQ and BWA Aln (paired-end)

data:workflow:chipseq:seqtkbwaalnworkflow-subsample-bwa-aln-paired (data:reads:fastq:paired reads, data:index:bwa genome, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v1.1.0]

reads

label:: Reads
type:: data:reads:fastq:paired

genome

label:: Reference genome
type:: data:index:bwa

downsampling.n_reads

label:: Number of reads
type:: basic:integer
default:: 10000000

downsampling.advanced.seed

label:: Seed
type:: basic:integer
default:: 11

downsampling.advanced.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:: False

downsampling.advanced.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:: True

alignment.q

label:: Quality threshold
type:: basic:integer
description:: Parameter for dynamic read trimming.
default:: 5

alignment.use_edit

label:: Use maximum edit distance (excludes fraction of missing alignments)
type:: basic:boolean
default:: False

alignment.edit_value

label:: Maximum edit distance
type:: basic:integer
hidden:: !use_edit
default:: 5

alignment.fraction

label:: Fraction of missing alignments
type:: basic:decimal
description:: The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
hidden:: use_edit
default:: 0.04

alignment.seeds

label:: Use seeds
type:: basic:boolean
default:: True

alignment.seed_length

label:: Seed length
type:: basic:integer
description:: Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
hidden:: !seeds
default:: 32

alignment.seed_dist

label:: Seed maximum edit distance
type:: basic:integer
hidden:: !seeds
default:: 2

Subsample FASTQ and BWA Aln (single-end)

data:workflow:chipseq:seqtkbwaalnworkflow-subsample-bwa-aln-single (data:reads:fastq:single reads, data:index:bwa genome, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v1.1.0]

reads

label:: Reads
type:: data:reads:fastq:single

genome

label:: Reference genome
type:: data:index:bwa

downsampling.n_reads

label:: Number of reads
type:: basic:integer
default:: 10000000

downsampling.advanced.seed

label:: Seed
type:: basic:integer
default:: 11

downsampling.advanced.fraction

label:: Fraction
type:: basic:decimal
description:: Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:: False

downsampling.advanced.two_pass

label:: 2-pass mode
type:: basic:boolean
description:: Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:: True

alignment.q

label:: Quality threshold
type:: basic:integer
description:: Parameter for dynamic read trimming.
default:: 5

alignment.use_edit

label:: Use maximum edit distance (excludes fraction of missing alignments)
type:: basic:boolean
default:: False

alignment.edit_value

label:: Maximum edit distance
type:: basic:integer
hidden:: !use_edit
default:: 5

alignment.fraction

label:: Fraction of missing alignments
type:: basic:decimal
description:: The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
hidden:: use_edit
default:: 0.04

alignment.seeds

label:: Use seeds
type:: basic:boolean
default:: True

alignment.seed_length

label:: Seed length
type:: basic:integer
description:: Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
hidden:: !seeds
default:: 32

alignment.seed_dist

label:: Seed maximum edit distance
type:: basic:integer
hidden:: !seeds
default:: 2

Test basic fields

data:test:fieldstest-basic-fields (basic:boolean boolean, basic:date date, basic:datetime datetime, basic:decimal decimal, basic:integer integer, basic:string string, basic:text text, basic:url:download url_download, basic:url:view url_view, basic:string string2, basic:string string3, basic:string string4, basic:string string5, basic:string string6, basic:string string7, basic:string tricky2)[Source: v1.2.4]

Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.

boolean

label:: Boolean
type:: basic:boolean
default:: True

date

label:: Date
type:: basic:date
default:: 2013-12-31

datetime

label:: Date and time
type:: basic:datetime
default:: 2013-12-31 23:59:59

decimal

label:: Decimal
type:: basic:decimal
default:: -123.456

integer

label:: Integer
type:: basic:integer
default:: -123

string

label:: String
type:: basic:string
default:: Foo b-a-r.gz 1.23

text

label:: Text
type:: basic:text
default:: Foo bar in 3 lines.

url_download

label:: URL download
type:: basic:url:download
default:: {'url': 'http://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf'}

url_view

label:: URL view
type:: basic:url:view
default:: {'name': 'Something', 'url': 'http://www.something.com/'}

group.string2

label:: String 2 required
type:: basic:string
description:: String 2 description.
required:: True
disabled:: false
hidden:: false
placeholder:: Enter string

group.string3

label:: String 3 disabled
type:: basic:string
description:: String 3 description.
disabled:: true
default:: disabled

group.string4

label:: String 4 hidden
type:: basic:string
description:: String 4 description.
hidden:: True
default:: hidden

group.string5

label:

String 5 choices

type:

basic:string

description:

String 5 description.

hidden:

False

default:

choice_2

choices:

Choice 1: choice_1
Choice 2: choice_2
Choice 3: choice_3

group.string6

label:: String 6 regex only “Aa”
type:: basic:string
default:: AAaAaaa
validate_regex:: ^[aA]*$

group.string7

label:

String 7 optional choices

type:

basic:string

description:

String 7 description.

required:

False

hidden:

False

default:

choice_2

choices:

Choice 1: choice_1
Choice 2: choice_2
Choice 3: choice_3

tricky.tricky1.tricky2

label:: Tricky 2
type:: basic:string
default:: true

output

label:: Result
type:: basic:url:view

out_boolean

label:: Boolean
type:: basic:boolean

out_date

label:: Date
type:: basic:date

out_datetime

label:: Date and time
type:: basic:datetime

out_decimal

label:: Decimal
type:: basic:decimal

out_integer

label:: Integer
type:: basic:integer

out_string

label:: String
type:: basic:string

out_text

label:: Text
type:: basic:text

out_url_download

label:: URL download
type:: basic:url:download

out_url_view

label:: URL view
type:: basic:url:view

out_group.string2

label:: String 2 required
type:: basic:string
description:: String 2 description.

out_group.string3

label:: String 3 disabled
type:: basic:string
description:: String 3 description.

out_group.string4

label:: String 4 hidden
type:: basic:string
description:: String 4 description.

out_group.string5

label:: String 5 choices
type:: basic:string
description:: String 5 description.

out_group.string6

label:: String 6 regex only “Aa”
type:: basic:string

out_group.string7

label:: String 7 optional choices
type:: basic:string

out_tricky.tricky1.tricky2

label:: Tricky 2
type:: basic:string

Test disabled inputs

data:test:disabledtest-disabled (basic:boolean broad, basic:integer broad_width, basic:string width_label, basic:integer if_and_condition)[Source: v1.2.4]

Test disabled input fields.

broad

label:: Broad peaks
type:: basic:boolean
default:: False

broad_width

label:: Width of peaks
type:: basic:integer
disabled:: broad === false
default:: 5

width_label

label:: Width label
type:: basic:string
disabled:: broad === false
default:: FD

if_and_condition

label:: If width is 5 and label FDR
type:: basic:integer
disabled:: broad_width == 5 && width_label == ‘FDR’
default:: 5

output

label:: Result
type:: basic:string

Test hidden inputs

data:test:hiddentest-hidden (basic:boolean broad, basic:integer broad_width, basic:integer parameter1, basic:integer parameter2, basic:integer broad_width2)[Source: v1.2.4]

Test hidden input fields

broad

label:: Broad peaks
type:: basic:boolean
default:: False

broad_width

label:: Width of peaks
type:: basic:integer
hidden:: broad === false
default:: 5

parameters_broad_f.parameter1

label:: parameter1
type:: basic:integer
default:: 10

parameters_broad_f.parameter2

label:: parameter2
type:: basic:integer
default:: 10

parameters_broad_t.broad_width2

label:: Width of peaks2
type:: basic:integer
default:: 5

output

label:: Result
type:: basic:string

Test select controler

data:test:resulttest-list (data:test:result single, list:data:test:result multiple)[Source: v1.2.4]

Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.

single

label:: Single
type:: data:test:result

multiple

label:: Multiple
type:: list:data:test:result

output

label:: Result
type:: basic:string

Test sleep progress

data:test:resulttest-sleep-progress (basic:integer t)[Source: v1.2.4]

Test for the progress bar by sleeping 5 times for the specified amount of time.

t

label:: Sleep time
type:: basic:integer
default:: 5

output

label:: Result
type:: basic:string

Trim Galore (paired-end)

data:reads:fastq:paired:trimgalore:trimgalore-paired (data:reads:fastq:paired reads, list:basic:string adapter, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, basic:integer quality, basic:integer nextseq, basic:string phred, basic:integer min_length, basic:integer max_n, basic:boolean retain_unpaired, basic:integer unpaired_len_1, basic:integer unpaired_len_2, basic:integer clip_r1, basic:integer clip_r2, basic:integer three_prime_r1, basic:integer three_prime_r2, basic:integer trim_5, basic:integer trim_3)[Source: v1.3.2]

Process paired-end sequencing reads with Trim Galore. Trim Galore is a wrapper script that makes use of the publicly available adapter trimming tool Cutadapt and FastQC for quality control once the trimming process has completed. Low-quality ends are trimmed from reads in addition to adapter removal in a single pass. If no sequence was supplied, Trim Galore will attempt to auto-detect the adapter which has been used. For this it will analyse the first 1 million sequences of the first specified file and attempt to find the first 12 or 13bp of the following standard adapters: Illumina: AGATCGGAAGAGC, Small RNA: TGGAATTCTCGG, Nextera: CTGTCTCTTATA. If no adapter contamination can be detected within the first 1 million sequences, or in case of a tie between several different adapters, Trim Galore defaults to illumina adapters. For additional information see official [user guide](https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md).

reads

label:: Select paired-end reads
type:: data:reads:fastq:paired
required:: True
disabled:: False
hidden:: False

adapter_trim.adapter

label:: Read 1 adapter sequence
type:: list:basic:string
description:: Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.
required:: False
disabled:: False
hidden:: False
default:: []

adapter_trim.adapter_2

label:: Read 2 adapter sequence
type:: list:basic:string
description:: Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.
required:: False
disabled:: False
hidden:: False
default:: []

adapter_trim.adapter_file_1

label:: Read 1 adapters file
type:: data:seq:nucleotide
description:: This is mutually exclusive with read 1 adapters and universal adapters.
required:: False
disabled:: False
hidden:: False

adapter_trim.adapter_file_2

label:: Read 2 adapters file
type:: data:seq:nucleotide
description:: This is mutually exclusive with read 2 adapters and universal adapters.
required:: False
disabled:: False
hidden:: False

adapter_trim.universal_adapter

label:

Universal adapters

type:

basic:string

description:

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required:

False

disabled:

False

hidden:

False

choices:

Illumina: --illumina
Nextera: --nextera
Illumina small RNA: --small_rna

adapter_trim.stringency

label:: Overlap with adapter sequence required to trim
type:: basic:integer
description:: Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
required:: True
disabled:: False
hidden:: False
default:: 1

adapter_trim.error_rate

label:: Maximum allowed error rate
type:: basic:decimal
description:: Number of errors divided by the length of the matching region
required:: True
disabled:: False
hidden:: False
default:: 0.1

quality_trim.quality

label:: Quality cutoff
type:: basic:integer
description:: Trim low-quality ends from reads based on phred score.
required:: True
disabled:: False
hidden:: False
default:: 20

quality_trim.nextseq

label:: NextSeq/NovaSeq trim cutoff
type:: basic:integer
description:: NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
required:: False
disabled:: False
hidden:: False

quality_trim.phred

label:

Phred score encoding

type:

basic:string

description:

Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1.9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming

required:

True

disabled:

False

hidden:

False

default:

--phred33

choices:

ASCII+33: --phred33
ASCII+64: --phred64

quality_trim.min_length

label:: Minimum length after trimming
type:: basic:integer
description:: Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.
required:: True
disabled:: False
hidden:: False
default:: 20

quality_trim.max_n

label:: Maximum number of Ns
type:: basic:integer
description:: Read exceeding this limit will result in the entire pair being removed from the trimmed output files.
required:: False
disabled:: False
hidden:: False

quality_trim.retain_unpaired

label:: Retain unpaired reads after trimming
type:: basic:boolean
description:: If only one of the two paired-end reads became too short, the longer read will be written.
required:: True
disabled:: False
hidden:: False
default:: False

quality_trim.unpaired_len_1

label:: Unpaired read length cutoff for mate 1
type:: basic:integer
required:: True
disabled:: False
hidden:: !quality_trim.retain_unpaired
default:: 35

quality_trim.unpaired_len_2

label:: Unpaired read length cutoff for mate 2
type:: basic:integer
required:: True
disabled:: False
hidden:: !quality_trim.retain_unpaired
default:: 35

quality_trim.clip_r1

label:: Trim bases from 5’ end of read 1
type:: basic:integer
description:: This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.
required:: False
disabled:: False
hidden:: False

quality_trim.clip_r2

label:: Trim bases from 5’ end of read 2
type:: basic:integer
description:: This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.
required:: False
disabled:: False
hidden:: False

quality_trim.three_prime_r1

label:: Trim bases from 3’ end of read 1
type:: basic:integer
description:: Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
required:: False
disabled:: False
hidden:: False

quality_trim.three_prime_r2

label:: Trim bases from 3’ end of read 2
type:: basic:integer
description:: Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
required:: False
disabled:: False
hidden:: False

hard_trim.trim_5

label:: Hard trim sequences from 3’ end
type:: basic:integer
description:: Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.
required:: False
disabled:: False
hidden:: False

hard_trim.trim_3

label:: Hard trim sequences from 5’ end
type:: basic:integer
description:: Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.
required:: False
disabled:: False
hidden:: False

fastq

label:: Remaining mate 1 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastq2

label:: Remaining mate 2 reads
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

report

label:: Trim galore report
type:: basic:file
required:: False
disabled:: False
hidden:: False

fastqc_url

label:: Mate 1 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_url2

label:: Mate 2 quality control with FastQC
type:: list:basic:file:html
required:: True
disabled:: False
hidden:: False

fastqc_archive

label:: Download mate 1 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

fastqc_archive2

label:: Download mate 2 FastQC archive
type:: list:basic:file
required:: True
disabled:: False
hidden:: False

Trimmomatic (paired-end)

data:reads:fastq:paired:trimmomatictrimmomatic-paired (data:reads:fastq:paired reads, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer palindrome_clip_threshold, basic:integer min_adapter_length, basic:boolean keep_both_reads, basic:integer window_size, basic:integer required_quality, basic:integer target_length, basic:decimal strictness, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer average_quality)[Source: v2.5.2]

Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.

reads

label:: Reads
type:: data:reads:fastq:paired

illuminaclip.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
required:: False

illuminaclip.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
required:: False
disabled:: !illuminaclip.adapters

illuminaclip.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequence’, ‘Seed mismatches’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
required:: False
disabled:: !illuminaclip.adapters

illuminaclip.palindrome_clip_threshold

label:: Palindrome clip threshold
type:: basic:integer
description:: Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminacliping.
required:: False
disabled:: !illuminaclip.adapters

illuminaclip.min_adapter_length

label:: Minimum adapter length
type:: basic:integer
description:: In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.
disabled:: !illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold
default:: 8

illuminaclip.keep_both_reads

label:: Keep both reads
type:: basic:boolean
description:: After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read.By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming Illuminaclip. ‘Adapter sequence’, ‘Seed mismatches’, ‘Simple clip threshold’, ‘Palindrome clip threshold’ and also ‘Minimum adapter length’ are needed in order to use this parameter.
required:: False
disabled:: !illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold && !illuminaclip.min_adapter_length

slidingwindow.window_size

label:: Window size
type:: basic:integer
description:: Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:: False

slidingwindow.required_quality

label:: Required quality
type:: basic:integer
description:: Specifies the average quality required. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:: False

maxinfo.target_length

label:: Target length
type:: basic:integer
description:: This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:: False

maxinfo.strictness

label:: Strictness
type:: basic:decimal
description:: This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:: False

trim_bases.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning. Specifies the minimum quality required to keep a base.
required:: False

trim_bases.trailing

label:: Trailing
type:: basic:integer
description:: Remove low quality bases from the end. Specifies the minimum quality required to keep a base.
required:: False

trim_bases.crop

label:: Crop
type:: basic:integer
description:: Cut the read to a specified length by removing bases from the end.
required:: False

trim_bases.headcrop

label:: Headcrop
type:: basic:integer
description:: Cut the specified number of bases from the start of the read.
required:: False

reads_filtering.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False

reads_filtering.average_quality

label:: Average quality
type:: basic:integer
description:: Drop the read if the average quality is below the specified level.
required:: False

fastq

label:: Reads file (mate 1)
type:: list:basic:file

fastq_unpaired

label:: Reads file
type:: basic:file
required:: False

fastq2

label:: Reads file (mate 2)
type:: list:basic:file

fastq2_unpaired

label:: Reads file
type:: basic:file
required:: False

fastqc_url

label:: Quality control with FastQC (Upstream)
type:: list:basic:file:html

fastqc_url2

label:: Quality control with FastQC (Downstream)
type:: list:basic:file:html

fastqc_archive

label:: Download FastQC archive (Upstream)
type:: list:basic:file

fastqc_archive2

label:: Download FastQC archive (Downstream)
type:: list:basic:file

Trimmomatic (single-end)

data:reads:fastq:single:trimmomatictrimmomatic-single (data:reads:fastq:single reads, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer window_size, basic:integer required_quality, basic:integer target_length, basic:decimal strictness, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer average_quality)[Source: v2.5.2]

Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.

reads

label:: Reads
type:: data:reads:fastq:single

illuminaclip.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform Illuminacliping.
required:: False

illuminaclip.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequences’ and ‘Simple clip threshold’ parameter are needed to perform Illuminacliping.
required:: False
disabled:: !illuminaclip.adapters

illuminaclip.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.
required:: False
disabled:: !illuminaclip.adapters

slidingwindow.window_size

label:: Window size
type:: basic:integer
description:: Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:: False

slidingwindow.required_quality

label:: Required quality
type:: basic:integer
description:: Specifies the average quality required in window size. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:: False

maxinfo.target_length

label:: Target length
type:: basic:integer
description:: This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:: False

maxinfo.strictness

label:: Strictness
type:: basic:decimal
description:: This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:: False

trim_bases.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning, if below a threshold quality.
required:: False

trim_bases.trailing

label:: Trailing quality
type:: basic:integer
description:: Remove low quality bases from the end, if below a threshold quality.
required:: False

trim_bases.crop

label:: Crop
type:: basic:integer
description:: Cut the read to a specified length by removing bases from the end.
required:: False

trim_bases.headcrop

label:: Headcrop
type:: basic:integer
description:: Cut the specified number of bases from the start of the read.
required:: False

reads_filtering.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False

reads_filtering.average_quality

label:: Average quality
type:: basic:integer
description:: Drop the read if the average quality is below the specified level.
required:: False

fastq

label:: Reads file
type:: list:basic:file

fastqc_url

label:: Quality control with FastQC
type:: list:basic:file:html

fastqc_archive

label:: Download FastQC archive
type:: list:basic:file

UMI-tools dedup

data:alignment:bam:umitools:dedup:umi-tools-dedup (data:alignment:bam alignment)[Source: v1.5.1]

Deduplicate reads using UMI and mapping coordinates.

alignment

label:: Alignment
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

bam

label:: Clipped BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of clipped BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

dedup_log

label:: Deduplication log
type:: basic:file
required:: True
disabled:: False
hidden:: False

dedup_stats

label:: Deduplication stats
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Upload microarray expression (unmapped)

data:microarray:normalized:upload-microarray-expression (basic:file exp, basic:string exp_type, basic:string platform, basic:string platform_id, basic:string species)[Source: v1.1.1]

Import unmapped microarray expression data.

exp

label:: Normalized expression
type:: basic:file
description:: Normalized expression file with the original probe IDs. Supported file extensions are .tab.*, .tsv.*, .txt.*
required:: True
disabled:: False
hidden:: False

exp_type

label:: Normalization type
type:: basic:string
required:: True
disabled:: False
hidden:: False

platform

label:: Microarray platform name
type:: basic:string
required:: True
disabled:: False
hidden:: False

platform_id

label:: GEO platform ID
type:: basic:string
description:: Platform ID according to the GEO database. This can be used in following steps to automatically map probe IDs to genes.
required:: False
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Macaca mulatta: Macaca mulatta
Dictyostelium discoideum: Dictyostelium discoideum

exp

label:: Uploaded normalized expression
type:: basic:file
required:: True
disabled:: False
hidden:: False

exp_type

label:: Normalization type
type:: basic:string
required:: True
disabled:: False
hidden:: False

platform

label:: Microarray platform type
type:: basic:string
required:: True
disabled:: False
hidden:: False

platform_id

label:: GEO platform ID
type:: basic:string
required:: False
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

Upload proteomics sample

data:proteomics:massspectrometry:upload-proteomics-sample (basic:file src, basic:string species, basic:string source)[Source: v1.2.1]

Upload a mass spectrometry proteomics sample data file. The input 5-column tab-delimited file with the .txt suffix is expected to contain a header line with the following meta-data column names: “Uniprot ID”, “Gene symbol”, “Protein name” and “Number of peptides”. The fifth column contains the sample data.

src

label:: Table containing mass spectrometry data (.txt)
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus

source

label:

Protein ID database source

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

UniProtKB

choices:

UniProtKB: UniProtKB

table

label:: Uploaded table
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

source

label:: Source
type:: basic:string
required:: True
disabled:: False
hidden:: False

Upload proteomics sample set

data:proteomics:sampleset:upload-proteomics-sample-set (basic:file src, basic:string species, basic:string source)[Source: v1.2.1]

Upload a mass spectrometry proteomics sample set file. The input multi-sample tab-delimited file with the .txt suffix is expected to contain a header line with the following meta-data column names: “Uniprot ID”, “Gene symbol”, “Protein name” and “Number of peptides”. Each additional column in the input file should contain data for a single sample.

src

label:: Table containing mass spectrometry data (.txt)
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:

Species

type:

basic:string

description:

Select a species name from the dropdown menu or write a custom species name in the species field.

required:

True

disabled:

False

hidden:

False

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus

source

label:

Protein ID database source

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

UniProtKB

choices:

UniProtKB: UniProtKB

table

label:: Uploaded table
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

source

label:: Source
type:: basic:string
required:: True
disabled:: False
hidden:: False

VCF file

data:variants:vcfupload-variants-vcf (basic:file src, basic:string species, basic:string build)[Source: v2.3.0]

Upload variants in VCF format.

src

label:: Variants (VCF)
type:: basic:file
description:: Variants in VCF format.
required:: True
validate_regex:: \.(vcf)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

species

label:

Species

type:

basic:string

description:

Species latin name.

choices:

Homo sapiens: Homo sapiens
Mus musculus: Mus musculus
Rattus norvegicus: Rattus norvegicus
Dictyostelium discoideum: Dictyostelium discoideum
Odocoileus virginianus texanus: Odocoileus virginianus texanus
Solanum tuberosum: Solanum tuberosum

build

label:: Genome build
type:: basic:string

vcf

label:: Uploaded file
type:: basic:file

tbi

label:: Tabix index
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

Variant calling (CheMut)

data:variants:vcf:chemut:vc-chemut (data:seq:nucleotide genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean base_recalibration, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:string PL, basic:string LB, basic:string PU, basic:string CN, basic:date DT, data:bed intervals, basic:integer ploidy, basic:integer stand_call_conf, basic:integer mbq, basic:integer max_reads, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v3.0.1]

CheMut varint calling using multiple BAM input files.

genome

label:: Reference genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

parental_strains

label:: Parental strains
type:: list:data:alignment:bam
required:: True
disabled:: False
hidden:: False

mutant_strains

label:: Mutant strains
type:: list:data:alignment:bam
required:: True
disabled:: False
hidden:: False

base_recalibration

label:: Do variant base recalibration
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

known_sites

label:: dbSNP file
type:: data:variants:vcf
description:: Database of known polymorphic sites.
required:: False
disabled:: False
hidden:: False

known_indels

label:: Known indels
type:: list:data:variants:vcf
required:: False
disabled:: False
hidden:: !base_recalibration

reads_info.PL

label:

Platform/technology

type:

basic:string

description:

Platform/technology used to produce the reads.

required:

True

disabled:

False

hidden:

False

default:

Illumina

choices:

Capillary: Capillary
Ls454: Ls454
Illumina: Illumina
SOLiD: SOLiD
Helicos: Helicos
IonTorrent: IonTorrent
Pacbio: Pacbio

reads_info.LB

label:: Library
type:: basic:string
required:: True
disabled:: False
hidden:: False
default:: x

reads_info.PU

label:: Platform unit
type:: basic:string
description:: Platform unit (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.
required:: True
disabled:: False
hidden:: False
default:: x

reads_info.CN

label:: Sequencing center
type:: basic:string
description:: Name of sequencing center producing the read.
required:: True
disabled:: False
hidden:: False
default:: x

reads_info.DT

label:: Date
type:: basic:date
description:: Date the run was produced.
required:: True
disabled:: False
hidden:: False
default:: 2017-01-01

hc.intervals

label:: Intervals (from BED file)
type:: data:bed
description:: Use this option to perform the analysis over only part of the genome.
required:: False
disabled:: False
hidden:: False

hc.ploidy

label:: Sample ploidy
type:: basic:integer
description:: Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
required:: True
disabled:: False
hidden:: False
default:: 2

hc.stand_call_conf

label:: Min call confidence threshold
type:: basic:integer
description:: The minimum phred-scaled confidence threshold at which variants should be called.
required:: True
disabled:: False
hidden:: False
default:: 30

hc.mbq

label:: Min Base Quality
type:: basic:integer
description:: Minimum base quality required to consider a base for calling.
required:: True
disabled:: False
hidden:: False
default:: 10

hc.max_reads

label:: Max reads per alignment start site
type:: basic:integer
description:: Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.
required:: True
disabled:: False
hidden:: False
default:: 50

advanced.java_gc_threads

label:: Java ParallelGCThreads
type:: basic:integer
description:: Sets the number of threads used during parallel phases of the garbage collectors.
required:: True
disabled:: False
hidden:: False
default:: 2

advanced.max_heap_size

label:: Java maximum heap size (Xmx)
type:: basic:integer
description:: Set the maximum Java heap size (in GB).
required:: True
disabled:: False
hidden:: False
default:: 12

vcf

label:: Called variants file
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Variant filtering (CheMut)

data:variants:vcf:filtering:filtering-chemut (data:variants:vcf variants, basic:string analysis_type, basic:string parental_strain, basic:string mutant_strain, data:seq:nucleotide genome, basic:integer read_depth)[Source: v1.8.2]

Filtering and annotation of Variant Calling (CheMut). Filtering and annotation of Variant Calling data - Chemical mutagenesis in _Dictyostelium discoideum_.

variants

label:: Variants file (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

analysis_type

label:

Analysis type

type:

basic:string

description:

Choice of the analysis type. Use ‘SNV’ or ‘INDEL’ options. Choose options SNV_CHR2 or INDEL_CHR2 to run the GATK analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).

required:

True

disabled:

False

hidden:

False

default:

snv

choices:

SNV: snv
INDEL: indel
SNV_CHR2: snv_chr2
INDEL_CHR2: indel_chr2

parental_strain

label:: Parental strain prefix
type:: basic:string
required:: True
disabled:: False
hidden:: False
default:: parental

mutant_strain

label:: Mutant strain prefix
type:: basic:string
required:: True
disabled:: False
hidden:: False
default:: mut

genome

label:: Reference genome
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

read_depth

label:: Read Depth Cutoff
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 5

summary

label:: Summary
type:: basic:file
description:: Summarize the input parameters and results.
required:: True
disabled:: False
hidden:: False

vcf

label:: Variants
type:: basic:file
description:: A genome VCF file of variants that passed the filters.
required:: True
disabled:: False
hidden:: False

tbi

label:: Tabix index
type:: basic:file
required:: True
disabled:: False
hidden:: False

variants_filtered

label:: Variants filtered
type:: basic:file
description:: A data frame of variants that passed the filters.
required:: False
disabled:: False
hidden:: False

variants_filtered_alt

label:: Variants filtered (multiple alt. alleles)
type:: basic:file
description:: A data frame of variants that contain more than two alternative alleles. These variants are likely to be false positives.
required:: False
disabled:: False
hidden:: False

gene_list_all

label:: Gene list (all)
type:: basic:file
description:: Genes that are mutated at least once.
required:: False
disabled:: False
hidden:: False

gene_list_top

label:: Gene list (top)
type:: basic:file
description:: Genes that are mutated at least twice.
required:: False
disabled:: False
hidden:: False

mut_chr

label:: Mutations (by chr)
type:: basic:file
description:: List mutations in individual chromosomes.
required:: False
disabled:: False
hidden:: False

mut_strain

label:: Mutations (by strain)
type:: basic:file
description:: List mutations in individual strains.
required:: False
disabled:: False
hidden:: False

strain_by_gene

label:: Strain (by gene)
type:: basic:file
description:: List mutants that carry mutations in individual genes.
required:: False
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

WALT

data:alignment:bam:waltwalt (data:index:walt genome, data:reads:fastq reads, basic:boolean rm_dup, basic:integer optical_distance, basic:integer mismatch, basic:integer number, basic:string spikein_name, basic:boolean filter_spikein)[Source: v3.7.2]

WALT (Wildcard ALignment Tool) is a read mapping program for bisulfite sequencing in DNA methylation studies.

genome

label:: Reference genome
type:: data:index:walt

reads

label:: Reads
type:: data:reads:fastq

rm_dup

label:: Remove duplicates
type:: basic:boolean
default:: True

optical_distance

label:: Optical duplicate distance
type:: basic:integer
description:: The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.
disabled:: !rm_dup
default:: 0

mismatch

label:: Maximum allowed mismatches
type:: basic:integer
required:: False

number

label:: Number of reads to map in one loop
type:: basic:integer
description:: Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
required:: False

spikein_options.spikein_name

label:: Chromosome name of unmethylated control sequence
type:: basic:string
description:: Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.
required:: False

spikein_options.filter_spikein

label:: Remove control/spike-in sequences.
type:: basic:boolean
description:: Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).
disabled:: !spikein_options.spikein_name
default:: False

bam

label:: Alignment file (BAM)
type:: basic:file
description:: Position sorted alignment in .bam format

bai

label:: Index BAI
type:: basic:file

stats

label:: Statistics
type:: basic:file

mr

label:: Alignment file (MR)
type:: basic:file
description:: Position sorted alignment in .mr format.

duplicates_report

label:: Removed duplicates statistics
type:: basic:file
required:: False

unmapped

label:: Unmapped reads
type:: basic:file
required:: False

spikein_mr

label:: Alignment file of unmethylated control reads
type:: basic:file
required:: False

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

WALT genome index

data:index:walt:walt-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]

Create WALT genome index.

ref_seq

label:: Reference sequence (nucleotide FASTA)
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

index

label:: WALT index
type:: basic:dir
required:: True
disabled:: False
hidden:: False

fastagz

label:: FASTA file (compressed)
type:: basic:file
required:: True
disabled:: False
hidden:: False

fasta

label:: FASTA file
type:: basic:file
required:: True
disabled:: False
hidden:: False

fai

label:: FASTA file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

WGBS (paired-end)

data:workflow:wgbsworkflow-wgbs-paired (data:reads:fastq:paired reads, data:index:walt walt_index, data:seq:nucleotide ref_seq, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:boolean keep_both_reads, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:boolean rm_dup, basic:integer optical_distance, basic:integer mismatch, basic:integer number, basic:string spikein_name, basic:boolean filter_spikein, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich, basic:boolean cpgs, basic:boolean symmetric_cpgs, data:seq:nucleotide adapters, basic:integer insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:integer min_quality, basic:integer next_base_quality, basic:integer min_lenght, basic:decimal mismatch_rate, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations)[Source: v2.2.0]

This WGBS pipeline is comprised of trimming, alignment, computation of methylation levels, identification of hypo-methylated regions (HMRs) and additional QC steps. First, reads are trimmed to remove adapters or kit specific artifacts. Reads are then aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package. QC steps are based on [Picard](http://broadinstitute.github.io/picard/) and include high level metrics about the alignment, WGS performance and summary statistics from bisulfite sequencing. Final QC reports are summarized by MultiQC.

reads

label:: Select sample(s)
type:: data:reads:fastq:paired

walt_index

label:: Walt index
type:: data:index:walt

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

adapter_trimming.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform adapter trimming. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
required:: False

adapter_trimming.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
required:: False
disabled:: !adapter_trimming.adapters

adapter_trimming.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.
required:: False
disabled:: !adapter_trimming.adapters

adapter_trimming.min_adapter_length

label:: Minimum adapter length
type:: basic:integer
description:: In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.
disabled:: !adapter_trimming.seed_mismatches && !adapter_trimming.simple_clip_threshold && !adapter_trimming.palindrome_clip_threshold
default:: 8

adapter_trimming.palindrome_clip_threshold

label:: Palindrome clip threshold
type:: basic:integer
description:: Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.
required:: False
disabled:: !adapter_trimming.adapters

adapter_trimming.keep_both_reads

label:: Keep both reads
type:: basic:boolean
description:: After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming adapter trimming.
required:: False
disabled:: !adapter_trimming.seed_mismatches && !adapter_trimming.simple_clip_threshold && !adapter_trimming.palindrome_clip_threshold && !adapter_trimming.min_adapter_length

trimming_filtering.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning, if below a threshold quality.
required:: False

trimming_filtering.trailing

label:: Trailing quality
type:: basic:integer
description:: Remove low quality bases from the end, if below a threshold quality.
required:: False

trimming_filtering.crop

label:: Crop
type:: basic:integer
description:: Cut the read to a specified length by removing bases from the end.
required:: False

trimming_filtering.headcrop

label:: Headcrop
type:: basic:integer
description:: Cut the specified number of bases from the start of the read.
required:: False

trimming_filtering.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False

alignment.rm_dup

label:: Remove duplicates
type:: basic:boolean
default:: True

alignment.optical_distance

label:: Optical duplicate distance
type:: basic:integer
description:: The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.
disabled:: !alignment.rm_dup
default:: 0

alignment.mismatch

label:: Maximum allowed mismatches
type:: basic:integer
default:: 6

alignment.number

label:: Number of reads to map in one loop
type:: basic:integer
description:: Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
required:: False

alignment.spikein_name

label:: Chromosome name of unmethylated control sequence
type:: basic:string
description:: Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.
required:: False

alignment.filter_spikein

label:: Remove control/spike-in sequences.
type:: basic:boolean
description:: Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).
disabled:: !alignment.spikein_name
default:: False

bsrate.skip

label:: Skip Bisulfite conversion rate step
type:: basic:boolean
description:: Bisulfite conversion rate step can be skipped. If separate alignment file for unmethylated control sequence is not produced during the alignment this process will fail.
disabled:: !alignment.spikein_name
default:: True

bsrate.sequence

label:: Unmethylated control sequence
type:: data:seq:nucleotide
required:: False
disabled:: bsrate.skip

bsrate.count_all

label:: Count all cytosines including CpGs
type:: basic:boolean
disabled:: bsrate.skip
default:: True

bsrate.read_length

label:: Average read length
type:: basic:integer
default:: 150

bsrate.max_mismatch

label:: Maximum fraction of mismatches
type:: basic:decimal
required:: False
disabled:: bsrate.skip

bsrate.a_rich

label:: Reads are A-rich
type:: basic:boolean
disabled:: bsrate.skip
default:: False

methcounts.cpgs

label:: Only CpG context sites
type:: basic:boolean
description:: Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
disabled:: methcounts.symmetric_cpgs
default:: False

methcounts.symmetric_cpgs

label:: Merge CpG pairs
type:: basic:boolean
description:: Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
disabled:: methcounts.cpgs
default:: True

summary.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
required:: False

summary.insert_size

label:: Maximum insert size
type:: basic:integer
default:: 100000

summary.pair_orientation

label:

Pair orientation

type:

basic:string

default:

null

choices:

Unspecified: null
FR: FR
RF: RF
TANDEM: TANDEM

wgs_metrics.read_length

label:: Average read length
type:: basic:integer
default:: 150

wgs_metrics.min_map_quality

label:: Minimum mapping quality for a read to contribute coverage
type:: basic:integer
default:: 20

wgs_metrics.min_quality

label:: Minimum base quality for a base to contribute coverage
type:: basic:integer
description:: N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
default:: 20

wgs_metrics.coverage_cap

label:: Maximum coverage cap
type:: basic:integer
description:: Treat positions with coverage exceeding this value as if they had coverage at this set value.
default:: 250

wgs_metrics.accumulation_cap

label:: Ignore positions with coverage above this value
type:: basic:integer
description:: At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value
default:: 100000

wgs_metrics.sample_size

label:: Sample Size used for Theoretical Het Sensitivity sampling
type:: basic:integer
default:: 10000

rrbs_metrics.min_quality

label:: Threshold for base quality of a C base before it is considered
type:: basic:integer
default:: 20

rrbs_metrics.next_base_quality

label:: Threshold for quality of a base next to a C before the C base is considered
type:: basic:integer
default:: 10

rrbs_metrics.min_lenght

label:: Minimum read length
type:: basic:integer
default:: 5

rrbs_metrics.mismatch_rate

label:: Maximum fraction of mismatches in a read to be considered (Between 0 and 1)
type:: basic:decimal
default:: 0.1

insert.minimum_fraction

label:: Minimum fraction of reads in a category to be considered
type:: basic:decimal
description:: When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
default:: 0.05

insert.include_duplicates

label:: Include reads marked as duplicates in the insert size histogram
type:: basic:boolean
default:: False

insert.deviations

label:: Deviations limit
type:: basic:decimal
description:: Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
default:: 10.0

WGBS (single-end)

data:workflow:wgbsworkflow-wgbs-single (data:reads:fastq:single reads, data:index:walt walt_index, data:seq:nucleotide ref_seq, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:boolean rm_dup, basic:integer optical_distance, basic:integer mismatch, basic:integer number, basic:string spikein_name, basic:boolean filter_spikein, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich, basic:boolean cpgs, basic:boolean symmetric_cpgs, data:seq:nucleotide adapters, basic:integer insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:integer min_quality, basic:integer next_base_quality, basic:integer min_lenght, basic:decimal mismatch_rate)[Source: v2.2.0]

This WGBS pipeline is comprised of trimming, alignment, computation of methylation levels, identification of hypo-methylated regions (HMRs) and additional QC steps. First, reads are trimmed to remove adapters or kit specific artifacts. Reads are then aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package. QC steps are based on [Picard](http://broadinstitute.github.io/picard/) and include high level metrics about the alignment, WGS performance and summary statistics from bisulfite sequencing. Final QC reports are summarized by MultiQC.

reads

label:: Select sample(s)
type:: data:reads:fastq:single

walt_index

label:: Walt index
type:: data:index:walt

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

adapter_trimming.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform adapter trimming.
required:: False

adapter_trimming.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
required:: False
disabled:: !adapter_trimming.adapters

adapter_trimming.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.
required:: False
disabled:: !adapter_trimming.adapters

trimming_filtering.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning, if below a threshold quality.
required:: False

trimming_filtering.trailing

label:: Trailing quality
type:: basic:integer
description:: Remove low quality bases from the end, if below a threshold quality.
required:: False

trimming_filtering.crop

label:: Crop
type:: basic:integer
description:: Cut the read to a specified length by removing bases from the end.
required:: False

trimming_filtering.headcrop

label:: Headcrop
type:: basic:integer
description:: Cut the specified number of bases from the start of the read.
required:: False

trimming_filtering.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False

alignment.rm_dup

label:: Remove duplicates
type:: basic:boolean
default:: True

alignment.optical_distance

label:: Optical duplicate distance
type:: basic:integer
description:: The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.
disabled:: !alignment.rm_dup
default:: 0

alignment.mismatch

label:: Maximum allowed mismatches
type:: basic:integer
default:: 6

alignment.number

label:: Number of reads to map in one loop
type:: basic:integer
description:: Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
required:: False

alignment.spikein_name

label:: Chromosome name of unmethylated control sequence
type:: basic:string
description:: Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.
required:: False

alignment.filter_spikein

label:: Remove control/spike-in sequences.
type:: basic:boolean
description:: Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).
disabled:: !alignment.spikein_name
default:: False

bsrate.skip

label:: Skip Bisulfite conversion rate step
type:: basic:boolean
description:: Bisulfite conversion rate step can be skipped. If separate alignment file for unmethylated control sequence is not produced during the alignment this process will fail.
disabled:: !alignment.spikein_name
default:: True

bsrate.sequence

label:: Unmethylated control sequence
type:: data:seq:nucleotide
required:: False
disabled:: bsrate.skip

bsrate.count_all

label:: Count all cytosines including CpGs
type:: basic:boolean
disabled:: bsrate.skip
default:: True

bsrate.read_length

label:: Average read length
type:: basic:integer
default:: 150

bsrate.max_mismatch

label:: Maximum fraction of mismatches
type:: basic:decimal
required:: False
disabled:: bsrate.skip

bsrate.a_rich

label:: Reads are A-rich
type:: basic:boolean
disabled:: bsrate.skip
default:: False

methcounts.cpgs

label:: Only CpG context sites
type:: basic:boolean
description:: Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
disabled:: methcounts.symmetric_cpgs
default:: False

methcounts.symmetric_cpgs

label:: Merge CpG pairs
type:: basic:boolean
description:: Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
disabled:: methcounts.cpgs
default:: True

summary.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
required:: False

summary.insert_size

label:: Maximum insert size
type:: basic:integer
default:: 100000

summary.pair_orientation

label:

Pair orientation

type:

basic:string

default:

null

choices:

Unspecified: null
FR: FR
RF: RF
TANDEM: TANDEM

wgs_metrics.read_length

label:: Average read length
type:: basic:integer
default:: 150

wgs_metrics.min_map_quality

label:: Minimum mapping quality for a read to contribute coverage
type:: basic:integer
default:: 20

wgs_metrics.min_quality

label:: Minimum base quality for a base to contribute coverage
type:: basic:integer
description:: N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
default:: 20

wgs_metrics.coverage_cap

label:: Maximum coverage cap
type:: basic:integer
description:: Treat positions with coverage exceeding this value as if they had coverage at this set value.
default:: 250

wgs_metrics.accumulation_cap

label:: Ignore positions with coverage above this value
type:: basic:integer
description:: At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value
default:: 100000

wgs_metrics.sample_size

label:: Sample Size used for Theoretical Het Sensitivity sampling
type:: basic:integer
default:: 10000

rrbs_metrics.min_quality

label:: Threshold for base quality of a C base before it is considered
type:: basic:integer
default:: 20

rrbs_metrics.next_base_quality

label:: Threshold for quality of a base next to a C before the C base is considered
type:: basic:integer
default:: 10

rrbs_metrics.min_lenght

label:: Minimum read length
type:: basic:integer
default:: 5

rrbs_metrics.mismatch_rate

label:: Maximum fraction of mismatches in a read to be considered (Between 0 and 1)
type:: basic:decimal
default:: 0.1

WGS (paired-end) analysis

data:workflow:wgsworkflow-wgs-paired (data:reads:fastq:paired reads, data:index:bwa bwa_index, data:seq:nucleotide ref_seq, list:data:variants:vcf known_sites, data:variants:vcf hc_dbsnp, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer mismatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:integer report_tr, basic:boolean skip, basic:boolean remove_duplicates, basic:string assume_sort_order, basic:string read_group, data:seq:nucleotide adapters, basic:integer max_insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations, basic:integer stand_call_conf, basic:integer mbq)[Source: v2.1.0]

Whole genome sequencing pipeline analyses paired-end whole genome sequencing data. It consists of trimming, aligning, marking of duplicates, Picard metrics, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Marking of duplicates (MarkDuplicates), Picard metrics (AlignmentSummaryMetrics, CollectWgsMetrics and InsertSizeMetrics), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. Result is a file of called variants (VCF).

reads

label:: Raw untrimmed reads
type:: data:reads:fastq:paired
description:: Raw paired-end reads.

bwa_index

label:: Genome index (BWA)
type:: data:index:bwa
description:: BWA genome index.

ref_seq

label:: Reference genome sequence
type:: data:seq:nucleotide

known_sites

label:: Known sites of variation used in BQSR
type:: list:data:variants:vcf
description:: Known sites of variation as a VCF file.

hc_dbsnp

label:: dbSNP for GATK4’s HaplotypeCaller
type:: data:variants:vcf
description:: dbSNP database of variants for variant calling.

validation_stringency

label:

Validation stringency

type:

basic:string

description:

Validation stringency for all BAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.

default:

STRICT

choices:

STRICT: STRICT
LENIENT: LENIENT
SILENT: SILENT

advanced.trimming.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform adapter trimming. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
required:: False

advanced.trimming.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
required:: False
disabled:: !advanced.trimming.adapters

advanced.trimming.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.
required:: False
disabled:: !advanced.trimming.adapters

advanced.trimming.min_adapter_length

label:: Minimum adapter length
type:: basic:integer
description:: In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.
disabled:: !advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold
default:: 8

advanced.trimming.palindrome_clip_threshold

label:: Palindrome clip threshold
type:: basic:integer
description:: Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.
required:: False
disabled:: !advanced.trimming.adapters

advanced.trimming.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning, if below a threshold quality.
required:: False

advanced.trimming.trailing

label:: Trailing quality
type:: basic:integer
description:: Remove low quality bases from the end, if below a threshold quality.
required:: False

advanced.trimming.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False

advanced.align.seed_l

label:: Minimum seed length
type:: basic:integer
description:: Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
default:: 19

advanced.align.band_w

label:: Band width
type:: basic:integer
description:: Gaps longer than this will not be found.
default:: 100

advanced.align.re_seeding

label:: Re-seeding factor
type:: basic:decimal
description:: Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:: 1.5

advanced.align.m

label:: Mark shorter split hits as secondary
type:: basic:boolean
description:: Mark shorter split hits as secondary (for Picard compatibility)
default:: False

advanced.align.scoring.match

label:: Score of a match
type:: basic:integer
default:: 1

advanced.align.scoring.mismatch

label:: Mismatch penalty
type:: basic:integer
default:: 4

advanced.align.scoring.gap_o

label:: Gap open penalty
type:: basic:integer
default:: 6

advanced.align.scoring.gap_e

label:: Gap extension penalty
type:: basic:integer
default:: 1

advanced.align.scoring.clipping

label:: Clipping penalty
type:: basic:integer
description:: Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:: 5

advanced.align.scoring.unpaired_p

label:: Penalty for an unpaired read pair
type:: basic:integer
description:: Affinity to force pair. Score: scoreRead1+ scoreRead2-Penalty
default:: 9

advanced.align.report_tr

label:: Report threshold score
type:: basic:integer
description:: Don’t output alignment with score lower than defined number. This option only affects output.
default:: 30

advanced.markduplicates.skip

label:: Skip GATK’s MarkDuplicates step
type:: basic:boolean
default:: False

advanced.markduplicates.remove_duplicates

label:: Remove found duplicates
type:: basic:boolean
default:: False

advanced.markduplicates.assume_sort_order

label:

Assume sort oder

type:

basic:string

default:

choices:

as in BAM header (default):
unsorted: unsorted
queryname: queryname
coordinate: coordinate
duplicate: duplicate
unknown: unknown

advanced.bqsr.read_group

label:: Read group (@RG)
type:: basic:string
description:: This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields.
default:: -LB=NA;-PL=NA;-PU=NA;-SM=sample

advanced.summary.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
required:: False

advanced.summary.max_insert_size

label:: Maximum insert size
type:: basic:integer
default:: 100000

advanced.summary.pair_orientation

label:

Pair orientation

type:

basic:string

default:

null

choices:

Unspecified: null
FR: FR
RF: RF
TANDEM: TANDEM

advanced.wgs_metrics.read_length

label:: Average read length
type:: basic:integer
default:: 150

advanced.wgs_metrics.min_map_quality

label:: Minimum mapping quality for a read to contribute coverage
type:: basic:integer
default:: 20

advanced.wgs_metrics.min_quality

label:: Minimum base quality for a base to contribute coverage
type:: basic:integer
description:: N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
default:: 20

advanced.wgs_metrics.coverage_cap

label:: Maximum coverage cap
type:: basic:integer
description:: Treat positions with coverage exceeding this value as if they had coverage at this set value.
default:: 250

advanced.wgs_metrics.accumulation_cap

label:: Ignore positions with coverage above this value
type:: basic:integer
description:: At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value.
default:: 100000

advanced.wgs_metrics.sample_size

label:: Sample Size used for Theoretical Het Sensitivity sampling
type:: basic:integer
default:: 10000

advanced.insert_size.minimum_fraction

label:: Minimum fraction of reads in a category to be considered
type:: basic:decimal
description:: When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
default:: 0.05

advanced.insert_size.include_duplicates

label:: Include reads marked as duplicates in the insert size histogram
type:: basic:boolean
default:: False

advanced.insert_size.deviations

label:: Deviations limit
type:: basic:decimal
description:: Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS * MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
default:: 10.0

advanced.hc.stand_call_conf

label:: Min call confidence threshold
type:: basic:integer
description:: The minimum phred-scaled confidence threshold at which variants should be called.
default:: 20

advanced.hc.mbq

label:: Min Base Quality
type:: basic:integer
description:: Minimum base quality required to consider a base for calling.
default:: 20

WGS analysis (GVCF)

data:workflow:wgs:gvcf:workflow-wgs-gvcf (data:reads:fastq:paired reads, data:alignment:bam aligned_reads, data:seq:nucleotide ref_seq, data:index:bwamem2 bwa_index, list:data:variants:vcf known_sites, basic:boolean enable_trimming, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, data:bed intervals, basic:integer contamination, data:seq:nucleotide adapters, basic:integer max_insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations)[Source: v2.3.0]

Whole genome sequencing pipeline (GATK GVCF). The pipeline follows GATK best practices recommendations and prepares single-sample paired-end sequencing data for a joint-genotyping step. The pipeline steps include read trimming (Trimmomatic), read alignment (BWA-MEM2), marking of duplicates (Picard MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (GATK HaplotypeCaller in GVCF mode). The QC reports (FASTQC report, Picard AlignmentSummaryMetrics, CollectWgsMetrics and InsertSizeMetrics) are summarized using MultiQC.

reads

label:: Input sample (FASTQ)
type:: data:reads:fastq:paired
description:: Input data in FASTQ format. This input type allows for optional read trimming procedure and is mutually exclusive with the BAM input file type.
required:: False
disabled:: aligned_reads
hidden:: False

aligned_reads

label:: Input sample (BAM)
type:: data:alignment:bam
description:: Input data in BAM format. This input file type is mutually exclusive with the FASTQ input file type and does not allow for read trimming procedure.
required:: False
disabled:: reads
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

bwa_index

label:: BWA genome index
type:: data:index:bwamem2
required:: True
disabled:: False
hidden:: False

known_sites

label:: Known sites of variation (VCF)
type:: list:data:variants:vcf
required:: True
disabled:: False
hidden:: False

trimming_options.enable_trimming

label:: Trim and quality filter input data
type:: basic:boolean
description:: Enable or disable adapter trimming and QC filtering procedure.
required:: True
disabled:: False
hidden:: False
default:: False

trimming_options.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequences in FASTA format that will be removed from the reads.
required:: False
disabled:: !trimming_options.enable_trimming
hidden:: False

trimming_options.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
required:: False
disabled:: !trimming_options.adapters
hidden:: False

trimming_options.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter sequence must be against a read. This field is required to perform adapter trimming.
required:: False
disabled:: !trimming_options.adapters
hidden:: False

trimming_options.min_adapter_length

label:: Minimum adapter length
type:: basic:integer
description:: In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.
required:: True
disabled:: !trimming_options.seed_mismatches && !trimming_options.simple_clip_threshold && !trimming_options.palindrome_clip_threshold
hidden:: False
default:: 8

trimming_options.palindrome_clip_threshold

label:: Palindrome clip threshold
type:: basic:integer
description:: Specifies how accurate the match between the two adapter ligated reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.
required:: False
disabled:: !trimming_options.adapters
hidden:: False

trimming_options.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning, if below a threshold quality.
required:: False
disabled:: !trimming_options.enable_trimming
hidden:: False

trimming_options.trailing

label:: Trailing quality
type:: basic:integer
description:: Remove low quality bases from the end, if below a threshold quality.
required:: False
disabled:: !trimming_options.enable_trimming
hidden:: False

trimming_options.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False
disabled:: !trimming_options.enable_trimming
hidden:: False

gatk_options.intervals

label:: Intervals BED file
type:: data:bed
description:: Use intervals BED file to limit the analysis to the specified parts of the genome.
required:: False
disabled:: False
hidden:: False

gatk_options.contamination

label:: Contamination fraction
type:: basic:integer
description:: Fraction of contamination in sequencing data (for all samples) to aggressively remove.
required:: True
disabled:: False
hidden:: False
default:: 0

alignment_summary.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
required:: False
disabled:: False
hidden:: False

alignment_summary.max_insert_size

label:: Maximum insert size
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 100000

alignment_summary.pair_orientation

label:

Pair orientation

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

null

choices:

Unspecified: null
FR: FR
RF: RF
TANDEM: TANDEM

wgs_metrics.read_length

label:: Average read length
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 150

wgs_metrics.min_map_quality

label:: Minimum mapping quality for a read to contribute coverage
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 20

wgs_metrics.min_quality

label:: Minimum base quality for a base to contribute coverage
type:: basic:integer
description:: N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
required:: True
disabled:: False
hidden:: False
default:: 20

wgs_metrics.coverage_cap

label:: Maximum coverage cap
type:: basic:integer
description:: Treat positions with coverage exceeding this value as if they had coverage at this set value.
required:: True
disabled:: False
hidden:: False
default:: 250

wgs_metrics.accumulation_cap

label:: Ignore positions with coverage above this value
type:: basic:integer
description:: At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value.
required:: True
disabled:: False
hidden:: False
default:: 100000

wgs_metrics.sample_size

label:: Sample size used for Theoretical Het Sensitivity sampling
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 10000

insert_size.minimum_fraction

label:: Minimum fraction of reads in a category to be considered
type:: basic:decimal
description:: When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
required:: True
disabled:: False
hidden:: False
default:: 0.05

insert_size.include_duplicates

label:: Include reads marked as duplicates in the insert size histogram
type:: basic:boolean
required:: True
disabled:: False
hidden:: False
default:: False

insert_size.deviations

label:: Deviations limit
type:: basic:decimal
description:: Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS * MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
required:: True
disabled:: False
hidden:: False
default:: 10.0

WGS preprocess data with bwa-mem2

data:alignment:bam:wgsbwa2:wgs-preprocess-bwa2 (data:reads:fastq:paired reads, data:alignment:bam aligned_reads, data:seq:nucleotide ref_seq, data:index:bwamem2 bwa_index, list:data:variants:vcf known_sites, basic:integer pixel_distance, basic:integer n_jobs)[Source: v1.4.0]

Prepare analysis ready BAM file. This process follows GATK best practices procedure to prepare analysis-ready BAM file. The steps included are read alignment using BWA MEM2, marking of duplicates (Picard MarkDuplicates), BAM sorting, read-group assignment and base quality score recalibration (BQSR).

reads

label:: Input sample (FASTQ)
type:: data:reads:fastq:paired
required:: False
disabled:: False
hidden:: False

aligned_reads

label:: Input sample (BAM)
type:: data:alignment:bam
required:: False
disabled:: False
hidden:: False

ref_seq

label:: Reference sequence
type:: data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

bwa_index

label:: BWA-MEM2 genome index
type:: data:index:bwamem2
required:: True
disabled:: False
hidden:: False

known_sites

label:: Known sites of variation (VCF)
type:: list:data:variants:vcf
required:: True
disabled:: False
hidden:: False

advanced_options.pixel_distance

label:: –OPTICAL_DUPLICATE_PIXEL_DISTANCE
type:: basic:integer
description:: Set the optical pixel distance, e.g. distance between clusters. Modify this parameter to ensure compatibility with older Illumina platforms.
required:: True
disabled:: False
hidden:: False
default:: 2500

advanced_options.n_jobs

label:: Number of concurent jobs
type:: basic:integer
description:: Use a fixed number of jobs for quality score recalibration of determining it based on the number of available cores.
required:: False
disabled:: False
hidden:: False

bam

label:: Analysis ready BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: BAM file index
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

metrics_file

label:: Metrics from MarkDuplicate process
type:: basic:file
required:: True
disabled:: False
hidden:: False

Whole exome sequencing (WES) analysis

data:workflow:wesworkflow-wes (data:reads:fastq:paired reads, data:index:bwa bwa_index, data:seq:nucleotide ref_seq, list:data:variants:vcf known_sites, data:bed intervals, data:variants:vcf hc_dbsnp, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer seed_l, basic:integer band_w, basic:boolean m, basic:decimal re_seeding, basic:integer match, basic:integer mismatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:integer report_tr, data:bedpe bedpe, basic:boolean skip, basic:boolean md_skip, basic:boolean md_remove_duplicates, basic:string md_assume_sort_order, basic:string read_group, basic:integer stand_call_conf, basic:integer mbq)[Source: v3.1.0]

Whole exome sequencing pipeline analyzes Illumina panel data. It consists of trimming, aligning, soft clipping, (optional) marking of duplicates, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Soft clipping of Illumina primer sequences is done using bamclipper tool. Marking of duplicates (MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. To successfully run this pipeline, you will need a genome (FASTA), paired-end (FASTQ) files, BEDPE file for bamclipper, known sites of variation (dbSNP) (VCF), dbSNP database of variations (can be the same as known sites of variation), intervals on which target capture was done (BED) and illumina adapter sequences (FASTA). Make sure that specified resources match the genome used in the alignment step. Result is a file of called variants (VCF).

reads

label:: Raw untrimmed reads
type:: data:reads:fastq:paired
description:: Raw paired-end reads.

bwa_index

label:: BWA genome index
type:: data:index:bwa
description:: Genome index used for the BWA alignment step.

ref_seq

label:: Genome FASTA
type:: data:seq:nucleotide
description:: The selection of Genome FASTA should match the BWA index species and genome build type.

known_sites

label:: Known sites of variation used in BQSR
type:: list:data:variants:vcf
description:: Known sites of variation as a VCF file.

intervals

label:: Intervals
type:: data:bed
description:: Use intervals to narrow the analysis to defined regions. This usually help cutting down on process time.

hc_dbsnp

label:: dbSNP for GATK4’s HaplotypeCaller
type:: data:variants:vcf
description:: dbSNP database of variants for variant calling.

validation_stringency

label:

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.

type:

basic:string

default:

STRICT

choices:

STRICT: STRICT
SILENT: SILENT
LENIENT: LENIENT

advanced.trimming.adapters

label:: Adapter sequences
type:: data:seq:nucleotide
description:: Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
required:: False

advanced.trimming.seed_mismatches

label:: Seed mismatches
type:: basic:integer
description:: Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
required:: False
disabled:: !advanced.trimming.adapters

advanced.trimming.simple_clip_threshold

label:: Simple clip threshold
type:: basic:integer
description:: Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.
required:: False
disabled:: !advanced.trimming.adapters

advanced.trimming.min_adapter_length

label:: Minimum adapter length
type:: basic:integer
description:: In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.
disabled:: !advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold
default:: 8

advanced.trimming.palindrome_clip_threshold

label:: Palindrome clip threshold
type:: basic:integer
description:: Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminaclipping.
required:: False
disabled:: !advanced.trimming.adapters

advanced.trimming.leading

label:: Leading quality
type:: basic:integer
description:: Remove low quality bases from the beginning, if below a threshold quality.
required:: False

advanced.trimming.trailing

label:: Trailing quality
type:: basic:integer
description:: Remove low quality bases from the end, if below a threshold quality.
required:: False

advanced.trimming.minlen

label:: Minimum length
type:: basic:integer
description:: Drop the read if it is below a specified length.
required:: False

advanced.align.seed_l

label:: Minimum seed length
type:: basic:integer
description:: Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.
default:: 19

advanced.align.band_w

label:: Band width
type:: basic:integer
description:: Gaps longer than this will not be found.
default:: 100

advanced.align.m

label:: Mark shorter split hits as secondary
type:: basic:boolean
description:: Mark shorter split hits as secondary (for Picard compatibility)
default:: False

advanced.align.re_seeding

label:: Re-seeding factor
type:: basic:decimal
description:: Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:: 1.5

advanced.align.scoring.match

label:: Score of a match
type:: basic:integer
default:: 1

advanced.align.scoring.mismatch

label:: Mismatch penalty
type:: basic:integer
default:: 4

advanced.align.scoring.gap_o

label:: Gap open penalty
type:: basic:integer
default:: 6

advanced.align.scoring.gap_e

label:: Gap extension penalty
type:: basic:integer
default:: 1

advanced.align.scoring.clipping

label:: Clipping penalty
type:: basic:integer
description:: Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:: 5

advanced.align.scoring.unpaired_p

label:: Penalty for an unpaired read pair
type:: basic:integer
description:: Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
default:: 9

advanced.align.report_tr

label:: Report threshold score
type:: basic:integer
description:: Don’t output alignment with score lower than defined number. This option only affects output.
default:: 30

advanced.bamclipper.bedpe

label:: BEDPE file used for clipping using Bamclipper
type:: data:bedpe
description:: BEDPE file used for clipping using Bamclipper tool.
required:: False

advanced.bamclipper.skip

label:: Skip Bamclipper step
type:: basic:boolean
description:: Use this option to skip Bamclipper step.
default:: False

advanced.markduplicates.md_skip

label:: Skip GATK’s MarkDuplicates step
type:: basic:boolean
default:: False

advanced.markduplicates.md_remove_duplicates

label:: Remove found duplicates
type:: basic:boolean
default:: False

advanced.markduplicates.md_assume_sort_order

label:

Assume sort oder

type:

basic:string

default:

choices:

as in BAM header (default):
unsorted: unsorted
queryname: queryname
coordinate: coordinate
duplicate: duplicate
unknown: unknown

advanced.bqsr.read_group

label:: Read group (@RG)
type:: basic:string
description:: If BAM file has not been prepared using a @RG tag, you can add it here. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation linked above.
required:: False

advanced.hc.stand_call_conf

label:: Min call confidence threshold
type:: basic:integer
description:: The minimum phred-scaled confidence threshold at which variants should be called.
default:: 20

advanced.hc.mbq

label:: Min Base Quality
type:: basic:integer
description:: Minimum base quality required to consider a base for calling.
default:: 20

Xengsort classify

data:xengsort:classification:xengsort-classify (data:reads:fastq reads, data:xengsort:index index, basic:string upload_reads, basic:boolean merge_both, basic:decimal chunksize)[Source: v1.0.0]

Classify xenograft reads with Xengsort. Xengsort is an alignment free method for sorting reads from xenograft experiments. It classifies sequencing reads into five categories based on their origin: host, graft, both, neither, and ambiguous. Categories “host” and “graft” are for reads that can be clearly assigned to one of the species. Category “both” is for reads that match equally well to both references. Category “neither” is for reads that contain many k-mers that cannot be found in the key-value store; these could point to technical problems (primer dimers) or contamination of the sample with other species. Finally, category “ambiguous” is for reads that provide conflicting information. Such reads should not usually be seen; they could result from PCR hybrids between host and graft during library preparation. Description of the method and evaluation on several datasets is provided in the [article](https://doi.org/10.1186/s13015-021-00181-w).

reads

label:: Reads
type:: data:reads:fastq
required:: True
disabled:: False
hidden:: False

index

label:: Xengsort genome index
type:: data:xengsort:index
required:: True
disabled:: False
hidden:: False

upload_reads

label:

Select reads to upload

type:

basic:string

description:

All read categories are returned in this process but only the ones selected are uploaded as separate FASTQ files. This should be used for categories of reads that will be used in further analyses.

required:

True

disabled:

False

hidden:

False

default:

none

choices:

none: none
all: all
graft: graft
graft, both: graft, both
graft, host: graft, host
graft, host, both: graft, host, both

merge_both

label:: Upload merged graft and both reads
type:: basic:boolean
description:: Merge graft reads with the reads that can originate from both genomes and upload it as graft reads. In any workflow, the latter reads, classified as both may pose problems, because one may not be able to decide on the species of origin due to ultra-conserved regions between species.
required:: True
disabled:: False
hidden:: upload_reads == ‘none’
default:: False

advanced.chunksize

label:: Chunk size in MB [–chunksize]
type:: basic:decimal
description:: Controll the memory usage by setting chunk size per thread.
required:: True
disabled:: False
hidden:: False
default:: 16.0

stats

label:: Xengsort classification statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

host1

label:: Host reads (mate 1)
type:: basic:file
required:: True
disabled:: False
hidden:: False

host2

label:: Host reads (mate 2)
type:: basic:file
required:: False
disabled:: False
hidden:: False

graft1

label:: Graft reads (mate 1)
type:: basic:file
required:: True
disabled:: False
hidden:: False

graft2

label:: Graft reads (mate 2)
type:: basic:file
required:: False
disabled:: False
hidden:: False

both1

label:: Both reads (mate 1)
type:: basic:file
required:: True
disabled:: False
hidden:: False

both2

label:: Both reads (mate 2)
type:: basic:file
required:: False
disabled:: False
hidden:: False

neither1

label:: Neither reads (mate 1)
type:: basic:file
required:: True
disabled:: False
hidden:: False

neither2

label:: Neither reads (mate 2)
type:: basic:file
required:: False
disabled:: False
hidden:: False

ambiguous1

label:: Ambiguous reads (mate 1)
type:: basic:file
required:: True
disabled:: False
hidden:: False

ambiguous2

label:: Ambiguous reads (mate 2)
type:: basic:file
required:: False
disabled:: False
hidden:: False

graft_species

label:: Graft species
type:: basic:string
required:: True
disabled:: False
hidden:: False

graft_build

label:: Graft build
type:: basic:string
required:: True
disabled:: False
hidden:: False

host_species

label:: Host species
type:: basic:string
required:: True
disabled:: False
hidden:: False

host_build

label:: Host build
type:: basic:string
required:: True
disabled:: False
hidden:: False

Xengsort index

data:xengsort:index:xengsort-index (list:data:seq:nucleotide graft_refs, list:data:seq:nucleotide host_refs, basic:integer n_kmer, basic:integer kmer_size, basic:boolean aligned_cache, basic:boolean fixed_hashing, basic:integer page_size, basic:decimal fill)[Source: v1.0.1]

Build an index for sorting xenograft reads with Xengsort. Xengsort is an alignment free method for sorting reads from xenograft experiments. Description of the method and evaluation on several datasets is provided in the [article](https://doi.org/10.1186/s13015-021-00181-w).

graft_refs

label:: Graft reference sequences (nucleotide FASTA)
type:: list:data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

host_refs

label:: Host reference sequences (nucleotide FASTA)
type:: list:data:seq:nucleotide
required:: True
disabled:: False
hidden:: False

n_kmer

label:: Number of distinct k-mers [–nobjects]
type:: basic:integer
description:: The number of k-mers that will be stored in the hash table. This depends on the used reference genomes and must be estimated beforehand. If the number of distinct k-mers is known beforehand it should be specified. For all 25-mers in the human and mouse genome and transcriptome, this number is roughly 4,500,000,000. If this is not set, the number is estimated with ntCard tool and increased by two percent to account for errors.
required:: False
disabled:: False
hidden:: False

advanced.kmer_size

label:: k-mer size [–kmersize]
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 25

advanced.aligned_cache

label:: Use power-of-two aligned pages [–aligned]
type:: basic:boolean
description:: Indicates whether each bucket should consume a number of bits that is a power of 2. Using –aligned ensures that each bucket stays within the same cache line, but may waste space (padding bits), yielding faster speed but larger space requirements. By default no bits are used for padding and buckets may cross cache line boundaries [–unaligned]. This is slightly slower, but may save a little or a lot of space.
required:: True
disabled:: False
hidden:: False
default:: False

advanced.fixed_hashing

label:: Use fixed hash function [–hashfunctions]
type:: basic:boolean
description:: Hash function used to store the key-value pairs is defined by –hashfunction parameter. With this option selected a fixed hash function (linear945:linear9123641:linear349341847) is used. When this is not selected a different random functions are chosen each time. It is recommended to have them chosen randomly unless you need strictly reproducible behavior.
required:: True
disabled:: False
hidden:: False
default:: True

advanced.page_size

label:: Number of elements stored in one bucket (or page) [–pagesize]
type:: basic:integer
required:: True
disabled:: False
hidden:: False
default:: 4

advanced.fill

label:: Fill rate of the hash table [–fill]
type:: basic:decimal
description:: This determines the desired fill rate or load factor of the hash table. It should be set between 0.0 and 1.0. It is beneficial to leave part of the hash table empty for faster lookups. Together with the number of distinct k-mers [–nobjects], the number of slots in the table is calculated as ceil(nobjects/fill).
required:: True
disabled:: False
hidden:: False
default:: 0.88

index

label:: Xengsort index
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Xengsort statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

graft_species

label:: Graft species
type:: basic:string
required:: True
disabled:: False
hidden:: False

graft_build

label:: Graft build
type:: basic:string
required:: True
disabled:: False
hidden:: False

host_species

label:: Host species
type:: basic:string
required:: True
disabled:: False
hidden:: False

host_build

label:: Host build
type:: basic:string
required:: True
disabled:: False
hidden:: False

alignmentSieve

data:alignment:bam:sieve:alignmentsieve (data:alignment:bam alignment, basic:integer min_fragment_length, basic:integer max_fragment_length)[Source: v1.5.3]

Filter alignments of BAM files according to specified parameters. Program is bundled with deeptools. See [documentation]( https://deeptools.readthedocs.io/en/develop/content/tools/alignmentSieve.html) for more details.

alignment

label:: Alignment BAM file
type:: data:alignment:bam
required:: True
disabled:: False
hidden:: False

min_fragment_length

label:: –minFragmentLength
type:: basic:integer
description:: The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)
required:: True
disabled:: False
hidden:: False
default:: 0

max_fragment_length

label:: –maxFragmentLength
type:: basic:integer
description:: The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. (Default: 0)
required:: True
disabled:: False
hidden:: False
default:: 0

bam

label:: Sieved BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

bai

label:: Index of sieved BAM file
type:: basic:file
required:: True
disabled:: False
hidden:: False

stats

label:: Alignment statistics
type:: basic:file
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

edgeR

data:differentialexpression:edger:differentialexpression-edger (list:data:expression case, list:data:expression control, basic:integer count_filter, basic:boolean create_sets, basic:decimal logfc, basic:decimal fdr)[Source: v1.7.0]

Run EdgeR analysis. Empirical Analysis of Digital Gene Expression Data in R (edgeR). Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. See [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) for more information.

case

label:: Case
type:: list:data:expression
description:: Case samples (replicates)
required:: True
disabled:: False
hidden:: False

control

label:: Control
type:: list:data:expression
description:: Control samples (replicates)
required:: True
disabled:: False
hidden:: False

count_filter

label:: Raw counts filtering threshold
type:: basic:integer
description:: Filter genes in the expression matrix input. Remove genes where the number of counts in all samples is below the threshold.
required:: True
disabled:: False
hidden:: False
default:: 10

create_sets

label:: Create gene sets
type:: basic:boolean
description:: After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
required:: True
disabled:: False
hidden:: False
default:: False

logfc

label:: Log2 fold change threshold for gene sets
type:: basic:decimal
description:: Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
required:: True
disabled:: False
hidden:: !create_sets
default:: 1.0

fdr

label:: FDR threshold for gene sets
type:: basic:decimal
required:: True
disabled:: False
hidden:: !create_sets
default:: 0.05

raw

label:: Differential expression
type:: basic:file
required:: True
disabled:: False
hidden:: False

de_json

label:: Results table (JSON)
type:: basic:json
required:: True
disabled:: False
hidden:: False

de_file

label:: Results table (file)
type:: basic:file
required:: True
disabled:: False
hidden:: False

source

label:: Gene ID database
type:: basic:string
required:: True
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

feature_type

label:: Feature type
type:: basic:string
required:: True
disabled:: False
hidden:: False

methcounts

data:wgbs:methcountsmethcounts (data:seq:nucleotide genome, data:alignment:bam:walt alignment, basic:boolean cpgs, basic:boolean symmetric_cpgs)[Source: v3.3.0]

The methcounts program takes the mapped reads and produces the methylation level at each genomic cytosine, with the option to produce only levels for CpG-context cytosines.

genome

label:: Reference genome
type:: data:seq:nucleotide

alignment

label:: Mapped reads
type:: data:alignment:bam:walt
description:: WGBS alignment file in Mapped Read (.mr) format.

cpgs

label:: Only CpG context sites
type:: basic:boolean
description:: Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
disabled:: symmetric_cpgs
default:: False

symmetric_cpgs

label:: Merge CpG pairs
type:: basic:boolean
description:: Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
disabled:: cpgs
default:: True

meth

label:: Methylation levels
type:: basic:file

stats

label:: Statistics
type:: basic:file

bigwig

label:: Methylation levels BigWig file
type:: basic:file

species

label:: Species
type:: basic:string

build

label:: Build
type:: basic:string

miRNA pipeline

data:workflow:mirnaworkflow-mirna (data:reads:fastq:single reads, data:seq:nucleotide up_primers_file, data:seq:nucleotide down_primers_file, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:integer min_overlap, basic:boolean show_advanced, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer maxlen, basic:integer max_n, basic:boolean match_read_wildcards, basic:boolean no_indels, basic:decimal error_rate, data:index:bowtie2 genome, basic:boolean show_alignment_options, basic:string mode, basic:string speed, basic:integer N, basic:integer L, basic:string rep_mode, basic:integer k_reports, data:annotation annotation, basic:string id_attribute, basic:string feature_class, basic:string normalization_type, basic:boolean allow_multi_overlap, basic:boolean count_multi_mapping_reads, basic:string assay_type)[Source: v3.1.0]

preprocessing.reads

label:: Input miRNA reads.
type:: data:reads:fastq:single

preprocessing.adapters.up_primers_file

label:: 5 prime adapter file
type:: data:seq:nucleotide
required:: False

preprocessing.adapters.down_primers_file

label:: 3 prime adapter file
type:: data:seq:nucleotide
required:: False

preprocessing.adapters.up_primers_seq

label:: 5 prime adapter sequence
type:: list:basic:string
required:: False

preprocessing.adapters.down_primers_seq

label:: 3 prime adapter sequence
type:: list:basic:string
required:: False

preprocessing.adapters.min_overlap

label:: Minimal overlap
type:: basic:integer
description:: Minimum overlap for an adapter match. Default 5.
default:: 5

preprocessing.show_advanced

label:: Show advanced preprocessing parameters
type:: basic:boolean
default:: False

preprocessing.trimming.leading

label:: Quality on 5 prime
type:: basic:integer
description:: Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. Default: 20.
hidden:: !preprocessing.show_advanced
default:: 28

preprocessing.trimming.trailing

label:: Quality on 3 prime
type:: basic:integer
description:: Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. Default: 20.
hidden:: !preprocessing.show_advanced
default:: 28

preprocessing.filtering.minlen

label:: Min length
type:: basic:integer
description:: Drop the read if it is below a specified length. Default: 15.
hidden:: !preprocessing.show_advanced
default:: 15

preprocessing.filtering.maxlen

label:: Max length
type:: basic:integer
description:: Drop the read if it is above a specified length. Default: 35.
hidden:: !preprocessing.show_advanced
default:: 35

preprocessing.filtering.max_n

label:: Max numebr of N-s
type:: basic:integer
description:: Discard reads having more ‘N’ bases than specified. Default: 1.
hidden:: !preprocessing.show_advanced
default:: 1

preprocessing.filtering.match_read_wildcards

label:: Match read wildcards
type:: basic:boolean
description:: Interpret IUPAC wildcards in reads.
hidden:: !preprocessing.show_advanced
default:: True

preprocessing.filtering.no_indels

label:: No indels
type:: basic:boolean
description:: Disable (disallow) insertions and deletions in adapters.
hidden:: !preprocessing.show_advanced
default:: True

preprocessing.filtering.error_rate

label:: Error rate
type:: basic:decimal
description:: Maximum allowed error rate (no. of errors divided by the length of the matching region). Default: 0.2.
hidden:: !preprocessing.show_advanced
default:: 0.2

alignment.genome

label:: Genome reference
type:: data:index:bowtie2
description:: Choose the genome reference against which to align reads.

alignment.show_alignment_options

label:: Show alignment options
type:: basic:boolean
default:: False

alignment.alignment_options.mode

label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score. Default: –local (with sensitivity set to ‘–very-sensitive’ for both options).

hidden:

!alignment.show_alignment_options

default:

--local

choices:

local: --local
end to end mode: --end-to-end

alignment.alignment_options.speed

label:: Sensitivity
type:: basic:string
description:: A quick parameter presetting for aligning accurately. This option is a shortcut for parameters as follows: For both alignment modes: –very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
hidden:: !alignment.show_alignment_options
default:: --very-sensitive

alignment.alignment_options.N

label:: Number of mismatches allowed in seed alignment (N)
type:: basic:integer
description:: Sets the number of mismatches allowed in seed. Can be set to 0 or 1. Default: 0
hidden:: !alignment.show_alignment_options
default:: 0

alignment.alignment_options.L

label:: Length of seed substrings (L)
type:: basic:integer
description:: Sets the length of the seed substrings to align during multiseed alignment. The –very-sensitive preset sets -L to 20 in –end-to-end and in –local mode. For miRNA, a shorter seed length is recommended. Default: -L 8
hidden:: !alignment.show_alignment_options
default:: 8

alignment.alignment_options.rep_mode

label:

Report mode

type:

basic:string

description:

Tool default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments. Default: -k

hidden:

!alignment.show_alignment_options

default:

k

choices:

Tool default mode: def
-k mode: k
-a mode (very slow): a

alignment.alignment_options.k_reports

label:: Number of reports (for -k mode only)
type:: basic:integer
description:: Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. Default: 5
hidden:: !alignment.show_alignment_options
default:: 5

quant_options.annotation

label:: Annotation (GTF/GFF3)
type:: data:annotation

quant_options.id_attribute

label:

ID attribute

type:

basic:string

description:

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats. miRNA name refers to the miRBase GFF3 ‘Name’ filed and is the default option.

default:

Name

choices:

miRNA name: Name
gene_id: gene_id
transcript_id: transcript_id
ID: ID
geneid: geneid

quant_options.feature_class

label:: Feature class
type:: basic:string
description:: Feature class (3rd column in GFF file) to be used, all features of other types are ignored. Default: miRNA.
default:: miRNA

quant_options.normalization_type

label:: Normalization type
type:: basic:string
description:: The default expression normalization type.
default:: CPM

quant_options.allow_multi_overlap

label:: Count multi-overlapping reads
type:: basic:boolean
description:: Assign reads to all their overlapping features or meta-features.
default:: True

quant_options.count_multi_mapping_reads

label:: Count multi-mapping reads
type:: basic:boolean
description:: For a multi-mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM input is used to detect multi-mapping reads.
default:: True

assay_type

label:

Assay type

type:

basic:string

description:

Indicate if strand-specific read counting should be performed. In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay, the read has to be mapped to the same strand as the feature. In strand-specific reverse assay these rules are reversed.

choices:

Strand non-specific: non_specific
Strand-specific forward: forward
Strand-specific reverse: reverse

shRNA quantification

data:workflow:trimalquantworkflow-trim-align-quant (data:reads:fastq:single reads, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:decimal error_rate_5end, basic:decimal error_rate_3end, data:index:bowtie2 genome, basic:string mode, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer readlengths, basic:integer alignscores)[Source: v1.1.0]

reads

label:: Untrimmed reads.
type:: data:reads:fastq:single
description:: First stage of shRNA pipeline. Trims 5’ adapters, then 3’ adapters using the same error rate setting, aligns reads to a reference library and quantifies species.

trimming_options.up_primers_seq

label:: 5’ adapter sequence
type:: list:basic:string
description:: A string of 5’ adapter sequence.
required:: True

trimming_options.down_primers_seq

label:: 3’ adapter sequence
type:: list:basic:string
description:: A string of 3’ adapter sequence.
required:: True

trimming_options.error_rate_5end

label:: Error rate for 5’
type:: basic:decimal
description:: Maximum allowed error rate (no. of errors divided by the length of the matching region) for 5’ trimming.
required:: False
default:: 0.1

trimming_options.error_rate_3end

label:: Error rate for 3’
type:: basic:decimal
description:: Maximum allowed error rate (no. of errors divided by the length of the matching region) for 3’ trimming.
required:: False
default:: 0.1

alignment_options.genome

label:: Reference library
type:: data:index:bowtie2
description:: Choose the reference library against which to align reads.

alignment_options.mode

label:

Alignment mode

type:

basic:string

description:

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default:

--end-to-end

choices:

end to end mode: --end-to-end
local: --local

alignment_options.N

label:: Number of mismatches allowed in seed alignment (N)
type:: basic:integer
description:: Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
required:: False

alignment_options.L

label:: Length of seed substrings (L)
type:: basic:integer
description:: Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
required:: False

alignment_options.gbar

label:: Disallow gaps within positions (gbar)
type:: basic:integer
description:: Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
required:: False

alignment_options.mp

label:: Maximal and minimal mismatch penalty (mp)
type:: basic:string
description:: Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
required:: False

alignment_options.rdg

label:: Set read gap open and extend penalties (rdg)
type:: basic:string
description:: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:: False

alignment_options.rfg

label:: Set reference gap open and close penalties (rfg)
type:: basic:string
description:: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:: False

alignment_options.score_min

label:: Minimum alignment score needed for “valid” alignment (score-min)
type:: basic:string
description:: Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
required:: False

quant_options.readlengths

label:: Species lengths threshold
type:: basic:integer
description:: Species with read lengths below specified threshold will be removed from final output. Default is no removal.

quant_options.alignscores

label:: Align scores filter threshold
type:: basic:integer
description:: Species with align score below specified threshold will be removed from final output. Default is no removal.

snpEff (General variant annotation) (multi-sample)

data:variants:vcf:snpeff:snpeff (data:variants:vcf variants, basic:string database, data:variants:vcf dbsnp, basic:string filtering_options, list:data:geneset sets, list:basic:string extract_fields, basic:boolean one_per_line)[Source: v1.1.1]

Annotate variants with SnpEff. SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes). This process also allows filtering of variants with ``SnpSift filter`` command and extracting specific fields from the VCF file with ``SnpSift extractFields`` command. This tool works with multi-sample VCF file as an input.

variants

label:: Variants (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

database

label:

snpEff database

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

GRCh38.99

choices:

GRCh37.75: GRCh37.75
GRCh38.99: GRCh38.99

dbsnp

label:: Known variants
type:: data:variants:vcf
description:: List of known variants for annotation.
required:: False
disabled:: False
hidden:: False

filtering_options

label:: Filtering expressions
type:: basic:string
description:: Filter VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)
required:: False
disabled:: False
hidden:: False

sets

label:: Files with list of genes
type:: list:data:geneset
description:: Use list of genes, if you only want variants reported for them. Each file must have one string per line.
required:: False
disabled:: False
hidden:: !filtering_options

extract_fields

label:: Fields to extract
type:: list:basic:string
description:: Write fields you want to extract from annonated vcf file and press Enter after each one. Example of fields: `CHROM POS REF ALT ‘ANN[*].GENE’`. For more information follow this [link](https://pcingola.github.io/SnpEff/ss_extractfields/).
required:: False
disabled:: False
hidden:: False

advanced.one_per_line

label:: One effect per line
type:: basic:boolean
description:: If there is more than one effect per variant, write them to seperate lines.
required:: True
disabled:: False
hidden:: False
default:: False

vcf

label:: Annotated variants (VCF)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Index of annotated variants
type:: basic:file
required:: True
disabled:: False
hidden:: False

vcf_extracted

label:: Extracted annotated variants (VCF)
type:: basic:file
required:: False
disabled:: False
hidden:: False

tbi_extracted

label:: Index of extracted variants
type:: basic:file
required:: False
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

genes

label:: SnpEff genes
type:: basic:file
required:: True
disabled:: False
hidden:: False

summary

label:: Summary
type:: basic:file:html
required:: True
disabled:: False
hidden:: False

snpEff (General variant annotation) (single-sample)

data:variants:vcf:snpeff:single:snpeff-single (data:variants:vcf variants, basic:string database, data:variants:vcf dbsnp, basic:string filtering_options, list:data:geneset sets, list:basic:string extract_fields, basic:boolean one_per_line)[Source: v1.0.1]

Annotate variants with SnpEff. SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes). This process also allows filtering of variants with ``SnpSift filter`` command and extracting specific fields from the VCF file with ``SnpSift extractFields`` command. This tool works with single-sample VCF file as an input.

variants

label:: Variants (VCF)
type:: data:variants:vcf
required:: True
disabled:: False
hidden:: False

database

label:

snpEff database

type:

basic:string

required:

True

disabled:

False

hidden:

False

default:

GRCh38.99

choices:

GRCh37.75: GRCh37.75
GRCh38.99: GRCh38.99

dbsnp

label:: Known variants
type:: data:variants:vcf
description:: List of known variants for annotation.
required:: False
disabled:: False
hidden:: False

filtering_options

label:: Filtering expressions
type:: basic:string
description:: Filter VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)
required:: False
disabled:: False
hidden:: False

sets

label:: Files with list of genes
type:: list:data:geneset
description:: Use list of genes, if you only want variants reported for them. Each file must have one string per line.
required:: False
disabled:: False
hidden:: !filtering_options

extract_fields

label:: Fields to extract
type:: list:basic:string
description:: Write fields you want to extract from annonated vcf file and press Enter after each one. Example of fields: `CHROM POS REF ALT ‘ANN[*].GENE’`. For more information follow this [link](https://pcingola.github.io/SnpEff/ss_extractfields/).
required:: False
disabled:: False
hidden:: False

advanced.one_per_line

label:: One effect per line
type:: basic:boolean
description:: If there is more than one effect per variant, write them to seperate lines.
required:: True
disabled:: False
hidden:: False
default:: False

vcf

label:: Annotated variants (VCF)
type:: basic:file
required:: True
disabled:: False
hidden:: False

tbi

label:: Index of annotated variants
type:: basic:file
required:: True
disabled:: False
hidden:: False

vcf_extracted

label:: Extracted annotated variants (VCF)
type:: basic:file
required:: False
disabled:: False
hidden:: False

tbi_extracted

label:: Index of extracted variants
type:: basic:file
required:: False
disabled:: False
hidden:: False

species

label:: Species
type:: basic:string
required:: True
disabled:: False
hidden:: False

build

label:: Build
type:: basic:string
required:: True
disabled:: False
hidden:: False

genes

label:: SnpEff genes
type:: basic:file
required:: True
disabled:: False
hidden:: False

summary

label:: Summary
type:: basic:file:html
required:: True
disabled:: False
hidden:: False