Process definitions
ATAC-Seq
- data:workflow:atacseqworkflow-atac-seq (data:reads:fastq reads, data:index:bowtie2 genome, data:bed promoter, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:boolean tagalign, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v3.1.1]
This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment, pre-peakcall QC, and calling peaks (with post-peakcall QC). First, reads are aligned to a genome using [Bowtie2](http://bowtie-bio.sourceforge.net/index.shtml) aligner. Next, pre-peakcall QC metrics are calculated. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). Finally, the peaks are called using [MACS2](https://github.com/taoliu/MACS/). The post-peakcall QC report includes additional QC metrics – number of peaks, fraction of reads in peaks (FRiP), number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq
- label:
Genome
- type:
data:index:bowtie2
- label:
Promoter regions BED file
- type:
data:bed
- description:
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required:
False
- label:
Alignment mode
- type:
basic:string
- description:
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default:
--local
- choices:
end to end mode:
--end-to-end
local:
--local
- label:
Speed vs. Sensitivity
- type:
basic:string
- default:
--sensitive
- choices:
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label:
Map as single-ended (for paired-end reads only)
- type:
basic:boolean
- description:
If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
- default:
False
- label:
Report discordantly matched read
- type:
basic:boolean
- description:
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default:
True
- label:
Report single ended
- type:
basic:boolean
- description:
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
- default:
True
- label:
Minimal distance
- type:
basic:integer
- description:
The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
- default:
0
- label:
Maximal distance
- type:
basic:integer
- description:
The maximum fragment length for valid paired-end alignments.
- default:
2000
- label:
Bases to trim from 5’
- type:
basic:integer
- description:
Number of bases to trim from from 5’ (left) end of each read before alignment.
- default:
0
- label:
Bases to trim from 3’
- type:
basic:integer
- description:
Number of bases to trim from from 3’ (right) end of each read before alignment
- default:
0
- label:
Iterations
- type:
basic:integer
- description:
Number of iterations.
- default:
0
- label:
Bases to trim
- type:
basic:integer
- description:
Number of bases to trim from 3’ end in each iteration.
- default:
2
- label:
Report mode
- type:
basic:string
- description:
Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments
- default:
def
- choices:
Default mode:
def
-k mode:
k
-a mode (very slow):
a
- label:
Number of reports (for -k mode only)
- type:
basic:integer
- description:
Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first.
- default:
5
- label:
Quality filtering threshold
- type:
basic:integer
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- default:
25000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default:
True
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- default:
0
- label:
Use tagAlign files
- type:
basic:boolean
- description:
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- default:
True
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
settings.tagalign
- choices:
1:
1
auto:
auto
all:
all
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
!settings.tagalign
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Q-value cutoff
- type:
basic:decimal
- description:
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required:
False
- disabled:
settings.pvalue && settings.pvalue_prepeak
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
False
- disabled:
settings.qvalue
- hidden:
settings.tagalign
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled:
settings.qvalue
- hidden:
!settings.tagalign || settings.qvalue
- default:
0.01
- label:
Cap number of peaks by taking top N peaks
- type:
basic:integer
- description:
To keep all peaks set value to 0.
- disabled:
settings.broad
- default:
300000
- label:
MFOLD range (lower limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
MFOLD range (upper limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
Small local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
Large local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
extsize
- type:
basic:integer
- description:
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- default:
150
- label:
Shift
- type:
basic:integer
- description:
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- default:
-75
- label:
Band width
- type:
basic:integer
- description:
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required:
False
- label:
Use backgroud lambda as local lambda
- type:
basic:boolean
- description:
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default:
False
- label:
Turn on the auto paired-peak model process
- type:
basic:boolean
- description:
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
settings.tagalign
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
!settings.tagalign
- default:
True
- label:
Down-sample
- type:
basic:boolean
- description:
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default:
False
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default:
True
- label:
Save signal per million reads for fragment pileup profiles
- type:
basic:boolean
- disabled:
settings.bedgraph === false
- default:
True
- label:
Call summits
- type:
basic:boolean
- description:
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default:
True
- label:
Composite broad regions
- type:
basic:boolean
- description:
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled:
settings.call_summits === true
- default:
False
- label:
Broad cutoff
- type:
basic:decimal
- description:
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required:
False
- disabled:
settings.call_summits === true || settings.broad !== true
Output results
Abstract alignment process
- data:alignmentabstract-alignment ()[Source: v1.0.1]
Input arguments
Output results
- label:
Alignment file
- type:
basic:file
- label:
Alignment index BAI
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Abstract annotation process
- data:annotationabstract-annotation ()[Source: v1.0.1]
Input arguments
Output results
- label:
Uploaded file
- type:
basic:file
- label:
Gene ID source
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Abstract bed process
- data:bedabstract-bed ()[Source: v1.0.2]
Input arguments
Output results
- label:
BED
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Abstract differential expression process
- data:differentialexpressionabstract-differentialexpression ()[Source: v1.0.1]
Input arguments
Output results
- label:
Differential expression (gene level)
- type:
basic:file
- label:
Results table (JSON)
- type:
basic:json
- label:
Results table (file)
- type:
basic:file
- label:
Gene ID source
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
Abstract expression process
- data:expressionabstract-expression ()[Source: v1.0.1]
Input arguments
Output results
- label:
Normalized expression
- type:
basic:file
- label:
Read counts
- type:
basic:file
- required:
False
- label:
Expression (json)
- type:
basic:json
- label:
Expression type
- type:
basic:string
- label:
Gene ID source
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
Annotate novel splice junctions (regtools)
- data:junctions:regtoolsregtools-junctions-annotate (data:seq:nucleotide genome, data:annotation:gtf annotation, data:alignment:bam:star alignment_star, data:alignment:bam alignment, data:bed input_bed_junctions)[Source: v1.3.1]
Identify novel splice junctions by using regtools to annotate against a reference. The process accepts reference genome, reference genome annotation (GTF), and input with reads information (STAR aligment or reads aligned by any other aligner or junctions in BED12 format). If STAR aligner data is given as input, the process calculates BED12 file from STAR ‘SJ.out.tab’ file, and annotates all junctions with ‘regtools junctions annotate’ command. When reads are aligned by other aligner, junctions are extracted with ‘regtools junctions extract’ tool and then annotated with ‘junction annotate’ command. Third option allows user to provide directly BED12 file with junctions, which are then annotated. Finnally, annotated novel junctions are filtered in a separate output file. More information can be found in the [regtools manual](https://regtools.readthedocs.io/en/latest/).
Input arguments
- label:
Reference genome
- type:
data:seq:nucleotide
- label:
Reference genome annotation (GTF)
- type:
data:annotation:gtf
- label:
STAR alignment
- type:
data:alignment:bam:star
- description:
Splice junctions detected by STAR aligner (SJ.out.tab STAR output file). Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
- required:
False
- label:
Alignment
- type:
data:alignment:bam
- description:
Aligned reads from which splice junctions are going to be extracted. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
- required:
False
- label:
Junctions in BED12 format
- type:
data:bed
- description:
Splice junctions in BED12 format. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
- required:
False
Output results
- label:
Table of annotated novel splice junctions
- type:
basic:file
- label:
Table of annotated splice junctions
- type:
basic:file
- label:
Novel splice junctions in BED format
- type:
basic:file
- label:
Splice junctions in BED format
- type:
basic:file
- label:
Novel splice junctions in BigBed format
- type:
basic:file
- required:
False
- label:
Splice junctions in BigBed format
- type:
basic:file
- required:
False
- label:
Novel splice junctions bed tbi index for JBrowse
- type:
basic:file
- label:
Bed tbi index for JBrowse
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Archive samples
- data:archive:samplesarchive-samples (list:data data, list:basic:string fields, basic:boolean j)[Source: v0.5.2]
Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names.
Input arguments
- label:
Data list
- type:
list:data
- label:
Output file fields
- type:
list:basic:string
- label:
Junk paths
- type:
basic:boolean
- description:
Store just names of saved files (junk the path)
- default:
False
Output results
- label:
Archive
- type:
basic:file
BAM file
- data:alignment:bam:uploadupload-bam (basic:file src, basic:string species, basic:string build)[Source: v1.8.0]
Import a BAM file (.bam), which is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).
Input arguments
- label:
Mapping (BAM)
- type:
basic:file
- description:
A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
- validate_regex:
\.(bam)$
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Build
- type:
basic:string
Output results
- label:
Uploaded file
- type:
basic:file
- label:
Index BAI
- type:
basic:file
- label:
Alignment statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BAM file and index
- data:alignment:bam:uploadupload-bam-indexed (basic:file src, basic:file src2, basic:string species, basic:string build)[Source: v1.8.0]
Import a BAM file (.bam) and BAM index (.bam.bai). BAM file is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).
Input arguments
- label:
Mapping (BAM)
- type:
basic:file
- description:
A mapping file in BAM format.
- validate_regex:
\.(bam)$
- label:
bam index (*.bam.bai file)
- type:
basic:file
- description:
An index file of a BAM mapping file (ending with bam.bai).
- validate_regex:
\.(bam.bai)$
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Build
- type:
basic:string
Output results
- label:
Uploaded file
- type:
basic:file
- label:
Index BAI
- type:
basic:file
- label:
Alignment statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BBDuk (paired-end)
- data:reads:fastq:paired:bbduk:bbduk-paired (data:reads:fastq:paired reads, basic:integer min_length, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean find_best_match, basic:boolean remove_if_either_bad, basic:boolean perform_error_correction, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:boolean trim_by_overlap, basic:boolean strict_overlap, basic:integer min_overlap, basic:integer min_insert, basic:boolean trim_pairs_evenly, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v3.1.2]
Run BBDuk on paired-end reads. BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.
Input arguments
- label:
Reads
- type:
data:reads:fastq:paired
- required:
True
- disabled:
False
- hidden:
False
- label:
Minimum length
- type:
basic:integer
- description:
Reads shorter than the minimum length will be discarded after trimming.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Sequences
- type:
list:data:seq:nucleotide
- description:
Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
- required:
False
- disabled:
False
- hidden:
False
- label:
Literal sequences
- type:
list:basic:string
- description:
Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Kmer length
- type:
basic:integer
- description:
Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
- required:
True
- disabled:
False
- hidden:
False
- default:
27
- label:
Check reverse complements
- type:
basic:boolean
- description:
Look for reverse complements of kmers in addition to forward kmers.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Mask the middle base of a kmer
- type:
basic:boolean
- description:
Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Minimum number of kmer hits
- type:
basic:integer
- description:
Reads need at least this many matching kmers to be considered matching the reference.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Minimum kmer fraction
- type:
basic:decimal
- description:
A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Minimum kmer fraction
- type:
basic:decimal
- description:
A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Maximum Hamming distance for kmers (substitutions only)
- type:
basic:integer
- description:
Hamming distance i.e. the number of mismatches allowed in the kmer.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Hamming distance for query kmers
- type:
basic:integer
- description:
Set a hamming distance for query kmers instead of the read kmers. This makes the read processing much slower, but does not use additional memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Maximum edit distance from reference kmers (substitutions and indels)
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Hamming distance for short kmers when looking for shorter kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Hamming distance for short query kmers when looking for shorter kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Forbid matching of read kmers containing N
- type:
basic:boolean
- description:
By default, these will match a reference ‘A’ if’Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Find best match
- type:
basic:boolean
- description:
If multiple matches, associate read with sequence sharing most kmers.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Remove both sequences of a paired-end read, if either of them is to be removed
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Perform error correction with BBMerge prior to kmer operations
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Trimming protocol to remove bases matching reference kmers from reads
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
f
- choices:
Don’t trim:
f
Trim to the right:
r
Trim to the left:
l
- label:
Symbol to replace bases matching reference kmers
- type:
basic:string
- description:
Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
- required:
True
- disabled:
False
- hidden:
False
- default:
f
- label:
Only mask bases that are fully covered by kmers
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Look for shorter kmers at read tips down to this length when k-trimming or masking
- type:
basic:integer
- description:
-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Trimming protocol to remove bases with quality below the minimum average region quality from read ends
- type:
basic:string
- description:
Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.
- required:
True
- disabled:
False
- hidden:
False
- default:
f
- choices:
Trim neither end:
f
Trim both ends:
rl
Trim only right end:
r
Trim only left end:
l
Use sliding window:
w
- label:
Average quality below which to trim region
- type:
basic:integer
- description:
Set trimming protocol to enable this parameter.
- required:
True
- disabled:
operations.quality_trim === ‘f’
- hidden:
False
- default:
6
- label:
Quality encoding offset
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+ (33):
33
Illumina up to 1.3+, 1.5+ (64):
64
Auto:
auto
- label:
Don’t crash if quality values appear to be incorrect
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum length of poly-A or poly-T tails to trim on either end of reads
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum length fraction
- type:
basic:decimal
- description:
Reads shorter than this fraction of original length after trimming will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Maximum length
- type:
basic:integer
- description:
Reads longer than this after trimming will be discarded.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum average quality
- type:
basic:integer
- description:
Reads with average quality (after trimming) below this will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of initial bases to calculate minimum average quality from
- type:
basic:integer
- description:
If positive, calculate minimum average quality from this many initial bases
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum base quality below which reads are discarded after trimming
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum number of consecutive called bases
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of bases to trim around matching kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Trim adapters based on where paired-end reads overlap
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Adjust sensitivity in ‘Trim adapters based on where paired-end reads overlap’ mode
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Minum number of overlapping bases
- type:
basic:integer
- description:
Require this many bases of overlap for detection.
- required:
True
- disabled:
False
- hidden:
False
- default:
14
- label:
Minimum insert size
- type:
basic:integer
- description:
Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
- required:
True
- disabled:
False
- hidden:
False
- default:
40
- label:
Trim both sequences of paired-end reads to the minimum length of either sequence
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Position from which to trim bases to the left
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Position from which to trim bases to the right
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of bases to trim from the right end
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Modulo to right-trim reads
- type:
basic:integer
- description:
Trim reads to the largest multiple of modulo.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of leftmost bases to look in for kmer matches
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of rightmost bases to look in for kmer matches
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum GC content
- type:
basic:decimal
- description:
Discard reads with lower GC content.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Maximum GC content
- type:
basic:decimal
- description:
Discard reads with higher GC content.
- required:
True
- disabled:
False
- hidden:
False
- default:
1.0
- label:
Max Ns after trimming
- type:
basic:integer
- description:
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Discard reads with invalid characters as bases
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Discard reads that fail Illumina chastity filtering
- type:
basic:boolean
- description:
Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Remove reads with unexpected barcodes
- type:
basic:boolean
- description:
Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise. A barcode must be the last part of the read header.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Barcode sequences
- type:
list:data:seq:nucleotide
- description:
FASTA file(s) with barcode sequences.
- required:
False
- disabled:
False
- hidden:
False
- label:
Literal barcode sequences
- type:
list:basic:string
- description:
Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Minimum X coordinate
- type:
basic:integer
- description:
If positive, discard reads with a smaller X coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Minimum Y coordinate
- type:
basic:integer
- description:
If positive, discard reads with a smaller Y coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Maximum X coordinate
- type:
basic:integer
- description:
If positive, discard reads with a larger X coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Maximum Y coordinate
- type:
basic:integer
- description:
If positive, discard reads with a larger Y coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Minimum entropy
- type:
basic:decimal
- description:
Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1.0
- label:
Length of sliding window used to calculate entropy
- type:
basic:integer
- description:
To use the sliding window set minimum entropy in range between 0.0 and 1.0.
- required:
True
- disabled:
False
- hidden:
False
- default:
50
- label:
Length of kmers used to calcuate entropy
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
5
- label:
Mask low-entropy parts of sequences with N instead of discarding
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum base frequency
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Disable grouping of bases for reads >50bp
- type:
basic:boolean
- description:
All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Remaining upstream reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Remaining downstream reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Statistics
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Upstream quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Downstream quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download upstream FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download downstream FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
BBDuk (single-end)
- data:reads:fastq:single:bbduk:bbduk-single (data:reads:fastq:single reads, basic:integer min_length, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean find_best_match, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:integer min_overlap, basic:integer min_insert, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v3.1.2]
Run BBDuk on single-end reads. BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.
Input arguments
- label:
Reads
- type:
data:reads:fastq:single
- required:
True
- disabled:
False
- hidden:
False
- label:
Minimum length
- type:
basic:integer
- description:
Reads shorter than the minimum length will be discarded after trimming.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Sequences
- type:
list:data:seq:nucleotide
- description:
Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
- required:
False
- disabled:
False
- hidden:
False
- label:
Literal sequences
- type:
list:basic:string
- description:
Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Kmer length
- type:
basic:integer
- description:
Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
- required:
True
- disabled:
False
- hidden:
False
- default:
27
- label:
Check reverse complements
- type:
basic:boolean
- description:
Look for reverse complements of kmers in addition to forward kmers
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Mask the middle base of a kmer
- type:
basic:boolean
- description:
Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Minimum number of kmer hits
- type:
basic:integer
- description:
Reads need at least this many matching kmers to be considered matching the reference.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Minimum kmer fraction
- type:
basic:decimal
- description:
A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Minimum coverage fraction
- type:
basic:decimal
- description:
A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Maximum Hamming distance for kmers (substitutions only)
- type:
basic:integer
- description:
Hamming distance i.e. the number of mismatches allowed in the kmer.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Hamming distance for query kmers
- type:
basic:integer
- description:
Set a hamming distance for query kmers instead of the read kmers. This makes the read processing much slower, but does not use additional memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Maximum edit distance from reference kmers (substitutions and indels)
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Hamming distance for short kmers when looking for shorter kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Hamming distance for short query kmers when looking for shorter kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Forbid matching of read kmers containing N
- type:
basic:boolean
- description:
By default, these will match a reference ‘A’ if’Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Find best match
- type:
basic:boolean
- description:
If multiple matches, associate read with sequence sharing most kmers.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Trimming protocol to remove bases matching reference kmers from reads
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
f
- choices:
Don’t trim:
f
Trim to the right:
r
Trim to the left:
l
- label:
Symbol to replace bases matching reference kmers
- type:
basic:string
- description:
Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
- required:
True
- disabled:
False
- hidden:
False
- default:
f
- label:
Only mask bases that are fully covered by kmers
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Look for shorter kmers at read tips down to this length when k-trimming or masking
- type:
basic:integer
- description:
-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Trimming protocol to remove bases with quality below the minimum average region quality from read ends
- type:
basic:string
- description:
Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.
- required:
True
- disabled:
False
- hidden:
False
- default:
f
- choices:
Trim neither end:
f
Trim both ends:
rl
Trim only right end:
r
Trim only left end:
l
Use sliding window:
w
- label:
Average quality below which to trim region
- type:
basic:integer
- description:
Set trimming protocol to enable this parameter.
- required:
True
- disabled:
operations.quality_trim === ‘f’
- hidden:
False
- default:
6
- label:
Quality encoding offset
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+ (33):
33
Illumina up to 1.3+, 1.5+ (64):
64
Auto:
auto
- label:
Don’t crash if quality values appear to be incorrect
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum length of poly-A or poly-T tails to trim on either end of reads
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum length fraction
- type:
basic:decimal
- description:
Reads shorter than this fraction of original length after trimming will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Maximum length
- type:
basic:integer
- description:
Reads longer than this after trimming will be discarded.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum average quality
- type:
basic:integer
- description:
Reads with average quality (after trimming) below this will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of initial bases to calculate minimum average quality from
- type:
basic:integer
- description:
If positive, calculate minimum average quality from this many initial bases
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum base quality below which reads are discarded after trimming
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum number of consecutive called bases
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of bases to trim around matching kmers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minum number of overlapping bases
- type:
basic:integer
- description:
Require this many bases of overlap for detection.
- required:
True
- disabled:
False
- hidden:
False
- default:
14
- label:
Minimum insert size
- type:
basic:integer
- description:
Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
- required:
True
- disabled:
False
- hidden:
False
- default:
40
- label:
Position from which to trim bases to the left
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Position from which to trim bases to the right
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of bases to trim from the right end
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Modulo to right-trim reads
- type:
basic:integer
- description:
Trim reads to the largest multiple of modulo.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of leftmost bases to look in for kmer matches
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Number of rightmost bases to look in for kmer matches
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Minimum GC content
- type:
basic:decimal
- description:
Discard reads with lower GC content.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.0
- label:
Maximum GC content
- type:
basic:decimal
- description:
Discard reads with higher GC content.
- required:
True
- disabled:
False
- hidden:
False
- default:
1.0
- label:
Max Ns after trimming
- type:
basic:integer
- description:
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Discard reads with invalid characters as bases
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Discard reads that fail Illumina chastity filtering
- type:
basic:boolean
- description:
Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Remove reads with unexpected barcodes
- type:
basic:boolean
- description:
Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise. A barcode must be the last part of the read header.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Barcode sequences
- type:
list:data:seq:nucleotide
- description:
FASTA file(s) with barcode sequences.
- required:
False
- disabled:
False
- hidden:
False
- label:
Literal barcode sequences
- type:
list:basic:string
- description:
Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Minimum X coordinate
- type:
basic:integer
- description:
If positive, discard reads with a smaller X coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Minimum Y coordinate
- type:
basic:integer
- description:
If positive, discard reads with a smaller Y coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Maximum X coordinate
- type:
basic:integer
- description:
If positive, discard reads with a larger X coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Maximum Y coordinate
- type:
basic:integer
- description:
If positive, discard reads with a larger Y coordinate.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Minimum entropy
- type:
basic:decimal
- description:
Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1.0
- label:
Length of sliding window used to calculate entropy
- type:
basic:integer
- description:
To use the sliding window set minimum entropy in range between 0.0 and 1.0.
- required:
True
- disabled:
False
- hidden:
False
- default:
50
- label:
Length of kmers used to calcuate entropy
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
5
- label:
Mask low-entropy parts of sequences with N instead of discarding
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum base frequency
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Disable grouping of bases for reads >50bp
- type:
basic:boolean
- description:
All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Remaining reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Statistics
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
BBDuk - STAR - featureCounts - QC
- data:workflow:rnaseq:featurecounts:qc:workflow-bbduk-star-featurecounts-qc (data:reads:fastq reads, data:index:star genome, data:annotation annotation, basic:string assay_type, data:index:salmon cdna_index, data:index:star rrna_reference, data:index:star globin_reference, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chim_segment_min, basic:boolean quant_mode, basic:boolean single_end, basic:string out_filter_type, basic:integer out_multimap_max, basic:integer out_mismatch_max, basic:decimal out_mismatch_nl_max, basic:integer out_score_min, basic:decimal out_mismatch_nrl_max, basic:integer align_overhang_min, basic:integer align_sjdb_overhang_min, basic:integer align_intron_size_min, basic:integer align_intron_size_max, basic:integer align_gap_max, basic:string align_end_alignment, basic:boolean out_unmapped, basic:string out_sam_attributes, basic:string out_rg_line, basic:integer n_reads, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:boolean by_read_group, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v6.2.0]
RNA-seq pipeline comprised of preprocessing, alignment and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using __Seqtk__ tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.
Input arguments
- label:
Reads (FASTQ)
- type:
data:reads:fastq
- description:
Reads in FASTQ file, single or paired end.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed reference genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation
- type:
data:annotation
- description:
GTF and GFF3 annotation formats are supported.
- required:
True
- disabled:
False
- hidden:
False
- label:
Assay type
- type:
basic:string
- description:
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- required:
True
- disabled:
False
- hidden:
False
- default:
non_specific
- choices:
Strand non-specific:
non_specific
Strand-specific forward:
forward
Strand-specific reverse:
reverse
Detect automatically:
auto
- label:
cDNA index file
- type:
data:index:salmon
- description:
Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
- required:
False
- disabled:
False
- hidden:
assay_type != ‘auto’
- label:
Indexed rRNA reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed Globin reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Adapters
- type:
list:data:seq:nucleotide
- description:
FASTA file(s) with adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Custom adapter sequences
- type:
list:basic:string
- description:
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
K-mer length [k=]
- type:
basic:integer
- description:
Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
- required:
True
- disabled:
False
- hidden:
False
- default:
23
- label:
Minimum k-mer length at right end of reads used for trimming [mink=]
- type:
basic:integer
- required:
True
- disabled:
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- hidden:
False
- default:
11
- label:
Maximum Hamming distance for k-mers [hammingdistance=]
- type:
basic:integer
- description:
Hamming distance i.e. the number of mismatches allowed in the kmer.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Max Ns after trimming [maxns=]
- type:
basic:integer
- description:
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Average quality below which to trim region [trimq=]
- type:
basic:integer
- description:
Phred algorithm is used, which is more accurate than naive trimming.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Minimum read length [minlength=]
- type:
basic:integer
- description:
Reads shorter than minimum read length after trimming are discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Quality encoding offset [qin=]
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+:
33
Illumina up to 1.3+, 1.5+:
64
Auto:
auto
- label:
Ignore bad quality [ignorebadquality]
- type:
basic:boolean
- description:
Don’t crash if quality values appear to be incorrect.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
The data is unstranded [–outSAMstrandField intronMotif]
- type:
basic:boolean
- description:
For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Remove non-cannonical junctions (Cufflinks compatibility)
- type:
basic:boolean
- description:
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Detect chimeric and circular alignments [–chimOutType SeparateSAMold]
- type:
basic:boolean
- description:
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum length of chimeric segment [–chimSegmentMin]
- type:
basic:integer
- required:
True
- disabled:
!alignment.chimeric_reads.chimeric
- hidden:
False
- default:
20
- label:
Output in transcript coordinates [–quantMode]
- type:
basic:boolean
- description:
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]
- type:
basic:boolean
- description:
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).
- required:
True
- disabled:
!t_coordinates.quant_mode
- hidden:
False
- default:
False
- label:
Type of filtering [–outFilterType]
- type:
basic:string
- description:
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.
- required:
True
- disabled:
False
- hidden:
False
- default:
Normal
- choices:
Normal:
Normal
BySJout:
BySJout
- label:
Maximum number of loci [–outFilterMultimapNmax]
- type:
basic:integer
- description:
Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum number of mismatches [–outFilterMismatchNmax]
- type:
basic:integer
- description:
Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]
- type:
basic:decimal
- description:
Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum alignment score [–outFilterScoreMin]
- type:
basic:integer
- description:
Alignment will be output only if its score is higher than or equal to this value (default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]
- type:
basic:decimal
- description:
Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum overhang [–alignSJoverhangMin]
- type:
basic:integer
- description:
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum overhang (sjdb) [–alignSJDBoverhangMin]
- type:
basic:integer
- description:
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum intron size [–alignIntronMin]
- type:
basic:integer
- description:
Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum intron size [–alignIntronMax]
- type:
basic:integer
- description:
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum gap between mates [–alignMatesGapMax]
- type:
basic:integer
- description:
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Read ends alignment [–alignEndsType]
- type:
basic:string
- description:
Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.
- required:
True
- disabled:
False
- hidden:
False
- default:
Local
- choices:
Local:
Local
EndToEnd:
EndToEnd
Extend5pOfRead1:
Extend5pOfRead1
Extend5pOfReads12:
Extend5pOfReads12
- label:
Output unmapped reads (SAM) [–outSAMunmapped Within]
- type:
basic:boolean
- description:
Output of unmapped reads in the SAM format.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Desired SAM attributes [–outSAMattributes]
- type:
basic:string
- description:
A string of desired SAM attributes, in the order desired for the output SAM.
- required:
True
- disabled:
False
- hidden:
False
- default:
Standard
- choices:
Standard:
Standard
All:
All
NH HI NM MD:
NH HI NM MD
None:
None
- label:
SAM/BAM read group line [–outSAMattrRGline]
- type:
basic:string
- description:
The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in -readFilesIn. Commas have to be surrounded by spaces, e.g. -outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.
- required:
False
- disabled:
False
- hidden:
False
- label:
Number of reads in subsampled alignment file
- type:
basic:integer
- description:
Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
- required:
True
- disabled:
False
- hidden:
assay_type != ‘auto’
- default:
5000000
- label:
Feature class [-t]
- type:
basic:string
- description:
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- required:
True
- disabled:
False
- hidden:
False
- default:
exon
- label:
Feature type
- type:
basic:string
- description:
The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.
- required:
True
- disabled:
False
- hidden:
False
- default:
gene
- choices:
gene:
gene
transcript:
transcript
- label:
ID attribute [-g]
- type:
basic:string
- description:
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- required:
True
- disabled:
False
- hidden:
False
- default:
gene_id
- choices:
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
- label:
Assign reads by read group
- type:
basic:boolean
- description:
RG tag is required to be present in the input BAM files.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Number of reads
- type:
basic:integer
- description:
Number of reads to include in subsampling.
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
Seed [-s]
- type:
basic:integer
- description:
Using the same random seed makes reads subsampling more reproducible in different environments.
- required:
True
- disabled:
False
- hidden:
False
- default:
11
- label:
Fraction of reads used
- type:
basic:decimal
- description:
Use the fraction of reads [0.0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
- required:
False
- disabled:
False
- hidden:
False
- label:
2-pass mode [-2]
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
BBDuk - Salmon - QC
- data:workflow:rnaseq:salmon:workflow-bbduk-salmon-qc (data:reads:fastq reads, data:index:salmon salmon_index, data:index:star genome, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean seq_bias, basic:boolean gc_bias, basic:decimal consensus_slack, basic:decimal min_score_fraction, basic:integer range_factorization_bins, basic:integer min_assigned_frag, basic:integer num_bootstraps, basic:integer num_gibbs_samples, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v4.3.1]
Alignment-free RNA-Seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.
Input arguments
- label:
Select sample(s) (FASTQ)
- type:
data:reads:fastq
- description:
Reads in FASTQ file, single or paired end.
- required:
True
- disabled:
False
- hidden:
False
- label:
Salmon index
- type:
data:index:salmon
- description:
Transcriptome index file created using the Salmon indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed reference genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation
- type:
data:annotation
- description:
GTF and GFF3 annotation formats are supported.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed rRNA reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed Globin reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Adapters
- type:
list:data:seq:nucleotide
- description:
FASTA file(s) with adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Custom adapter sequences
- type:
list:basic:string
- description:
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
K-mer length
- type:
basic:integer
- description:
K-mer length must be smaller or equal to the length of adapters.
- required:
True
- disabled:
False
- hidden:
False
- default:
23
- label:
Minimum k-mer length at right end of reads used for trimming
- type:
basic:integer
- required:
True
- disabled:
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- hidden:
False
- default:
11
- label:
Maximum Hamming distance for k-mers
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Max Ns after trimming
- type:
basic:integer
- description:
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Quality below which to trim reads from the right end
- type:
basic:integer
- description:
Phred algorithm is used, which is more accurate than naive trimming.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Minimum read length
- type:
basic:integer
- description:
Reads shorter than minimum read length after trimming are discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Quality encoding offset
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+:
33
Illumina up to 1.3+, 1.5+:
64
Auto:
auto
- label:
Ignore bad quality
- type:
basic:boolean
- description:
Don’t crash if quality values appear to be incorrect.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Perform sequence-specific bias correction
- type:
basic:boolean
- description:
Perform sequence-specific bias correction.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Perform fragment GC bias correction
- type:
basic:boolean
- description:
Perform fragment GC bias correction. If single-end reads are selected as input in this workflow, it is recommended that you set this option to False. If you selected paired-end reads as input in this workflow, it is recommended that you set this option to True.
- required:
False
- disabled:
False
- hidden:
False
- label:
Consensus slack
- type:
basic:decimal
- description:
The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum alignment score fraction
- type:
basic:decimal
- description:
The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].
- required:
True
- disabled:
False
- hidden:
False
- default:
0.65
- label:
Range factorization bins
- type:
basic:integer
- description:
Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.
- required:
True
- disabled:
False
- hidden:
False
- default:
4
- label:
Minimum number of assigned fragments
- type:
basic:integer
- description:
The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
–numBootstraps
- type:
basic:integer
- description:
Salmon has the ability to optionally compute bootstrapped abundance estimates. This is done by resampling (with replacement) from the counts assigned to the fragment equivalence classes, and then re-running the optimization procedure, either the EM or VBEM, for each such sample. The values of these different bootstraps allows us to assess technical variance in the main abundance estimates we produce. Such estimates can be useful for downstream (e.g. differential expression) tools that can make use of such uncertainty estimates. This option takes a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required.
- required:
False
- disabled:
quantification.num_gibbs_samples
- hidden:
False
- label:
–numGibbsSamples
- type:
basic:integer
- description:
Just as with the bootstrap procedure above, this option produces samples that allow us to estimate the variance in abundance estimates. However, in this case the samples are generated using posterior Gibbs sampling over the fragment equivalence classes rather than bootstrapping. We are currently analyzing these different approaches to assess the potential trade-offs in time / accuracy. The –numBootstraps and –numGibbsSamples options are mutually exclusive (i.e. in a given run, you must set at most one of these options to a positive integer.)
- required:
False
- disabled:
quantification.num_bootstraps
- hidden:
False
- label:
Number of reads
- type:
basic:integer
- description:
Number of reads to include in subsampling.
- required:
True
- disabled:
False
- hidden:
False
- default:
10000000
- label:
Number of reads
- type:
basic:integer
- description:
Using the same random seed makes reads subsampling reproducible in different environments.
- required:
True
- disabled:
False
- hidden:
False
- default:
11
- label:
Fraction of reads
- type:
basic:decimal
- description:
Use the fraction of reads [0.0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
- required:
False
- disabled:
False
- hidden:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory usage.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
BED file
- data:bedupload-bed (basic:file src, basic:string species, basic:string build)[Source: v1.5.0]
Import a BED file (.bed) which is a tab-delimited text file that defines a feature track. It can have any file extension, but .bed is recommended. The BED file format is described on the [UCSC Genome Bioinformatics web site](http://genome.ucsc.edu/FAQ/FAQformat#format1).
Input arguments
- label:
BED file
- type:
basic:file
- description:
Upload BED file annotation track. The first three required BED fields are chrom, chromStart and chromEnd.
- required:
True
- validate_regex:
\.(bed|narrowPeak)$
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Genome build
- type:
basic:string
Output results
- label:
BED file
- type:
basic:file
- label:
Bgzip bed file for JBrowse
- type:
basic:file
- label:
Bed file index for Jbrowse
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BEDPE file
- data:bedpe:upload-bedpe (basic:file src, basic:string species, basic:string build)[Source: v1.3.1]
Upload BEDPE files.
Input arguments
- label:
Select BEDPE file to upload
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
BEDPE file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
BWA ALN
- data:alignment:bam:bwaalnalignment-bwa-aln (data:index:bwa genome, data:reads:fastq reads, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v2.6.2]
Read aligner for mapping low-divergent sequences against a large reference genome. Designed for Illumina sequence reads up to 100bp.
Input arguments
- label:
Reference genome
- type:
data:index:bwa
- label:
Reads
- type:
data:reads:fastq
- label:
Quality threshold
- type:
basic:integer
- description:
Parameter for dynamic read trimming.
- default:
0
- label:
Use maximum edit distance (excludes fraction of missing alignments)
- type:
basic:boolean
- default:
False
- label:
Maximum edit distance
- type:
basic:integer
- hidden:
!use_edit
- default:
5
- label:
Fraction of missing alignments
- type:
basic:decimal
- description:
The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
- hidden:
use_edit
- default:
0.04
- label:
Use seeds
- type:
basic:boolean
- default:
False
- label:
Seed length
- type:
basic:integer
- description:
Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
- hidden:
!seeds
- default:
35
- label:
Seed maximum edit distance
- type:
basic:integer
- hidden:
!seeds
- default:
2
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BWA MEM
- data:alignment:bam:bwamemalignment-bwa-mem (data:index:bwa genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v3.6.0]
BWA MEM is a read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW). See [here](http://bio-bwa.sourceforge.net/) for more information.
Input arguments
- label:
Reference genome
- type:
data:index:bwa
- label:
Reads
- type:
data:reads:fastq
- label:
Minimum seed length
- type:
basic:integer
- description:
Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
- default:
19
- label:
Band width
- type:
basic:integer
- description:
Gaps longer than this will not be found.
- default:
100
- label:
Re-seeding factor
- type:
basic:decimal
- description:
Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
- default:
1.5
- label:
Mark shorter split hits as secondary
- type:
basic:boolean
- description:
Mark shorter split hits as secondary (for Picard compatibility)
- default:
False
- label:
Score of a match
- type:
basic:integer
- default:
1
- label:
Mismatch penalty
- type:
basic:integer
- default:
4
- label:
Gap open penalty
- type:
basic:integer
- default:
6
- label:
Gap extension penalty
- type:
basic:integer
- default:
1
- label:
Clipping penalty
- type:
basic:integer
- description:
Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
- default:
5
- label:
Penalty for an unpaired read pair
- type:
basic:integer
- description:
Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
- default:
9
- label:
Report all found alignments
- type:
basic:boolean
- description:
Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
- default:
False
- label:
Report threshold score
- type:
basic:integer
- description:
Don’t output alignment with score lower than defined number. This option only affects output.
- default:
30
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BWA MEM2
- data:alignment:bam:bwamem2alignment-bwa-mem2 (data:index:bwamem2 genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v1.3.0]
Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. See [here](https://github.com/bwa-mem2/bwa-mem2) for more information.
Input arguments
- label:
Reference genome
- type:
data:index:bwamem2
- label:
Reads
- type:
data:reads:fastq
- label:
Minimum seed length
- type:
basic:integer
- description:
Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
- default:
19
- label:
Band width
- type:
basic:integer
- description:
Gaps longer than this will not be found.
- default:
100
- label:
Re-seeding factor
- type:
basic:decimal
- description:
Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
- default:
1.5
- label:
Mark shorter split hits as secondary
- type:
basic:boolean
- description:
Mark shorter split hits as secondary (for Picard compatibility)
- default:
False
- label:
Score of a match
- type:
basic:integer
- default:
1
- label:
Mismatch penalty
- type:
basic:integer
- default:
4
- label:
Gap open penalty
- type:
basic:integer
- default:
6
- label:
Gap extension penalty
- type:
basic:integer
- default:
1
- label:
Clipping penalty
- type:
basic:integer
- description:
Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
- default:
5
- label:
Penalty for an unpaired read pair
- type:
basic:integer
- description:
Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
- default:
9
- label:
Report all found alignments
- type:
basic:boolean
- description:
Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
- default:
False
- label:
Report threshold score
- type:
basic:integer
- description:
Don’t output alignment with score lower than defined number. This option only affects output.
- default:
30
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BWA SW
- data:alignment:bam:bwaswalignment-bwa-sw (data:index:bwa genome, data:reads:fastq reads, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e)[Source: v2.5.2]
Read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The paired-end mode only works for reads Illumina short-insert libraries.
Input arguments
- label:
Reference genome
- type:
data:index:bwa
- label:
Reads
- type:
data:reads:fastq
- label:
Score of a match
- type:
basic:integer
- default:
1
- label:
Mismatch penalty
- type:
basic:integer
- default:
3
- label:
Gap open penalty
- type:
basic:integer
- default:
5
- label:
Gap extension penalty
- type:
basic:integer
- default:
2
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
BWA genome index
- data:index:bwa:bwa-index (data:seq:nucleotide ref_seq)[Source: v1.2.0]
Create BWA genome index.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
BWA index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
BWA-MEM2 genome index
- data:index:bwamem2:bwamem2-index (data:seq:nucleotide ref_seq)[Source: v1.1.0]
Create BWA-MEM2 genome index.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
BWA-MEM2 index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
BWA-MEM2 index files
- data:index:bwamem2:upload-bwamem2-index (basic:file ref_seq, basic:file index_name, basic:string species, basic:string build)[Source: v1.0.0]
Import BWA-MEM2 index files.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
BWA-MEM2 index files
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu or write a custom species name in the species field. For sequences that are not related to any particular species (e.g. adapters file), you can select the value Other.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Macaca mulatta:
Macaca mulatta
Dictyostelium discoideum:
Dictyostelium discoideum
Other:
Other
- label:
Genome build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
BWA-MEM2 index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Bam split
- data:alignment:bam:primarybam-split (data:alignment:bam bam, data:sam:header header, data:sam:header header2)[Source: v0.9.1]
Split hybrid bam file into two bam files.
Input arguments
- label:
Hybrid alignment bam
- type:
data:alignment:bam
- label:
Primary header sam file (optional)
- type:
data:sam:header
- description:
If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
- required:
False
- label:
Secondary header sam file (optional)
- type:
data:sam:header
- description:
If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
- required:
False
Output results
- label:
Uploaded file
- type:
basic:file
- label:
Index BAI
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Bamclipper
- data:alignment:bam:bamclipped:bamclipper (data:alignment:bam alignment, data:bedpe bedpe, basic:boolean skip)[Source: v1.5.1]
Remove primer sequence from BAM alignments by soft-clipping. This process is a wrapper for bamclipper which can be found at https://github.com/tommyau/bamclipper.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
BEDPE file
- type:
data:bedpe
- required:
False
- disabled:
False
- hidden:
False
- label:
Skip Bamclipper step
- type:
basic:boolean
- description:
Use this option to skip Bamclipper step.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Clipped BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of clipped BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Bamliquidator
- data:bam:plot:bamliquidatorbamliquidator (basic:string analysis_type, list:data:alignment:bam bam, basic:string cell_type, basic:integer bin_size, data:annotation:gtf regions_gtf, data:bed regions_bed, basic:integer extension, basic:string sense, basic:boolean skip_plot, list:basic:string black_list, basic:integer threads)[Source: v0.3.3]
Set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.
Input arguments
- label:
Analysis type
- type:
basic:string
- default:
bin
- choices:
Bin mode:
bin
Region mode:
region
BED mode:
bed
- label:
BAM File
- type:
list:data:alignment:bam
- label:
Cell type
- type:
basic:string
- default:
cell_type
- label:
Bin size
- type:
basic:integer
- description:
Number of base pairs in each bin. The smaller the bin size the longer the runtime and the larger the data files. Default is 100000.
- required:
False
- hidden:
analysis_type != ‘bin’
- label:
Region gff file / Annotation file (.gff|.gtf)
- type:
data:annotation:gtf
- required:
False
- hidden:
analysis_type != ‘region’
- label:
Region bed file / Annotation file (.bed)
- type:
data:bed
- required:
False
- hidden:
analysis_type != ‘bed’
- label:
Extension
- type:
basic:integer
- description:
Extends reads by number of bp
- default:
200
- label:
Mapping strand to gff file
- type:
basic:string
- default:
.
- choices:
Forward:
+
Reverse:
-
Both:
.
- label:
Skip plot
- type:
basic:boolean
- required:
False
- label:
Black list
- type:
list:basic:string
- description:
One or more chromosome patterns to skip during bin liquidation. Default is to skip any chromosomes that contain any of the following substrings chrUn _random Zv9_ _hap.
- required:
False
- label:
Threads
- type:
basic:integer
- description:
Number of threads to run concurrently during liquidation.
- default:
1
Output results
- label:
Analysis type
- type:
basic:string
- hidden:
True
- label:
Output directory
- type:
basic:file
- label:
Counts HDF5 file
- type:
basic:file
- label:
Matrix file
- type:
basic:file
- required:
False
- hidden:
analysis_type != ‘region’
- label:
Summary file
- type:
basic:file:html
- required:
False
- hidden:
analysis_type != ‘bin’
Bamplot
- data:bam:plot:bamplotbamplot (basic:string genome, data:annotation:gtf input_gff, basic:string input_region, list:data:alignment:bam bam, basic:integer stretch_input, basic:string color, basic:string sense, basic:integer extension, basic:boolean rpm, basic:string yscale, list:basic:string names, basic:string plot, basic:string title, basic:string scale, list:data:bed bed, basic:boolean multi_page)[Source: v1.4.3]
Plot a single locus from a bam.
Input arguments
- label:
Genome
- type:
basic:string
- choices:
HG19:
HG19
HG18:
HG18
MM8:
MM8
MM9:
MM9
MM10:
MM10
RN6:
RN6
RN4:
RN4
- label:
Region string
- type:
data:annotation:gtf
- description:
Enter .gff file.
- required:
False
- label:
Region string
- type:
basic:string
- description:
Enter genomic region e.g. chr1:+:1-1000.
- required:
False
- label:
Bam
- type:
list:data:alignment:bam
- description:
bam to plot from
- required:
False
- label:
Stretch-input
- type:
basic:integer
- description:
Stretch the input regions to a minimum length in bp, e.g. 10000 (for 10kb).
- required:
False
- label:
Color
- type:
basic:string
- description:
Enter a colon separated list of colors e.g. 255,0,0:255,125,0, default samples the rainbow.
- default:
255,0,0:255,125,0
- label:
Sense
- type:
basic:string
- description:
Map to forward, reverse or’both strands. Default maps to both.
- default:
both
- choices:
Forward:
forward
Reverse:
reverse
Both:
both
- label:
Extension
- type:
basic:integer
- description:
Extends reads by n bp. Default value is 200bp.
- default:
200
- label:
rpm
- type:
basic:boolean
- description:
Normalizes density to reads per million (rpm) Default is False.
- required:
False
- label:
y scale
- type:
basic:string
- description:
Choose either relative or uniform y axis scaling. Default is relative scaling.
- default:
relative
- choices:
relative:
relative
uniform:
uniform
- label:
Names
- type:
list:basic:string
- description:
Enter a comma separated list of names for your bams.
- required:
False
- label:
Single or multiple polt
- type:
basic:string
- description:
Choose either all lines on a single plot or multiple plots.
- default:
merge
- choices:
single:
single
multiple:
multiple
merge:
merge
- label:
Title
- type:
basic:string
- description:
Specify a title for the output plot(s), default will be the coordinate region.
- default:
output
- label:
Scale
- type:
basic:string
- description:
Enter a comma separated list of multiplicative scaling factors for your bams. Default is none.
- required:
False
- label:
Bed
- type:
list:data:bed
- description:
Add a space-delimited list of bed files to plot.
- required:
False
- label:
Multi page
- type:
basic:boolean
- description:
If flagged will create a new pdf for each region.
- default:
False
Output results
- label:
region plot
- type:
basic:file
BaseQualityScoreRecalibrator
- data:alignment:bam:bqsr:bqsr (data:alignment:bam bam, data:seq:nucleotide reference, list:data:variants:vcf known_sites, data:bed intervals, basic:string read_group, basic:string validation_stringency, basic:boolean use_original_qualities, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v2.5.1]
A two pass process of BaseRecalibrator and ApplyBQSR from GATK. See GATK website for more information on BaseRecalibrator. It is possible to modify read group using GATK’s AddOrReplaceGroups through Replace read groups in BAM (``read_group``) input field.
Input arguments
- label:
BAM file containing reads
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference genome file
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
List of known sites of variation
- type:
list:data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
One or more genomic intervals over which to operate.
- type:
data:bed
- description:
This field is optional, but it can speed up the process by restricting calculations to specific genome regions.
- required:
False
- disabled:
False
- hidden:
False
- label:
Replace read groups in BAM
- type:
basic:string
- description:
Replace read groups in a BAM file.This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.
- required:
True
- disabled:
False
- hidden:
False
- default:
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.
- required:
True
- disabled:
False
- hidden:
False
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Use the base quality scores from the OQ tag
- type:
basic:boolean
- description:
This flag tells GATK to use the original base qualities (that were in the data before BQSR/recalibration) which are stored in the OQ tag, if they are present, rather than use the post-recalibration quality scores. If no OQ tag is present for a read, the standard qual score will be used.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Base quality score recalibrated BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of base quality score recalibrated BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Recalibration tabled
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
BaseSpace file
- data:file:basespace-file-import (basic:string file_id, basic:secret access_token_secret, basic:string output, basic:integer tries, basic:boolean verbose)[Source: v1.5.1]
Import a file from Illumina BaseSpace.
Input arguments
- label:
BaseSpace file ID
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
BaseSpace access token
- type:
basic:secret
- description:
BaseSpace access token secret handle needed to download the file.
- required:
True
- disabled:
False
- hidden:
False
- label:
Output
- type:
basic:string
- description:
Sets what is printed to standard output. Argument ‘Full’ outputs everything, argument ‘Filename’ outputs only file names of downloaded files.
- required:
True
- disabled:
False
- hidden:
False
- default:
filename
- choices:
Full:
full
Filename:
filename
- label:
Tries
- type:
basic:integer
- description:
Number of tries to download a file before giving up.
- required:
True
- disabled:
False
- hidden:
False
- default:
3
- label:
Verbose
- type:
basic:boolean
- description:
Print detailed exception information to standard output when error occurs. Output argument had no effect on this argument.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
File with reads
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Bedtools bamtobed
- data:bedpe:bedtools-bamtobed (data:alignment:bam alignment)[Source: v1.3.1]
Takes in a BAM file and calculates a normalization factor in BEDPE format. Done by sorting with Samtools and transformed with Bedtools.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
BEDPE file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Beta Cut & Run workflow
- data:workflow:cutnrun:workflow-cutnrun-beta (data:reads:fastq:paired reads, basic:integer quality, basic:integer nextseq, basic:integer min_length, list:basic:string adapter_1, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, data:index:bowtie2 genome, data:index:bowtie2 spikein_genome, basic:string alignment_mode, basic:string speed, basic:boolean dovetail, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean discordantly, basic:boolean no_unal, basic:boolean skip_norm, basic:decimal scale, basic:boolean downsample_reads, basic:integer n_reads, basic:boolean remove_duplicates)[Source: v2.0.0]
Beta Cut & Run workflow. Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN, which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome (optional). Aligned reads are processed to produce bigwig files to be viewed in a genome browser.
Input arguments
- label:
Input Reads (FASTQ)
- type:
data:reads:fastq:paired
- description:
Paired-end reads in FASTQ file.
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality cutoff
- type:
basic:integer
- description:
Trim low-quality ends from reads based on Phred score. Default: 20.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
NextSeq/NovaSeq trim cutoff
- type:
basic:integer
- description:
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum length after trimming
- type:
basic:integer
- description:
Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than the specified length to be printed out to validated paired-end files. A value of 0 disables filtering based on length. Default: 20.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Read 1 adapter sequence
- type:
list:basic:string
- description:
Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with Read 1 adapters file and Universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Read 2 adapter sequence
- type:
list:basic:string
- description:
Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with Read 2 adapters file and Universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Read 1 adapters file
- type:
data:seq:nucleotide
- description:
This is mutually exclusive with Read 1 adapters and Universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Read 2 adapters file
- type:
data:seq:nucleotide
- description:
This is mutually exclusive with Read 2 adapters and Universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Universal adapters
- type:
basic:string
- description:
Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the min length value to 18bp. If smallRNA libraries are paired-end, then Read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.
- required:
False
- disabled:
False
- hidden:
False
- choices:
Illumina:
--illumina
Nextera:
--nextera
Illumina small RNA:
--small_rna
- label:
Overlap with adapter sequence required to trim
- type:
basic:integer
- description:
Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Maximum allowed error rate
- type:
basic:decimal
- description:
Number of errors divided by the length of the matching region. Default: 0.1.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.1
- label:
Species genome
- type:
data:index:bowtie2
- required:
True
- disabled:
False
- hidden:
False
- label:
Spike-in genome
- type:
data:index:bowtie2
- required:
False
- disabled:
normalization_options.skip_norm == true
- hidden:
False
- label:
Alignment mode
- type:
basic:string
- description:
Local: Some characters may be omitted (‘soft clipped’) from the ends in order to achieve the greatest possible alignment score. End-to-end: Option without any trimming (or ‘soft clipping’) of bases from either end. This option is enabled by default and is suitable if reads have been clipped beforehand.
- required:
True
- disabled:
False
- hidden:
False
- default:
--end-to-end
- choices:
Local:
--local
End-to-end:
--end-to-end
- label:
Speed vs. Sensitivity
- type:
basic:string
- description:
Setting for aligning fast or accurately. Default: Very sensitive.
- required:
True
- disabled:
False
- hidden:
False
- default:
--very-sensitive
- choices:
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label:
Dovetail
- type:
basic:boolean
- description:
If the mates dovetail, it implies that if the alignment of one mate extends beyond the starting point of the other, it results in the incorrect mate initiating upstream. This condition is considered concordant. Default: True.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Report single ended
- type:
basic:boolean
- description:
If paired alignment cannot be found, Bowtie2 tries to find alignments for the individual mates. Default: False.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimal distance
- type:
basic:integer
- description:
The minimum fragment length (–minins) for valid paired-end alignments. Default: 10.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Maximal distance
- type:
basic:integer
- description:
The maximum fragment length (–maxins) for valid paired-end alignments. Default: 700.
- required:
True
- disabled:
False
- hidden:
False
- default:
700
- label:
Report discordantly matched read
- type:
basic:boolean
- description:
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance), alignment will still be reported. Useful for detecting structural variations. Default: False.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Suppress SAM records for unaligned reads
- type:
basic:boolean
- description:
When enabled, suppress SAM records for unaligned reads. Default: True.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Skip normalization
- type:
basic:boolean
- description:
Skip the spike-in normalization step of BigWig output. Use this if you don’t provide a spike-in. Default: False.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Scale factor
- type:
basic:decimal
- description:
Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)). Default: 10000.
- required:
True
- disabled:
normalization_options.skip_norm == true
- hidden:
False
- default:
10000
- label:
Downsample reads
- type:
basic:boolean
- description:
Option to downsample reads before trimming. Default: True
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Number of reads to downsample
- type:
basic:integer
- description:
Number of reads to downsample from the input FASTQ file. Default: 10M.
- required:
True
- disabled:
downsampling_options.downsample_reads == false
- hidden:
False
- default:
10000000
- label:
Remove duplicates
- type:
basic:boolean
- description:
Option on how to handle duplicate reads. True: Mark and remove duplicate reads. False: Only mark duplicate reads. Note that this option is only available for species genome. In case of spike-in genome, duplicate reads are always removed. Default: False.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
Bisulfite conversion rate
- data:wgbs:bsrate:bs-conversion-rate (data:alignment:bam:walt mr, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich)[Source: v1.3.1]
Estimate bisulfite conversion rate in a control set. The program bsrate included in [Methpipe] (https://github.com/smithlabcode/methpipe) will estimate the bisulfite conversion rate.
Input arguments
- label:
Aligned reads from bisulfite sequencing
- type:
data:alignment:bam:walt
- description:
Bisulfite specifc alignment such as WALT is required as .mr file type is used. Duplicatesshould be removed to reduce any bias introduced by incomplete conversion on PCR duplicatereads.
- required:
True
- disabled:
False
- hidden:
False
- label:
Skip Bisulfite conversion rate step
- type:
basic:boolean
- description:
Bisulfite conversion rate step can be skipped.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Unmethylated control sequence
- type:
data:seq:nucleotide
- description:
Separate unmethylated control sequence FASTA file is required to estimate bisulfiteconversion rate.
- required:
False
- disabled:
False
- hidden:
False
- label:
Count all cytosines including CpGs
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Average read length
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
150
- label:
Maximum fraction of mismatches
- type:
basic:decimal
- required:
False
- disabled:
False
- hidden:
False
- label:
Reads are A-rich
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Bisulfite conversion rate report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Bowtie (Dicty)
- data:alignment:bam:bowtie1alignment-bowtie (data:index:bowtie genome, data:reads:fastq reads, basic:string mode, basic:integer m, basic:integer l, basic:boolean use_se, basic:integer trim_5, basic:integer trim_3, basic:integer trim_nucl, basic:integer trim_iter, basic:string r)[Source: v2.5.2]
An ultrafast memory-efficient short read aligner.
Input arguments
- label:
Reference genome
- type:
data:index:bowtie
- label:
Reads
- type:
data:reads:fastq
- label:
Alignment mode
- type:
basic:string
- description:
When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq’s default policy. 1. Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the “seed”. 2. The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40. In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.
- default:
-n
- choices:
Use qualities (-n):
-n
Use mismatches (-v):
-v
- label:
Allowed mismatches
- type:
basic:integer
- description:
When used with “Use qualities (-n)” it is the maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/–seedlen). This may be 0, 1, 2 or 3 and the default is 2 When used with “Use mismatches (-v)” report alignments with at most <int> mismatches.
- default:
2
- label:
Seed length (for -n only)
- type:
basic:integer
- description:
Only for “Use qualities (-n)”. Seed length (-l) is the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
- default:
28
- label:
Map as single-ended (for paired end reads only)
- type:
basic:boolean
- description:
If this option is selected paired-end reads will be mapped as single-ended.
- default:
False
- label:
Bases to trim from 5’
- type:
basic:integer
- description:
Number of bases to trim from from 5’ (left) end of each read before alignment
- default:
0
- label:
Bases to trim from 3’
- type:
basic:integer
- description:
Number of bases to trim from from 3’ (right) end of each read before alignment
- default:
0
- label:
Bases to trim
- type:
basic:integer
- description:
Number of bases to trim from 3’ end in each iteration.
- default:
2
- label:
Iterations
- type:
basic:integer
- description:
Number of iterations.
- default:
0
- label:
Reporting mode
- type:
basic:string
- description:
Report up to <int> valid alignments per read or pair (-k) (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the –best and –strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/–offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
- default:
-a -m 1 --best --strata
- choices:
Report unique alignments:
-a -m 1 --best --strata
Report all alignments:
-a --best
Report all alignments in the best stratum:
-a --best --strata
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Bowtie genome index
- data:index:bowtie:bowtie-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]
Create Bowtie genome index.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Bowtie index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Bowtie2
- data:alignment:bam:bowtie2alignment-bowtie2 (data:index:bowtie2 genome, data:reads:fastq reads, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:boolean no_unal)[Source: v2.8.2]
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small–typically about 2.2 GB for the human genome (2.9 GB for paired-end). See [here](http://bowtie-bio.sourceforge.net/index.shtml) for more information.
Input arguments
- label:
Reference genome
- type:
data:index:bowtie2
- label:
Reads
- type:
data:reads:fastq
- label:
Alignment mode
- type:
basic:string
- description:
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default:
--end-to-end
- choices:
end to end mode:
--end-to-end
local:
--local
- label:
Speed vs. Sensitivity
- type:
basic:string
- description:
A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- required:
False
- choices:
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label:
Map as single-ended (for paired-end reads only)
- type:
basic:boolean
- description:
If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
- default:
False
- label:
Report discordantly matched read
- type:
basic:boolean
- description:
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default:
True
- label:
Report single ended
- type:
basic:boolean
- description:
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
- default:
True
- label:
Minimal distance
- type:
basic:integer
- description:
The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
- default:
0
- label:
Maximal distance
- type:
basic:integer
- description:
The maximum fragment length for valid paired-end alignments.
- default:
500
- label:
Not concordant when mates overlap
- type:
basic:boolean
- description:
When true, it is considered not concordant when mates overlap at all. Defaul is false.
- default:
False
- label:
Dovetail
- type:
basic:boolean
- description:
If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment.
- default:
False
- label:
Number of mismatches allowed in seed alignment (N)
- type:
basic:integer
- description:
Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
- required:
False
- label:
Length of seed substrings (L)
- type:
basic:integer
- description:
Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
- required:
False
- label:
Disallow gaps within positions (gbar)
- type:
basic:integer
- description:
Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
- required:
False
- label:
Maximal and minimal mismatch penalty (mp)
- type:
basic:string
- description:
Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
- required:
False
- label:
Set read gap open and extend penalties (rdg)
- type:
basic:string
- description:
Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
- required:
False
- label:
Set reference gap open and close penalties (rfg)
- type:
basic:string
- description:
Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
- required:
False
- label:
Minimum alignment score needed for “valid” alignment (score_min)
- type:
basic:string
- description:
Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
- required:
False
- label:
Bases to trim from 5’
- type:
basic:integer
- description:
Number of bases to trim from from 5’ (left) end of each read before alignment
- default:
0
- label:
Bases to trim from 3’
- type:
basic:integer
- description:
Number of bases to trim from from 3’ (right) end of each read before alignment
- default:
0
- label:
Iterations
- type:
basic:integer
- description:
Number of iterations.
- default:
0
- label:
Bases to trim
- type:
basic:integer
- description:
Number of bases to trim from 3’ end in each iteration.
- default:
2
- label:
Report mode
- type:
basic:string
- description:
Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments
- default:
def
- choices:
Default mode:
def
-k mode:
k
-a mode (very slow):
a
- label:
Number of reports (for -k mode only)
- type:
basic:integer
- description:
Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. default: 5
- default:
5
- label:
Suppress SAM records for unaligned reads
- type:
basic:boolean
- description:
When true, suppress SAM records for unaligned reads. Default is false.
- default:
False
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Bowtie2 genome index
- data:index:bowtie2:bowtie2-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]
Create Bowtie2 genome index.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Bowtie2 index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Calculate coverage (bamCoverage)
- data:coverage:bigwig:calculate-bigwig (data:alignment:bam alignment, data:bedpe bedpe, basic:decimal scale, basic:integer bin_size)[Source: v2.0.1]
Calculate bigWig coverage track. Deeptools bamCoverage takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig) as output. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. For more information is available in the [bamCoverage documentation](https://deeptools.readthedocs.io/en/latest/content/tools/bamCoverage.html).
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
BEDPE Normalization factor
- type:
data:bedpe
- description:
The BEDPE file describes disjoint genome features, such as structural variations or paired-end sequence alignments. It is used to estimate the scale factor [–scaleFactor].
- required:
False
- disabled:
False
- hidden:
False
- label:
Scale for the normalization factor
- type:
basic:decimal
- description:
Magnitude of the scale factor. The scaling factor [–scaleFactor] is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).
- required:
True
- disabled:
!bedpe
- hidden:
False
- default:
10000
- label:
Bin size[–binSize]
- type:
basic:integer
- description:
Size of the bins (in bp) for the output bigWig file. A smaller bin size value will result in a higher resolution of the coverage track but also in a larger file size.
- required:
True
- disabled:
False
- hidden:
False
- default:
50
Output results
- label:
Coverage file (bigWig)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Cell Ranger Count
- data:scexpression:10x:cellranger-count (data:screads:10x: reads, data:genomeindex:10x: genome_index, basic:string chemistry, basic:integer trim_r1, basic:integer trim_r2, basic:integer expected_cells, basic:integer force_cells)[Source: v1.2.2]
Perform gene expression analysis. Generate single cell feature counts for a single library. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count
Input arguments
- label:
10x reads data object
- type:
data:screads:10x:
- required:
True
- disabled:
False
- hidden:
False
- label:
10x genome index data object
- type:
data:genomeindex:10x:
- required:
True
- disabled:
False
- hidden:
False
- label:
Chemistry
- type:
basic:string
- description:
Assay configuration. By default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection.
- required:
False
- disabled:
False
- hidden:
False
- default:
auto
- choices:
auto:
auto
threeprime:
Single Cell 3'
fiveprime:
Single Cell 5'
SC3Pv1:
Single Cell 3' v1
SC3Pv2:
Single Cell 3' v2
SC3Pv3:
Single Cell 3' v3
C5P-PE:
Single Cell 5' paired-end
SC5P-R2:
Single Cell 5' R2-only
- label:
Trim R1
- type:
basic:integer
- description:
Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3’ v2 or Single Cell 5’. This and “Trim R2” are useful for determining the optimal read length for sequencing.
- required:
False
- disabled:
False
- hidden:
False
- label:
Trim R2
- type:
basic:integer
- description:
Hard-trim the input R2 sequence to this length.
- required:
False
- disabled:
False
- hidden:
False
- label:
Expected number of recovered cells
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
3000
- label:
Force cell number
- type:
basic:integer
- description:
Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
Matrix (filtered)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Genes (filtered)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Barcodes (filtered)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Matrix (raw)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Genes (raw)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Barcodes (raw)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Report
- type:
basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Cell Ranger Mkref
- data:genomeindex:10x:cellranger-mkref (data:seq:nucleotide: genome, data:annotation:gtf: annotation)[Source: v2.1.3]
Reference preparation tool for 10x Genomics Cell Ranger. Build a Cell Ranger-compatible reference from genome FASTA and gene GTF files. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references
Input arguments
- label:
Reference genome
- type:
data:seq:nucleotide:
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation
- type:
data:annotation:gtf:
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Indexed genome
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
ChIP-Seq (Gene Score)
- data:chipseq:genescorechipseq-genescore (data:chipseq:peakscore peakscore, basic:decimal fdr, basic:decimal pval, basic:decimal logratio)[Source: v1.3.1]
Chip-Seq analysis - Gene Score (BCM)
Input arguments
- label:
PeakScore file
- type:
data:chipseq:peakscore
- description:
PeakScore file
- label:
FDR threshold
- type:
basic:decimal
- description:
FDR threshold value (default = 0.00005).
- default:
5e-05
- label:
Pval threshold
- type:
basic:decimal
- description:
Pval threshold value (default = 0.00005).
- default:
5e-05
- label:
Log-ratio threshold
- type:
basic:decimal
- description:
Log-ratio threshold value (default = 2).
- default:
2.0
Output results
- label:
Gene Score
- type:
basic:file
ChIP-Seq (Peak Score)
- data:chipseq:peakscorechipseq-peakscore (data:chipseq:callpeak:macs2 peaks, data:bed bed)[Source: v2.3.1]
Chip-Seq analysis - Peak Score (BCM)
Input arguments
- label:
MACS2 results
- type:
data:chipseq:callpeak:macs2
- description:
MACS2 results file (NarrowPeak)
- label:
BED file
- type:
data:bed
Output results
- label:
Peak Score
- type:
basic:file
ChIP-seq (MACS2)
- data:chipseq:batch:macs2macs2-batch (list:data:alignment:bam alignments, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.5.1]
This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).
Input arguments
- label:
Aligned reads
- type:
list:data:alignment:bam
- description:
Select multiple treatment/background samples.
- label:
Promoter regions BED file
- type:
data:bed
- description:
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required:
False
- label:
Use tagAlign files
- type:
basic:boolean
- description:
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- default:
True
- label:
Quality filtering threshold
- type:
basic:integer
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- default:
15000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default:
False
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required:
False
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
tagalign
- choices:
1:
1
auto:
auto
all:
all
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
!tagalign
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Q-value cutoff
- type:
basic:decimal
- description:
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required:
False
- disabled:
settings.pvalue && settings.pvalue_prepeak
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
False
- disabled:
settings.qvalue
- hidden:
tagalign
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled:
settings.qvalue
- hidden:
!tagalign || settings.qvalue
- default:
1e-05
- label:
Cap number of peaks by taking top N peaks
- type:
basic:integer
- description:
To keep all peaks set value to 0.
- disabled:
settings.broad
- default:
500000
- label:
MFOLD range (lower limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
MFOLD range (upper limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
Small local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
Large local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
extsize
- type:
basic:integer
- description:
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required:
False
- label:
Shift
- type:
basic:integer
- description:
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required:
False
- label:
Band width
- type:
basic:integer
- description:
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required:
False
- label:
Use backgroud lambda as local lambda
- type:
basic:boolean
- description:
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default:
False
- label:
Turn on the auto paired-peak model process
- type:
basic:boolean
- description:
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
tagalign
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
!tagalign
- default:
True
- label:
Down-sample
- type:
basic:boolean
- description:
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default:
False
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default:
True
- label:
Save signal per million reads for fragment pileup profiles
- type:
basic:boolean
- disabled:
settings.bedgraph === false
- default:
True
- label:
Call summits
- type:
basic:boolean
- description:
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default:
False
- label:
Composite broad regions
- type:
basic:boolean
- description:
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled:
settings.call_summits === true
- default:
False
- label:
Broad cutoff
- type:
basic:decimal
- description:
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required:
False
- disabled:
settings.call_summits === true || settings.broad !== true
- label:
Blacklist regions
- type:
data:bed
- description:
BED file containing genomic regions that should be excluded from the analysis.
- required:
False
- label:
Calculate enrichment
- type:
basic:boolean
- description:
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- default:
False
- label:
Window size
- type:
basic:integer
- description:
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- default:
400
- label:
Shift size
- type:
basic:string
- description:
Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
- default:
1:300
Output results
ChIP-seq (MACS2-ROSE2)
- data:chipseq:batch:macs2macs2-rose2-batch (list:data:alignment:bam alignments, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:boolean use_filtered_bam, basic:integer tss, basic:integer stitch, data:bed mask, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.5.1]
This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.
Input arguments
- label:
Aligned reads
- type:
list:data:alignment:bam
- description:
Select multiple treatment/background samples.
- label:
Promoter regions BED file
- type:
data:bed
- description:
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required:
False
- label:
Use tagAlign files
- type:
basic:boolean
- description:
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- default:
True
- label:
Quality filtering threshold
- type:
basic:integer
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- default:
15000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default:
False
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required:
False
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
tagalign
- choices:
1:
1
auto:
auto
all:
all
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
!tagalign
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Q-value cutoff
- type:
basic:decimal
- description:
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required:
False
- disabled:
settings.pvalue && settings.pvalue_prepeak
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
False
- disabled:
settings.qvalue
- hidden:
tagalign
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled:
settings.qvalue
- hidden:
!tagalign || settings.qvalue
- default:
1e-05
- label:
Cap number of peaks by taking top N peaks
- type:
basic:integer
- description:
To keep all peaks set value to 0.
- disabled:
settings.broad
- default:
500000
- label:
MFOLD range (lower limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
MFOLD range (upper limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
Small local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
Large local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
extsize
- type:
basic:integer
- description:
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required:
False
- label:
Shift
- type:
basic:integer
- description:
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required:
False
- label:
Band width
- type:
basic:integer
- description:
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required:
False
- label:
Use backgroud lambda as local lambda
- type:
basic:boolean
- description:
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default:
False
- label:
Turn on the auto paired-peak model process
- type:
basic:boolean
- description:
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
tagalign
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
!tagalign
- default:
True
- label:
Down-sample
- type:
basic:boolean
- description:
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default:
False
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default:
True
- label:
Save signal per million reads for fragment pileup profiles
- type:
basic:boolean
- disabled:
settings.bedgraph === false
- default:
True
- label:
Call summits
- type:
basic:boolean
- description:
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default:
False
- label:
Composite broad regions
- type:
basic:boolean
- description:
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled:
settings.call_summits === true
- default:
False
- label:
Broad cutoff
- type:
basic:decimal
- description:
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required:
False
- disabled:
settings.call_summits === true || settings.broad !== true
- label:
Use Filtered BAM File
- type:
basic:boolean
- description:
Use filtered BAM file from a MACS2 object to rank enhancers by.
- default:
True
- label:
TSS exclusion
- type:
basic:integer
- description:
Enter a distance from TSS to exclude. 0 = no TSS exclusion
- default:
0
- label:
Stitch
- type:
basic:integer
- description:
Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
- required:
False
- label:
Masking BED file
- type:
data:bed
- description:
Mask a set of regions from analysis. Provide a BED of masking regions.
- required:
False
- label:
Blacklist regions
- type:
data:bed
- description:
BED file containing genomic regions that should be excluded from the analysis.
- required:
False
- label:
Calculate enrichment
- type:
basic:boolean
- description:
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- default:
False
- label:
Window size
- type:
basic:integer
- description:
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- default:
400
- label:
Shift size
- type:
basic:string
- description:
Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
- default:
1:300
Output results
Chemical Mutagenesis
- data:workflow:chemutworkflow-chemut (basic:string analysis_type, data:seq:nucleotide genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean base_recalibration, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:integer stand_call_conf, basic:integer mbq, basic:integer read_depth)[Source: v2.1.0]
Input arguments
- label:
Analysis type
- type:
basic:string
- description:
Choice of the analysis type. Use “SNV” or “INDEL” options to run the GATK analysis only on the haploid portion of the dicty genome. Choose options SNV_CHR2 or INDEL_CHR2 to run the analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).
- default:
snv
- choices:
SNV:
snv
INDEL:
indel
SNV_CHR2:
snv_chr2
INDEL_CHR2:
indel_chr2
- label:
Reference genome
- type:
data:seq:nucleotide
- label:
Parental strains
- type:
list:data:alignment:bam
- label:
Mutant strains
- type:
list:data:alignment:bam
- label:
Do variant base recalibration
- type:
basic:boolean
- default:
False
- label:
Known sites (dbSNP)
- type:
data:variants:vcf
- required:
False
- label:
Known indels
- type:
list:data:variants:vcf
- required:
False
- hidden:
Vc.base_recalibration === false
- label:
Calling confidence threshold
- type:
basic:integer
- description:
The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.
- default:
30
- label:
Min base quality
- type:
basic:integer
- description:
Minimum base quality required to consider a base for calling.
- default:
10
- label:
Read depth cutoff
- type:
basic:integer
- description:
The minimum number of replicate reads required for a variant site to be included.
- default:
5
Output results
ChipQC
- data:chipqc:chipqc (data:alignment:bam alignment, data:chipseq:callpeak peaks, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer quality_threshold, basic:integer profile_window, basic:string shift_size)[Source: v1.4.2]
Calculate quality control metrics for ChIP-seq samples. The analysis is based on ChIPQC package which computs a variety of quality control metrics and statistics, and provides plots and a report for assessment of experimental data for further analysis.
Input arguments
- label:
Aligned reads
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Called peaks
- type:
data:chipseq:callpeak
- required:
True
- disabled:
False
- hidden:
False
- label:
Blacklist regions
- type:
data:bed
- description:
BED file containing genomic regions that should be excluded from the analysis.
- required:
False
- disabled:
False
- hidden:
False
- label:
Calculate enrichment
- type:
basic:boolean
- description:
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Mapping quality threshold
- type:
basic:integer
- description:
Only reads with mapping quality scores above this threshold will be used for some statistics.
- required:
True
- disabled:
False
- hidden:
False
- default:
15
- label:
Window size
- type:
basic:integer
- description:
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- required:
True
- disabled:
False
- hidden:
False
- default:
400
- label:
Shift size
- type:
basic:string
- description:
Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
- required:
True
- disabled:
False
- hidden:
False
- default:
1:300
Output results
- label:
ChipQC report folder
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
Cross coverage score plot
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
SSD metric plot
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Peak profile plot
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Barplot of reads in peaks
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Density plot of reads in peaks
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Heatmap of reads in genomic features
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Convert GFF3 to GTF
- data:annotation:gtfgff-to-gtf (data:annotation:gff3 annotation)[Source: v0.6.0]
Convert GFF3 file to GTF format.
Input arguments
- label:
Annotation (GFF3)
- type:
data:annotation:gff3
- description:
Annotation in GFF3 format.
Output results
- label:
Converted GTF file
- type:
basic:file
- label:
Sorted GTF file
- type:
basic:file
- label:
Igv index for sorted GTF file
- type:
basic:file
- label:
Jbrowse track for sorted GTF
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Convert files to reads (paired-end)
- data:reads:fastq:paired:files-to-fastq-paired (list:data:file src1, list:data:file src2, basic:boolean merge_lanes)[Source: v1.6.0]
Convert FASTQ files to paired-end reads.
Input arguments
- label:
Mate1
- type:
list:data:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate2
- type:
list:data:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Merge lanes
- type:
basic:boolean
- description:
Merge sample data split into multiple sequencing lanes into a single FASTQ file.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Reads file (mate 1)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Reads file (mate 2)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Upstream)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Downstream)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FasQC archive (Upstream)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FasQC archive (Downstream)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Convert files to reads (single-end)
- data:reads:fastq:single:files-to-fastq-single (list:data:file src, basic:boolean merge_lanes)[Source: v1.6.0]
Convert FASTQ files to single-end reads.
Input arguments
- label:
Reads
- type:
list:data:file
- description:
Sequencing reads in FASTQ format
- required:
True
- disabled:
False
- hidden:
False
- label:
Merge lanes
- type:
basic:boolean
- description:
Merge sample data split into multiple sequencing lanes into a single FASTQ file.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Reads file
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Cuffdiff 2.2
- data:differentialexpression:cuffdiff:cuffdiff (list:data:cufflinks:cuffquant case, list:data:cufflinks:cuffquant control, list:basic:string labels, data:annotation annotation, data:seq:nucleotide genome, basic:boolean multi_read_correct, basic:boolean create_sets, basic:decimal gene_logfc, basic:decimal gene_fdr, basic:decimal fdr, basic:string library_type, basic:string library_normalization, basic:string dispersion_method)[Source: v3.4.0]
Run Cuffdiff 2.2 analysis. Cuffdiff finds significant changes in transcript expression, splicing, and promoter use. You can use it to find differentially expressed genes and transcripts, as well as genes that are being differentially regulated at the transcriptional and post-transcriptional level. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/) and [here](https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cuffdiff/7) for more information.
Input arguments
- label:
Case samples
- type:
list:data:cufflinks:cuffquant
- required:
True
- disabled:
False
- hidden:
False
- label:
Control samples
- type:
list:data:cufflinks:cuffquant
- required:
True
- disabled:
False
- hidden:
False
- label:
Group labels
- type:
list:basic:string
- description:
Define labels for each sample group.
- required:
True
- disabled:
False
- hidden:
False
- default:
['control', 'case']
- label:
Annotation (GTF/GFF3)
- type:
data:annotation
- description:
A transcript annotation file produced by cufflinks, cuffcompare, or other tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Run bias detection and correction algorithm
- type:
data:seq:nucleotide
- description:
Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
- required:
False
- disabled:
False
- hidden:
False
- label:
Do initial estimation procedure to more accurately weight reads with multiple genome mappings
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Create gene sets
- type:
basic:boolean
- description:
After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Log2 fold change threshold for gene sets
- type:
basic:decimal
- description:
Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
- required:
True
- disabled:
False
- hidden:
!create_sets
- default:
1.0
- label:
FDR threshold for gene sets
- type:
basic:decimal
- required:
True
- disabled:
False
- hidden:
!create_sets
- default:
0.05
- label:
Allowed FDR
- type:
basic:decimal
- description:
The allowed false discovery rate. The default is 0.05.
- required:
True
- disabled:
False
- hidden:
False
- default:
0.05
- label:
Library type
- type:
basic:string
- description:
In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
- required:
True
- disabled:
False
- hidden:
False
- default:
fr-unstranded
- choices:
fr-unstranded:
fr-unstranded
fr-firststrand:
fr-firststrand
fr-secondstrand:
fr-secondstrand
- label:
Library normalization method
- type:
basic:string
- description:
You can control how library sizes (i.e. sequencing depths) are normalized in Cufflinks and Cuffdiff. Cuffdiff has several methods that require multiple libraries in order to work. Library normalization methods supported by Cufflinks work on one library at a time.
- required:
True
- disabled:
False
- hidden:
False
- default:
geometric
- choices:
geometric:
geometric
classic-fpkm:
classic-fpkm
quartile:
quartile
- label:
Dispersion method
- type:
basic:string
- description:
Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010).
- required:
True
- disabled:
False
- hidden:
False
- default:
pooled
- choices:
pooled:
pooled
per-condition:
per-condition
blind:
blind
poisson:
poisson
Output results
- label:
Differential expression
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Results table (JSON)
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Results table (file)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Differential expression (transcript level)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Differential expression (primary transcript)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Differential expression (coding sequence)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Cuffdiff output
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Cufflinks 2.2
- data:cufflinks:cufflinkscufflinks (data:alignment:bam alignment, data:annotation annotation, data:seq:nucleotide genome, data:annotation:gtf mask_file, basic:string library_type, basic:string annotation_usage, basic:boolean multi_read_correct)[Source: v3.2.1]
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. See [here](http://cole-trapnell-lab.github.io/cufflinks/) for more information.
Input arguments
- label:
Aligned reads
- type:
data:alignment:bam
- label:
Annotation (GTF/GFF3)
- type:
data:annotation
- required:
False
- label:
Run bias detection and correction algorithm
- type:
data:seq:nucleotide
- description:
Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
- required:
False
- label:
Mask file
- type:
data:annotation:gtf
- description:
Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
- required:
False
- label:
Library type
- type:
basic:string
- description:
In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
- default:
fr-unstranded
- choices:
fr-unstranded:
fr-unstranded
fr-firststrand:
fr-firststrand
fr-secondstrand:
fr-secondstrand
- label:
Instruct Cufflinks how to use the provided annotation (GFF/GTF) file
- type:
basic:string
- description:
GTF-guide - tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled. –GTF - tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript.
- default:
--GTF-guide
- choices:
Use supplied reference annotation to guide RABT assembly (–GTF-guide):
--GTF-guide
Use supplied reference annotation to estimate isoform expression (–GTF):
--GTF
- label:
Do initial estimation procedure to more accurately weight reads with multiple genome mappings
- type:
basic:boolean
- description:
Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
- default:
False
Output results
- label:
Assembled transcript isoforms
- type:
basic:file
- label:
Isoforms FPKM tracking
- type:
basic:file
- label:
Genes FPKM tracking
- type:
basic:file
- label:
Skipped loci
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Cuffmerge
- data:annotation:cuffmergecuffmerge (list:data:cufflinks:cufflinks expressions, list:data:annotation:gtf gtf, data:annotation gff, data:seq:nucleotide genome, basic:integer threads)[Source: v2.2.0]
Cufflinks includes a script called Cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. The main purpose of Cuffmerge is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/) for more information.
Input arguments
- label:
Cufflinks transcripts (GTF)
- type:
list:data:cufflinks:cufflinks
- required:
False
- label:
Annotation files (GTF)
- type:
list:data:annotation:gtf
- description:
Annotation files you wish to merge together with Cufflinks produced annotation files (e.g. upload Cufflinks annotation GTF file)
- required:
False
- label:
Reference annotation (GTF/GFF3)
- type:
data:annotation
- description:
An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.
- required:
False
- label:
Reference genome
- type:
data:seq:nucleotide
- description:
This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats. Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension
- required:
False
- label:
Use this many processor threads
- type:
basic:integer
- description:
Use this many threads to align reads. The default is 1.
- default:
1
Output results
- label:
Merged GTF file
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Cuffnorm
- data:cuffnormcuffnorm (list:data:cufflinks:cuffquant cuffquant, data:annotation annotation, basic:boolean useERCC)[Source: v2.5.0]
Cufflinks includes a program, Cuffnorm, that you can use to generate tables of expression values that are properly normalized for library size. Cuffnorm takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM, BAM, or CXB files for two or more samples. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/) for more information. Replicate relation needs to be defined for Cuffnorm to account for replicates. If the replicate relation is not defined, each sample will be treated individually.
Input arguments
- label:
Cuffquant expression file
- type:
list:data:cufflinks:cuffquant
- label:
Annotation (GTF/GFF3)
- type:
data:annotation
- description:
A transcript annotation file produced by cufflinks, cuffcompare, or other source.
- label:
ERCC spike-in normalization
- type:
basic:boolean
- description:
Use ERRCC spike-in controls for normalization.
- default:
False
Output results
- label:
Genes count
- type:
basic:file
- label:
Genes FPKM
- type:
basic:file
- label:
Genes attr table
- type:
basic:file
- label:
Isoform count
- type:
basic:file
- label:
Isoform FPKM
- type:
basic:file
- label:
Isoform attr table
- type:
basic:file
- label:
CDS count
- type:
basic:file
- label:
CDS FPKM
- type:
basic:file
- label:
CDS attr table
- type:
basic:file
- label:
TSS groups count
- type:
basic:file
- label:
TSS groups FPKM
- type:
basic:file
- label:
TSS attr table
- type:
basic:file
- label:
Run info
- type:
basic:file
- label:
FPKM exp scatter plot
- type:
basic:file
- label:
Boxplot
- type:
basic:file
- label:
FPKM exp raw
- type:
basic:file
- label:
Replicate correlatios plot
- type:
basic:file
- label:
FPKM means
- type:
basic:file
- label:
Exp FPKM means
- type:
basic:file
- label:
FKPM exp scatter normalized plot
- type:
basic:file
- required:
False
- label:
FPKM exp normalized
- type:
basic:file
- required:
False
- label:
Spike raw
- type:
basic:file
- required:
False
- label:
Spike normalized
- type:
basic:file
- required:
False
- label:
All R normalization data
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Cuffquant 2.2
- data:cufflinks:cuffquantcuffquant (data:alignment:bam alignment, data:annotation annotation, data:seq:nucleotide genome, data:annotation:gtf mask_file, basic:string library_type, basic:boolean multi_read_correct)[Source: v2.3.1]
Cuffquant allows you to compute the gene and transcript expression profiles and save these profiles to files that you can analyze later with Cuffdiff or Cuffnorm. See [here](http://cole-trapnell-lab.github.io/cufflinks/manual/) for more information.
Input arguments
- label:
Aligned reads
- type:
data:alignment:bam
- label:
Annotation (GTF/GFF3)
- type:
data:annotation
- label:
Run bias detection and correction algorithm
- type:
data:seq:nucleotide
- description:
Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
- required:
False
- label:
Mask file
- type:
data:annotation:gtf
- description:
Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
- required:
False
- label:
Library type
- type:
basic:string
- description:
In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
- default:
fr-unstranded
- choices:
fr-unstranded:
fr-unstranded
fr-firststrand:
fr-firststrand
fr-secondstrand:
fr-secondstrand
- label:
Do initial estimation procedure to more accurately weight reads with multiple genome mappings
- type:
basic:boolean
- description:
Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
- default:
False
Output results
- label:
Abundances (.cxb)
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Cuffquant results
- data:cufflinks:cuffquantupload-cxb (basic:file src, basic:string source, basic:string species, basic:string build, basic:string feature_type)[Source: v1.3.3]
Upload Cuffquant results file (.cxb)
Input arguments
- label:
Cuffquant file
- type:
basic:file
- description:
Upload Cuffquant results file. Supported extention: *.cxb
- required:
True
- validate_regex:
\.(cxb)$
- label:
Gene ID database
- type:
basic:string
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
- default:
gene
- choices:
gene:
gene
transcript:
transcript
exon:
exon
Output results
- label:
Cuffquant results
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
Cut & Run
- data:workflow:cutnrunworkflow-cutnrun (data:reads:fastq:paired reads, basic:integer quality, basic:integer nextseq, basic:string phred, basic:integer min_length, basic:integer max_n, basic:boolean retain_unpaired, basic:integer unpaired_len_1, basic:integer unpaired_len_2, basic:integer clip_r1, basic:integer clip_r2, basic:integer three_prime_r1, basic:integer three_prime_r2, list:basic:string adapter, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, basic:integer trim_5, basic:integer trim_3, data:index:bowtie2 genome, basic:string mode, basic:string speed, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:boolean no_unal, data:index:bowtie2 genome, basic:string mode, basic:string speed, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:boolean no_unal, basic:string format, basic:decimal pvalue, basic:string duplicates, basic:boolean bedgraph, basic:integer min_frag_length, basic:integer max_frag_length, basic:decimal scale)[Source: v1.6.0]
Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome. Aligned reads are processed to produce bigwig files to be viewed in a genome browser. Peaks are called using MACS2. Lenght-selection of reads is performed using alignmentSieve tool from the deeptools package.
Input arguments
- label:
Input reads
- type:
data:reads:fastq:paired
- label:
Quality cutoff
- type:
basic:integer
- description:
Trim low-quality ends from reads based on Phred score.
- required:
False
- label:
NextSeq/NovaSeq trim cutoff
- type:
basic:integer
- description:
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
- required:
False
- label:
Phred score encoding
- type:
basic:string
- description:
Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1 .9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming.
- default:
--phred33
- choices:
ASCII+33:
--phred33
ASCII+64:
--phred64
- label:
Minimum length after trimming
- type:
basic:integer
- description:
Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.
- default:
20
- label:
Maximum number of Ns
- type:
basic:integer
- description:
Read exceeding this limit will result in the entire pair being removed from the trimmed output files.
- required:
False
- label:
Retain unpaired reads after trimming
- type:
basic:boolean
- description:
If only one of the two paired-end reads “became too short, the longer read will be written.
- default:
False
- label:
Unpaired read length cutoff of mate 1
- type:
basic:integer
- hidden:
!quality_trim.retain_unpaired
- default:
35
- label:
Unpaired read length cutoff for mate 2
- type:
basic:integer
- hidden:
!quality_trim.retain_unpaired
- default:
35
- label:
Trim bases from 5’ end of read 1
- type:
basic:integer
- description:
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.
- required:
False
- label:
Trim bases from 5’ end of read 2
- type:
basic:integer
- description:
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.
- required:
False
- label:
Trim bases from 3’ end of read 1
- type:
basic:integer
- description:
Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
- required:
False
- label:
Trim bases from 3’ end of read 2
- type:
basic:integer
- description:
Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
- required:
False
- label:
Read 1 adapter sequence
- type:
list:basic:string
- description:
Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.
- required:
False
- label:
Read 2 adapter sequence
- type:
list:basic:string
- description:
Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.
- required:
False
- label:
Read 1 adapters file
- type:
data:seq:nucleotide
- description:
This is mutually exclusive with read 1 adapters and universal adapters.
- required:
False
- label:
Read 2 adapters file
- type:
data:seq:nucleotide
- description:
This is mutually exclusive with read 2 adapters and universal adapters.
- required:
False
- label:
Universal adapters
- type:
basic:string
- description:
Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.
- required:
False
- choices:
Illumina:
--illumina
Nextera:
--nextera
Illumina small RNA:
--small_rna
- label:
Overlap with adapter sequence required to trim
- type:
basic:integer
- description:
Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
- default:
1
- label:
Maximum allowed error rate
- type:
basic:decimal
- description:
Number of errors divided by the length of the matching region. Default value of 0.1.
- default:
0.1
- label:
Hard trim sequence from 3’ end
- type:
basic:integer
- description:
Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.
- required:
False
- label:
Hard trim sequences from 5’ end
- type:
basic:integer
- description:
Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.
- required:
False
- label:
Species genome
- type:
data:index:bowtie2
- label:
Alignment mode
- type:
basic:string
- description:
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default:
--local
- choices:
end to end mode:
--end-to-end
local:
--local
- label:
Speed vs. Sensitivity
- type:
basic:string
- description:
A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- default:
--very-sensitive
- choices:
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label:
Report discordantly matched read
- type:
basic:boolean
- description:
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default:
True
- label:
Report single ended
- type:
basic:boolean
- description:
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).
- default:
True
- label:
Minimal distance
- type:
basic:integer
- description:
The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.
- default:
10
- label:
Maximal distance
- type:
basic:integer
- description:
The maximum fragment length (–maxins) for valid paired-end alignments.
- default:
700
- label:
Not concordant when mates overlap
- type:
basic:boolean
- description:
When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).
- default:
False
- label:
Dovetail
- type:
basic:boolean
- description:
If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.
- default:
False
- label:
Suppress SAM records for unaligned reads
- type:
basic:boolean
- description:
When true, suppress SAM records for unaligned reads. Default is true (–no-unal).
- default:
True
- label:
Spike-in genome
- type:
data:index:bowtie2
- label:
Alignment mode
- type:
basic:string
- description:
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default:
--local
- choices:
end to end mode:
--end-to-end
local:
--local
- label:
Speed vs. Sensitivity
- type:
basic:string
- description:
A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- default:
--very-sensitive
- choices:
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label:
Report discordantly matched read
- type:
basic:boolean
- description:
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default:
True
- label:
Report single ended
- type:
basic:boolean
- description:
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).
- default:
True
- label:
Minimal distance
- type:
basic:integer
- description:
The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.
- default:
10
- label:
Maximal distance
- type:
basic:integer
- description:
The maximum fragment length (–maxins) for valid paired-end alignments.
- default:
700
- label:
Not concordant when mates overlap
- type:
basic:boolean
- description:
When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).
- default:
True
- label:
Dovetail
- type:
basic:boolean
- description:
If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.
- default:
False
- label:
Suppress SAM records for unaligned reads
- type:
basic:boolean
- description:
When true, suppress SAM records for unaligned reads. Default is true (–no-unal).
- default:
True
- label:
Format of tag file
- type:
basic:string
- description:
This specifies the format of input files. For paired-end data the format dicates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.
- required:
False
- default:
BAMPE
- choices:
BAM:
BAM
BAMPE:
BAMPE
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff.
- required:
False
- default:
0.001
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10(pvalue) and -log10(qvalue) scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default:
True
- label:
Minimum fragment length
- type:
basic:integer
- description:
The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. Default is 0.
- default:
0
- label:
Maximum fragment length
- type:
basic:integer
- description:
The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. Default is 0.
- default:
0
- label:
Scale factor
- type:
basic:decimal
- description:
Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).
- default:
10000
Output results
Cutadapt (3’ mRNA-seq, single-end)
- data:reads:fastq:single:cutadapt:cutadapt-3prime-single (data:reads:fastq:single reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap, basic:integer times)[Source: v1.4.2]
Process 3’ mRNA-seq datasets using Cutadapt tool.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:single
- required:
True
- disabled:
False
- hidden:
False
- label:
NextSeq/NovaSeq trim
- type:
basic:integer
- description:
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Quality cutoff
- type:
basic:integer
- description:
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
- required:
False
- disabled:
False
- hidden:
False
- label:
Discard reads shorter than specified minimum length.
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Mimimum overlap
- type:
basic:integer
- description:
Minimum overlap between adapter and read for an adapter to be found.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Remove up to a specified number of adapters from each read.
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
2
Output results
- label:
Reads file.
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Cutadapt report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC.
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive.
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Cutadapt (Corall RNA-Seq, paired-end)
- data:reads:fastq:paired:cutadapt:cutadapt-corall-paired (data:reads:fastq:paired reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap)[Source: v1.3.2]
Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:paired
- required:
True
- disabled:
False
- hidden:
False
- label:
NextSeq/NovaSeq trim
- type:
basic:integer
- description:
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Quality cutoff
- type:
basic:integer
- description:
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Mimimum overlap
- type:
basic:integer
- description:
Minimum overlap between adapter and read for an adapter to be found.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
Output results
- label:
Remaining mate1 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Remaining mate2 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Cutadapt report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate1 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate2 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate1 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate2 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Cutadapt (Corall RNA-Seq, single-end)
- data:reads:fastq:single:cutadapt:cutadapt-corall-single (data:reads:fastq:single reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap)[Source: v1.4.2]
Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:single
- required:
True
- disabled:
False
- hidden:
False
- label:
NextSeq/NovaSeq trim
- type:
basic:integer
- description:
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Quality cutoff
- type:
basic:integer
- description:
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Mimimum overlap
- type:
basic:integer
- description:
Minimum overlap between adapter and read for an adapter to be found.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
Output results
- label:
Reads file
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Cutadapt report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Cutadapt (paired-end)
- data:reads:fastq:paired:cutadaptcutadapt-paired (data:reads:fastq:paired reads, data:seq:nucleotide mate1_5prime_file, data:seq:nucleotide mate1_3prime_file, data:seq:nucleotide mate2_5prime_file, data:seq:nucleotide mate2_3prime_file, list:basic:string mate1_5prime_seq, list:basic:string mate1_3prime_seq, list:basic:string mate2_5prime_seq, list:basic:string mate2_3prime_seq, basic:integer times, basic:decimal error_rate, basic:integer min_overlap, basic:boolean match_read_wildcards, basic:boolean no_indels, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer maxlen, basic:integer max_n, basic:string pair_filter)[Source: v2.7.2]
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:paired
- label:
5 prime adapter file for Mate 1
- type:
data:seq:nucleotide
- required:
False
- label:
3 prime adapter file for Mate 1
- type:
data:seq:nucleotide
- required:
False
- label:
5 prime adapter file for Mate 2
- type:
data:seq:nucleotide
- required:
False
- label:
3 prime adapter file for Mate 2
- type:
data:seq:nucleotide
- required:
False
- label:
5 prime adapter sequence for Mate 1
- type:
list:basic:string
- required:
False
- label:
3 prime adapter sequence for Mate 1
- type:
list:basic:string
- required:
False
- label:
5 prime adapter sequence for Mate 2
- type:
list:basic:string
- required:
False
- label:
3 prime adapter sequence for Mate 2
- type:
list:basic:string
- required:
False
- label:
Times
- type:
basic:integer
- description:
Remove up to COUNT adapters from each read.
- default:
1
- label:
Error rate
- type:
basic:decimal
- description:
Maximum allowed error rate (no. of errors divided by the length of the matching region).
- default:
0.1
- label:
Minimal overlap
- type:
basic:integer
- description:
Minimum overlap for an adapter match.
- default:
3
- label:
Match read wildcards
- type:
basic:boolean
- description:
Interpret IUPAC wildcards in reads.
- default:
False
- label:
No indels
- type:
basic:boolean
- description:
Disable (disallow) insertions and deletions in adapters.
- default:
False
- label:
NextSeq-specific quality trimming
- type:
basic:integer
- description:
NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
- required:
False
- label:
Quality on 5 prime
- type:
basic:integer
- description:
Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base.
- required:
False
- label:
Quality on 3 prime
- type:
basic:integer
- description:
Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base.
- required:
False
- label:
Crop
- type:
basic:integer
- description:
Cut the specified number of bases from the end of the reads.
- required:
False
- label:
Headcrop
- type:
basic:integer
- description:
Cut the specified number of bases from the start of the reads.
- required:
False
- label:
Min length
- type:
basic:integer
- description:
Drop the read if it is below a specified.
- required:
False
- label:
Max length
- type:
basic:integer
- description:
Drop the read if it is above a specified length.
- required:
False
- label:
Max numebr of N-s
- type:
basic:integer
- description:
Discard reads having more ‘N’ bases than specified.
- required:
False
- label:
Which of the reads have to match the filtering criterion
- type:
basic:string
- description:
Which of the reads in a paired-end read have to match the filtering criterion in order for the pair to be filtered.
- default:
any
- choices:
Any of the reads in a paired-end read have to match the filtering criterion:
any
Both of the reads in a paired-end read have to match the filtering criterion:
both
Output results
- label:
Reads file (forward)
- type:
list:basic:file
- label:
Reads file (reverse)
- type:
list:basic:file
- label:
Cutadapt report
- type:
basic:file
- label:
Quality control with FastQC (forward)
- type:
list:basic:file:html
- label:
Quality control with FastQC (reverse)
- type:
list:basic:file:html
- label:
Download FastQC archive (forward)
- type:
list:basic:file
- label:
Download FastQC archive (reverse)
- type:
list:basic:file
Cutadapt (single-end)
- data:reads:fastq:single:cutadaptcutadapt-single (data:reads:fastq:single reads, data:seq:nucleotide up_primers_file, data:seq:nucleotide down_primers_file, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:integer polya_tail, basic:integer min_overlap, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer maxlen, basic:integer max_n, basic:boolean match_read_wildcards, basic:boolean no_indels, basic:integer times, basic:decimal error_rate)[Source: v2.5.2]
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:single
- label:
5 prime adapter file
- type:
data:seq:nucleotide
- required:
False
- label:
3 prime adapter file
- type:
data:seq:nucleotide
- required:
False
- label:
5 prime adapter sequence
- type:
list:basic:string
- required:
False
- label:
3 prime adapter sequence
- type:
list:basic:string
- required:
False
- label:
Poly-A tail
- type:
basic:integer
- description:
Length of poly-A tail, example - AAAN -> 3, AAAAAN -> 5
- required:
False
- label:
Minimal overlap
- type:
basic:integer
- description:
Minimum overlap for an adapter match
- default:
3
- label:
NextSeq-specific quality trimming
- type:
basic:integer
- description:
NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
- required:
False
- label:
Quality on 5 prime
- type:
basic:integer
- description:
Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
- required:
False
- label:
Quality on 3 prime
- type:
basic:integer
- description:
Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
- required:
False
- label:
Crop
- type:
basic:integer
- description:
Cut the read to a specified length by removing bases from the end
- required:
False
- label:
Headcrop
- type:
basic:integer
- description:
Cut the specified number of bases from the start of the read
- required:
False
- label:
Min length
- type:
basic:integer
- description:
Drop the read if it is below a specified length
- required:
False
- label:
Max length
- type:
basic:integer
- description:
Drop the read if it is above a specified length.
- required:
False
- label:
Max numebr of N-s
- type:
basic:integer
- description:
Discard reads having more ‘N’ bases than specified.
- required:
False
- label:
Match read wildcards
- type:
basic:boolean
- description:
Interpret IUPAC wildcards in reads.
- required:
False
- default:
False
- label:
No indels
- type:
basic:boolean
- description:
Disable (disallow) insertions and deletions in adapters.
- default:
False
- label:
Times
- type:
basic:integer
- description:
Remove up to COUNT adapters from each read.
- default:
1
- label:
Error rate
- type:
basic:decimal
- description:
Maximum allowed error rate (no. of errors divided by the length of the matching region).
- default:
0.1
Output results
- label:
Reads file
- type:
list:basic:file
- label:
Cutadapt report
- type:
basic:file
- label:
Quality control with FastQC
- type:
list:basic:file:html
- label:
Download FastQC archive
- type:
list:basic:file
Cutadapt - STAR - StringTie (Corall, paired-end)
- data:workflow:rnaseq:corallworkflow-corall-paired (data:reads:fastq:paired reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string feature_class, basic:string id_attribute)[Source: v5.2.0]
RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:paired
- label:
Genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- label:
Annotation
- type:
data:annotation
- description:
Genome annotation file (GTF).
- label:
Indexed rRNA reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- label:
Indexed Globin reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- label:
Reads quality cutoff
- type:
basic:integer
- description:
Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
- required:
False
- label:
Number of reads
- type:
basic:integer
- default:
1000000
- label:
Seed
- type:
basic:integer
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default:
False
- label:
Feature class
- type:
basic:string
- description:
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- default:
exon
- label:
ID attribute
- type:
basic:string
- description:
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- default:
gene_id
- choices:
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
Output results
Cutadapt - STAR - StringTie (Corall, single-end)
- data:workflow:rnaseq:corallworkflow-corall-single (data:reads:fastq:single reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string feature_class, basic:string id_attribute)[Source: v5.2.0]
RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:single
- label:
Genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- label:
Annotation
- type:
data:annotation
- description:
Genome annotation file (GTF).
- label:
Indexed rRNA reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- label:
Indexed Globin reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- label:
Reads quality cutoff
- type:
basic:integer
- description:
Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
- required:
False
- label:
Number of reads
- type:
basic:integer
- default:
1000000
- label:
Seed
- type:
basic:integer
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default:
False
- label:
Feature class
- type:
basic:string
- description:
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- default:
exon
- label:
ID attribute
- type:
basic:string
- description:
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- default:
gene_id
- choices:
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
Output results
DESeq2
- data:differentialexpression:deseq2:differentialexpression-deseq2 (list:data:expression case, list:data:expression control, basic:boolean create_sets, basic:decimal logfc, basic:decimal fdr, basic:boolean beta_prior, basic:boolean count, basic:integer min_count_sum, basic:boolean cook, basic:decimal cooks_cutoff, basic:boolean independent, basic:decimal alpha)[Source: v3.6.0]
Run DESeq2 analysis. The DESeq2 package estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. See [here](https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf) and [here](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) for more information.
Input arguments
- label:
Case
- type:
list:data:expression
- description:
Case samples (replicates)
- required:
True
- disabled:
False
- hidden:
False
- label:
Control
- type:
list:data:expression
- description:
Control samples (replicates)
- required:
True
- disabled:
False
- hidden:
False
- label:
Create gene sets
- type:
basic:boolean
- description:
After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Log2 fold change threshold for gene sets
- type:
basic:decimal
- description:
Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
- required:
True
- disabled:
False
- hidden:
!create_sets
- default:
1.0
- label:
FDR threshold for gene sets
- type:
basic:decimal
- required:
True
- disabled:
False
- hidden:
!create_sets
- default:
0.05
- label:
Beta prior
- type:
basic:boolean
- description:
Whether or not to put a zero-mean normal prior on the non-intercept coefficients.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Filter genes based on expression count
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Minimum gene expression count summed over all samples
- type:
basic:integer
- description:
Filter genes in the expression matrix input. Remove genes where the expression count sum over all samples is below the threshold.
- required:
True
- disabled:
False
- hidden:
!filter_options.count
- default:
10
- label:
Filter genes based on Cook’s distance
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Threshold on Cook’s distance
- type:
basic:decimal
- description:
If one or more samples have Cook’s distance larger than the threshold set here, the p-value for the row is set to NA. If left empty, the default threshold of 0.99 quantile of the F(p, m-p) distribution is used, where p is the number of coefficients being fitted and m is the number of samples. This test excludes Cook’s distance of samples belonging to experimental groups with only two samples.
- required:
False
- disabled:
False
- hidden:
!filter_options.cook
- label:
Apply independent gene filtering
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Significance cut-off used for optimizing independent gene filtering
- type:
basic:decimal
- description:
The value should be set to adjusted p-value cut-off (FDR).
- required:
True
- disabled:
False
- hidden:
!filter_options.independent
- default:
0.1
Output results
- label:
Differential expression
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Results table (JSON)
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Results table (file)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Count matrix
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Normalized count matrix (median of ratios)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Detect library strandedness
- data:strandednesslibrary-strandedness (data:reads:fastq reads, basic:integer read_number, data:index:salmon salmon_index)[Source: v0.6.2]
This process uses the Salmon transcript quantification tool to automatically infer the NGS library strandedness. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)
Input arguments
- label:
Sequencing reads
- type:
data:reads:fastq
- description:
Sequencing reads in .fastq format. Both single and paired-end libraries are supported
- label:
Number of input reads
- type:
basic:integer
- description:
Number of sequencing reads that are subsampled from each of the original .fastq files before library strand detection
- default:
50000
- label:
Transcriptome index file
- type:
data:index:salmon
- description:
Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results
Output results
- label:
Library strandedness type
- type:
basic:string
- description:
The predicted library strandedness type. The codes U and IU indicate ‘strand non-specific’ library for single or paired-end reads, respectively. Codes SF and ISF correspond to the ‘strand-specific forward’ library, for the single or paired-end reads, respectively. For ‘strand-specific reverse’ library, the corresponding codes are SR and ISR. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)
- label:
Compatible fragment ratio
- type:
basic:decimal
- description:
The ratio of fragments that support the predicted library strandedness type
- label:
Log file
- type:
basic:file
- description:
Analysis log file.
Dictyostelium expressions
- data:expression:polyaexpression-dicty (data:alignment:bam alignment, data:annotation:gff3 gff, data:mappability:bcm mappable)[Source: v1.4.2]
Dictyostelium-specific pipeline. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
Input arguments
- label:
Aligned sequence
- type:
data:alignment:bam
- label:
Features (GFF3)
- type:
data:annotation:gff3
- label:
Mappability
- type:
data:mappability:bcm
Output results
- label:
Expression RPKUM (polyA)
- type:
basic:file
- description:
mRNA reads scaled by uniquely mappable part of exons.
- label:
Expression RPKM (polyA)
- type:
basic:file
- description:
mRNA reads scaled by exon length.
- label:
Read counts (polyA)
- type:
basic:file
- description:
mRNA reads uniquely mapped to gene exons.
- label:
Expression RPKUM
- type:
basic:file
- description:
Reads scaled by uniquely mappable part of exons.
- label:
Expression RPKM
- type:
basic:file
- description:
Reads scaled by exon length.
- label:
Read counts (raw)
- type:
basic:file
- description:
Reads uniquely mapped to gene exons.
- label:
Expression RPKUM (polyA) (json)
- type:
basic:json
- label:
Expression Type (default output)
- type:
basic:string
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
Differential Expression (table)
- data:differentialexpression:uploadupload-diffexp (basic:file src, basic:string gene_id, basic:string logfc, basic:string fdr, basic:string logodds, basic:string fwer, basic:string pvalue, basic:string stat, basic:string source, basic:string species, basic:string build, basic:string feature_type, list:data:expression case, list:data:expression control)[Source: v1.5.1]
Upload Differential Expression table.
Input arguments
- label:
Differential expression file
- type:
basic:file
- description:
Differential expression file. Supported file types: *.xls, *.xlsx, *.tab (tab-delimited file), *.diff. DE file must include columns with log2(fold change) and FDR or pval information. DE file must contain header row with column names. Accepts DESeq, DESeq2, edgeR and CuffDiff output files.
- validate_regex:
\.(xls|xlsx|tab|tab.gz|diff|diff.gz)$
- label:
Gene ID label
- type:
basic:string
- label:
LogFC label
- type:
basic:string
- label:
FDR label
- type:
basic:string
- required:
False
- label:
LogOdds label
- type:
basic:string
- required:
False
- label:
FWER label
- type:
basic:string
- required:
False
- label:
Pvalue label
- type:
basic:string
- required:
False
- label:
Statistics label
- type:
basic:string
- required:
False
- label:
Gene ID database
- type:
basic:string
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Build
- type:
basic:string
- description:
Genome build or annotation version.
- label:
Feature type
- type:
basic:string
- default:
gene
- choices:
gene:
gene
transcript:
transcript
exon:
exon
- label:
Case
- type:
list:data:expression
- description:
Case samples (replicates)
- required:
False
- label:
Control
- type:
list:data:expression
- description:
Control samples (replicates)
- required:
False
Output results
- label:
Differential expression
- type:
basic:file
- label:
Results table (JSON)
- type:
basic:json
- label:
Results table (file)
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
Differential expression of shRNA
- data:shrna:differentialexpression:differentialexpression-shrna (data:file parameter_file, list:data:expression:shrna2quant: expression_data)[Source: v1.3.0]
Performing differential expression on a list of objects. Analysis starts by inputting a set of expression files (count matrices) and a parameter file. Parameter file is an xlsx file and consists of tabs: - `sample_key`: Should have column sample with exact sample name as input expression file(s), columns defining treatment and lastly a column which indicates replicate. - `contrasts`: Define groups which will be used to perform differential expression analysis. Model for DE uses these contrasts and replicate number. In R annotation, this would be ` ~ 1 + group + replicate`. Table should have two columns named `group_1` and `group_2`. - `overall_contrasts`: This is a layer “above” `contrasts`, where results from two contrasts are compared for lethal, beneficial and neutral species. Thresholds governing classification can be found in `classification_parameters` tab. - `classification_parameters`: This tab holds three columns, `threshold`, `value` and `description`. Only the first two are used in the workflow, description is for your benefit. This process outputs DESeq2 results, classified results based on provided thresholds and counts of beneficial and lethal species.
Input arguments
- label:
Excel parameter file (.xlsx)
- type:
data:file
- description:
Select .xlsx file which holds parameters for analysis. See [here](https://github.com/genialis/shRNAde/blob/master/inst/extdata/template_doDE_inputs.xlsx) for a template.
- required:
True
- disabled:
False
- hidden:
False
- label:
List of expression files from shrna2quant
- type:
list:data:expression:shrna2quant:
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
DESeq2 results
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Results classified based on thresholds provided by the user
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
shRNAs considered as beneficial based on user input
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
shRNAs considered as lethal based on user input
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Ensembl Variant Effect Predictor
- data:variants:vcf:vep:ensembl-vep (data:variants:vcf vcf, data:vep:cache cache, data:seq:nucleotide ref_seq, basic:integer n_forks)[Source: v2.1.0]
Run Ensembl-VEP. VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. This process accepts VCF file and VEP cache directory to produce VCF file with annotated variants, its index and summary of the procces.
Input arguments
- label:
Input VCF file
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Cache directory for Ensembl-VEP
- type:
data:vep:cache
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of forks
- type:
basic:integer
- description:
Using forking enables VEP to run multiple parallel threads, with each thread processing a subset of your input. Forking can dramatically improve runtime.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
Output results
- label:
Annotated VCF file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Summary of the analysis
- type:
basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Ensembl-VEP cache directory
- data:vep:cache:upload-vep-cache (basic:file cache_file, basic:string species, basic:string build, basic:string release)[Source: v1.1.0]
Import VEP cache directory.
Input arguments
- label:
Compressed cache directory
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu.
- required:
True
- disabled:
False
- hidden:
False
- default:
Homo sapiens
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
- label:
Genome build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Cache release
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Cache directory
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Cache release
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Expression Time Course
- data:etcetc-bcm (list:data:expression expressions, basic:boolean avg)[Source: v1.2.2]
Select gene expression data and form a time course.
Input arguments
- label:
RPKM expression profile
- type:
list:data:expression
- required:
True
- label:
Average by time
- type:
basic:boolean
- default:
True
Output results
- label:
Expression time course file
- type:
basic:file
- label:
Expression time course
- type:
basic:json
Expression aggregator
- data:aggregator:expressionexpression-aggregator (list:data:expression exps, basic:string group_by, data:aggregator:expression expr_aggregator)[Source: v0.5.1]
Collect expression data from samples grouped by sample descriptor field. The Expression aggregator process should not be run in Batch Mode, as this will create redundant outputs. Rather, select multiple samples below for which you wish to aggregate the expression matrix.
Input arguments
- label:
Expressions
- type:
list:data:expression
- label:
Sample descriptor field
- type:
basic:string
- label:
Expression aggregator
- type:
data:aggregator:expression
- required:
False
Output results
- label:
Expression matrix
- type:
basic:file
- label:
Box plot
- type:
basic:json
- label:
Log box plot
- type:
basic:json
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Expression type
- type:
basic:string
Expression matrix
- data:expressionsetmergeexpressions (list:data:expression exps, list:basic:string genes)[Source: v1.4.2]
Merge expression data to create an expression matrix where each column represents all the gene expression levels from a single experiment, and each row represents the expression of a gene across all experiments.
Input arguments
- label:
Gene expressions
- type:
list:data:expression
- label:
Filter genes
- type:
list:basic:string
- required:
False
Output results
- label:
Expression set
- type:
basic:file
- label:
Expression set type
- type:
basic:string
Expression time course
- data:etcupload-etc (basic:file src)[Source: v1.4.1]
Upload Expression time course.
Input arguments
- label:
Expression time course file (xls or tab)
- type:
basic:file
- description:
Expression time course
- required:
True
- validate_regex:
\.(xls|xlsx|tab)$
Output results
- label:
Expression time course file
- type:
basic:file
- label:
Expression time course
- type:
basic:json
FASTA file
- data:seq:nucleotide:upload-fasta-nucl (basic:file src, basic:string species, basic:string build)[Source: v3.2.0]
Import nucleotide sequence file in FASTA format. FASTA file is a text-based format for representing nucleotide sequences, in which nucleotides are represented using single-letter codes. The uploaded FASTA file can hold multiple nucleotide sequences.
Input arguments
- label:
Sequence file (FASTA)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu or write a custom species name in the species field. For sequences that are not related to any particular species (e.g. adapters file), you can select the value Other.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Macaca mulatta:
Macaca mulatta
Dictyostelium discoideum:
Dictyostelium discoideum
Other:
Other
- label:
Genome build
- type:
basic:string
- description:
Enter a genome build information associated with the uploaded sequence(s).
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA dictionary
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of sequences
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
FASTQ file (paired-end)
- data:reads:fastq:paired:upload-fastq-paired (list:basic:file src1, list:basic:file src2, basic:boolean merge_lanes)[Source: v2.6.0]
Import paired-end reads in FASTQ format. Import paired-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.
Input arguments
- label:
Mate1
- type:
list:basic:file
- description:
Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate2
- type:
list:basic:file
- description:
Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
- required:
True
- disabled:
False
- hidden:
False
- label:
Merge lanes
- type:
basic:boolean
- description:
Merge sample data split into multiple sequencing lanes into a single FASTQ file.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Reads file (mate 1)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Reads file (mate 2)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Upstream)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Downstream)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive (Upstream)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive (Downstream)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
FASTQ file (single-end)
- data:reads:fastq:single:upload-fastq-single (list:basic:file src, basic:boolean merge_lanes)[Source: v2.6.0]
Import single-end reads in FASTQ format. Import single-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.
Input arguments
- label:
Reads
- type:
list:basic:file
- description:
Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
- required:
True
- disabled:
False
- hidden:
False
- label:
Merge lanes
- type:
basic:boolean
- description:
Merge sample data split into multiple sequencing lanes into a single FASTQ file.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Reads file
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Find similar genes
- data:similarexpression:find-similar (list:data:expression expressions, basic:string gene, basic:string distance)[Source: v1.3.1]
Find genes with similar expression profile. Find genes that have similar expression over time to the query gene.
Input arguments
- label:
Time series relation
- type:
list:data:expression
- description:
Select time course to which the expressions belong to.
- required:
True
- disabled:
False
- hidden:
False
- label:
Query gene
- type:
basic:string
- description:
Select a gene to which others are compared.
- required:
True
- disabled:
False
- hidden:
False
- label:
Distance metric
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
spearman
- choices:
Euclidean:
euclidean
Spearman:
spearman
Pearson:
pearson
Output results
- label:
Similar genes
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GAF file
- data:gaf:2:0upload-gaf (basic:file src, basic:string source, basic:string species)[Source: v1.4.0]
GO annotation file (GAF v2.0) relating gene ID and associated GO terms
Input arguments
- label:
GO annotation file (GAF v2.0)
- type:
basic:file
- description:
Upload GO annotation file (GAF v2.0) relating gene ID and associated GO terms
- label:
Gene ID database
- type:
basic:string
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
MGI:
MGI
NCBI:
NCBI
UCSC:
UCSC
UniProtKB:
UniProtKB
- label:
Species
- type:
basic:string
Output results
- label:
GO annotation file (GAF v2.0)
- type:
basic:file
- label:
GAF object
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
GATK GenomicsDBImport
- data:genomicsdb:gatk-genomicsdb-import (list:data:variants:gvcf gvcfs, data:bed intervals, basic:boolean use_existing, data:genomicsdb existing_db, basic:integer batch_size, basic:boolean consolidate, basic:integer max_heap_size, basic:boolean use_cms_gc)[Source: v1.3.0]
Import single-sample GVCFs into GenomicsDB before joint genotyping.
Input arguments
- label:
Input data (GVCF)
- type:
list:data:variants:gvcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Intervals file (.bed)
- type:
data:bed
- description:
Intervals file is required if a new database will be created.
- required:
False
- disabled:
False
- hidden:
False
- label:
Add new samples to an existing GenomicsDB workspace
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Select a GATK GenomicsDB object
- type:
data:genomicsdb
- description:
Instead of creating a new database the GVCFs are added to this database and a new GenomicsDB object is created.
- required:
False
- disabled:
False
- hidden:
!use_existing
- label:
Batch size
- type:
basic:integer
- description:
Batch size controls the number of samples for which readers are open at once and therefore provides a way to minimize memory consumption. However, it can take longer to complete. Use the consolidate flag if more than a hundred batches were used. This will improve feature read time. batchSize=0 means no batching (i.e. readers for all samples will be opened at once).
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Consolidate
- type:
basic:boolean
- description:
Boolean flag to enable consolidation. If importing data in batches, a new fragment is created for each batch. In case thousands of fragments are created, GenomicsDB feature readers will try to open ~20x as many files. Also, internally GenomicsDB would consume more memory to maintain bookkeeping data from all fragments. Use this flag to merge all fragments into one. Merging can potentially improve read performance, however overall benefit might not be noticeable as the top Java layers have significantly higher overheads. This flag has no effect if only one batch is used.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Java maximum heap size in GB (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size.
- required:
True
- disabled:
False
- hidden:
False
- default:
28
- label:
Use CMS Garbage Collector in Java
- type:
basic:boolean
- description:
The Concurrent Mark Sweep (CMS) implementation uses multiple garbage collector threads for garbage collection.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
Output results
- label:
GenomicsDB workspace
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
Intervals file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK GenotypeGVCFs
- data:variants:vcf:genotypegvcfs:gatk-genotype-gvcfs (data:genomicsdb database, data:seq:nucleotide ref_seq, data:variants:vcf dbsnp, basic:integer n_jobs, basic:integer max_heap_size)[Source: v2.3.0]
Consolidate GVCFs and run joint calling using GenotypeGVCFs tool.
Input arguments
- label:
GATK GenomicsDB
- type:
data:genomicsdb
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
dbSNP file
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of concurent jobs
- type:
basic:integer
- description:
Use a fixed number of jobs for genotyping instead of determining it based on the number of available cores.
- required:
False
- disabled:
False
- hidden:
False
- label:
Java maximum heap size in GB (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size.
- required:
True
- disabled:
False
- hidden:
False
- default:
28
Output results
- label:
GVCF file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Folder with split GVCFs
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK HaplotypeCaller (GVCF)
- data:variants:gvcf:gatk-haplotypecaller-gvcf (data:alignment:bam bam, data:seq:nucleotide ref_seq, data:bed intervals, basic:decimal contamination)[Source: v1.3.0]
Run GATK HaplotypeCaller in GVCF mode.
Input arguments
- label:
Analysis ready BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Use intervals BED file to limit the analysis to the specified parts of the genome.
- type:
data:bed
- required:
False
- disabled:
False
- hidden:
False
- label:
Contamination fraction
- type:
basic:decimal
- description:
Fraction of contamination in sequencing data (for all samples) to aggressively remove.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
Output results
- label:
GVCF file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK MergeVcfs
- data:variants:vcf:mergevcfs:gatk-merge-vcfs (list:data:variants:vcf vcfs, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.2.0]
Combine multiple variant files into a single variant file using GATK MergeVcfs.
Input arguments
- label:
Input data (VCFs)
- type:
list:data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- description:
Optionally use a sequence dictionary file (.dict) if the input VCF does not contain a complete contig list.
- required:
False
- disabled:
False
- hidden:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Merged VCF
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK SelectVariants (multi-sample)
- data:variants:vcf:selectvariants:gatk-select-variants (data:variants:vcf vcf, data:bed intervals, list:basic:string select_type, basic:boolean exclude_filtered, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.2.0]
Select a subset of variants based on various criteria using GATK SelectVariants. This tool works with multi-sample VCF file as an input.
Input arguments
- label:
Input data (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Intervals file (.bed)
- type:
data:bed
- description:
One or more genomic intervals over which to operate. This can also be used to get data from a specific interval.
- required:
False
- disabled:
False
- hidden:
False
- label:
Select only a certain type of variants from the input file
- type:
list:basic:string
- description:
This argument selects particular kinds of variants out of a list. If left empty, there is no type selection and all variant types are considered for other selection criteria. Valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. Can be specified multiple times.
- required:
False
- disabled:
False
- hidden:
False
- label:
Don’t include filtered sites
- type:
basic:boolean
- description:
If this flag is enabled, sites that have been marked as filtered (i.e. have anything other than `.` or `PASS` in the FILTER field) will be excluded from the output.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
False
- disabled:
False
- hidden:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Selected variants (VCF)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK SelectVariants (single-sample)
- data:variants:vcf:selectvariants:single:gatk-select-variants-single (data:variants:vcf vcf, data:bed intervals, list:basic:string select_type, basic:boolean exclude_filtered, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.1.0]
Select a subset of variants based on various criteria using GATK SelectVariants. This tool works with single-sample VCF file as an input.
Input arguments
- label:
Input data (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Intervals file (.bed)
- type:
data:bed
- description:
One or more genomic intervals over which to operate. This can also be used to get data from a specific interval.
- required:
False
- disabled:
False
- hidden:
False
- label:
Select only a certain type of variants from the input file
- type:
list:basic:string
- description:
This argument selects particular kinds of variants out of a list. If left empty, there is no type selection and all variant types are considered for other selection criteria. Valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. Can be specified multiple times.
- required:
False
- disabled:
False
- hidden:
False
- label:
Don’t include filtered sites
- type:
basic:boolean
- description:
If this flag is enabled, sites that have been marked as filtered (i.e. have anything other than `.` or `PASS` in the FILTER field) will be excluded from the output.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
False
- disabled:
False
- hidden:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Selected variants (VCF)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK SplitNCigarReads
- data:alignment:bam:splitncigar:gatk-split-ncigar (data:alignment:bam bam, data:seq:nucleotide ref_seq, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.2.0]
Splits reads that contain Ns in their cigar string. Identifies all N cigar elements and creates k+1 new reads (where k is the number of N cigar elements). The first read includes the bases that are to the left of the first N element, while the part of the read that is to the right of the N (including the Ns) is hard clipped and so on for the rest of the new reads. Used for post-processing RNA reads aligned against the full reference.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence FASTA file
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
BAM file with reads split at N CIGAR elements
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK VariantFiltration (multi-sample)
- data:variants:vcf:variantfiltration:gatk-variant-filtration (data:variants:vcf vcf, data:seq:nucleotide ref_seq, list:basic:string filter_expressions, list:basic:string filter_name, list:basic:string genotype_filter_expressions, list:basic:string genotype_filter_name, data:variants:vcf mask, basic:string mask_name, basic:integer cluster, basic:integer window, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.3.0]
Filter multi-sample variant calls based on INFO and/or FORMAT annotations. This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed. If you want to remove failing variants, use GATK SelectVariants process.
Input arguments
- label:
Input data (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Expressions used with INFO fields to filter
- type:
list:basic:string
- description:
VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.
- required:
False
- disabled:
False
- hidden:
False
- label:
Names to use for the list of filters
- type:
list:basic:string
- description:
This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.
- required:
False
- disabled:
False
- hidden:
False
- label:
Expressions used with FORMAT field to filter
- type:
list:basic:string
- description:
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’. This filter expression will filter all the samples in the multi-sample VCF file.
- required:
False
- disabled:
False
- hidden:
False
- label:
Names to use for the list of genotype filters
- type:
list:basic:string
- description:
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.
- required:
False
- disabled:
False
- hidden:
False
- label:
Input mask
- type:
data:variants:vcf
- description:
Any variant which overlaps entries from the provided mask file will be filtered.
- required:
False
- disabled:
False
- hidden:
False
- label:
The text to put in the FILTER field if a ‘mask’ is provided
- type:
basic:string
- description:
When using the mask file, the mask name will be annotated in the variant record.
- required:
False
- disabled:
!mask
- hidden:
False
- label:
Cluster size
- type:
basic:integer
- description:
The number of SNPs which make up a cluster. Must be at least 2.
- required:
True
- disabled:
False
- hidden:
False
- default:
3
- label:
Window size
- type:
basic:integer
- description:
The window size (in bases) in which to evaluate clustered SNPs.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Filtered variants (VCF)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK VariantFiltration (single-sample)
- data:variants:vcf:variantfiltration:single:gatk-variant-filtration-single (data:variants:vcf vcf, data:seq:nucleotide ref_seq, list:basic:string filter_expressions, list:basic:string filter_name, list:basic:string genotype_filter_expressions, list:basic:string genotype_filter_name, data:variants:vcf mask, basic:string mask_name, basic:integer cluster, basic:integer window, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.3.0]
Filter single-sample variant calls based on INFO and/or FORMAT annotations. This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed. If you want to remove failing variants, use GATK SelectVariants process.
Input arguments
- label:
Input data (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Expressions used with INFO fields to filter
- type:
list:basic:string
- description:
VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.
- required:
False
- disabled:
False
- hidden:
False
- label:
Names to use for the list of filters
- type:
list:basic:string
- description:
This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.
- required:
False
- disabled:
False
- hidden:
False
- label:
Expressions used with FORMAT field to filter
- type:
list:basic:string
- description:
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’.
- required:
False
- disabled:
False
- hidden:
False
- label:
Names to use for the list of genotype filters
- type:
list:basic:string
- description:
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.
- required:
False
- disabled:
False
- hidden:
False
- label:
Input mask
- type:
data:variants:vcf
- description:
Any variant which overlaps entries from the provided mask file will be filtered.
- required:
False
- disabled:
False
- hidden:
False
- label:
The text to put in the FILTER field if a ‘mask’ is provided
- type:
basic:string
- description:
When using the mask file, the mask name will be annotated in the variant record.
- required:
False
- disabled:
!mask
- hidden:
False
- label:
Cluster size
- type:
basic:integer
- description:
The number of SNPs which make up a cluster. Must be at least 2.
- required:
True
- disabled:
False
- hidden:
False
- default:
3
- label:
Window size
- type:
basic:integer
- description:
The window size (in bases) in which to evaluate clustered SNPs.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Filtered variants (VCF)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK VariantsToTable
- data:variantstable:variants-to-table (data:variants:vcf vcf, list:basic:string vcf_fields, list:basic:string gf_fields, basic:boolean split_alleles)[Source: v1.2.0]
Run GATK VariantsToTable. This tool extracts specified fields for each variant in a VCF file to a tab-delimited table, which may be easier to work with than a VCF. For additional information, please see [manual page](https://gatk.broadinstitute.org/hc/en-us/articles/360036711531-VariantsToTable)
Input arguments
- label:
Input VCF file
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Select VCF fields
- type:
list:basic:string
- description:
The name of a standard VCF field or an INFO field to include in the output table. The field can be any standard VCF column (e.g. CHROM, ID, QUAL) or any annotation name in the INFO field (e.g. AC, AF).
- required:
True
- disabled:
False
- hidden:
False
- default:
['CHROM', 'POS', 'ID', 'REF', 'ALT']
- label:
Include FORMAT/sample-level fields
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
['GT', 'GQ']
- label:
Split multi-allelic records into multiple lines
- type:
basic:boolean
- description:
By default, a variant record with multiple ALT alleles will be summarized in one line, with per alt-allele fields (e.g. allele depth) separated by commas.This may cause difficulty when the table is loaded by an R script, for example. Use this flag to write multi-allelic records on separate lines of output.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
Output results
- label:
Tab-delimited file with variants
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK filter variants (VQSR)
- data:variants:vcf:vqsr:gatk-vqsr (data:variants:vcf vcf, data:variants:vcf dbsnp, data:variants:vcf mills, data:variants:vcf axiom_poly, data:variants:vcf hapmap, data:variants:vcf omni, data:variants:vcf thousand_genomes, basic:boolean use_as_anno, list:basic:string indel_anno_fields, list:basic:string snp_anno_fields, basic:decimal indel_filter_level, basic:decimal snp_filter_level, basic:integer max_gaussians_indels, basic:integer max_gaussians_snps)[Source: v1.2.0]
Filter WGS variants using Variant Quality Score Recalibration (VQSR) procedure.
Input arguments
- label:
Input data (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
dbSNP file
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Mills and 1000G gold standard indels
- type:
data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
- label:
1000G Axiom genotype data
- type:
data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
- label:
HapMap variants
- type:
data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
- label:
1000G Omni variants
- type:
data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
- label:
1000G high confidence SNPs
- type:
data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
- label:
–use-allele-specific-annotations
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Annotation fields (INDEL filtering)
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
['FS', 'ReadPosRankSum', 'MQRankSum', 'QD', 'SOR', 'DP']
- label:
Annotation fields (SNP filtering)
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
['QD', 'MQRankSum', 'ReadPosRankSum', 'FS', 'MQ', 'SOR', 'DP']
- label:
–truth-sensitivity-filter-level (INDELs)
- type:
basic:decimal
- required:
True
- disabled:
False
- hidden:
False
- default:
99.0
- label:
–truth-sensitivity-filter-level (SNPs)
- type:
basic:decimal
- required:
True
- disabled:
False
- hidden:
False
- default:
99.7
- label:
–max-gaussians (INDELs)
- type:
basic:integer
- description:
This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. This parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with a No data found error message. In this case, try decrementing the –max-gaussians value.
- required:
True
- disabled:
False
- hidden:
False
- default:
4
- label:
–max-gaussians (SNPs)
- type:
basic:integer
- description:
This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. This parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with a No data found error message. In this case, try decrementing the –max-gaussians value.
- required:
True
- disabled:
False
- hidden:
False
- default:
6
Output results
- label:
GVCF file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK refine variants
- data:variants:vcf:refinevariants:gatk-refine-variants (data:variants:vcf vcf, data:seq:nucleotide ref_seq, data:variants:vcf vcf_pop)[Source: v1.1.1]
Run GATK Genotype Refinement. The goal of the Genotype Refinement workflow is to use additional data to improve the accuracy of genotype calls and to filter genotype calls that are not reliable enough for downstream analysis. In this sense it serves as an optional extension of the variant calling workflow, intended for researchers whose work requires high-quality identification of individual genotypes. For additional information, please see [manual page](https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants)
Input arguments
- label:
The main input, as produced in the GATK VQSR process
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Population-level variant set (VCF)
- type:
data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
Refined multi-sample vcf
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GATK4 (HaplotypeCaller)
- data:variants:vcf:gatk:hc:vc-gatk4-hc (data:alignment:bam alignment, data:seq:nucleotide genome, data:bed intervals_bed, data:variants:vcf dbsnp, basic:integer stand_call_conf, basic:integer mbq, basic:integer max_reads, basic:integer interval_padding, basic:boolean soft_clipped, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.5.0]
GATK HaplotypeCaller Variant Calling. Call germline SNPs and indels via local re-assembly of haplotypes. The HaplotypeCaller is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. This allows the HaplotypeCaller to be more accurate when calling regions that are traditionally difficult to call, for example when they contain different types of variants close to each other. It also makes the HaplotypeCaller much better at calling indels than position-based callers like UnifiedGenotyper.
Input arguments
- label:
Analysis ready BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Intervals (from BED file)
- type:
data:bed
- description:
Use this option to perform the analysis over only part of the genome.
- required:
False
- disabled:
False
- hidden:
False
- label:
dbSNP file
- type:
data:variants:vcf
- description:
Database of known polymorphic sites.
- required:
True
- disabled:
False
- hidden:
False
- label:
Min call confidence threshold
- type:
basic:integer
- description:
The minimum phred-scaled confidence threshold at which variants should be called.
- required:
True
- disabled:
False
- hidden:
False
- default:
30
- label:
Min Base Quality
- type:
basic:integer
- description:
Minimum base quality required to consider a base for calling.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Max reads per aligment start site
- type:
basic:integer
- description:
Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.
- required:
True
- disabled:
False
- hidden:
False
- default:
50
- label:
Interval padding
- type:
basic:integer
- description:
Amount of padding (in bp) to add to each interval you are including. The recommended value is 100.
- required:
False
- disabled:
False
- hidden:
!intervals_bed
- label:
Do not analyze soft clipped bases in the reads
- type:
basic:boolean
- description:
Suitable option for RNA-seq variant calling.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
VCF file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
GEO import
- data:geo:geo-import (basic:string gse_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned, basic:file mapping_file, basic:string source, basic:string build)[Source: v2.7.2]
Import all runs from a GEO Series. WARNING: Additional costs for storage and processing may be incurred if a very large data set is selected. RNA-seq ChIP-Seq, ATAC-Seq and expression microarray datasets can be uploaded. For RNA-Seq data sets this runs the SRA import process for each experiment (SRX) from the selected RNA-Seq GEO Series. The same procedure is followed for ChIP-Seq and ATAC-Seq data sets. If GSE contains microarray data, it downloads individual samples and uploads them as microarray expression objects. Probe IDs can be mapped to the Ensembl IDs if the corresponding GPL platform is supported, otherwise, a custom mapping file should be provided. Currently supported platforms are: GPL74, GPL201, GPL96, GPL571, GPL97, GPL570, GPL91, GPL8300, GPL92, GPL93, GPL94, GPL95, GPL17586, GPL5175, GPL80, GPL6244, GPL16686, GPL15207, GPL1352, GPL11068, GPL26966, GPL6848, GPL14550, GPL17077, GPL16981, GPL13497, GPL6947, GPL10558, GPL6883, GPL13376,GPL6884, GPL6254. In addition metadata table with sample information is created and uploaded to the same collection.
Input arguments
- label:
GEO accession
- type:
basic:string
- description:
Enter a GEO series accession number.
- required:
True
- disabled:
False
- hidden:
False
- label:
Prefetch SRA file
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Maximum file size to download in KB
- type:
basic:string
- description:
A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
- required:
True
- disabled:
False
- hidden:
False
- default:
20G
- label:
Minimum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Clip adapter sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only aligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only unaligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
File with probe ID mappings
- type:
basic:file
- description:
The file should be tab-separated and contain two columns with their column names. The first column should contain Gene IDs and the second one should contain probe names. Supported file extensions are .tab.*, .tsv.*, .txt.*
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- description:
Gene ID source used for probe mapping is required when using a custom file.
- required:
False
- disabled:
False
- hidden:
False
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Genome build
- type:
basic:string
- description:
Genome build of mapping file is required when using a custom file.
- required:
False
- disabled:
False
- hidden:
False
Output results
GFF3 file
- data:annotation:gff3upload-gff3 (basic:file src, basic:string source, basic:string species, basic:string build)[Source: v3.5.0]
Import a General Feature Format (GFF) file which is a file format used for describing genes and other features of DNA, RNA and protein sequences. See [here](https://useast.ensembl.org/info/website/upload/gff3.html) and [here](https://en.wikipedia.org/wiki/General_feature_format) for more information.
Input arguments
- label:
Annotation (GFF3)
- type:
basic:file
- description:
Annotation in GFF3 format. Supported extensions are: .gff, .gff3 and .gtf
- validate_regex:
\.(gff|gff3|gtf)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$
- label:
Gene ID database
- type:
basic:string
- choices:
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
- label:
Build
- type:
basic:string
Output results
- label:
Uploaded GFF3 file
- type:
basic:file
- label:
Sorted GFF3 file
- type:
basic:file
- label:
IGV index for sorted GFF3
- type:
basic:file
- label:
Jbrowse track for sorted GFF3
- type:
basic:file
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
GTF file
- data:annotation:gtfupload-gtf (basic:file src, basic:string source, basic:string species, basic:string build)[Source: v3.5.0]
Import a Gene Transfer Format (GTF) file. It is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information. See [here](https://en.wikipedia.org/wiki/General_feature_format) for differences between GFF and GTF files.
Input arguments
- label:
Annotation (GTF)
- type:
basic:file
- description:
Annotation in GTF format.
- validate_regex:
\.(gtf|gff)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$
- label:
Gene ID database
- type:
basic:string
- choices:
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
- label:
Build
- type:
basic:string
Output results
- label:
Uploaded GTF file
- type:
basic:file
- label:
Sorted GTF file
- type:
basic:file
- label:
IGV index for sorted GTF file
- type:
basic:file
- required:
False
- label:
Jbrowse track for sorted GTF
- type:
basic:file
- required:
False
- label:
Gene ID database
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Gene set
- data:geneset:upload-geneset (basic:file src, basic:string source, basic:string species)[Source: v1.3.2]
Upload a set of genes. Provide one gene ID per line in a .tab, .tab.gz, or .txt file format.
Input arguments
- label:
Gene set
- type:
basic:file
- description:
List of genes (.tab/.txt extension), one gene ID per line.
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
Output results
- label:
Gene set
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene set (JSON)
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Gene set (create from Venn diagram)
- data:geneset:venn:create-geneset-venn (list:basic:string genes, basic:string source, basic:string species, basic:file venn)[Source: v1.3.2]
Create a gene set from a Venn diagram.
Input arguments
- label:
Genes
- type:
list:basic:string
- description:
List of genes.
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Venn diagram
- type:
basic:file
- description:
JSON file of Venn diagram.
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Gene set
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene set (JSON)
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Venn diagram
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
Gene set (create)
- data:geneset:create-geneset (list:basic:string genes, basic:string source, basic:string species)[Source: v1.3.2]
Create a gene set from a list of genes.
Input arguments
- label:
Genes
- type:
list:basic:string
- description:
List of genes.
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
Output results
- label:
Gene set
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene set (JSON)
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
HISAT2
- data:alignment:bam:hisat2alignment-hisat2 (data:index:hisat2 genome, data:reads:fastq reads, basic:boolean softclip, basic:integer noncansplice, basic:boolean cufflinks)[Source: v2.6.1]
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of genomes (as well as to a single reference genome). See [here](https://ccb.jhu.edu/software/hisat2/index.shtml) for more information.
Input arguments
- label:
Reference genome
- type:
data:index:hisat2
- label:
Reads
- type:
data:reads:fastq
- label:
Disallow soft clipping
- type:
basic:boolean
- default:
False
- label:
Non-canonical splice sites penalty (optional)
- type:
basic:integer
- description:
Sets the penalty for each pair of non-canonical splice sites (e.g. non-GT/AG).
- required:
False
- label:
Report alignments tailored specifically for Cufflinks
- type:
basic:boolean
- description:
With this option, HISAT2 looks for novel splice sites with three signals (GT/AG, GC/AG, AT/AC), but all user-provided splice sites are used irrespective of their signals. HISAT2 produces an optional field, XS:A:[+-], for every spliced alignment.
- default:
False
Output results
- label:
Alignment file
- type:
basic:file
- description:
Position sorted alignment
- label:
Index BAI
- type:
basic:file
- label:
Statistics
- type:
basic:file
- label:
Splice junctions
- type:
basic:file
- label:
Unmapped reads (mate 1)
- type:
basic:file
- required:
False
- label:
Unmapped reads (mate 2)
- type:
basic:file
- required:
False
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
HISAT2 genome index
- data:index:hisat2:hisat2-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]
Create HISAT2 genome index.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
HISAT2 index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
HMR
- data:wgbs:hmrhmr (data:wgbs:methcounts methcounts)[Source: v1.4.0]
Identify hypo-methylated regions.
Input arguments
- label:
Methylation levels
- type:
data:wgbs:methcounts
- description:
Methylation levels data calculated using methcounts.
Output results
- label:
Hypo-methylated regions
- type:
basic:file
- label:
Bed file index for Jbrowse
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Hierarchical clustering of time courses
- data:clustering:hierarchical:etc:clustering-hierarchical-etc (list:data:expression expressions, list:basic:string genes, basic:string gene_species, basic:string gene_source, basic:string distance, basic:string linkage, basic:boolean ordering)[Source: v1.3.1]
Cluster gene expression time courses. Hierarchical clustering of expression time courses.
Input arguments
- label:
Time series relation
- type:
list:data:expression
- description:
Select time course to which the expressions belong to.
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene subset
- type:
list:basic:string
- description:
Select at least two genes or leave this field empty.
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Species to which the selected genes belong to. This field is required if gene subset is set.
- required:
False
- disabled:
False
- hidden:
!genes
- choices:
Dictyostelium discoideum:
Dictyostelium discoideum
Homo sapiens:
Homo sapiens
Macaca mulatta:
Macaca mulatta
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
- label:
Gene ID database of selected genes
- type:
basic:string
- description:
This field is required if gene subset is set.
- required:
False
- disabled:
False
- hidden:
!genes
- label:
Distance metric
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
spearman
- choices:
Euclidean:
euclidean
Spearman:
spearman
Pearson:
pearson
- label:
Linkage method
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
average
- choices:
single:
single
average:
average
complete:
complete
- label:
Use optimal ordering
- type:
basic:boolean
- description:
Results in a more intuitive tree structure, but may slow down the clustering on large datasets
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Hieararhical clustering
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
IDAT file
- data:methylationarray:idat:upload-idat (basic:file red_channel, basic:file green_channel, basic:string species, basic:string platform)[Source: v1.1.1]
Upload Illumina methylation array raw IDAT data. This import process accepts Illumina methylation array BeadChip raw files in IDAT format. Two input files, one for each of the Green and Red signal channels, are expected. The uploads of human (HM27, HM450, EPIC) and mouse (MM285) array types are supported.
Input arguments
- label:
Red channel IDAT file (*_Red.idat)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Green channel IDAT file (*_Grn.idat)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu.
- required:
True
- disabled:
False
- hidden:
False
- default:
Homo sapiens
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
- label:
Protein ID database source
- type:
basic:string
- description:
Select a methylation array platform for human (HM450, HM27, EPIC) or mouse (MM285) samples.
- required:
True
- disabled:
False
- hidden:
False
- default:
HM450
- choices:
HM450:
HM450
HM27:
HM27
EPIC:
EPIC
MM285:
MM285
Output results
- label:
Red channel IDAT file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Green channel IDAT file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Platform
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
MACS 1.4
- data:chipseq:callpeak:macs14macs14 (data:alignment:bam treatment, data:alignment:bam control, basic:string pvalue)[Source: v3.5.1]
Model-based Analysis of ChIP-Seq (MACS 1.4) empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. See the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2592715/) for more information.
Input arguments
- label:
BAM File
- type:
data:alignment:bam
- label:
BAM Background File
- type:
data:alignment:bam
- required:
False
- label:
P-value
- type:
basic:string
- default:
1e-9
- choices:
1e-9:
1e-9
1e-6:
1e-6
Output results
- label:
Peaks (BED)
- type:
basic:file
- label:
Summits (BED)
- type:
basic:file
- label:
Peaks (XLS)
- type:
basic:file
- label:
Wiggle
- type:
basic:file
- label:
Control (bigWig)
- type:
basic:file
- required:
False
- label:
Treat (bigWig)
- type:
basic:file
- label:
Peaks (bigBed)
- type:
basic:file
- required:
False
- label:
Summits (bigBed)
- type:
basic:file
- required:
False
- label:
JBrowse track peaks file
- type:
basic:file
- label:
JBrowse track summits file
- type:
basic:file
- label:
Model
- type:
basic:file
- required:
False
- label:
Negative peaks (XLS)
- type:
basic:file
- required:
False
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
MACS 2.0
- data:chipseq:callpeak:macs2:macs2-callpeak (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string format, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v4.8.1]
Call ChIP-Seq peaks with MACS 2.0. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).
Input arguments
- label:
Case (treatment)
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Control (background)
- type:
data:alignment:bam
- required:
False
- disabled:
False
- hidden:
False
- label:
Promoter regions BED file
- type:
data:bed
- description:
BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required:
False
- disabled:
False
- hidden:
False
- label:
Use tagAlign files
- type:
basic:boolean
- description:
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Quality filtering threshold
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
15000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on ‘+’ strand by 4bp and reads on ‘-’ strand by 5bp.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required:
False
- disabled:
False
- hidden:
False
- label:
Format of tag file
- type:
basic:string
- description:
This specifies the format of input files. For paired-end data the format dictates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.
- required:
True
- disabled:
False
- hidden:
tagalign
- default:
BAM
- choices:
BAM:
BAM
BAMPE:
BAMPE
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- disabled:
False
- hidden:
tagalign
- choices:
1:
1
auto:
auto
all:
all
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
True
- disabled:
False
- hidden:
!tagalign
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Q-value cutoff
- type:
basic:decimal
- description:
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required:
False
- disabled:
settings.pvalue && settings.pvalue_prepeak
- hidden:
False
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
False
- disabled:
settings.qvalue
- hidden:
tagalign
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
True
- disabled:
settings.qvalue
- hidden:
!tagalign || settings.qvalue
- default:
1e-05
- label:
Cap number of peaks by taking top N peaks
- type:
basic:integer
- description:
To keep all peaks set value to 0.
- required:
True
- disabled:
settings.broad
- hidden:
False
- default:
500000
- label:
MFOLD range (lower limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- disabled:
False
- hidden:
False
- label:
MFOLD range (upper limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- disabled:
False
- hidden:
False
- label:
Small local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- disabled:
False
- hidden:
False
- label:
Large local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- disabled:
False
- hidden:
False
- label:
Extension size [–extsize]
- type:
basic:integer
- description:
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required:
False
- disabled:
False
- hidden:
False
- label:
Shift
- type:
basic:integer
- description:
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required:
False
- disabled:
False
- hidden:
settings.format == ‘BAMPE’
- label:
Band width
- type:
basic:integer
- description:
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required:
False
- disabled:
False
- hidden:
False
- label:
Use background lambda as local lambda
- type:
basic:boolean
- description:
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Turn on the auto paired-peak model process
- type:
basic:boolean
- description:
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Bypass building the shifting model [–nomodel]
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- required:
True
- disabled:
False
- hidden:
tagalign
- default:
False
- label:
Bypass building the shifting model [–nomodel]
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- required:
True
- disabled:
False
- hidden:
!tagalign
- default:
True
- label:
Down-sample
- type:
basic:boolean
- description:
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and unreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- required:
True
- disabled:
settings.bedgraph === false
- hidden:
False
- default:
True
- label:
Call summits [–call-summits]
- type:
basic:boolean
- description:
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Composite broad regions [–broad]
- type:
basic:boolean
- description:
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- required:
True
- disabled:
settings.call_summits === true
- hidden:
False
- default:
False
- label:
Broad cutoff
- type:
basic:decimal
- description:
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required:
False
- disabled:
settings.call_summits === true || settings.broad !== true
- hidden:
False
Output results
- label:
Called peaks
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Narrow peaks
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
QC report
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Pre-peak QC report (case)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Filtered tagAlign (case)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Filtered BAM (case)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Filtered BAM index (case)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Pre-peak QC report (control)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Filtered tagAlign (control)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Filtered BAM (control)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Filtered BAM index (control)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Narrow peaks (BigBed)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Peak summits
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Peak summits tbi index for JBrowse
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Summits (bigBed)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Broad peaks
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Broad peaks (bed12/gappedPeak)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Treatment pileup (bedGraph)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Treatment pileup (bigWig)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Control lambda (bedGraph)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Control lambda (bigwig)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Model
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
MACS2
- data:workflow:chipseq:macs2rose2workflow-macs2 (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.2.0]
Input arguments
- label:
Case (treatment)
- type:
data:alignment:bam
- label:
Control (background)
- type:
data:alignment:bam
- required:
False
- label:
Promoter regions BED file
- type:
data:bed
- description:
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required:
False
- label:
Use tagAlign files
- type:
basic:boolean
- description:
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- default:
False
- label:
Quality filtering threshold
- type:
basic:integer
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- default:
15000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default:
False
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required:
False
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
tagalign
- choices:
1:
1
auto:
auto
all:
all
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
!tagalign
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Q-value cutoff
- type:
basic:decimal
- description:
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required:
False
- disabled:
settings.pvalue && settings.pvalue_prepeak
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
False
- disabled:
settings.qvalue
- hidden:
tagalign
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled:
settings.qvalue
- hidden:
!tagalign || settings.qvalue
- default:
1e-05
- label:
Cap number of peaks by taking top N peaks
- type:
basic:integer
- description:
To keep all peaks set value to 0.
- disabled:
settings.broad
- default:
500000
- label:
MFOLD range (lower limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
MFOLD range (upper limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
Small local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
Large local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
extsize
- type:
basic:integer
- description:
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required:
False
- label:
Shift
- type:
basic:integer
- description:
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required:
False
- label:
Band width
- type:
basic:integer
- description:
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required:
False
- label:
Use backgroud lambda as local lambda
- type:
basic:boolean
- description:
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default:
False
- label:
Turn on the auto paired-peak model process
- type:
basic:boolean
- description:
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model is failed.
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
tagalign
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
!tagalign
- default:
True
- label:
Down-sample
- type:
basic:boolean
- description:
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default:
False
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default:
True
- label:
Save signal per million reads for fragment pileup profiles
- type:
basic:boolean
- disabled:
settings.bedgraph === false
- default:
True
- label:
Call summits
- type:
basic:boolean
- description:
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default:
False
- label:
Composite broad regions
- type:
basic:boolean
- description:
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled:
settings.call_summits === true
- default:
False
- label:
Broad cutoff
- type:
basic:decimal
- description:
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required:
False
- disabled:
settings.call_summits === true || settings.broad !== true
- label:
Blacklist regions
- type:
data:bed
- description:
BED file containing genomic regions that should be excluded from the analysis.
- required:
False
- label:
Calculate enrichment
- type:
basic:boolean
- description:
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome build which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- default:
False
- label:
Window size
- type:
basic:integer
- description:
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- default:
400
- label:
Shift size
- type:
basic:string
- description:
Vector of values to try when computing optimal shift sizes. It should be specified as consecutive numbers vector with start:end
- default:
1:300
Output results
MACS2 - ROSE2
- data:workflow:chipseq:macs2rose2workflow-macs-rose (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:boolean use_filtered_bam, basic:integer tss, basic:integer stitch, data:bed mask, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.4.0]
Input arguments
- label:
Case (treatment)
- type:
data:alignment:bam
- label:
Control (background)
- type:
data:alignment:bam
- required:
False
- label:
Promoter regions BED file
- type:
data:bed
- description:
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required:
False
- label:
Use tagAlign files
- type:
basic:boolean
- description:
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- default:
False
- label:
Quality filtering threshold
- type:
basic:integer
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- default:
15000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default:
False
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required:
False
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
tagalign
- choices:
1:
1
auto:
auto
all:
all
- label:
Number of duplicates
- type:
basic:string
- description:
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required:
False
- hidden:
!tagalign
- default:
all
- choices:
1:
1
auto:
auto
all:
all
- label:
Q-value cutoff
- type:
basic:decimal
- description:
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required:
False
- disabled:
settings.pvalue && settings.pvalue_prepeak
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required:
False
- disabled:
settings.qvalue
- hidden:
tagalign
- label:
P-value cutoff
- type:
basic:decimal
- description:
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled:
settings.qvalue
- hidden:
!tagalign || settings.qvalue
- default:
1e-05
- label:
Cap number of peaks by taking top N peaks
- type:
basic:integer
- description:
To keep all peaks set value to 0.
- disabled:
settings.broad
- default:
500000
- label:
MFOLD range (lower limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
MFOLD range (upper limit)
- type:
basic:integer
- description:
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required:
False
- label:
Small local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
Large local region
- type:
basic:integer
- description:
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required:
False
- label:
extsize
- type:
basic:integer
- description:
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required:
False
- label:
Shift
- type:
basic:integer
- description:
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required:
False
- label:
Band width
- type:
basic:integer
- description:
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required:
False
- label:
Use backgroud lambda as local lambda
- type:
basic:boolean
- description:
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default:
False
- label:
Turn on the auto paired-peak model process
- type:
basic:boolean
- description:
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
tagalign
- default:
False
- label:
Bypass building the shifting model
- type:
basic:boolean
- description:
While on, MACS will bypass building the shifting model.
- hidden:
!tagalign
- default:
True
- label:
Down-sample
- type:
basic:boolean
- description:
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default:
False
- label:
Save fragment pileup and control lambda
- type:
basic:boolean
- description:
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default:
True
- label:
Save signal per million reads for fragment pileup profiles
- type:
basic:boolean
- disabled:
settings.bedgraph === false
- default:
True
- label:
Call summits
- type:
basic:boolean
- description:
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default:
False
- label:
Composite broad regions
- type:
basic:boolean
- description:
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled:
settings.call_summits === true
- default:
False
- label:
Broad cutoff
- type:
basic:decimal
- description:
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required:
False
- disabled:
settings.call_summits === true || settings.broad !== true
- label:
Use Filtered BAM File
- type:
basic:boolean
- description:
Use filtered BAM file from a MACS2 object to rank enhancers by.
- default:
False
- label:
TSS exclusion
- type:
basic:integer
- description:
Enter a distance from TSS to exclude. 0 = no TSS exclusion
- default:
0
- label:
Stitch
- type:
basic:integer
- description:
Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
- required:
False
- label:
Masking BED file
- type:
data:bed
- description:
Mask a set of regions from analysis. Provide a BED of masking regions.
- required:
False
- label:
Blacklist regions
- type:
data:bed
- description:
BED file containing genomic regions that should be excluded from the analysis.
- required:
False
- label:
Calculate enrichment
- type:
basic:boolean
- description:
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome build which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- default:
False
- label:
Window size
- type:
basic:integer
- description:
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- default:
400
- label:
Shift size
- type:
basic:string
- description:
Vector of values to try when computing optimal shift sizes. It should be specified as consecutive numbers vector with start:end
- default:
1:300
Output results
ML-ready expression
- data:ml:table:expressions:upload-ml-expression (basic:file exp, basic:string source, basic:string species, data:ml:space reference_space)[Source: v1.0.2]
Upload ML-ready expression matrix.
Input arguments
- label:
Transformed expressions
- type:
basic:file
- description:
A TAB separated file containing transformed expression values with sample IDs for index (first column with label sample_id) and ENSEMBL IDs (recommended but not required) for the column names.
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
- label:
Reference space of ML-ready data
- type:
data:ml:space
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Transformed expressions
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Map microarray probes
- data:microarray:mapping:map-microarray-probes (list:data:microarray:normalized expressions, basic:file mapping_file, basic:string source, basic:string build)[Source: v1.1.1]
Map microarray probes to Gene IDs. Mapping can be done automatically or using a custom mapping file. For automatic probe mapping all ‘Normalized expression’ objects should have a GEO platform ID. If the platform is supported the provided probe IDs will be mapped to the corresponding Ensembl IDs. Currently supported platforms are: GPL74, GPL201, GPL96, GPL571, GPL97, GPL570, GPL91, GPL8300, GPL92, GPL93, GPL94, GPL95, GPL17586, GPL5175, GPL80, GPL6244, GPL16686, GPL15207, GPL1352, GPL11068, GPL26966, GPL6848, GPL14550, GPL17077, GPL16981, GPL13497, GPL6947, GPL10558, GPL6883, GPL13376,GPL6884, GPL6254.
Input arguments
- label:
Normalized expressions
- type:
list:data:microarray:normalized
- required:
True
- disabled:
False
- hidden:
False
- label:
File with probe ID mappings
- type:
basic:file
- description:
The file should be tab-separated and contain two columns with their column names. The first column should contain Gene IDs and the second one should contain probe names. Supported file extensions are .tab.*, .tsv.*, .txt.*
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- description:
Gene ID source used for probe mapping is required when using a custom file.
- required:
False
- disabled:
False
- hidden:
False
- choices:
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Genome build
- type:
basic:string
- description:
Genome build of mapping file is required when using a custom file.
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
Mapped expressions
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Probe to transcript mapping used
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Mapping file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Microarray platform type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
GEO platform ID
- type:
basic:string
- required:
False
- disabled:
False
- hidden:
False
Mappability
- data:mappability:bcmmappability-bcm (data:index:bowtie genome, data:annotation:gff3 gff, basic:integer length)[Source: v3.1.2]
Compute genome mappability. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky’s Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
Input arguments
- label:
Reference genome
- type:
data:index:bowtie
- label:
General feature format
- type:
data:annotation:gff3
- label:
Read length
- type:
basic:integer
- default:
50
Output results
- label:
Mappability
- type:
basic:file
Mappability info
- data:mappability:bcmupload-mappability (basic:file src)[Source: v1.2.3]
Upload mappability information.
Input arguments
- label:
Mappability file
- type:
basic:file
- description:
Mappability file: 2 column tab separated
- validate_regex:
\.(tab)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$
Output results
- label:
Uploaded mappability
- type:
basic:file
MarkDuplicates
- data:alignment:bam:markduplicate:markduplicates (data:alignment:bam bam, basic:boolean skip, basic:boolean remove_duplicates, basic:string validation_stringency, basic:string assume_sort_order, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.7.0]
Remove duplicate reads from BAM file. Tool from Picard, wrapped by GATK4. See GATK MarkDuplicates for more information.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Skip MarkDuplicates step
- type:
basic:boolean
- description:
MarkDuplicates step can be skipped.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Remove duplicates
- type:
basic:boolean
- description:
If true do not write duplicates to the output file instead of writing them with appropriate flags set.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- required:
True
- disabled:
False
- hidden:
False
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Assume sort order
- type:
basic:string
- description:
If not null (default), assume that the input file has this order even if the header says otherwise.Possible values are unsorted, queryname, coordinate and unknown.
- required:
True
- disabled:
False
- hidden:
False
- default:
- choices:
as in BAM header (default):
unsorted:
unsorted
queryname:
queryname
coordinate:
coordinate
duplicate:
duplicate
unknown:
unknown
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Marked duplicates BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of marked duplicates BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Metrics from MarkDuplicate process
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Merge Expressions (ETC)
- data:expressionset:etcmergeetc (list:data:etc exps, list:basic:string genes)[Source: v1.2.4]
Merge Expression Time Course (ETC) data.
Input arguments
- label:
Expression Time Course (ETC)
- type:
list:data:etc
- label:
Filter genes
- type:
list:basic:string
- required:
False
Output results
- label:
Expression set
- type:
basic:file
- label:
Expression set type
- type:
basic:string
Merge FASTQ (paired-end)
- data:mergereads:paired:merge-fastq-paired (list:data:reads:fastq:paired: reads)[Source: v2.2.2]
Merge paired-end FASTQs into one sample. Samples are merged based on the defined replicate group relations and then uploaded as separate samples.
Input arguments
- label:
Select relations
- type:
list:data:reads:fastq:paired:
- description:
Define and select Replicate relations.
- required:
True
- disabled:
False
- hidden:
False
Output results
Merge FASTQ (single-end)
- data:mergereads:single:merge-fastq-single (list:data:reads:fastq:single: reads)[Source: v2.2.2]
Merge single-end FASTQs into one sample. Samples are merged based on the defined replicate group relations and then uploaded as separate samples.
Input arguments
- label:
Select relations
- type:
list:data:reads:fastq:single:
- description:
Define and select replicate relations.
- required:
True
- disabled:
False
- hidden:
False
Output results
Metadata table
- data:metadata:upload-metadata (basic:file src)[Source: v1.1.1]
Upload metadata file where more than one row can match to a single sample. The uploaded metadata table represents one-to-many (1:n) relation to samples in the working collection. Metadata table must contain a column with one of the following headers: “Sample ID”, “Sample name” or “Sample slug”.
Input arguments
- label:
Table with metadata
- type:
basic:file
- description:
The metadata table should use one of the following extensions: .csv, .tab, .tsv, .xlsx, .xls
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Uploaded table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of samples
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
Metadata table (one-to-one)
- data:metadata:unique:upload-metadata-unique (basic:file src)[Source: v1.1.1]
Upload metadata file where each row corresponds to a single sample. The uploaded metadata table represents one-to-one (1:1) relation to samples in the working collection. Metadata table must contain a column with one of the following headers: “Sample ID”, “Sample name” or “Sample slug”.
Input arguments
- label:
Table with metadata
- type:
basic:file
- description:
The metadata table should use one of the following extensions: .csv, .tab, .tsv, .xlsx, .xls
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Uploaded table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of samples
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
MultiQC
- data:multiqc:multiqc (list:data: data, basic:boolean dirs, basic:integer dirs_depth, basic:boolean fullnames, basic:boolean config, basic:string cl_config)[Source: v1.22.0]
Aggregate results from bioinformatics analyses across many samples into a single report. [MultiQC](http://www.multiqc.info) searches a given directory for analysis logs and compiles a HTML report. It’s a general purpose tool, perfect for summarising the output from numerous bioinformatics tools.
Input arguments
- label:
Input data
- type:
list:data:
- required:
True
- disabled:
False
- hidden:
False
- label:
–dirs
- type:
basic:boolean
- description:
Prepend directory to sample names.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
–dirs-depth
- type:
basic:integer
- description:
Prepend a specified number of directories to sample names. Enter a negative number (default) to take from start of path.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
–fullnames
- type:
basic:boolean
- description:
Disable the sample name cleaning (leave as full file name).
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Use configuration file
- type:
basic:boolean
- description:
Use Genialis configuration file for MultiQC report.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
–cl-config
- type:
basic:string
- description:
Enter text with command-line configuration options to override the defaults (e.g. custom_logo_url: https://www.genialis.com).
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
MultiQC report
- type:
basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Report data
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
OBO file
- data:ontology:oboupload-obo (basic:file src)[Source: v1.4.0]
Upload gene ontology in OBO format.
Input arguments
- label:
Gene ontology (OBO)
- type:
basic:file
- description:
Gene ontology in OBO format.
- required:
True
- validate_regex:
\.obo(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$
Output results
- label:
Ontology file
- type:
basic:file
- label:
OBO object
- type:
basic:file
PCA
- data:pcapca (list:data:expression exps, list:basic:string genes, basic:string source, basic:string species)[Source: v2.4.2]
Principal component analysis (PCA)
Input arguments
- label:
Expressions
- type:
list:data:expression
- label:
Gene subset
- type:
list:basic:string
- required:
False
- label:
Gene ID database of selected genes
- type:
basic:string
- description:
This field is required if gene subset is set.
- required:
False
- label:
Species
- type:
basic:string
- description:
Species latin name. This field is required if gene subset is set.
- required:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
Output results
- label:
PCA
- type:
basic:json
Picard AlignmentSummary
- data:picard:summary:alignment-summary (data:alignment:bam bam, data:seq:nucleotide genome, data:seq:nucleotide adapters, basic:string validation_stringency, basic:integer insert_size, basic:string pair_orientation, basic:boolean bisulfite, basic:boolean assume_sorted)[Source: v2.3.0]
Produce a summary of alignment metrics from BAM file. Tool from Picard, wrapped by GATK4. See GATK CollectAlignmentSummaryMetrics for more information.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Adapter sequences
- type:
data:seq:nucleotide
- required:
False
- disabled:
False
- hidden:
False
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- required:
True
- disabled:
False
- hidden:
False
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Maximum insert size
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
100000
- label:
Pair orientation
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
null
- choices:
Unspecified:
null
FR:
FR
RF:
RF
TANDEM:
TANDEM
- label:
BAM file consists of bisulfite sequenced reads
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Sorted BAM file
- type:
basic:boolean
- description:
If true the sort order in the header file will be ignored.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Alignement metrics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Picard CollectRrbsMetrics
- data:picard:rrbs:rrbs-metrics (data:alignment:bam bam, data:seq:nucleotide genome, basic:integer min_quality, basic:integer next_base_quality, basic:integer min_lenght, basic:decimal mismatch_rate, basic:string validation_stringency, basic:boolean assume_sorted)[Source: v2.3.0]
Produce metrics for RRBS data based on the methylation status. This tool uses reduced representation bisulfite sequencing (Rrbs) data to determine cytosine methylation status across all reads of a genomic DNA sequence. Tool is wrapped by GATK4. See GATK CollectRrbsMetrics for more information.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Threshold for base quality of a C base before it is considered
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Threshold for quality of a base next to a C before the C base is considered
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Minimum read length
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
5
- label:
Maximum fraction of mismatches in a read to be considered (Range: 0 and 1)
- type:
basic:decimal
- required:
True
- disabled:
False
- hidden:
False
- default:
0.1
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- required:
True
- disabled:
False
- hidden:
False
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Sorted BAM file
- type:
basic:boolean
- description:
If true the sort order in the header file will be ignored.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
RRBS summary metrics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Detailed RRBS report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
QC plots
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Picard InsertSizeMetrics
- data:picard:insert:insert-size (data:alignment:bam bam, data:seq:nucleotide genome, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations, basic:string validation_stringency, basic:boolean assume_sorted)[Source: v2.3.0]
Collect metrics about the insert size of a paired-end library. Tool from Picard, wrapped by GATK4. See GATK CollectInsertSizeMetrics for more information.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Minimum fraction of reads in a category to be considered
- type:
basic:decimal
- description:
When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
- required:
True
- disabled:
False
- hidden:
False
- default:
0.05
- label:
Include reads marked as duplicates in the insert size histogram
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Deviations limit
- type:
basic:decimal
- description:
Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
- required:
True
- disabled:
False
- hidden:
False
- default:
10.0
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- required:
True
- disabled:
False
- hidden:
False
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Sorted BAM file
- type:
basic:boolean
- description:
If True, the sort order in the header file will be ignored.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Insert size metrics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Insert size histogram
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Picard WGS Metrics
- data:picard:wgsmetrics:wgs-metrics (data:alignment:bam bam, data:seq:nucleotide genome, basic:integer read_length, basic:boolean create_histogram, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:boolean count_unpaired, basic:integer sample_size, basic:string validation_stringency)[Source: v2.4.0]
Collect metrics about coverage of whole genome sequencing. Tool from Picard, wrapped by GATK4. See GATK CollectWgsMetrics for more information.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Average read length
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
150
- label:
Include data for base quality histogram in the metrics file
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum mapping quality for a read to contribute coverage
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Minimum base quality for a base to contribute coverage
- type:
basic:integer
- description:
N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Maximum coverage cap
- type:
basic:integer
- description:
Treat positions with coverage exceeding this value as if they had coverage at this set value.
- required:
True
- disabled:
False
- hidden:
False
- default:
250
- label:
Ignore positions with coverage above this value
- type:
basic:integer
- description:
At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value
- required:
True
- disabled:
False
- hidden:
False
- default:
100000
- label:
Count unpaired reads and paired reads with one end unmapped
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Sample Size used for Theoretical Het Sensitivity sampling
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
10000
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- required:
True
- disabled:
False
- hidden:
False
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
Output results
- label:
WGS metrics report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Pre-peakcall QC
- data:prepeakqcqc-prepeak (data:alignment:bam alignment, basic:integer q_treshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift)[Source: v0.5.2]
ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. Both fragment length estimation and the tagAlign file can be used as inputs in MACS 2.0. QC report contains ENCODE 3 proposed QC metrics – [NRF, PBC bottlenecking coefficients](https://www.encodeproject.org/data-standards/terms/), [NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).
Input arguments
- label:
Aligned reads
- type:
data:alignment:bam
- label:
Quality filtering treshold
- type:
basic:integer
- default:
30
- label:
Number of reads to subsample
- type:
basic:integer
- default:
15000000
- label:
Tn5 shifting
- type:
basic:boolean
- description:
Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
- default:
False
- label:
User-defined cross-correlation peak strandshift
- type:
basic:integer
- description:
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required:
False
Output results
- label:
QC report
- type:
basic:file
- label:
Filtered tagAlign
- type:
basic:file
- label:
Fragnment length
- type:
basic:integer
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Prepare GEO - ChIP-Seq
- data:other:geo:chipseqprepare-geo-chipseq (list:data:reads:fastq reads, list:data:chipseq:callpeak macs, basic:string name)[Source: v2.1.3]
Prepare ChIP-seq data for GEO upload.
Input arguments
- label:
Reads
- type:
list:data:reads:fastq
- description:
List of reads objects. Fastq files will be used.
- label:
MACS
- type:
list:data:chipseq:callpeak
- description:
List of MACS2 or MACS14 objects. BedGraph (MACS2) or Wiggle (MACS14) files will be used.
- label:
Collection name
- type:
basic:string
Output results
- label:
GEO folder
- type:
basic:file
- label:
Annotation table
- type:
basic:file
Prepare GEO - RNA-Seq
- data:other:geo:rnaseqprepare-geo-rnaseq (list:data:reads:fastq reads, list:data:expression expressions, basic:string name)[Source: v0.2.3]
Prepare RNA-Seq data for GEO upload.
Input arguments
- label:
Reads
- type:
list:data:reads:fastq
- description:
List of reads objects. Fastq files will be used.
- label:
Expressions
- type:
list:data:expression
- description:
Cuffnorm data object. Expression table will be used.
- label:
Collection name
- type:
basic:string
Output results
- label:
GEO folder
- type:
basic:file
- label:
Annotation table
- type:
basic:file
QoRTs QC
- data:qorts:qc:qorts-qc (data:alignment:bam alignment, data:annotation:gtf annotation, basic:string stranded, data:index:salmon cdna_index, basic:integer n_reads, basic:integer maxPhredScore, basic:integer adjustPhredScore)[Source: v1.8.0]
QoRTs QC analysis.
Input arguments
- label:
Alignment
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
GTF annotation
- type:
data:annotation:gtf
- required:
True
- disabled:
False
- hidden:
False
- label:
Assay type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
non_specific
- choices:
Strand non-specific:
non_specific
Strand-specific forward:
forward
Strand-specific reverse:
reverse
Detect automatically:
auto
- label:
cDNA index file
- type:
data:index:salmon
- required:
False
- disabled:
False
- hidden:
options.stranded != ‘auto’
- label:
Number of reads in subsampled alignment file
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
options.stranded != ‘auto’
- default:
5000000
- label:
Max Phred Score
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Adjust Phred Score
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
QC multiplot
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
QC summary
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
QoRTs report data
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
QuantSeq workflow
- data:workflow:quant:featurecounts:workflow-quantseq (basic:string trimming_tool, data:reads:fastq reads, data:index:star genome, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string assay_type, data:index:star rrna_reference, data:index:star globin_reference, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality)[Source: v5.1.0]
3’ mRNA-Seq pipeline. Reads are preprocessed by __BBDuk__ or __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to assess the rRNA/globin sequence depletion rate.
Input arguments
- label:
Trimming tool
- type:
basic:string
- description:
Select the trimming tool. If you select BBDuk then please provide adapter sequences in fasta file(s). If you select Cutadapt as a trimming tool, pre-determined adapter sequences will be removed.
- required:
True
- disabled:
False
- hidden:
False
- choices:
BBDuk:
bbduk
Cutadapt:
cutadapt
- label:
Input reads (FASTQ)
- type:
data:reads:fastq
- description:
Reads in FASTQ file, single or paired end.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed reference genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Adapters
- type:
list:data:seq:nucleotide
- description:
Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
- required:
False
- disabled:
False
- hidden:
trimming_tool != ‘bbduk’
- label:
Annotation
- type:
data:annotation
- description:
GTF and GFF3 annotation formats are supported.
- required:
True
- disabled:
False
- hidden:
False
- label:
Assay type
- type:
basic:string
- description:
In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- required:
False
- disabled:
False
- hidden:
False
- choices:
Strand-specific forward:
forward
Strand-specific reverse:
reverse
- label:
Indexed rRNA reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
False
- disabled:
False
- hidden:
False
- label:
Indexed Globin reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
False
- disabled:
False
- hidden:
False
- label:
Reads quality cutoff
- type:
basic:integer
- description:
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.
- required:
False
- disabled:
False
- hidden:
False
- label:
Number of reads
- type:
basic:integer
- description:
Number of reads to include in subsampling.
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
Number of reads
- type:
basic:integer
- description:
Using the same random seed makes reads subsampling reproducible in different environments.
- required:
True
- disabled:
False
- hidden:
False
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the’Number of reads’ input parameter.
- required:
False
- disabled:
False
- hidden:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Quality encoding offset
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+:
33
Illumina up to 1.3+, 1.5+:
64
Auto:
auto
- label:
Ignore bad quality
- type:
basic:boolean
- description:
Don’t crash if quality values appear to be incorrect.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
Quantify shRNA species using bowtie2
- data:expression:shrna2quantshrna-quant (data:alignment:bam alignment, basic:integer readlengths, basic:integer alignscores)[Source: v1.4.0]
Based on `bowtie2` output (.bam file) calculate number of mapped species. Input is limited to results from `bowtie2` since `YT:Z:` tag used to fetch aligned species is specific to this process. Result is a count matrix (successfully mapped reads) where species are in rows columns contain read specifics (count, species name, sequence, `AS:i:` tag value).
Input arguments
- label:
Alignment
- type:
data:alignment:bam
- required:
True
- label:
Species lengths threshold
- type:
basic:integer
- description:
Species with read lengths below specified threshold will be removed from final output. Default is no removal.
- label:
Align scores filter threshold
- type:
basic:integer
- description:
Species with align score below specified threshold will be removed from final output. Default is no removal.
Output results
- label:
Normalized expression
- type:
basic:file
- label:
Read counts
- type:
basic:file
- required:
False
- label:
Expression (json)
- type:
basic:json
- label:
Expression type
- type:
basic:string
- label:
Gene ID source
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
- label:
Feature type
- type:
basic:string
- label:
Mapped species
- type:
basic:file
RNA-SeQC
- data:rnaseqc:qc:rnaseqc-qc (data:alignment:bam alignment, data:annotation:gtf annotation, basic:integer mapping_quality, basic:integer base_mismatch, basic:integer offset, basic:integer window_size, basic:integer gene_length, basic:integer detection_threshold, basic:boolean exclude_chimeric, basic:string stranded, data:index:salmon cdna_index, basic:integer n_reads)[Source: v2.0.0]
RNA-SeQC QC analysis. An efficient new version of RNA-SeQC that computes a comprehensive set of metrics for characterizing samples processed by a wide range of protocols. It also quantifies gene- and exon-level expression, enabling effective quality control of large-scale RNA-seq datasets. More information can be found in the [GitHub repository](https://github.com/getzlab/rnaseqc) and in the [original paper](https://academic.oup.com/bioinformatics/article/37/18/3048/6156810?login=false).
Input arguments
- label:
Input aligned reads (BAM file)
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation file (GTF)
- type:
data:annotation:gtf
- description:
The input GTF file containing features to check the bam against. The file should include gene_id in the attributes column for all entries. During the process the file is formatted so the transcript_id matches the gene_id. Exons are merged to remove overlaps and exon_id field is then matched with gene_id including the consecutive exon number.
- required:
True
- disabled:
False
- hidden:
False
- label:
Mapping quality [–mapping-quality]
- type:
basic:integer
- description:
Set the lower bound on read quality for exon coverage counting. Reads below this number are excluded from coverage metrics.
- required:
True
- disabled:
False
- hidden:
False
- default:
255
- label:
Base mismatch [–base-mismatch]
- type:
basic:integer
- description:
Set the maximum number of allowed mismatches between a read and the reference sequence. Reads with more than this number of mismatches are excluded from coverage metrics.
- required:
True
- disabled:
False
- hidden:
False
- default:
6
- label:
Offset [–offset]
- type:
basic:integer
- description:
Set the offset into the gene for the 3’ and 5’ windows in bias calculation. A positive value shifts the 3’ and 5’ windows towards each other, while a negative value shifts them apart.
- required:
True
- disabled:
False
- hidden:
False
- default:
150
- label:
Window size [–window-size]
- type:
basic:integer
- description:
Set the offset into the gene for the 3’ and 5’ windows in bias calculation.
- required:
True
- disabled:
False
- hidden:
False
- default:
100
- label:
Window size [–gene-length]
- type:
basic:integer
- description:
Set the minimum size of a gene for bias calculation. Genes below this size are ignored in the calculation.
- required:
True
- disabled:
False
- hidden:
False
- default:
600
- label:
Detection threshold [–detection-threshold]
- type:
basic:integer
- description:
Number of counts on a gene to consider the gene ‘detected’. Additionally, genes below this limit are excluded from 3’ bias computation.
- required:
True
- disabled:
False
- hidden:
False
- default:
5
- label:
Exclude chimeric reads [–exclude-chimeric]
- type:
basic:boolean
- description:
Exclude chimeric reads from the read counts.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Assay type [–stranded]
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
non_specific
- choices:
Strand non-specific:
non_specific
Strand-specific reverse then forward:
reverse
Strand-specific forward then reverse:
forward
Detect automatically:
auto
- label:
cDNA index file
- type:
data:index:salmon
- required:
False
- disabled:
False
- hidden:
strand_detection_options.stranded != ‘auto’
- label:
Number of reads in subsampled alignment file. Subsampled reads will be used in strandedness detection
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
strand_detection_options.stranded != ‘auto’
- default:
5000000
Output results
- label:
metrics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
RNA-Seq (Cuffquant)
- data:workflow:rnaseq:cuffquantworkflow-rnaseq-cuffquant (data:reads:fastq reads, data:index:hisat2 genome, data:annotation annotation)[Source: v2.1.0]
Input arguments
- label:
Input reads
- type:
data:reads:fastq
- label:
genome
- type:
data:index:hisat2
- label:
Annotation file
- type:
data:annotation
Output results
RNA-seq Variant Calling Workflow
- data:workflow:rnaseq:variants:workflow-rnaseq-variantcalling (data:alignment:bam:star bam, data:reads:fastq reads, basic:boolean preprocessing, data:seq:nucleotide ref_seq, data:index:star genome, data:variants:vcf dbsnp, list:data:variants:vcf indels, data:bed intervals, data:variants:vcf clinvar, data:geneset geneset, list:basic:string mutations, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean two_pass_mode, basic:boolean out_unmapped, basic:string align_end_alignment, basic:string read_group, basic:integer stand_call_conf, basic:boolean soft_clipped, basic:integer interval_padding, list:basic:string filter_expressions, list:basic:string filter_name, list:basic:string genotype_filter_expressions, list:basic:string genotype_filter_name, data:variants:vcf mask, basic:string mask_name, basic:string filtering_options, list:basic:string vcf_fields, list:basic:string ann_fields, basic:boolean split_alleles, basic:boolean show_filtered, list:basic:string gf_fields, basic:boolean multiqc, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v2.4.0]
Identify variants in RNA-seq data. This pipeline follows GATK best practices recommendantions for variant calling with RNA-seq data. The pipeline steps include read alignment (STAR), data cleanup (MarkDuplicates), splitting reads that contain Ns in their cigar string (SplitNCigarReads), base quality recalibration (BaseRecalibrator, ApplyBQSR), variant calling (HaplotypeCaller), variant filtering (VariantFiltration) and variant annotation (SnpEff). The last step of the pipeline is process Mutations table which prepares variants for ReSDK VariantTables. There is also possibility to run the pipeline directly from BAM file. In this case, it is recommended that you use two-pass mode in STAR alignment as well as turn the option ‘–outSAMunmapped Within’ on.
Input arguments
- label:
Input BAM file
- type:
data:alignment:bam:star
- description:
Input BAM file that was computed with STAR aligner. It is highly recommended that two-pass mode was used for the alignment as well as ‘–outSAMunmapped Within’ option if you want to use BAM file as an input.
- required:
False
- disabled:
reads
- hidden:
False
- label:
Input sample (FASTQ)
- type:
data:reads:fastq
- description:
Input data in FASTQ format.
- required:
False
- disabled:
bam
- hidden:
False
- label:
Perform reads processing with BBDuk
- type:
basic:boolean
- description:
If your reads have not been processed, set this to True.
- required:
True
- disabled:
bam
- hidden:
False
- default:
True
- label:
Reference FASTA sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed reference genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- required:
False
- disabled:
bam
- hidden:
False
- label:
dbSNP file
- type:
data:variants:vcf
- description:
File with known variants.
- required:
True
- disabled:
False
- hidden:
False
- label:
Known INDEL sites
- type:
list:data:variants:vcf
- required:
False
- disabled:
False
- hidden:
False
- label:
Intervals (from BED file)
- type:
data:bed
- description:
Use this option to perform the analysis over only part of the genome.
- required:
False
- disabled:
False
- hidden:
False
- label:
ClinVar VCF file
- type:
data:variants:vcf
- description:
[ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease.
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene set
- type:
data:geneset
- description:
Select a gene set with genes you are interested in. Only variants of genes in the selected gene set will be in the output.
- required:
False
- disabled:
mutations
- hidden:
False
- label:
Gene and its mutations
- type:
list:basic:string
- description:
Insert the gene you are interested in, together with mutations. First enter the name of the gene and then the mutations. Seperate gene from mutations with ‘:’ and mutations with ‘,’. Example of an input: ‘KRAS: Gly12, Gly61’. Press enter after each input (gene + mutations). NOTE: Field only accepts three character amino acid symbols. If you use this option, the selected geneset will not be used for Mutations table process.
- required:
False
- disabled:
geneset
- hidden:
False
- label:
Adapters
- type:
list:data:seq:nucleotide
- description:
Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
- required:
False
- disabled:
False
- hidden:
False
- label:
Custom adapter sequences
- type:
list:basic:string
- description:
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
K-mer length [k=]
- type:
basic:integer
- description:
Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
- required:
True
- disabled:
False
- hidden:
False
- default:
23
- label:
Minimum k-mer length at right end of reads used for trimming [mink=]
- type:
basic:integer
- required:
True
- disabled:
bbduk.adapters.length === 0 && bbduk.custom_adapter_sequences.length === 0
- hidden:
False
- default:
11
- label:
Maximum Hamming distance for k-mers [hammingdistance=]
- type:
basic:integer
- description:
Hamming distance i.e. the number of mismatches allowed in the kmer.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Max Ns after trimming [maxns=]
- type:
basic:integer
- description:
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Average quality below which to trim region [trimq=]
- type:
basic:integer
- description:
Phred algorithm is used, which is more accurate than naive trimming.
- required:
True
- disabled:
False
- hidden:
False
- default:
28
- label:
Minimum read length [minlength=]
- type:
basic:integer
- description:
Reads shorter than minimum read length after trimming are discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
30
- label:
Quality encoding offset
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+:
33
Illumina up to 1.3+, 1.5+:
64
Auto:
auto
- label:
Ignore bad quality
- type:
basic:boolean
- description:
Don’t crash if quality values appear to be incorrect.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Use two pass mode [–twopassMode]
- type:
basic:boolean
- description:
Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Output unmapped reads (SAM) [–outSAMunmapped Within]
- type:
basic:boolean
- description:
Output of unmapped reads in the SAM format.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Read ends alignment [–alignEndsType]
- type:
basic:string
- description:
Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.
- required:
True
- disabled:
False
- hidden:
False
- default:
Local
- choices:
Local:
Local
EndToEnd:
EndToEnd
Extend5pOfRead1:
Extend5pOfRead1
Extend5pOfReads12:
Extend5pOfReads12
- label:
Replace read groups in BAM
- type:
basic:string
- description:
Replace read groups in a BAM file. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation.
- required:
True
- disabled:
False
- hidden:
False
- default:
-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1
- label:
Min call confidence threshold
- type:
basic:integer
- description:
The minimum phred-scaled confidence threshold at which variants should be called.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Do not analyze soft clipped bases in the reads
- type:
basic:boolean
- description:
Suitable option for RNA-seq variant calling.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Interval padding
- type:
basic:integer
- description:
Amount of padding (in bp) to add to each interval you are including. The recommended value is 100. Set to 0 if you want to turn it off.
- required:
True
- disabled:
False
- hidden:
!intervals
- default:
100
- label:
Expressions used with INFO fields to filter
- type:
list:basic:string
- description:
VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using –filter-name One –filter-expression ‘X < 1’ –filter-name Two –filter-expression ‘X > 2’). It is preferable to use multiple expressions, each specifying an individual filter criteria, to a single compound expression that specifies multiple filter criteria. Input expressions one by one and press ENTER after each expression. Examples of filter expression: ‘FS > 30’, ‘DP > 10’.
- required:
True
- disabled:
False
- hidden:
False
- default:
['FS > 30.0', 'QD < 2.0']
- label:
Names to use for the list of filters
- type:
list:basic:string
- description:
This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names. Input expressions one by one and press ENTER after each name. Warning: filter names should be in the same order as filter expressions. Example: you specified filter expressions ‘FS > 30’ and ‘DP > 10’, now specify filter names ‘FS’ and ‘DP’.
- required:
True
- disabled:
False
- hidden:
False
- default:
['FS', 'QD']
- label:
Expressions used with FORMAT field to filter
- type:
list:basic:string
- description:
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record’s FILTER tag). One can filter normally based on most fields (e.g. ‘GQ < 5.0’), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (‘isHet == 1’), refs (‘isHomRef == 1’), or homs (‘isHomVar == 1’). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object. To filter by alternative allele depth, use the expression: ‘AD.1 < 5’. This filter expression will filter all the samples in the multi-sample VCF file.
- required:
True
- disabled:
False
- hidden:
False
- default:
['AD.1 < 5.0']
- label:
Names to use for the list of genotype filters
- type:
list:basic:string
- description:
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. Warning: filter names should be in the same order as filter expressions.
- required:
True
- disabled:
False
- hidden:
False
- default:
['AD']
- label:
Input mask
- type:
data:variants:vcf
- description:
Any variant which overlaps entries from the provided mask file will be filtered.
- required:
False
- disabled:
False
- hidden:
False
- label:
The text to put in the FILTER field if a ‘mask’ is provided
- type:
basic:string
- description:
When using the mask file, the mask name will be annotated in the variant record.
- required:
False
- disabled:
!variant_filtration.mask
- hidden:
False
- label:
SnpEff filtering expressions
- type:
basic:string
- description:
Filter annotated VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)
- required:
False
- disabled:
False
- hidden:
False
- label:
Select VCF fields
- type:
list:basic:string
- description:
The name of a standard VCF field or an INFO field to include in the output table. The field can be any standard VCF column (e.g. CHROM, ID, QUAL) or any annotation name in the INFO field (e.g. AC, AF). Required fields are CHROM, POS, ID, REF and ANN. If your variants file was annotated with clinvar information then fields CLNDN, CLNSIG and CLNSIGCONF might be of your interest.
- required:
True
- disabled:
False
- hidden:
False
- default:
['CHROM', 'POS', 'ID', 'QUAL', 'REF', 'ALT', 'FILTER', 'ANN', 'CLNDN', 'CLNSIG']
- label:
ANN fields to use
- type:
list:basic:string
- description:
Only use specific fields from the SnpEff ANN field. All available fields: Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO’ .Fields are seperated by ‘|’. For more information, follow this [link](https://pcingola.github.io/SnpEff/se_inputoutput/#ann-field-vcf-output-files).
- required:
True
- disabled:
False
- hidden:
False
- default:
['Allele', 'Annotation', 'Annotation_Impact', 'Gene_Name', 'Feature_ID', 'HGVS.p']
- label:
Split multi-allelic records into multiple lines
- type:
basic:boolean
- description:
By default, a variant record with multiple ALT alleles will be summarized in one line, with per alt-allele fields (e.g. allele depth) separated by commas.This may cause difficulty when the table is loaded by an R script, for example. Use this flag to write multi-allelic records on separate lines of output.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Include filtered records in the output
- type:
basic:boolean
- description:
Include filtered records in the output of the GATK VariantsToTable.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Include FORMAT/sample-level fields. Note: If you specify DP from genotype field, it will overwrite the original DP field. By default fields GT (genotype), AD (allele depth), DP (depth at the sample level), FT (sample-level filter) are included in the analysis.
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
['GT', 'AD', 'DP', 'FT']
- label:
Trigger MultiQC
- type:
basic:boolean
- description:
If the input for the pipeline is BAM file that has been computed by the RNA-seq gene expression pipeline, than MultiQC object already exists for this sample, so there is no need for an additional MultiQC process. If the input for this pipeline is FASTQ, than MultiQC cannot be disabled.
- required:
True
- disabled:
False
- hidden:
!bam
- default:
False
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
RNA-seq variant calling preprocess
- data:alignment:bam:rnaseqvc:rnaseq-vc-preprocess (data:alignment:bam bam, data:seq:nucleotide ref_seq, list:data:variants:vcf known_sites, basic:string read_group, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v1.3.0]
Prepare BAM file from STAR aligner for HaplotypeCaller. This process includes steps MarkDuplicates, SplitNCigarReads, read-group assignment and base quality recalibration (BQSR).
Input arguments
- label:
Alignment BAM file from STAR alignment
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Reference sequence FASTA file
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
List of known sites of variation
- type:
list:data:variants:vcf
- description:
One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.
- required:
True
- disabled:
False
- hidden:
False
- label:
Replace read groups in BAM
- type:
basic:string
- description:
Replace read groups in a BAM file. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using GATK AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.
- required:
True
- disabled:
False
- hidden:
False
- default:
-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Preprocessed BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Metrics from MarkDuplicate process
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
ROSE2
- data:chipseq:rose2:rose2 (data:chipseq:callpeak input_macs, data:bed input_upload, basic:boolean use_filtered_bam, data:alignment:bam rankby, data:alignment:bam control, basic:integer tss, basic:integer stitch, data:bed mask)[Source: v5.2.1]
Run ROSE2. Rank Ordering of Super-Enhancers algorithm (ROSE2) takes the acetylation peaks called by a peak caller (MACS, MACS2…) and based on the in-between distances and the acetylation signal at the peaks judges whether they can be considered super-enhancers. The ranked values are plotted and by locating the inflection point in the resulting graph, super-enhancers are assigned. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.
Input arguments
- label:
BED/narrowPeak file (MACS results)
- type:
data:chipseq:callpeak
- required:
False
- disabled:
False
- hidden:
input_upload
- label:
BED file (Upload)
- type:
data:bed
- required:
False
- disabled:
False
- hidden:
input_macs || use_filtered_bam
- label:
Use Filtered BAM File
- type:
basic:boolean
- description:
Use filtered BAM file from a MACS2 object to rank enhancers by. Only applicable if input is MACS2.
- required:
True
- disabled:
False
- hidden:
input_upload
- default:
False
- label:
BAM file
- type:
data:alignment:bam
- description:
BAM file to rank enhancers by.
- required:
False
- disabled:
False
- hidden:
use_filtered_bam
- label:
Control BAM File
- type:
data:alignment:bam
- description:
BAM file to rank enhancers by.
- required:
False
- disabled:
False
- hidden:
use_filtered_bam
- label:
TSS exclusion
- type:
basic:integer
- description:
Enter a distance from TSS to exclude. 0 = no TSS exclusion.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Stitch
- type:
basic:integer
- description:
Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
- required:
False
- disabled:
False
- hidden:
False
- label:
Masking BED file
- type:
data:bed
- description:
Mask a set of regions from analysis. Provide a BED of masking regions.
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
All enhancers table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Super enhancers table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Plot points
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Plot panel
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Enhancer to gene
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Enhancer to top gene
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene to Enhancer
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Stitch parameter
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
All output
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Super-Enhancer plot
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Reads (QSEQ multiplexed, paired)
- data:multiplexed:qseq:pairedupload-multiplexed-paired (basic:file reads, basic:file reads2, basic:file barcodes, basic:file annotation)[Source: v1.4.1]
Upload multiplexed NGS reds in QSEQ format.
Input arguments
- label:
Multiplexed upstream reads
- type:
basic:file
- description:
NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
- required:
True
- validate_regex:
((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$
- label:
Multiplexed downstream reads
- type:
basic:file
- description:
NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
- required:
True
- validate_regex:
((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$
- label:
NGS barcodes
- type:
basic:file
- description:
Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
- required:
True
- validate_regex:
((\.qseq|\.qseq\.txt)(\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$
- label:
Barcode mapping
- type:
basic:file
- description:
A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.
- required:
True
- validate_regex:
(\.tsv)$
Output results
- label:
Multiplexed upstream reads
- type:
basic:file
- label:
Multiplexed downstream reads
- type:
basic:file
- label:
NGS barcodes
- type:
basic:file
- label:
Barcode mapping
- type:
basic:file
- label:
Matched
- type:
basic:string
- label:
Not matched
- type:
basic:string
- label:
Bad quality
- type:
basic:string
- label:
Skipped
- type:
basic:string
Reads (QSEQ multiplexed, single)
- data:multiplexed:qseq:singleupload-multiplexed-single (basic:file reads, basic:file barcodes, basic:file annotation)[Source: v1.4.1]
Upload multiplexed NGS reds in QSEQ format.
Input arguments
- label:
Multiplexed NGS reads
- type:
basic:file
- description:
NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
- required:
True
- validate_regex:
(\.(qseq)(|\.txt)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$
- label:
NGS barcodes
- type:
basic:file
- description:
Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
- required:
True
- validate_regex:
(\.(qseq)(|\.txt)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z))|(\.bz2)$
- label:
Barcode mapping
- type:
basic:file
- description:
A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.
- required:
True
- validate_regex:
(\.tsv)$
Output results
- label:
Multiplexed NGS reads
- type:
basic:file
- label:
NGS barcodes
- type:
basic:file
- label:
Barcode mapping
- type:
basic:file
- label:
Matched
- type:
basic:string
- label:
Not matched
- type:
basic:string
- label:
Bad quality
- type:
basic:string
- label:
Skipped
- type:
basic:string
Reads (scRNA 10x)
- data:screads:10x:upload-sc-10x (list:basic:file barcodes, list:basic:file reads)[Source: v1.4.1]
Import 10x scRNA reads in FASTQ format.
Input arguments
- label:
Barcodes (.fastq.gz)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Reads (.fastq.gz)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Barcodes
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Barcodes)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Reads)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
Reverse complement FASTQ (paired-end)
- data:reads:fastq:paired:seqtk:seqtk-rev-complement-paired (data:reads:fastq:paired reads, basic:string select_mate)[Source: v1.2.2]
Reverse complement paired-end FASTQ reads file using Seqtk.
Input arguments
- label:
Reads
- type:
data:reads:fastq:paired
- required:
True
- disabled:
False
- hidden:
False
- label:
Select mate
- type:
basic:string
- description:
Select the which mate should be reverse complemented.
- required:
True
- disabled:
False
- hidden:
False
- default:
Mate 1
- choices:
Mate 1:
Mate 1
Mate 2:
Mate 2
Both:
Both
Output results
- label:
Reverse complemented FASTQ file
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Remaining mate
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Mate 1)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive (Mate 1)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (Mate 2)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive (Mate 2)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Reverse complement FASTQ (single-end)
- data:reads:fastq:single:seqtk:seqtk-rev-complement-single (data:reads:fastq:single reads)[Source: v1.3.2]
Reverse complement single-end FASTQ reads file using Seqtk.
Input arguments
- label:
Reads
- type:
data:reads:fastq:single
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Reverse complemented FASTQ file
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
SAM header
- data:sam:headerupload-header-sam (basic:file src)[Source: v1.2.3]
Upload a mapping file header in SAM format.
Input arguments
- label:
Header (SAM)
- type:
basic:file
- description:
A mapping file header in SAM format.
- validate_regex:
\.(sam)$
Output results
- label:
Uploaded file
- type:
basic:file
SRA data
- data:sra:import-sra (list:basic:string sra_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v1.5.1]
Import reads from SRA. Import single or paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.
Input arguments
- label:
SRA accession(s)
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Prefetch SRA file
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Maximum file size to download in KB
- type:
basic:string
- description:
A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
- required:
True
- disabled:
False
- hidden:
False
- default:
20G
- label:
Minimum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Clip adapter sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only aligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only unaligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
SRA data (paired-end)
- data:reads:fastq:paired:import-sra-paired (list:basic:string sra_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v1.6.1]
Import paired-end reads from SRA. Import paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.
Input arguments
- label:
SRA accession(s)
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Prefetch SRA file
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Maximum file size to download in KB
- type:
basic:string
- description:
A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
- required:
True
- disabled:
False
- hidden:
False
- default:
20G
- label:
Minimum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Clip adapter sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only aligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only unaligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Reads file (mate 1)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Reads file (mate 2)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (mate 1)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC (mate 2)
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive (mate 1)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive (mate 2)
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
SRA data (single-end)
- data:reads:fastq:single:import-sra-single (list:basic:string sra_accession, basic:boolean prefetch, basic:string max_size_prefetch, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v1.6.1]
Import single-end reads from SRA. Import single-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.
Input arguments
- label:
SRA accession(s)
- type:
list:basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Prefetch SRA file
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Maximum file size to download in KB
- type:
basic:string
- description:
A unit prefix can be used instead of a value in KB (e.g. 1024M or 1G).
- required:
True
- disabled:
False
- hidden:
False
- default:
20G
- label:
Minimum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum spot ID
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- required:
False
- disabled:
False
- hidden:
False
- label:
Clip adapter sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only aligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Dump only unaligned sequences
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Reads file
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
STAR
- data:alignment:bam:star:alignment-star (data:reads:fastq reads, data:index:star genome, data:annotation annotation, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean gene_counts, basic:string feature_exon, basic:integer sjdb_overhang, basic:boolean chimeric, basic:integer chim_segment_min, basic:boolean quant_mode, basic:boolean single_end, basic:string out_filter_type, basic:integer out_multimap_max, basic:integer out_mismatch_max, basic:decimal out_mismatch_nl_max, basic:integer out_score_min, basic:decimal out_mismatch_nrl_max, basic:integer align_overhang_min, basic:integer align_sjdb_overhang_min, basic:integer align_intron_size_min, basic:integer align_intron_size_max, basic:integer align_gap_max, basic:string align_end_alignment, basic:boolean two_pass_mode, basic:boolean out_unmapped, basic:string out_sam_attributes, basic:string out_rg_line, list:basic:integer limit_buffer_size, basic:integer limit_sam_records, basic:integer limit_junction_reads, basic:integer limit_collapsed_junctions, basic:integer limit_inserted_junctions)[Source: v5.1.0]
Align reads with STAR aligner. Spliced Transcripts Alignment to a Reference (STAR) software is based on an alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. More information can be found in the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) and in the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/). The current version of STAR is 2.7.10b.
Input arguments
- label:
Input reads (FASTQ)
- type:
data:reads:fastq
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed reference genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation file (GTF/GFF3)
- type:
data:annotation
- description:
Insert known annotations into genome indices at the mapping stage.
- required:
False
- disabled:
False
- hidden:
False
- label:
The data is unstranded [–outSAMstrandField intronMotif]
- type:
basic:boolean
- description:
For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Remove non-canonical junctions (Cufflinks compatibility)
- type:
basic:boolean
- description:
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Gene count [–quantMode GeneCounts]
- type:
basic:boolean
- description:
With this option set to True STAR will count the number of reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Feature type [–sjdbGTFfeatureExon]
- type:
basic:string
- description:
Feature type in GTF file to be used as exons for building transcripts.
- required:
True
- disabled:
False
- hidden:
False
- default:
exon
- label:
Junction length [–sjdbOverhang]
- type:
basic:integer
- description:
This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In the case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.
- required:
True
- disabled:
False
- hidden:
False
- default:
100
- label:
Detect chimeric and circular alignments [–chimOutType SeparateSAMold]
- type:
basic:boolean
- description:
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments.Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum length of chimeric segment [–chimSegmentMin]
- type:
basic:integer
- required:
True
- disabled:
!detect_chimeric.chimeric
- hidden:
False
- default:
20
- label:
Output in transcript coordinates [–quantMode TranscriptomeSAM]
- type:
basic:boolean
- description:
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]
- type:
basic:boolean
- description:
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).
- required:
True
- disabled:
!t_coordinates.quant_mode
- hidden:
False
- default:
False
- label:
Type of filtering [–outFilterType]
- type:
basic:string
- description:
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.
- required:
True
- disabled:
False
- hidden:
False
- default:
Normal
- choices:
Normal:
Normal
BySJout:
BySJout
- label:
Maximum number of loci [–outFilterMultimapNmax]
- type:
basic:integer
- description:
Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum number of mismatches [–outFilterMismatchNmax]
- type:
basic:integer
- description:
Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]
- type:
basic:decimal
- description:
Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minumum alignment score [–outFilterScoreMin]
- type:
basic:integer
- description:
Alignment will be output only if its score is higher than or equal to this value (default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]
- type:
basic:decimal
- description:
Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum overhang [–alignSJoverhangMin]
- type:
basic:integer
- description:
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum overhang (sjdb) [–alignSJDBoverhangMin]
- type:
basic:integer
- description:
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum intron size [–alignIntronMin]
- type:
basic:integer
- description:
Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum intron size [–alignIntronMax]
- type:
basic:integer
- description:
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum gap between mates [–alignMatesGapMax]
- type:
basic:integer
- description:
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Read ends alignment [–alignEndsType]
- type:
basic:string
- description:
Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.
- required:
False
- disabled:
False
- hidden:
False
- choices:
Local:
Local
EndToEnd:
EndToEnd
Extend5pOfRead1:
Extend5pOfRead1
Extend5pOfReads12:
Extend5pOfReads12
- label:
Use two pass mode [–twopassMode]
- type:
basic:boolean
- description:
Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Output unmapped reads (SAM) [–outSAMunmapped Within]
- type:
basic:boolean
- description:
Output of unmapped reads in the SAM format.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Desired SAM attributes [–outSAMattributes]
- type:
basic:string
- description:
A string of desired SAM attributes, in the order desired for the output SAM.
- required:
True
- disabled:
False
- hidden:
False
- default:
Standard
- choices:
Standard:
Standard
All:
All
NH HI NM MD:
NH HI NM MD
None:
None
- label:
SAM/BAM read group line [–outSAMattrRGline]
- type:
basic:string
- description:
The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in –readFilesIn. Commas have to be surrounded by spaces, e.g. –outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.
- required:
False
- disabled:
False
- hidden:
False
- label:
Buffer size [–limitIObufferSize]
- type:
list:basic:integer
- description:
Maximum available buffers size (bytes) for input/output, per thread. Parameter requires two numbers - separate sizes for input and output buffers.
- required:
True
- disabled:
False
- hidden:
False
- default:
[30000000, 50000000]
- label:
Maximum size of the SAM record [–limitOutSAMoneReadBytes]
- type:
basic:integer
- description:
Maximum size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax.
- required:
True
- disabled:
False
- hidden:
False
- default:
100000
- label:
Maximum number of junctions [–limitOutSJoneRead]
- type:
basic:integer
- description:
Maximum number of junctions for one read (including all multi-mappers).
- required:
True
- disabled:
False
- hidden:
False
- default:
1000
- label:
Maximum number of collapsed junctions [–limitOutSJcollapsed]
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
Maximum number of junction to be inserted [–limitSjdbInsertNsj]
- type:
basic:integer
- description:
Maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run.
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
Output results
- label:
Alignment file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
BAM file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Unmapped reads (mate 1)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Unmapped reads (mate 2)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Splice junctions
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Chimeric alignments
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Alignment (transcriptome coordinates)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene counts
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
STAR genome index
- data:index:star:alignment-star-index (data:seq:nucleotide ref_seq, data:annotation annotation, basic:string source, basic:string feature_exon, basic:integer sjdb_overhang, basic:integer genome_sa_string_len, basic:integer genome_chr_bin_size, basic:integer genome_sa_sparsity)[Source: v4.0.0]
Generate STAR genome index. Generate genome indices files from the supplied reference genome sequence and GTF files. The current version of STAR is 2.7.10b.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation file (GTF/GFF3)
- type:
data:annotation
- description:
Insert known annotations into genome indices at the indexing stage.
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene ID Database Source
- type:
basic:string
- required:
False
- disabled:
annotation
- hidden:
False
- choices:
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Feature type [–sjdbGTFfeatureExon]
- type:
basic:string
- description:
Feature type in GTF file to be used as exons for building transcripts.
- required:
True
- disabled:
False
- hidden:
False
- default:
exon
- label:
Junction length [–sjdbOverhang]
- type:
basic:integer
- description:
This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.
- required:
True
- disabled:
False
- hidden:
False
- default:
100
- label:
Small genome adjustment [–genomeSAindexNbases]
- type:
basic:integer
- description:
For small genomes, the parameter –genomeSAindexNbases needs to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.
- required:
False
- disabled:
False
- hidden:
False
- label:
Bin size for genome storage [–genomeChrBinNbits]
- type:
basic:integer
- description:
If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the –genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: –genomeChrBinNbits = min(18, log2(GenomeLength / NumberOfReferences)). For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.
- required:
False
- disabled:
False
- hidden:
False
- label:
Suffix array sparsity [–genomeSAsparseD]
- type:
basic:integer
- description:
Suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction (integer > 0, default = 1).
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
Indexed genome
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
STAR-based gene quantification workflow
- data:workflow:rnaseq:star:qc:workflow-bbduk-star-qc (data:reads:fastq reads, data:index:star genome, data:annotation annotation, basic:string assay_type, data:index:salmon cdna_index, data:index:star rrna_reference, data:index:star globin_reference, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:string quality_encoding_offset, basic:boolean ignore_bad_quality, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chim_segment_min, basic:boolean quant_mode, basic:boolean single_end, basic:string out_filter_type, basic:integer out_multimap_max, basic:integer out_mismatch_max, basic:decimal out_mismatch_nl_max, basic:integer out_score_min, basic:decimal out_mismatch_nrl_max, basic:integer align_overhang_min, basic:integer align_sjdb_overhang_min, basic:integer align_intron_size_min, basic:integer align_intron_size_max, basic:integer align_gap_max, basic:string align_end_alignment, basic:boolean two_pass_mode, basic:boolean out_unmapped, basic:string out_sam_attributes, basic:string out_rg_line, basic:integer n_reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.4.0]
STAR-based RNA-seq pipeline. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. STAR aligner counts and reports the number of aligned reads per gene while mapping. STAR version used is 2.7.10b. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are downsampled (using __Seqtk__ tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences. Final step of the workflow is QoRTs QC analysis with downsampled reads.
Input arguments
- label:
Reads (FASTQ)
- type:
data:reads:fastq
- description:
Reads in FASTQ file, single or paired end.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed reference genome
- type:
data:index:star
- description:
Genome index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Annotation
- type:
data:annotation
- description:
GTF and GFF3 annotation formats are supported.
- required:
True
- disabled:
False
- hidden:
False
- label:
Assay type
- type:
basic:string
- description:
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- required:
True
- disabled:
False
- hidden:
False
- default:
non_specific
- choices:
Strand non-specific:
non_specific
Strand-specific forward:
forward
Strand-specific reverse:
reverse
Detect automatically:
auto
- label:
Indexed cDNA reference sequence
- type:
data:index:salmon
- description:
Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
- required:
False
- disabled:
False
- hidden:
assay_type != ‘auto’
- label:
Indexed rRNA reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Indexed Globin reference sequence
- type:
data:index:star
- description:
Reference sequence index prepared by STAR aligner indexing tool.
- required:
True
- disabled:
False
- hidden:
False
- label:
Adapters
- type:
list:data:seq:nucleotide
- description:
FASTA file(s) with adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Custom adapter sequences
- type:
list:basic:string
- description:
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
K-mer length [k=]
- type:
basic:integer
- description:
K-mer length used for finding contaminants. Contaminants shorter than k-mer length will not be found. K-mer length must be at least 1.
- required:
True
- disabled:
False
- hidden:
False
- default:
23
- label:
Minimum k-mer length at right end of reads used for trimming [mink=]
- type:
basic:integer
- required:
True
- disabled:
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- hidden:
False
- default:
11
- label:
Maximum Hamming distance for k-mers [hammingdistance=]
- type:
basic:integer
- description:
Hamming distance i.e. the number of mismatches allowed in the k-mer.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Max Ns after trimming [maxns=]
- type:
basic:integer
- description:
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
-1
- label:
Average quality below which to trim region [trimq=]
- type:
basic:integer
- description:
Phred algorithm is used, which is more accurate than naive trimming.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Minimum read length [minlength=]
- type:
basic:integer
- description:
Reads shorter than minimum read length after trimming are discarded.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Quality encoding offset [qin=]
- type:
basic:string
- description:
Quality encoding offset for input FASTQ files.
- required:
True
- disabled:
False
- hidden:
False
- default:
auto
- choices:
Sanger / Illumina 1.8+:
33
Illumina up to 1.3+, 1.5+:
64
Auto:
auto
- label:
Ignore bad quality [ignorebadquality]
- type:
basic:boolean
- description:
Don’t crash if quality values appear to be incorrect.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
The data is unstranded [–outSAMstrandField intronMotif]
- type:
basic:boolean
- description:
For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Remove non-canonical junctions (Cufflinks compatibility)
- type:
basic:boolean
- description:
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Detect chimeric and circular alignments [–chimOutType SeparateSAMold]
- type:
basic:boolean
- description:
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two segments. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Minimum length of chimeric segment [–chimSegmentMin]
- type:
basic:integer
- required:
True
- disabled:
!alignment.chimeric_reads.chimeric
- hidden:
False
- default:
20
- label:
Output in transcript coordinates [–quantMode]
- type:
basic:boolean
- description:
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Allow soft-clipping and indels [–quantTranscriptomeBan Singleend]
- type:
basic:boolean
- description:
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions and soft-clips in the transcriptomic alignments, which can be used by some expression quantification softwares (e.g. eXpress).
- required:
True
- disabled:
!t_coordinates.quant_mode
- hidden:
False
- default:
False
- label:
Type of filtering [–outFilterType]
- type:
basic:string
- description:
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab.
- required:
True
- disabled:
False
- hidden:
False
- default:
Normal
- choices:
Normal:
Normal
BySJout:
BySJout
- label:
Maximum number of loci [–outFilterMultimapNmax]
- type:
basic:integer
- description:
Maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as ‘mapped to too many loci’ (default: 10).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum number of mismatches [–outFilterMismatchNmax]
- type:
basic:integer
- description:
Alignment will be output only if it has fewer mismatches than this value (default: 10). Large number (e.g. 999) switches off this filter.
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum no. of mismatches (map length) [–outFilterMismatchNoverLmax]
- type:
basic:decimal
- description:
Alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value (default: 0.3). The value should be between 0.0 and 1.0.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum alignment score [–outFilterScoreMin]
- type:
basic:integer
- description:
Alignment will be output only if its score is higher than or equal to this value (default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum no. of mismatches (read length) [–outFilterMismatchNoverReadLmax]
- type:
basic:decimal
- description:
Alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value (default: 1.0). Using 0.04 for 2x100bp, the max number of mismatches is calculated as 0.04*200=8 for the paired read. The value should be between 0.0 and 1.0.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum overhang [–alignSJoverhangMin]
- type:
basic:integer
- description:
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum overhang (sjdb) [–alignSJDBoverhangMin]
- type:
basic:integer
- description:
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum intron size [–alignIntronMin]
- type:
basic:integer
- description:
Minimum intron size: the genomic gap is considered an intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum intron size [–alignIntronMax]
- type:
basic:integer
- description:
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins)(default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum gap between mates [–alignMatesGapMax]
- type:
basic:integer
- description:
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required:
False
- disabled:
False
- hidden:
False
- label:
Read ends alignment [–alignEndsType]
- type:
basic:string
- description:
Type of read ends alignment (default: Local). Local: standard local alignment with soft-clipping allowed. EndToEnd: force end-to-end read alignment, do not soft-clip. Extend5pOfRead1: fully extend only the 5p of the read1, all other ends: local alignment. Extend5pOfReads12: fully extend only the 5’ of the both read1 and read2, all other ends use local alignment.
- required:
True
- disabled:
False
- hidden:
False
- default:
Local
- choices:
Local:
Local
EndToEnd:
EndToEnd
Extend5pOfRead1:
Extend5pOfRead1
Extend5pOfReads12:
Extend5pOfReads12
- label:
Use two pass mode [–twopassMode]
- type:
basic:boolean
- description:
Use two-pass maping instead of first-pass only. In two-pass mode we first perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Output unmapped reads (SAM) [–outSAMunmapped Within]
- type:
basic:boolean
- description:
Output of unmapped reads in the SAM format.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Desired SAM attributes [–outSAMattributes]
- type:
basic:string
- description:
A string of desired SAM attributes, in the order desired for the output SAM.
- required:
True
- disabled:
False
- hidden:
False
- default:
Standard
- choices:
Standard:
Standard
All:
All
NH HI NM MD:
NH HI NM MD
None:
None
- label:
SAM/BAM read group line [–outSAMattrRGline]
- type:
basic:string
- description:
The first word contains the read group identifier and must start with ID:, e.g. –outSAMattrRGline ID:xxx CN:yy ”DS:z z z” xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines corresponds to different (comma separated) input files in -readFilesIn. Commas have to be surrounded by spaces, e.g. -outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy.
- required:
False
- disabled:
False
- hidden:
False
- label:
Number of reads in subsampled alignment file for strandedness detection
- type:
basic:integer
- description:
Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
- required:
True
- disabled:
False
- hidden:
assay_type != ‘auto’
- default:
5000000
- label:
Number of reads
- type:
basic:integer
- description:
Number of reads to include in downsampling.
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
Seed [-s]
- type:
basic:integer
- description:
Using the same random seed makes reads downsampling more reproducible in different environments.
- required:
True
- disabled:
False
- hidden:
False
- default:
11
- label:
Fraction of reads used
- type:
basic:decimal
- description:
Use the fraction of reads [0.0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
- required:
False
- disabled:
False
- hidden:
False
- label:
2-pass mode [-2]
- type:
basic:boolean
- description:
Enable two-pass mode when downsampling. Two-pass mode is twice as slow but with much reduced memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
Salmon Index
- data:index:salmonsalmon-index (data:seq:nucleotide nucl, data:file decoys, basic:boolean gencode, basic:boolean keep_duplicates, basic:string source, basic:string species, basic:string build, basic:integer kmerlen)[Source: v2.2.1]
Generate index files for Salmon transcript quantification tool.
Input arguments
- label:
Nucleotide sequence
- type:
data:seq:nucleotide
- description:
A CDS sequence file in .FASTA format.
- label:
Decoys
- type:
data:file
- description:
Treat these sequences as decoys that may have sequence homologous to some known transcript.
- required:
False
- label:
Gencode
- type:
basic:boolean
- description:
This flag will expect the input transcript FASTA to be in GENCODE format, and will split the transcript name at the first ‘|’ character. These reduced names will be used in the output and when looking for these transcripts in a gene to transcript GTF.
- default:
False
- label:
Keep duplicates
- type:
basic:boolean
- description:
This flag will disable the default indexing behavior of discarding sequence-identical duplicate transcripts. If this flag is passed, then duplicate transcripts that appear in the input will be retained and quantified separately.
- default:
False
- label:
Source of attribute ID
- type:
basic:string
- choices:
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
- label:
Genome build
- type:
basic:string
- label:
Size of k-mers
- type:
basic:integer
- description:
The size of k-mers that should be used for the quasi index. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads.
- default:
31
Output results
- label:
Salmon index
- type:
basic:dir
- label:
Source of attribute ID
- type:
basic:string
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Samtools bedcov
- data:bedcov:samtools-bedcov (data:alignment:bam bam, data:bed bedfile, basic:integer min_read_qual, basic:boolean rm_del_ref_skips, basic:string output_option)[Source: v1.2.0]
Samtools bedcov. Reports the total read base count (i.e. the sum of per base read depths) for each genomic region specified in the supplied BED file. The regions are output as they appear in the BED file and are 0-based. The output is formatted as tab-delimited data, where the initial three columns indicate the chromosome, start, and end positions of the region. The subsequent column provides either the cumulative read base counts or the normalized sum of read base counts based on the length of each individual region (mean coverage). For more information about samtools bedcov, click [here](https://www.htslib.org/doc/samtools-bedcov.html).
Input arguments
- label:
Input BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Target BED file
- type:
data:bed
- description:
Target BED file with regions to extract.
- required:
True
- disabled:
False
- hidden:
False
- label:
Minimum read mapping quality
- type:
basic:integer
- description:
Only count reads with mapping quality greater than or equal to [-Q]
- required:
False
- disabled:
False
- hidden:
False
- label:
Skip deletions and ref skips
- type:
basic:boolean
- description:
Do not include deletions (D) and ref skips (N) in bedcov computation. [-j]
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Metric by which to output coverage
- type:
basic:string
- description:
Opt for either displaying the cumulative read base counts or the normalized read base counts based on the length of each region. The latter approach is not part of samtools but implemented within the resolwe-bio process.
- required:
False
- disabled:
False
- hidden:
False
- default:
sum
- choices:
Sum (default):
sum
Mean:
mean
Output results
- label:
Output coverage report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Samtools coverage (multi-sample)
- data:samtoolscoverage:multi:samtools-coverage-multi (list:data:alignment:bam bam, basic:string region, basic:integer min_read_length, basic:integer min_mq, basic:integer min_bq, list:basic:string excl_flags, basic:integer depth, basic:boolean no_header)[Source: v1.0.0]
Samtools coverage for multiple BAM files. Computes the depth at each position or region and creates tabulated text. For more information about samtools coverage, click [here](https://www.htslib.org/doc/samtools-coverage.html).
Input arguments
- label:
Input BAM files
- type:
list:data:alignment:bam
- description:
Select BAM file(s) for the analysis. Coverage information will be calculated from the merged alignments.
- required:
True
- disabled:
False
- hidden:
False
- label:
Region
- type:
basic:string
- description:
Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- description:
Ignore reads shorter than specified number of base pairs.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum mapping quality
- type:
basic:integer
- description:
Minimum mapping quality for an alignment to be used.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum base quality
- type:
basic:integer
- description:
Minimum base quality for a base to be considered.
- required:
False
- disabled:
False
- hidden:
False
- label:
Filter flags
- type:
list:basic:string
- description:
Filter flags: skip reads with mask bits set. Press ENTER after each flag.
- required:
True
- disabled:
False
- hidden:
False
- default:
['UNMAP', 'SECONDARY', 'QCFAIL', 'DUP']
- label:
Maximum allowed coverage depth
- type:
basic:integer
- description:
If 0, depth is set to the maximum integer value effectively removing any depth limit.
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
No header
- type:
basic:boolean
- description:
Do not output header.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Output coverage table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Samtools coverage (single-sample)
- data:samtoolscoverage:single:samtools-coverage-single (data:alignment:bam bam, basic:string region, basic:integer min_read_length, basic:integer min_mq, basic:integer min_bq, list:basic:string excl_flags, basic:integer depth, basic:boolean no_header)[Source: v1.0.0]
Samtools coverage for a single BAM file. Computes the depth at each position or region and creates tabulated text. For more information about samtools coverage, click [here](https://www.htslib.org/doc/samtools-coverage.html).
Input arguments
- label:
Input BAM file
- type:
data:alignment:bam
- description:
Select BAM file for the analysis
- required:
True
- disabled:
False
- hidden:
False
- label:
Region
- type:
basic:string
- description:
Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum read length
- type:
basic:integer
- description:
Ignore reads shorter than specified number of base pairs.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum mapping quality
- type:
basic:integer
- description:
Minimum mapping quality for an alignment to be used.
- required:
False
- disabled:
False
- hidden:
False
- label:
Minimum base quality
- type:
basic:integer
- description:
Minimum base quality for a base to be considered.
- required:
False
- disabled:
False
- hidden:
False
- label:
Filter flags
- type:
list:basic:string
- description:
Filter flags: skip reads with mask bits set. Press ENTER after each flag.
- required:
True
- disabled:
False
- hidden:
False
- default:
['UNMAP', 'SECONDARY', 'QCFAIL', 'DUP']
- label:
Maximum allowed coverage depth
- type:
basic:integer
- description:
If 0, depth is set to the maximum integer value effectively removing any depth limit.
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
No header
- type:
basic:boolean
- description:
Do not output header.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Output coverage table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Samtools fastq (paired-end)
- data:reads:fastq:paired:bamtofastq:bamtofastq-paired (data:alignment:bam bam)[Source: v1.3.2]
Convert aligned reads in BAM format to paired-end FASTQ files format.
Input arguments
- label:
BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Remaining mate1 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Remaining mate2 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate1 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate2 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate1 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate2 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Samtools idxstats
- data:samtools:idxstats:samtools-idxstats (data:alignment:bam alignment)[Source: v1.4.2]
Retrieve and print stats in the index file.
Input arguments
- label:
Alignment
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Samtools idxstats report
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Samtools view
- data:alignment:bam:samtools:samtools-view (data:alignment:bam bam, basic:string region, data:bed bedfile, basic:boolean include_header, basic:boolean only_header, basic:decimal subsample, basic:integer subsample_seed, basic:integer threads)[Source: v1.0.1]
Samtools view. With no options or regions specified, saves all alignments in the specified input alignment file in BAM format to standard output also in BAM format. You may specify one or more space-separated region specifications to restrict output to only those alignments which overlap the specified region(s). For more information about samtools view, click [here](https://www.htslib.org/doc/samtools-view.html).
Input arguments
- label:
Input BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Region
- type:
basic:string
- description:
Region can be specified as: RNAME:STARTPOS-ENDPOS and all position coordinates are 1-based, where RNAME is the name of the contig. If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30293-39103.
- required:
False
- disabled:
False
- hidden:
bedfile
- label:
Target BED file
- type:
data:bed
- description:
Target BED file with regions to extract.If the input BAM file was generated by General RNA-seq pipeline, you should use only chromosome numbers to subset the input file, e.g. 3:30292-39103.
- required:
False
- disabled:
False
- hidden:
region
- label:
Include the header in the output
- type:
basic:boolean
- required:
True
- disabled:
advanced.only_header
- hidden:
False
- default:
True
- label:
Output the header only
- type:
basic:boolean
- description:
Selecting this option overrides all other options.
- required:
True
- disabled:
advanced.include_header
- hidden:
False
- default:
False
- label:
Fraction of the input alignments
- type:
basic:decimal
- description:
Output only a proportion of the input alignments, as specified by 0.0 ≤ FLOAT ≤ 1.0, which gives the fraction of templates/pairs to be kept. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate.
- required:
False
- disabled:
False
- hidden:
False
- label:
Subsampling seed
- type:
basic:integer
- description:
Subsampling seed used to influence which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected.
- required:
True
- disabled:
False
- hidden:
!advanced.subsample
- default:
11
- label:
Number of threads
- type:
basic:integer
- description:
Number of BAM compression threads to use in addition to main thread.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
Output results
- label:
Output BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Ouput index file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Secondary hybrid BAM file
- data:alignment:bam:secondaryupload-bam-secondary (data:alignment:bam bam, basic:file src, basic:string species, basic:string build)[Source: v0.10.0]
Upload a secondary mapping file in BAM format.
Input arguments
- label:
Hybrid bam
- type:
data:alignment:bam
- description:
Secondary bam will be appended to the same sample where hybrid bam is.
- required:
False
- label:
Mapping (BAM)
- type:
basic:file
- description:
A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
- validate_regex:
\.(bam)$
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Drosophila melanogaster:
Drosophila melanogaster
Mus musculus:
Mus musculus
- label:
Build
- type:
basic:string
Output results
- label:
Uploaded file
- type:
basic:file
- label:
Index BAI
- type:
basic:file
- label:
Alignment statistics
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Single cell BAM file and index
- data:alignment:bam:scseq:upload-bam-scseq-indexed (basic:file src, basic:file src2, data:screads: reads, basic:string species, basic:string build)[Source: v1.4.1]
Import scSeq BAM file and index.
Input arguments
- label:
Mapping (BAM)
- type:
basic:file
- description:
A mapping file in BAM format.
- required:
True
- disabled:
False
- hidden:
False
- label:
BAM index (*.bam.bai file)
- type:
basic:file
- description:
An index file of a BAM mapping file (ending with bam.bai).
- required:
True
- disabled:
False
- hidden:
False
- label:
Single cell fastq reads
- type:
data:screads:
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Species latin name.
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Uploaded BAM
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index BAI
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Spike-ins quality control
- data:spikeinsspikein-qc (list:data:expression samples, basic:string mix)[Source: v1.4.1]
Plot spike-ins measured abundances for samples quality control. The process will output graphs showing the correlation between known concentration of ERCC spike-ins and sample’s measured abundance.
Input arguments
- label:
Expressions with spike-ins
- type:
list:data:expression
- label:
Spike-ins mix
- type:
basic:string
- description:
Select spike-ins mix.
- choices:
ERCC Mix 1:
ercc_mix1
ERCC Mix 2:
ercc_mix2
SIRV-Set 3:
sirv_set3
Output results
- label:
Plot figures
- type:
list:basic:file
- required:
False
- label:
HTML report with results
- type:
basic:file:html
- required:
False
- hidden:
True
- label:
ZIP file contining HTML report with results
- type:
basic:file
- required:
False
Subsample FASTQ (paired-end)
- data:reads:fastq:paired:seqtk:seqtk-sample-paired (data:reads:fastq:paired reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.5.2]
Subsample reads from FASTQ files (paired-end). [Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).
Input arguments
- label:
Reads
- type:
data:reads:fastq:paired
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of reads
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
Seed
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
- required:
False
- disabled:
False
- hidden:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Remaining mate 1 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Remaining mate 2 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate 1 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate 2 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate 1 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate 2 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Subsample FASTQ (single-end)
- data:reads:fastq:single:seqtk:seqtk-sample-single (data:reads:fastq:single reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.5.2]
Subsample reads from FASTQ file (single-end). [Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).
Input arguments
- label:
Reads
- type:
data:reads:fastq:single
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of reads
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
1000000
- label:
Seed
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the ‘Number of reads’ input parameter.
- required:
False
- disabled:
False
- hidden:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Remaining reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Subsample FASTQ and BWA Aln (paired-end)
- data:workflow:chipseq:seqtkbwaalnworkflow-subsample-bwa-aln-paired (data:reads:fastq:paired reads, data:index:bwa genome, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v1.1.0]
Input arguments
- label:
Reads
- type:
data:reads:fastq:paired
- label:
Reference genome
- type:
data:index:bwa
- label:
Number of reads
- type:
basic:integer
- default:
10000000
- label:
Seed
- type:
basic:integer
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default:
True
- label:
Quality threshold
- type:
basic:integer
- description:
Parameter for dynamic read trimming.
- default:
5
- label:
Use maximum edit distance (excludes fraction of missing alignments)
- type:
basic:boolean
- default:
False
- label:
Maximum edit distance
- type:
basic:integer
- hidden:
!use_edit
- default:
5
- label:
Fraction of missing alignments
- type:
basic:decimal
- description:
The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
- hidden:
use_edit
- default:
0.04
- label:
Use seeds
- type:
basic:boolean
- default:
True
- label:
Seed length
- type:
basic:integer
- description:
Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
- hidden:
!seeds
- default:
32
- label:
Seed maximum edit distance
- type:
basic:integer
- hidden:
!seeds
- default:
2
Output results
Subsample FASTQ and BWA Aln (single-end)
- data:workflow:chipseq:seqtkbwaalnworkflow-subsample-bwa-aln-single (data:reads:fastq:single reads, data:index:bwa genome, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v1.1.0]
Input arguments
- label:
Reads
- type:
data:reads:fastq:single
- label:
Reference genome
- type:
data:index:bwa
- label:
Number of reads
- type:
basic:integer
- default:
10000000
- label:
Seed
- type:
basic:integer
- default:
11
- label:
Fraction
- type:
basic:decimal
- description:
Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required:
False
- label:
2-pass mode
- type:
basic:boolean
- description:
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default:
True
- label:
Quality threshold
- type:
basic:integer
- description:
Parameter for dynamic read trimming.
- default:
5
- label:
Use maximum edit distance (excludes fraction of missing alignments)
- type:
basic:boolean
- default:
False
- label:
Maximum edit distance
- type:
basic:integer
- hidden:
!use_edit
- default:
5
- label:
Fraction of missing alignments
- type:
basic:decimal
- description:
The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
- hidden:
use_edit
- default:
0.04
- label:
Use seeds
- type:
basic:boolean
- default:
True
- label:
Seed length
- type:
basic:integer
- description:
Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
- hidden:
!seeds
- default:
32
- label:
Seed maximum edit distance
- type:
basic:integer
- hidden:
!seeds
- default:
2
Output results
Test basic fields
- data:test:fieldstest-basic-fields (basic:boolean boolean, basic:date date, basic:datetime datetime, basic:decimal decimal, basic:integer integer, basic:string string, basic:text text, basic:url:download url_download, basic:url:view url_view, basic:string string2, basic:string string3, basic:string string4, basic:string string5, basic:string string6, basic:string string7, basic:string tricky2)[Source: v1.2.4]
Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.
Input arguments
- label:
Boolean
- type:
basic:boolean
- default:
True
- label:
Date
- type:
basic:date
- default:
2013-12-31
- label:
Date and time
- type:
basic:datetime
- default:
2013-12-31 23:59:59
- label:
Decimal
- type:
basic:decimal
- default:
-123.456
- label:
Integer
- type:
basic:integer
- default:
-123
- label:
String
- type:
basic:string
- default:
Foo b-a-r.gz 1.23
- label:
Text
- type:
basic:text
- default:
Foo bar in 3 lines.
- label:
URL download
- type:
basic:url:download
- default:
{'url': 'http://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf'}
- label:
URL view
- type:
basic:url:view
- default:
{'name': 'Something', 'url': 'http://www.something.com/'}
- label:
String 2 required
- type:
basic:string
- description:
String 2 description.
- required:
True
- disabled:
false
- hidden:
false
- placeholder:
Enter string
- label:
String 3 disabled
- type:
basic:string
- description:
String 3 description.
- disabled:
true
- default:
disabled
- label:
String 4 hidden
- type:
basic:string
- description:
String 4 description.
- hidden:
True
- default:
hidden
- label:
String 5 choices
- type:
basic:string
- description:
String 5 description.
- hidden:
False
- default:
choice_2
- choices:
Choice 1:
choice_1
Choice 2:
choice_2
Choice 3:
choice_3
- label:
String 6 regex only “Aa”
- type:
basic:string
- default:
AAaAaaa
- validate_regex:
^[aA]*$
- label:
String 7 optional choices
- type:
basic:string
- description:
String 7 description.
- required:
False
- hidden:
False
- default:
choice_2
- choices:
Choice 1:
choice_1
Choice 2:
choice_2
Choice 3:
choice_3
- label:
Tricky 2
- type:
basic:string
- default:
true
Output results
- label:
Result
- type:
basic:url:view
- label:
Boolean
- type:
basic:boolean
- label:
Date
- type:
basic:date
- label:
Date and time
- type:
basic:datetime
- label:
Decimal
- type:
basic:decimal
- label:
Integer
- type:
basic:integer
- label:
String
- type:
basic:string
- label:
Text
- type:
basic:text
- label:
URL download
- type:
basic:url:download
- label:
URL view
- type:
basic:url:view
- label:
String 2 required
- type:
basic:string
- description:
String 2 description.
- label:
String 3 disabled
- type:
basic:string
- description:
String 3 description.
- label:
String 4 hidden
- type:
basic:string
- description:
String 4 description.
- label:
String 5 choices
- type:
basic:string
- description:
String 5 description.
- label:
String 6 regex only “Aa”
- type:
basic:string
- label:
String 7 optional choices
- type:
basic:string
- label:
Tricky 2
- type:
basic:string
Test disabled inputs
- data:test:disabledtest-disabled (basic:boolean broad, basic:integer broad_width, basic:string width_label, basic:integer if_and_condition)[Source: v1.2.4]
Test disabled input fields.
Input arguments
- label:
Broad peaks
- type:
basic:boolean
- default:
False
- label:
Width of peaks
- type:
basic:integer
- disabled:
broad === false
- default:
5
- label:
Width label
- type:
basic:string
- disabled:
broad === false
- default:
FD
- label:
If width is 5 and label FDR
- type:
basic:integer
- disabled:
broad_width == 5 && width_label == ‘FDR’
- default:
5
Output results
- label:
Result
- type:
basic:string
Test select controler
- data:test:resulttest-list (data:test:result single, list:data:test:result multiple)[Source: v1.2.4]
Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.
Input arguments
- label:
Single
- type:
data:test:result
- label:
Multiple
- type:
list:data:test:result
Output results
- label:
Result
- type:
basic:string
Test sleep progress
- data:test:resulttest-sleep-progress (basic:integer t)[Source: v1.2.4]
Test for the progress bar by sleeping 5 times for the specified amount of time.
Input arguments
- label:
Sleep time
- type:
basic:integer
- default:
5
Output results
- label:
Result
- type:
basic:string
Trim Galore (paired-end)
- data:reads:fastq:paired:trimgalore:trimgalore-paired (data:reads:fastq:paired reads, list:basic:string adapter, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, basic:integer quality, basic:integer nextseq, basic:string phred, basic:integer min_length, basic:integer max_n, basic:boolean retain_unpaired, basic:integer unpaired_len_1, basic:integer unpaired_len_2, basic:integer clip_r1, basic:integer clip_r2, basic:integer three_prime_r1, basic:integer three_prime_r2, basic:integer trim_5, basic:integer trim_3)[Source: v1.3.2]
Process paired-end sequencing reads with Trim Galore. Trim Galore is a wrapper script that makes use of the publicly available adapter trimming tool Cutadapt and FastQC for quality control once the trimming process has completed. Low-quality ends are trimmed from reads in addition to adapter removal in a single pass. If no sequence was supplied, Trim Galore will attempt to auto-detect the adapter which has been used. For this it will analyse the first 1 million sequences of the first specified file and attempt to find the first 12 or 13bp of the following standard adapters: Illumina: AGATCGGAAGAGC, Small RNA: TGGAATTCTCGG, Nextera: CTGTCTCTTATA. If no adapter contamination can be detected within the first 1 million sequences, or in case of a tie between several different adapters, Trim Galore defaults to illumina adapters. For additional information see official [user guide](https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md).
Input arguments
- label:
Select paired-end reads
- type:
data:reads:fastq:paired
- required:
True
- disabled:
False
- hidden:
False
- label:
Read 1 adapter sequence
- type:
list:basic:string
- description:
Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Read 2 adapter sequence
- type:
list:basic:string
- description:
Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- default:
[]
- label:
Read 1 adapters file
- type:
data:seq:nucleotide
- description:
This is mutually exclusive with read 1 adapters and universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Read 2 adapters file
- type:
data:seq:nucleotide
- description:
This is mutually exclusive with read 2 adapters and universal adapters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Universal adapters
- type:
basic:string
- description:
Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.
- required:
False
- disabled:
False
- hidden:
False
- choices:
Illumina:
--illumina
Nextera:
--nextera
Illumina small RNA:
--small_rna
- label:
Overlap with adapter sequence required to trim
- type:
basic:integer
- description:
Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
- required:
True
- disabled:
False
- hidden:
False
- default:
1
- label:
Maximum allowed error rate
- type:
basic:decimal
- description:
Number of errors divided by the length of the matching region
- required:
True
- disabled:
False
- hidden:
False
- default:
0.1
- label:
Quality cutoff
- type:
basic:integer
- description:
Trim low-quality ends from reads based on phred score.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
NextSeq/NovaSeq trim cutoff
- type:
basic:integer
- description:
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
- required:
False
- disabled:
False
- hidden:
False
- label:
Phred score encoding
- type:
basic:string
- description:
Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1.9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming
- required:
True
- disabled:
False
- hidden:
False
- default:
--phred33
- choices:
ASCII+33:
--phred33
ASCII+64:
--phred64
- label:
Minimum length after trimming
- type:
basic:integer
- description:
Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Maximum number of Ns
- type:
basic:integer
- description:
Read exceeding this limit will result in the entire pair being removed from the trimmed output files.
- required:
False
- disabled:
False
- hidden:
False
- label:
Retain unpaired reads after trimming
- type:
basic:boolean
- description:
If only one of the two paired-end reads became too short, the longer read will be written.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Unpaired read length cutoff for mate 1
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
!quality_trim.retain_unpaired
- default:
35
- label:
Unpaired read length cutoff for mate 2
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
!quality_trim.retain_unpaired
- default:
35
- label:
Trim bases from 5’ end of read 1
- type:
basic:integer
- description:
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.
- required:
False
- disabled:
False
- hidden:
False
- label:
Trim bases from 5’ end of read 2
- type:
basic:integer
- description:
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.
- required:
False
- disabled:
False
- hidden:
False
- label:
Trim bases from 3’ end of read 1
- type:
basic:integer
- description:
Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
- required:
False
- disabled:
False
- hidden:
False
- label:
Trim bases from 3’ end of read 2
- type:
basic:integer
- description:
Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
- required:
False
- disabled:
False
- hidden:
False
- label:
Hard trim sequences from 3’ end
- type:
basic:integer
- description:
Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.
- required:
False
- disabled:
False
- hidden:
False
- label:
Hard trim sequences from 5’ end
- type:
basic:integer
- description:
Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
Remaining mate 1 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Remaining mate 2 reads
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Trim galore report
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Mate 1 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Mate 2 quality control with FastQC
- type:
list:basic:file:html
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate 1 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Download mate 2 FastQC archive
- type:
list:basic:file
- required:
True
- disabled:
False
- hidden:
False
Trimmomatic (paired-end)
- data:reads:fastq:paired:trimmomatictrimmomatic-paired (data:reads:fastq:paired reads, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer palindrome_clip_threshold, basic:integer min_adapter_length, basic:boolean keep_both_reads, basic:integer window_size, basic:integer required_quality, basic:integer target_length, basic:decimal strictness, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer average_quality)[Source: v2.5.2]
Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.
Input arguments
- label:
Reads
- type:
data:reads:fastq:paired
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
- required:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
- required:
False
- disabled:
!illuminaclip.adapters
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequence’, ‘Seed mismatches’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
- required:
False
- disabled:
!illuminaclip.adapters
- label:
Palindrome clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminacliping.
- required:
False
- disabled:
!illuminaclip.adapters
- label:
Minimum adapter length
- type:
basic:integer
- description:
In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.
- disabled:
!illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold
- default:
8
- label:
Keep both reads
- type:
basic:boolean
- description:
After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read.By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming Illuminaclip. ‘Adapter sequence’, ‘Seed mismatches’, ‘Simple clip threshold’, ‘Palindrome clip threshold’ and also ‘Minimum adapter length’ are needed in order to use this parameter.
- required:
False
- disabled:
!illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold && !illuminaclip.min_adapter_length
- label:
Window size
- type:
basic:integer
- description:
Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
- required:
False
- label:
Required quality
- type:
basic:integer
- description:
Specifies the average quality required. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
- required:
False
- label:
Target length
- type:
basic:integer
- description:
This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
- required:
False
- label:
Strictness
- type:
basic:decimal
- description:
This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
- required:
False
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning. Specifies the minimum quality required to keep a base.
- required:
False
- label:
Trailing
- type:
basic:integer
- description:
Remove low quality bases from the end. Specifies the minimum quality required to keep a base.
- required:
False
- label:
Crop
- type:
basic:integer
- description:
Cut the read to a specified length by removing bases from the end.
- required:
False
- label:
Headcrop
- type:
basic:integer
- description:
Cut the specified number of bases from the start of the read.
- required:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- label:
Average quality
- type:
basic:integer
- description:
Drop the read if the average quality is below the specified level.
- required:
False
Output results
- label:
Reads file (mate 1)
- type:
list:basic:file
- label:
Reads file
- type:
basic:file
- required:
False
- label:
Reads file (mate 2)
- type:
list:basic:file
- label:
Reads file
- type:
basic:file
- required:
False
- label:
Quality control with FastQC (Upstream)
- type:
list:basic:file:html
- label:
Quality control with FastQC (Downstream)
- type:
list:basic:file:html
- label:
Download FastQC archive (Upstream)
- type:
list:basic:file
- label:
Download FastQC archive (Downstream)
- type:
list:basic:file
Trimmomatic (single-end)
- data:reads:fastq:single:trimmomatictrimmomatic-single (data:reads:fastq:single reads, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer window_size, basic:integer required_quality, basic:integer target_length, basic:decimal strictness, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer average_quality)[Source: v2.5.2]
Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.
Input arguments
- label:
Reads
- type:
data:reads:fastq:single
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform Illuminacliping.
- required:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequences’ and ‘Simple clip threshold’ parameter are needed to perform Illuminacliping.
- required:
False
- disabled:
!illuminaclip.adapters
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.
- required:
False
- disabled:
!illuminaclip.adapters
- label:
Window size
- type:
basic:integer
- description:
Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
- required:
False
- label:
Required quality
- type:
basic:integer
- description:
Specifies the average quality required in window size. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
- required:
False
- label:
Target length
- type:
basic:integer
- description:
This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
- required:
False
- label:
Strictness
- type:
basic:decimal
- description:
This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
- required:
False
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning, if below a threshold quality.
- required:
False
- label:
Trailing quality
- type:
basic:integer
- description:
Remove low quality bases from the end, if below a threshold quality.
- required:
False
- label:
Crop
- type:
basic:integer
- description:
Cut the read to a specified length by removing bases from the end.
- required:
False
- label:
Headcrop
- type:
basic:integer
- description:
Cut the specified number of bases from the start of the read.
- required:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- label:
Average quality
- type:
basic:integer
- description:
Drop the read if the average quality is below the specified level.
- required:
False
Output results
- label:
Reads file
- type:
list:basic:file
- label:
Quality control with FastQC
- type:
list:basic:file:html
- label:
Download FastQC archive
- type:
list:basic:file
UMI-tools dedup
- data:alignment:bam:umitools:dedup:umi-tools-dedup (data:alignment:bam alignment)[Source: v1.5.1]
Deduplicate reads using UMI and mapping coordinates.
Input arguments
- label:
Alignment
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
Clipped BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of clipped BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Deduplication log
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Deduplication stats
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Upload microarray expression (unmapped)
- data:microarray:normalized:upload-microarray-expression (basic:file exp, basic:string exp_type, basic:string platform, basic:string platform_id, basic:string species)[Source: v1.1.1]
Import unmapped microarray expression data.
Input arguments
- label:
Normalized expression
- type:
basic:file
- description:
Normalized expression file with the original probe IDs. Supported file extensions are .tab.*, .tsv.*, .txt.*
- required:
True
- disabled:
False
- hidden:
False
- label:
Normalization type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Microarray platform name
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
GEO platform ID
- type:
basic:string
- description:
Platform ID according to the GEO database. This can be used in following steps to automatically map probe IDs to genes.
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu or write a custom species name in the species field
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Macaca mulatta:
Macaca mulatta
Dictyostelium discoideum:
Dictyostelium discoideum
Output results
- label:
Uploaded normalized expression
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Normalization type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Microarray platform type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
GEO platform ID
- type:
basic:string
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Upload proteomics sample
- data:proteomics:massspectrometry:upload-proteomics-sample (basic:file src, basic:string species, basic:string source)[Source: v1.2.1]
Upload a mass spectrometry proteomics sample data file. The input 5-column tab-delimited file with the .txt suffix is expected to contain a header line with the following meta-data column names: “Uniprot ID”, “Gene symbol”, “Protein name” and “Number of peptides”. The fifth column contains the sample data.
Input arguments
- label:
Table containing mass spectrometry data (.txt)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu or write a custom species name in the species field.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
- label:
Protein ID database source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
UniProtKB
- choices:
UniProtKB:
UniProtKB
Output results
- label:
Uploaded table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Upload proteomics sample set
- data:proteomics:sampleset:upload-proteomics-sample-set (basic:file src, basic:string species, basic:string source)[Source: v1.2.1]
Upload a mass spectrometry proteomics sample set file. The input multi-sample tab-delimited file with the .txt suffix is expected to contain a header line with the following meta-data column names: “Uniprot ID”, “Gene symbol”, “Protein name” and “Number of peptides”. Each additional column in the input file should contain data for a single sample.
Input arguments
- label:
Table containing mass spectrometry data (.txt)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- description:
Select a species name from the dropdown menu or write a custom species name in the species field.
- required:
True
- disabled:
False
- hidden:
False
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
- label:
Protein ID database source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
UniProtKB
- choices:
UniProtKB:
UniProtKB
Output results
- label:
Uploaded table
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Source
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
VCF file
- data:variants:vcfupload-variants-vcf (basic:file src, basic:string species, basic:string build)[Source: v2.3.0]
Upload variants in VCF format.
Input arguments
- label:
Variants (VCF)
- type:
basic:file
- description:
Variants in VCF format.
- required:
True
- validate_regex:
\.(vcf)(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$
- label:
Species
- type:
basic:string
- description:
Species latin name.
- choices:
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label:
Genome build
- type:
basic:string
Output results
- label:
Uploaded file
- type:
basic:file
- label:
Tabix index
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
Variant calling (CheMut)
- data:variants:vcf:chemut:vc-chemut (data:seq:nucleotide genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean base_recalibration, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:string PL, basic:string LB, basic:string PU, basic:string CN, basic:date DT, data:bed intervals, basic:integer ploidy, basic:integer stand_call_conf, basic:integer mbq, basic:integer max_reads, basic:integer java_gc_threads, basic:integer max_heap_size)[Source: v3.0.1]
CheMut varint calling using multiple BAM input files.
Input arguments
- label:
Reference genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Parental strains
- type:
list:data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Mutant strains
- type:
list:data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
Do variant base recalibration
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
dbSNP file
- type:
data:variants:vcf
- description:
Database of known polymorphic sites.
- required:
False
- disabled:
False
- hidden:
False
- label:
Known indels
- type:
list:data:variants:vcf
- required:
False
- disabled:
False
- hidden:
!base_recalibration
- label:
Platform/technology
- type:
basic:string
- description:
Platform/technology used to produce the reads.
- required:
True
- disabled:
False
- hidden:
False
- default:
Illumina
- choices:
Capillary:
Capillary
Ls454:
Ls454
Illumina:
Illumina
SOLiD:
SOLiD
Helicos:
Helicos
IonTorrent:
IonTorrent
Pacbio:
Pacbio
- label:
Library
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
x
- label:
Platform unit
- type:
basic:string
- description:
Platform unit (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.
- required:
True
- disabled:
False
- hidden:
False
- default:
x
- label:
Sequencing center
- type:
basic:string
- description:
Name of sequencing center producing the read.
- required:
True
- disabled:
False
- hidden:
False
- default:
x
- label:
Date
- type:
basic:date
- description:
Date the run was produced.
- required:
True
- disabled:
False
- hidden:
False
- default:
2017-01-01
- label:
Intervals (from BED file)
- type:
data:bed
- description:
Use this option to perform the analysis over only part of the genome.
- required:
False
- disabled:
False
- hidden:
False
- label:
Sample ploidy
- type:
basic:integer
- description:
Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Min call confidence threshold
- type:
basic:integer
- description:
The minimum phred-scaled confidence threshold at which variants should be called.
- required:
True
- disabled:
False
- hidden:
False
- default:
30
- label:
Min Base Quality
- type:
basic:integer
- description:
Minimum base quality required to consider a base for calling.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Max reads per alignment start site
- type:
basic:integer
- description:
Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.
- required:
True
- disabled:
False
- hidden:
False
- default:
50
- label:
Java ParallelGCThreads
- type:
basic:integer
- description:
Sets the number of threads used during parallel phases of the garbage collectors.
- required:
True
- disabled:
False
- hidden:
False
- default:
2
- label:
Java maximum heap size (Xmx)
- type:
basic:integer
- description:
Set the maximum Java heap size (in GB).
- required:
True
- disabled:
False
- hidden:
False
- default:
12
Output results
- label:
Called variants file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Variant filtering (CheMut)
- data:variants:vcf:filtering:filtering-chemut (data:variants:vcf variants, basic:string analysis_type, basic:string parental_strain, basic:string mutant_strain, data:seq:nucleotide genome, basic:integer read_depth)[Source: v1.8.2]
Filtering and annotation of Variant Calling (CheMut). Filtering and annotation of Variant Calling data - Chemical mutagenesis in _Dictyostelium discoideum_.
Input arguments
- label:
Variants file (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Analysis type
- type:
basic:string
- description:
Choice of the analysis type. Use ‘SNV’ or ‘INDEL’ options. Choose options SNV_CHR2 or INDEL_CHR2 to run the GATK analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).
- required:
True
- disabled:
False
- hidden:
False
- default:
snv
- choices:
SNV:
snv
INDEL:
indel
SNV_CHR2:
snv_chr2
INDEL_CHR2:
indel_chr2
- label:
Parental strain prefix
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
parental
- label:
Mutant strain prefix
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
mut
- label:
Reference genome
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Read Depth Cutoff
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
5
Output results
- label:
Summary
- type:
basic:file
- description:
Summarize the input parameters and results.
- required:
True
- disabled:
False
- hidden:
False
- label:
Variants
- type:
basic:file
- description:
A genome VCF file of variants that passed the filters.
- required:
True
- disabled:
False
- hidden:
False
- label:
Tabix index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Variants filtered
- type:
basic:file
- description:
A data frame of variants that passed the filters.
- required:
False
- disabled:
False
- hidden:
False
- label:
Variants filtered (multiple alt. alleles)
- type:
basic:file
- description:
A data frame of variants that contain more than two alternative alleles. These variants are likely to be false positives.
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene list (all)
- type:
basic:file
- description:
Genes that are mutated at least once.
- required:
False
- disabled:
False
- hidden:
False
- label:
Gene list (top)
- type:
basic:file
- description:
Genes that are mutated at least twice.
- required:
False
- disabled:
False
- hidden:
False
- label:
Mutations (by chr)
- type:
basic:file
- description:
List mutations in individual chromosomes.
- required:
False
- disabled:
False
- hidden:
False
- label:
Mutations (by strain)
- type:
basic:file
- description:
List mutations in individual strains.
- required:
False
- disabled:
False
- hidden:
False
- label:
Strain (by gene)
- type:
basic:file
- description:
List mutants that carry mutations in individual genes.
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
WALT
- data:alignment:bam:waltwalt (data:index:walt genome, data:reads:fastq reads, basic:boolean rm_dup, basic:integer optical_distance, basic:integer mismatch, basic:integer number, basic:string spikein_name, basic:boolean filter_spikein)[Source: v3.7.2]
WALT (Wildcard ALignment Tool) is a read mapping program for bisulfite sequencing in DNA methylation studies.
Input arguments
- label:
Reference genome
- type:
data:index:walt
- label:
Reads
- type:
data:reads:fastq
- label:
Remove duplicates
- type:
basic:boolean
- default:
True
- label:
Optical duplicate distance
- type:
basic:integer
- description:
The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.
- disabled:
!rm_dup
- default:
0
- label:
Maximum allowed mismatches
- type:
basic:integer
- required:
False
- label:
Number of reads to map in one loop
- type:
basic:integer
- description:
Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
- required:
False
- label:
Chromosome name of unmethylated control sequence
- type:
basic:string
- description:
Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.
- required:
False
- label:
Remove control/spike-in sequences.
- type:
basic:boolean
- description:
Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).
- disabled:
!spikein_options.spikein_name
- default:
False
Output results
- label:
Alignment file (BAM)
- type:
basic:file
- description:
Position sorted alignment in .bam format
- label:
Index BAI
- type:
basic:file
- label:
Statistics
- type:
basic:file
- label:
Alignment file (MR)
- type:
basic:file
- description:
Position sorted alignment in .mr format.
- label:
Removed duplicates statistics
- type:
basic:file
- required:
False
- label:
Unmapped reads
- type:
basic:file
- required:
False
- label:
Alignment file of unmethylated control reads
- type:
basic:file
- required:
False
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
WALT genome index
- data:index:walt:walt-index (data:seq:nucleotide ref_seq)[Source: v1.2.1]
Create WALT genome index.
Input arguments
- label:
Reference sequence (nucleotide FASTA)
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
Output results
- label:
WALT index
- type:
basic:dir
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file (compressed)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
FASTA file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
WGBS (paired-end)
- data:workflow:wgbsworkflow-wgbs-paired (data:reads:fastq:paired reads, data:index:walt walt_index, data:seq:nucleotide ref_seq, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:boolean keep_both_reads, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:boolean rm_dup, basic:integer optical_distance, basic:integer mismatch, basic:integer number, basic:string spikein_name, basic:boolean filter_spikein, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich, basic:boolean cpgs, basic:boolean symmetric_cpgs, data:seq:nucleotide adapters, basic:integer insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:integer min_quality, basic:integer next_base_quality, basic:integer min_lenght, basic:decimal mismatch_rate, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations)[Source: v2.2.0]
This WGBS pipeline is comprised of trimming, alignment, computation of methylation levels, identification of hypo-methylated regions (HMRs) and additional QC steps. First, reads are trimmed to remove adapters or kit specific artifacts. Reads are then aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package. QC steps are based on [Picard](http://broadinstitute.github.io/picard/) and include high level metrics about the alignment, WGS performance and summary statistics from bisulfite sequencing. Final QC reports are summarized by MultiQC.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:paired
- label:
Walt index
- type:
data:index:walt
- label:
Reference sequence
- type:
data:seq:nucleotide
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform adapter trimming. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
- required:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
- required:
False
- disabled:
!adapter_trimming.adapters
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.
- required:
False
- disabled:
!adapter_trimming.adapters
- label:
Minimum adapter length
- type:
basic:integer
- description:
In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.
- disabled:
!adapter_trimming.seed_mismatches && !adapter_trimming.simple_clip_threshold && !adapter_trimming.palindrome_clip_threshold
- default:
8
- label:
Palindrome clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.
- required:
False
- disabled:
!adapter_trimming.adapters
- label:
Keep both reads
- type:
basic:boolean
- description:
After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming adapter trimming.
- required:
False
- disabled:
!adapter_trimming.seed_mismatches && !adapter_trimming.simple_clip_threshold && !adapter_trimming.palindrome_clip_threshold && !adapter_trimming.min_adapter_length
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning, if below a threshold quality.
- required:
False
- label:
Trailing quality
- type:
basic:integer
- description:
Remove low quality bases from the end, if below a threshold quality.
- required:
False
- label:
Crop
- type:
basic:integer
- description:
Cut the read to a specified length by removing bases from the end.
- required:
False
- label:
Headcrop
- type:
basic:integer
- description:
Cut the specified number of bases from the start of the read.
- required:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- label:
Remove duplicates
- type:
basic:boolean
- default:
True
- label:
Optical duplicate distance
- type:
basic:integer
- description:
The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.
- disabled:
!alignment.rm_dup
- default:
0
- label:
Maximum allowed mismatches
- type:
basic:integer
- default:
6
- label:
Number of reads to map in one loop
- type:
basic:integer
- description:
Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
- required:
False
- label:
Chromosome name of unmethylated control sequence
- type:
basic:string
- description:
Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.
- required:
False
- label:
Remove control/spike-in sequences.
- type:
basic:boolean
- description:
Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).
- disabled:
!alignment.spikein_name
- default:
False
- label:
Skip Bisulfite conversion rate step
- type:
basic:boolean
- description:
Bisulfite conversion rate step can be skipped. If separate alignment file for unmethylated control sequence is not produced during the alignment this process will fail.
- disabled:
!alignment.spikein_name
- default:
True
- label:
Unmethylated control sequence
- type:
data:seq:nucleotide
- required:
False
- disabled:
bsrate.skip
- label:
Count all cytosines including CpGs
- type:
basic:boolean
- disabled:
bsrate.skip
- default:
True
- label:
Average read length
- type:
basic:integer
- default:
150
- label:
Maximum fraction of mismatches
- type:
basic:decimal
- required:
False
- disabled:
bsrate.skip
- label:
Reads are A-rich
- type:
basic:boolean
- disabled:
bsrate.skip
- default:
False
- label:
Only CpG context sites
- type:
basic:boolean
- description:
Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
- disabled:
methcounts.symmetric_cpgs
- default:
False
- label:
Merge CpG pairs
- type:
basic:boolean
- description:
Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
- disabled:
methcounts.cpgs
- default:
True
- label:
Adapter sequences
- type:
data:seq:nucleotide
- required:
False
- label:
Maximum insert size
- type:
basic:integer
- default:
100000
- label:
Pair orientation
- type:
basic:string
- default:
null
- choices:
Unspecified:
null
FR:
FR
RF:
RF
TANDEM:
TANDEM
- label:
Average read length
- type:
basic:integer
- default:
150
- label:
Minimum mapping quality for a read to contribute coverage
- type:
basic:integer
- default:
20
- label:
Minimum base quality for a base to contribute coverage
- type:
basic:integer
- description:
N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
- default:
20
- label:
Maximum coverage cap
- type:
basic:integer
- description:
Treat positions with coverage exceeding this value as if they had coverage at this set value.
- default:
250
- label:
Ignore positions with coverage above this value
- type:
basic:integer
- description:
At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value
- default:
100000
- label:
Sample Size used for Theoretical Het Sensitivity sampling
- type:
basic:integer
- default:
10000
- label:
Threshold for base quality of a C base before it is considered
- type:
basic:integer
- default:
20
- label:
Threshold for quality of a base next to a C before the C base is considered
- type:
basic:integer
- default:
10
- label:
Minimum read length
- type:
basic:integer
- default:
5
- label:
Maximum fraction of mismatches in a read to be considered (Between 0 and 1)
- type:
basic:decimal
- default:
0.1
- label:
Minimum fraction of reads in a category to be considered
- type:
basic:decimal
- description:
When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
- default:
0.05
- label:
Include reads marked as duplicates in the insert size histogram
- type:
basic:boolean
- default:
False
- label:
Deviations limit
- type:
basic:decimal
- description:
Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
- default:
10.0
Output results
WGBS (single-end)
- data:workflow:wgbsworkflow-wgbs-single (data:reads:fastq:single reads, data:index:walt walt_index, data:seq:nucleotide ref_seq, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:boolean rm_dup, basic:integer optical_distance, basic:integer mismatch, basic:integer number, basic:string spikein_name, basic:boolean filter_spikein, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich, basic:boolean cpgs, basic:boolean symmetric_cpgs, data:seq:nucleotide adapters, basic:integer insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:integer min_quality, basic:integer next_base_quality, basic:integer min_lenght, basic:decimal mismatch_rate)[Source: v2.2.0]
This WGBS pipeline is comprised of trimming, alignment, computation of methylation levels, identification of hypo-methylated regions (HMRs) and additional QC steps. First, reads are trimmed to remove adapters or kit specific artifacts. Reads are then aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package. QC steps are based on [Picard](http://broadinstitute.github.io/picard/) and include high level metrics about the alignment, WGS performance and summary statistics from bisulfite sequencing. Final QC reports are summarized by MultiQC.
Input arguments
- label:
Select sample(s)
- type:
data:reads:fastq:single
- label:
Walt index
- type:
data:index:walt
- label:
Reference sequence
- type:
data:seq:nucleotide
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform adapter trimming.
- required:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
- required:
False
- disabled:
!adapter_trimming.adapters
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.
- required:
False
- disabled:
!adapter_trimming.adapters
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning, if below a threshold quality.
- required:
False
- label:
Trailing quality
- type:
basic:integer
- description:
Remove low quality bases from the end, if below a threshold quality.
- required:
False
- label:
Crop
- type:
basic:integer
- description:
Cut the read to a specified length by removing bases from the end.
- required:
False
- label:
Headcrop
- type:
basic:integer
- description:
Cut the specified number of bases from the start of the read.
- required:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- label:
Remove duplicates
- type:
basic:boolean
- default:
True
- label:
Optical duplicate distance
- type:
basic:integer
- description:
The maximum offset between two duplicate clusters in order to consider them optical duplicates. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. Default is 0 to not look for optical duplicates.
- disabled:
!alignment.rm_dup
- default:
0
- label:
Maximum allowed mismatches
- type:
basic:integer
- default:
6
- label:
Number of reads to map in one loop
- type:
basic:integer
- description:
Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
- required:
False
- label:
Chromosome name of unmethylated control sequence
- type:
basic:string
- description:
Specifies the name of unmethylated control sequence which is output as a separate alignment file. It is recomended to remove duplicates to reduce any bias introduced by incomplete conversion on PCR duplicate reads.
- required:
False
- label:
Remove control/spike-in sequences.
- type:
basic:boolean
- description:
Remove unmethylated control reads in the final alignment based on the provided name. It is recomended to remove any reads that are not naturally occuring in the sample (e.g. lambda virus spike-in).
- disabled:
!alignment.spikein_name
- default:
False
- label:
Skip Bisulfite conversion rate step
- type:
basic:boolean
- description:
Bisulfite conversion rate step can be skipped. If separate alignment file for unmethylated control sequence is not produced during the alignment this process will fail.
- disabled:
!alignment.spikein_name
- default:
True
- label:
Unmethylated control sequence
- type:
data:seq:nucleotide
- required:
False
- disabled:
bsrate.skip
- label:
Count all cytosines including CpGs
- type:
basic:boolean
- disabled:
bsrate.skip
- default:
True
- label:
Average read length
- type:
basic:integer
- default:
150
- label:
Maximum fraction of mismatches
- type:
basic:decimal
- required:
False
- disabled:
bsrate.skip
- label:
Reads are A-rich
- type:
basic:boolean
- disabled:
bsrate.skip
- default:
False
- label:
Only CpG context sites
- type:
basic:boolean
- description:
Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
- disabled:
methcounts.symmetric_cpgs
- default:
False
- label:
Merge CpG pairs
- type:
basic:boolean
- description:
Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
- disabled:
methcounts.cpgs
- default:
True
- label:
Adapter sequences
- type:
data:seq:nucleotide
- required:
False
- label:
Maximum insert size
- type:
basic:integer
- default:
100000
- label:
Pair orientation
- type:
basic:string
- default:
null
- choices:
Unspecified:
null
FR:
FR
RF:
RF
TANDEM:
TANDEM
- label:
Average read length
- type:
basic:integer
- default:
150
- label:
Minimum mapping quality for a read to contribute coverage
- type:
basic:integer
- default:
20
- label:
Minimum base quality for a base to contribute coverage
- type:
basic:integer
- description:
N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
- default:
20
- label:
Maximum coverage cap
- type:
basic:integer
- description:
Treat positions with coverage exceeding this value as if they had coverage at this set value.
- default:
250
- label:
Ignore positions with coverage above this value
- type:
basic:integer
- description:
At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value
- default:
100000
- label:
Sample Size used for Theoretical Het Sensitivity sampling
- type:
basic:integer
- default:
10000
- label:
Threshold for base quality of a C base before it is considered
- type:
basic:integer
- default:
20
- label:
Threshold for quality of a base next to a C before the C base is considered
- type:
basic:integer
- default:
10
- label:
Minimum read length
- type:
basic:integer
- default:
5
- label:
Maximum fraction of mismatches in a read to be considered (Between 0 and 1)
- type:
basic:decimal
- default:
0.1
Output results
WGS (paired-end) analysis
- data:workflow:wgsworkflow-wgs-paired (data:reads:fastq:paired reads, data:index:bwa bwa_index, data:seq:nucleotide ref_seq, list:data:variants:vcf known_sites, data:variants:vcf hc_dbsnp, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer mismatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:integer report_tr, basic:boolean skip, basic:boolean remove_duplicates, basic:string assume_sort_order, basic:string read_group, data:seq:nucleotide adapters, basic:integer max_insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations, basic:integer stand_call_conf, basic:integer mbq)[Source: v2.1.0]
Whole genome sequencing pipeline analyses paired-end whole genome sequencing data. It consists of trimming, aligning, marking of duplicates, Picard metrics, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Marking of duplicates (MarkDuplicates), Picard metrics (AlignmentSummaryMetrics, CollectWgsMetrics and InsertSizeMetrics), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. Result is a file of called variants (VCF).
Input arguments
- label:
Raw untrimmed reads
- type:
data:reads:fastq:paired
- description:
Raw paired-end reads.
- label:
Genome index (BWA)
- type:
data:index:bwa
- description:
BWA genome index.
- label:
Reference genome sequence
- type:
data:seq:nucleotide
- label:
Known sites of variation used in BQSR
- type:
list:data:variants:vcf
- description:
Known sites of variation as a VCF file.
- label:
dbSNP for GATK4’s HaplotypeCaller
- type:
data:variants:vcf
- description:
dbSNP database of variants for variant calling.
- label:
Validation stringency
- type:
basic:string
- description:
Validation stringency for all BAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT.
- default:
STRICT
- choices:
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform adapter trimming. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
- required:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
- required:
False
- disabled:
!advanced.trimming.adapters
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter etc. sequence must be against a read. This field is required to perform adapter trimming.
- required:
False
- disabled:
!advanced.trimming.adapters
- label:
Minimum adapter length
- type:
basic:integer
- description:
In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.
- disabled:
!advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold
- default:
8
- label:
Palindrome clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.
- required:
False
- disabled:
!advanced.trimming.adapters
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning, if below a threshold quality.
- required:
False
- label:
Trailing quality
- type:
basic:integer
- description:
Remove low quality bases from the end, if below a threshold quality.
- required:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- label:
Minimum seed length
- type:
basic:integer
- description:
Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
- default:
19
- label:
Band width
- type:
basic:integer
- description:
Gaps longer than this will not be found.
- default:
100
- label:
Re-seeding factor
- type:
basic:decimal
- description:
Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
- default:
1.5
- label:
Mark shorter split hits as secondary
- type:
basic:boolean
- description:
Mark shorter split hits as secondary (for Picard compatibility)
- default:
False
- label:
Score of a match
- type:
basic:integer
- default:
1
- label:
Mismatch penalty
- type:
basic:integer
- default:
4
- label:
Gap open penalty
- type:
basic:integer
- default:
6
- label:
Gap extension penalty
- type:
basic:integer
- default:
1
- label:
Clipping penalty
- type:
basic:integer
- description:
Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
- default:
5
- label:
Penalty for an unpaired read pair
- type:
basic:integer
- description:
Affinity to force pair. Score: scoreRead1+ scoreRead2-Penalty
- default:
9
- label:
Report threshold score
- type:
basic:integer
- description:
Don’t output alignment with score lower than defined number. This option only affects output.
- default:
30
- label:
Skip GATK’s MarkDuplicates step
- type:
basic:boolean
- default:
False
- label:
Remove found duplicates
- type:
basic:boolean
- default:
False
- label:
Assume sort oder
- type:
basic:string
- default:
- choices:
as in BAM header (default):
unsorted:
unsorted
queryname:
queryname
coordinate:
coordinate
duplicate:
duplicate
unknown:
unknown
- label:
Read group (@RG)
- type:
basic:string
- description:
This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields.
- default:
-LB=NA;-PL=NA;-PU=NA;-SM=sample
- label:
Adapter sequences
- type:
data:seq:nucleotide
- required:
False
- label:
Maximum insert size
- type:
basic:integer
- default:
100000
- label:
Pair orientation
- type:
basic:string
- default:
null
- choices:
Unspecified:
null
FR:
FR
RF:
RF
TANDEM:
TANDEM
- label:
Average read length
- type:
basic:integer
- default:
150
- label:
Minimum mapping quality for a read to contribute coverage
- type:
basic:integer
- default:
20
- label:
Minimum base quality for a base to contribute coverage
- type:
basic:integer
- description:
N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
- default:
20
- label:
Maximum coverage cap
- type:
basic:integer
- description:
Treat positions with coverage exceeding this value as if they had coverage at this set value.
- default:
250
- label:
Ignore positions with coverage above this value
- type:
basic:integer
- description:
At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value.
- default:
100000
- label:
Sample Size used for Theoretical Het Sensitivity sampling
- type:
basic:integer
- default:
10000
- label:
Minimum fraction of reads in a category to be considered
- type:
basic:decimal
- description:
When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
- default:
0.05
- label:
Include reads marked as duplicates in the insert size histogram
- type:
basic:boolean
- default:
False
- label:
Deviations limit
- type:
basic:decimal
- description:
Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS * MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
- default:
10.0
- label:
Min call confidence threshold
- type:
basic:integer
- description:
The minimum phred-scaled confidence threshold at which variants should be called.
- default:
20
- label:
Min Base Quality
- type:
basic:integer
- description:
Minimum base quality required to consider a base for calling.
- default:
20
Output results
WGS analysis (GVCF)
- data:workflow:wgs:gvcf:workflow-wgs-gvcf (data:reads:fastq:paired reads, data:alignment:bam aligned_reads, data:seq:nucleotide ref_seq, data:index:bwamem2 bwa_index, list:data:variants:vcf known_sites, basic:boolean enable_trimming, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, data:bed intervals, basic:integer contamination, data:seq:nucleotide adapters, basic:integer max_insert_size, basic:string pair_orientation, basic:integer read_length, basic:integer min_map_quality, basic:integer min_quality, basic:integer coverage_cap, basic:integer accumulation_cap, basic:integer sample_size, basic:decimal minimum_fraction, basic:boolean include_duplicates, basic:decimal deviations)[Source: v2.3.0]
Whole genome sequencing pipeline (GATK GVCF). The pipeline follows GATK best practices recommendations and prepares single-sample paired-end sequencing data for a joint-genotyping step. The pipeline steps include read trimming (Trimmomatic), read alignment (BWA-MEM2), marking of duplicates (Picard MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (GATK HaplotypeCaller in GVCF mode). The QC reports (FASTQC report, Picard AlignmentSummaryMetrics, CollectWgsMetrics and InsertSizeMetrics) are summarized using MultiQC.
Input arguments
- label:
Input sample (FASTQ)
- type:
data:reads:fastq:paired
- description:
Input data in FASTQ format. This input type allows for optional read trimming procedure and is mutually exclusive with the BAM input file type.
- required:
False
- disabled:
aligned_reads
- hidden:
False
- label:
Input sample (BAM)
- type:
data:alignment:bam
- description:
Input data in BAM format. This input file type is mutually exclusive with the FASTQ input file type and does not allow for read trimming procedure.
- required:
False
- disabled:
reads
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
BWA genome index
- type:
data:index:bwamem2
- required:
True
- disabled:
False
- hidden:
False
- label:
Known sites of variation (VCF)
- type:
list:data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
Trim and quality filter input data
- type:
basic:boolean
- description:
Enable or disable adapter trimming and QC filtering procedure.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequences in FASTA format that will be removed from the reads.
- required:
False
- disabled:
!trimming_options.enable_trimming
- hidden:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field is required to perform adapter trimming.
- required:
False
- disabled:
!trimming_options.adapters
- hidden:
False
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter sequence must be against a read. This field is required to perform adapter trimming.
- required:
False
- disabled:
!trimming_options.adapters
- hidden:
False
- label:
Minimum adapter length
- type:
basic:integer
- description:
In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.
- required:
True
- disabled:
!trimming_options.seed_mismatches && !trimming_options.simple_clip_threshold && !trimming_options.palindrome_clip_threshold
- hidden:
False
- default:
8
- label:
Palindrome clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between the two adapter ligated reads must be for PE palindrome read alignment. This field is required to perform adapter trimming.
- required:
False
- disabled:
!trimming_options.adapters
- hidden:
False
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning, if below a threshold quality.
- required:
False
- disabled:
!trimming_options.enable_trimming
- hidden:
False
- label:
Trailing quality
- type:
basic:integer
- description:
Remove low quality bases from the end, if below a threshold quality.
- required:
False
- disabled:
!trimming_options.enable_trimming
- hidden:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- disabled:
!trimming_options.enable_trimming
- hidden:
False
- label:
Intervals BED file
- type:
data:bed
- description:
Use intervals BED file to limit the analysis to the specified parts of the genome.
- required:
False
- disabled:
False
- hidden:
False
- label:
Contamination fraction
- type:
basic:integer
- description:
Fraction of contamination in sequencing data (for all samples) to aggressively remove.
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
Adapter sequences
- type:
data:seq:nucleotide
- required:
False
- disabled:
False
- hidden:
False
- label:
Maximum insert size
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
100000
- label:
Pair orientation
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
null
- choices:
Unspecified:
null
FR:
FR
RF:
RF
TANDEM:
TANDEM
- label:
Average read length
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
150
- label:
Minimum mapping quality for a read to contribute coverage
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Minimum base quality for a base to contribute coverage
- type:
basic:integer
- description:
N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.
- required:
True
- disabled:
False
- hidden:
False
- default:
20
- label:
Maximum coverage cap
- type:
basic:integer
- description:
Treat positions with coverage exceeding this value as if they had coverage at this set value.
- required:
True
- disabled:
False
- hidden:
False
- default:
250
- label:
Ignore positions with coverage above this value
- type:
basic:integer
- description:
At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value.
- required:
True
- disabled:
False
- hidden:
False
- default:
100000
- label:
Sample size used for Theoretical Het Sensitivity sampling
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
10000
- label:
Minimum fraction of reads in a category to be considered
- type:
basic:decimal
- description:
When generating the histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this fraction of overall reads (Range: 0 and 0.5).
- required:
True
- disabled:
False
- hidden:
False
- default:
0.05
- label:
Include reads marked as duplicates in the insert size histogram
- type:
basic:boolean
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Deviations limit
- type:
basic:decimal
- description:
Generate mean, standard deviation and plots by trimming the data down to MEDIAN + DEVIATIONS * MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and standard deviation grossly misleading regarding the real distribution.
- required:
True
- disabled:
False
- hidden:
False
- default:
10.0
Output results
WGS preprocess data with bwa-mem2
- data:alignment:bam:wgsbwa2:wgs-preprocess-bwa2 (data:reads:fastq:paired reads, data:alignment:bam aligned_reads, data:seq:nucleotide ref_seq, data:index:bwamem2 bwa_index, list:data:variants:vcf known_sites, basic:integer pixel_distance, basic:integer n_jobs)[Source: v1.4.0]
Prepare analysis ready BAM file. This process follows GATK best practices procedure to prepare analysis-ready BAM file. The steps included are read alignment using BWA MEM2, marking of duplicates (Picard MarkDuplicates), BAM sorting, read-group assignment and base quality score recalibration (BQSR).
Input arguments
- label:
Input sample (FASTQ)
- type:
data:reads:fastq:paired
- required:
False
- disabled:
False
- hidden:
False
- label:
Input sample (BAM)
- type:
data:alignment:bam
- required:
False
- disabled:
False
- hidden:
False
- label:
Reference sequence
- type:
data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
BWA-MEM2 genome index
- type:
data:index:bwamem2
- required:
True
- disabled:
False
- hidden:
False
- label:
Known sites of variation (VCF)
- type:
list:data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
–OPTICAL_DUPLICATE_PIXEL_DISTANCE
- type:
basic:integer
- description:
Set the optical pixel distance, e.g. distance between clusters. Modify this parameter to ensure compatibility with older Illumina platforms.
- required:
True
- disabled:
False
- hidden:
False
- default:
2500
- label:
Number of concurent jobs
- type:
basic:integer
- description:
Use a fixed number of jobs for quality score recalibration of determining it based on the number of available cores.
- required:
False
- disabled:
False
- hidden:
False
Output results
- label:
Analysis ready BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
BAM file index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Metrics from MarkDuplicate process
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
Whole exome sequencing (WES) analysis
- data:workflow:wesworkflow-wes (data:reads:fastq:paired reads, data:index:bwa bwa_index, data:seq:nucleotide ref_seq, list:data:variants:vcf known_sites, data:bed intervals, data:variants:vcf hc_dbsnp, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer seed_l, basic:integer band_w, basic:boolean m, basic:decimal re_seeding, basic:integer match, basic:integer mismatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:integer report_tr, data:bedpe bedpe, basic:boolean skip, basic:boolean md_skip, basic:boolean md_remove_duplicates, basic:string md_assume_sort_order, basic:string read_group, basic:integer stand_call_conf, basic:integer mbq)[Source: v3.1.0]
Whole exome sequencing pipeline analyzes Illumina panel data. It consists of trimming, aligning, soft clipping, (optional) marking of duplicates, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Soft clipping of Illumina primer sequences is done using bamclipper tool. Marking of duplicates (MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. To successfully run this pipeline, you will need a genome (FASTA), paired-end (FASTQ) files, BEDPE file for bamclipper, known sites of variation (dbSNP) (VCF), dbSNP database of variations (can be the same as known sites of variation), intervals on which target capture was done (BED) and illumina adapter sequences (FASTA). Make sure that specified resources match the genome used in the alignment step. Result is a file of called variants (VCF).
Input arguments
- label:
Raw untrimmed reads
- type:
data:reads:fastq:paired
- description:
Raw paired-end reads.
- label:
BWA genome index
- type:
data:index:bwa
- description:
Genome index used for the BWA alignment step.
- label:
Genome FASTA
- type:
data:seq:nucleotide
- description:
The selection of Genome FASTA should match the BWA index species and genome build type.
- label:
Known sites of variation used in BQSR
- type:
list:data:variants:vcf
- description:
Known sites of variation as a VCF file.
- label:
Intervals
- type:
data:bed
- description:
Use intervals to narrow the analysis to defined regions. This usually help cutting down on process time.
- label:
dbSNP for GATK4’s HaplotypeCaller
- type:
data:variants:vcf
- description:
dbSNP database of variants for variant calling.
- label:
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.
- type:
basic:string
- default:
STRICT
- choices:
STRICT:
STRICT
SILENT:
SILENT
LENIENT:
LENIENT
- label:
Adapter sequences
- type:
data:seq:nucleotide
- description:
Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
- required:
False
- label:
Seed mismatches
- type:
basic:integer
- description:
Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
- required:
False
- disabled:
!advanced.trimming.adapters
- label:
Simple clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.
- required:
False
- disabled:
!advanced.trimming.adapters
- label:
Minimum adapter length
- type:
basic:integer
- description:
In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.
- disabled:
!advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold
- default:
8
- label:
Palindrome clip threshold
- type:
basic:integer
- description:
Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminaclipping.
- required:
False
- disabled:
!advanced.trimming.adapters
- label:
Leading quality
- type:
basic:integer
- description:
Remove low quality bases from the beginning, if below a threshold quality.
- required:
False
- label:
Trailing quality
- type:
basic:integer
- description:
Remove low quality bases from the end, if below a threshold quality.
- required:
False
- label:
Minimum length
- type:
basic:integer
- description:
Drop the read if it is below a specified length.
- required:
False
- label:
Minimum seed length
- type:
basic:integer
- description:
Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.
- default:
19
- label:
Band width
- type:
basic:integer
- description:
Gaps longer than this will not be found.
- default:
100
- label:
Mark shorter split hits as secondary
- type:
basic:boolean
- description:
Mark shorter split hits as secondary (for Picard compatibility)
- default:
False
- label:
Re-seeding factor
- type:
basic:decimal
- description:
Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
- default:
1.5
- label:
Score of a match
- type:
basic:integer
- default:
1
- label:
Mismatch penalty
- type:
basic:integer
- default:
4
- label:
Gap open penalty
- type:
basic:integer
- default:
6
- label:
Gap extension penalty
- type:
basic:integer
- default:
1
- label:
Clipping penalty
- type:
basic:integer
- description:
Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
- default:
5
- label:
Penalty for an unpaired read pair
- type:
basic:integer
- description:
Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
- default:
9
- label:
Report threshold score
- type:
basic:integer
- description:
Don’t output alignment with score lower than defined number. This option only affects output.
- default:
30
- label:
BEDPE file used for clipping using Bamclipper
- type:
data:bedpe
- description:
BEDPE file used for clipping using Bamclipper tool.
- required:
False
- label:
Skip Bamclipper step
- type:
basic:boolean
- description:
Use this option to skip Bamclipper step.
- default:
False
- label:
Skip GATK’s MarkDuplicates step
- type:
basic:boolean
- default:
False
- label:
Remove found duplicates
- type:
basic:boolean
- default:
False
- label:
Assume sort oder
- type:
basic:string
- default:
- choices:
as in BAM header (default):
unsorted:
unsorted
queryname:
queryname
coordinate:
coordinate
duplicate:
duplicate
unknown:
unknown
- label:
Read group (@RG)
- type:
basic:string
- description:
If BAM file has not been prepared using a @RG tag, you can add it here. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation linked above.
- required:
False
- label:
Min call confidence threshold
- type:
basic:integer
- description:
The minimum phred-scaled confidence threshold at which variants should be called.
- default:
20
- label:
Min Base Quality
- type:
basic:integer
- description:
Minimum base quality required to consider a base for calling.
- default:
20
Output results
Xengsort classify
- data:xengsort:classification:xengsort-classify (data:reads:fastq reads, data:xengsort:index index, basic:string upload_reads, basic:boolean merge_both, basic:decimal chunksize)[Source: v1.0.0]
Classify xenograft reads with Xengsort. Xengsort is an alignment free method for sorting reads from xenograft experiments. It classifies sequencing reads into five categories based on their origin: host, graft, both, neither, and ambiguous. Categories “host” and “graft” are for reads that can be clearly assigned to one of the species. Category “both” is for reads that match equally well to both references. Category “neither” is for reads that contain many k-mers that cannot be found in the key-value store; these could point to technical problems (primer dimers) or contamination of the sample with other species. Finally, category “ambiguous” is for reads that provide conflicting information. Such reads should not usually be seen; they could result from PCR hybrids between host and graft during library preparation. Description of the method and evaluation on several datasets is provided in the [article](https://doi.org/10.1186/s13015-021-00181-w).
Input arguments
- label:
Reads
- type:
data:reads:fastq
- required:
True
- disabled:
False
- hidden:
False
- label:
Xengsort genome index
- type:
data:xengsort:index
- required:
True
- disabled:
False
- hidden:
False
- label:
Select reads to upload
- type:
basic:string
- description:
All read categories are returned in this process but only the ones selected are uploaded as separate FASTQ files. This should be used for categories of reads that will be used in further analyses.
- required:
True
- disabled:
False
- hidden:
False
- default:
none
- choices:
none:
none
all:
all
graft:
graft
graft, both:
graft, both
graft, host:
graft, host
graft, host, both:
graft, host, both
- label:
Upload merged graft and both reads
- type:
basic:boolean
- description:
Merge graft reads with the reads that can originate from both genomes and upload it as graft reads. In any workflow, the latter reads, classified as both may pose problems, because one may not be able to decide on the species of origin due to ultra-conserved regions between species.
- required:
True
- disabled:
False
- hidden:
upload_reads == ‘none’
- default:
False
- label:
Chunk size in MB [–chunksize]
- type:
basic:decimal
- description:
Controll the memory usage by setting chunk size per thread.
- required:
True
- disabled:
False
- hidden:
False
- default:
16.0
Output results
- label:
Xengsort classification statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Host reads (mate 1)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Host reads (mate 2)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Graft reads (mate 1)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Graft reads (mate 2)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Both reads (mate 1)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Both reads (mate 2)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Neither reads (mate 1)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Neither reads (mate 2)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Ambiguous reads (mate 1)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Ambiguous reads (mate 2)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Graft species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Graft build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Host species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Host build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
Xengsort index
- data:xengsort:index:xengsort-index (list:data:seq:nucleotide graft_refs, list:data:seq:nucleotide host_refs, basic:integer n_kmer, basic:integer kmer_size, basic:boolean aligned_cache, basic:boolean fixed_hashing, basic:integer page_size, basic:decimal fill)[Source: v1.0.1]
Build an index for sorting xenograft reads with Xengsort. Xengsort is an alignment free method for sorting reads from xenograft experiments. Description of the method and evaluation on several datasets is provided in the [article](https://doi.org/10.1186/s13015-021-00181-w).
Input arguments
- label:
Graft reference sequences (nucleotide FASTA)
- type:
list:data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Host reference sequences (nucleotide FASTA)
- type:
list:data:seq:nucleotide
- required:
True
- disabled:
False
- hidden:
False
- label:
Number of distinct k-mers [–nobjects]
- type:
basic:integer
- description:
The number of k-mers that will be stored in the hash table. This depends on the used reference genomes and must be estimated beforehand. If the number of distinct k-mers is known beforehand it should be specified. For all 25-mers in the human and mouse genome and transcriptome, this number is roughly 4,500,000,000. If this is not set, the number is estimated with ntCard tool and increased by two percent to account for errors.
- required:
False
- disabled:
False
- hidden:
False
- label:
k-mer size [–kmersize]
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
25
- label:
Use power-of-two aligned pages [–aligned]
- type:
basic:boolean
- description:
Indicates whether each bucket should consume a number of bits that is a power of 2. Using –aligned ensures that each bucket stays within the same cache line, but may waste space (padding bits), yielding faster speed but larger space requirements. By default no bits are used for padding and buckets may cross cache line boundaries [–unaligned]. This is slightly slower, but may save a little or a lot of space.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Use fixed hash function [–hashfunctions]
- type:
basic:boolean
- description:
Hash function used to store the key-value pairs is defined by –hashfunction parameter. With this option selected a fixed hash function (linear945:linear9123641:linear349341847) is used. When this is not selected a different random functions are chosen each time. It is recommended to have them chosen randomly unless you need strictly reproducible behavior.
- required:
True
- disabled:
False
- hidden:
False
- default:
True
- label:
Number of elements stored in one bucket (or page) [–pagesize]
- type:
basic:integer
- required:
True
- disabled:
False
- hidden:
False
- default:
4
- label:
Fill rate of the hash table [–fill]
- type:
basic:decimal
- description:
This determines the desired fill rate or load factor of the hash table. It should be set between 0.0 and 1.0. It is beneficial to leave part of the hash table empty for faster lookups. Together with the number of distinct k-mers [–nobjects], the number of slots in the table is calculated as ceil(nobjects/fill).
- required:
True
- disabled:
False
- hidden:
False
- default:
0.88
Output results
- label:
Xengsort index
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Xengsort statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Graft species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Graft build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Host species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Host build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
alignmentSieve
- data:alignment:bam:sieve:alignmentsieve (data:alignment:bam alignment, basic:integer min_fragment_length, basic:integer max_fragment_length)[Source: v1.5.3]
Filter alignments of BAM files according to specified parameters. Program is bundled with deeptools. See [documentation]( https://deeptools.readthedocs.io/en/develop/content/tools/alignmentSieve.html) for more details.
Input arguments
- label:
Alignment BAM file
- type:
data:alignment:bam
- required:
True
- disabled:
False
- hidden:
False
- label:
–minFragmentLength
- type:
basic:integer
- description:
The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)
- required:
True
- disabled:
False
- hidden:
False
- default:
0
- label:
–maxFragmentLength
- type:
basic:integer
- description:
The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. (Default: 0)
- required:
True
- disabled:
False
- hidden:
False
- default:
0
Output results
- label:
Sieved BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of sieved BAM file
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Alignment statistics
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
edgeR
- data:differentialexpression:edger:differentialexpression-edger (list:data:expression case, list:data:expression control, basic:integer count_filter, basic:boolean create_sets, basic:decimal logfc, basic:decimal fdr)[Source: v1.7.0]
Run EdgeR analysis. Empirical Analysis of Digital Gene Expression Data in R (edgeR). Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. See [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) for more information.
Input arguments
- label:
Case
- type:
list:data:expression
- description:
Case samples (replicates)
- required:
True
- disabled:
False
- hidden:
False
- label:
Control
- type:
list:data:expression
- description:
Control samples (replicates)
- required:
True
- disabled:
False
- hidden:
False
- label:
Raw counts filtering threshold
- type:
basic:integer
- description:
Filter genes in the expression matrix input. Remove genes where the number of counts in all samples is below the threshold.
- required:
True
- disabled:
False
- hidden:
False
- default:
10
- label:
Create gene sets
- type:
basic:boolean
- description:
After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
- label:
Log2 fold change threshold for gene sets
- type:
basic:decimal
- description:
Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
- required:
True
- disabled:
False
- hidden:
!create_sets
- default:
1.0
- label:
FDR threshold for gene sets
- type:
basic:decimal
- required:
True
- disabled:
False
- hidden:
!create_sets
- default:
0.05
Output results
- label:
Differential expression
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Results table (JSON)
- type:
basic:json
- required:
True
- disabled:
False
- hidden:
False
- label:
Results table (file)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Gene ID database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Feature type
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
methcounts
- data:wgbs:methcountsmethcounts (data:seq:nucleotide genome, data:alignment:bam:walt alignment, basic:boolean cpgs, basic:boolean symmetric_cpgs)[Source: v3.3.0]
The methcounts program takes the mapped reads and produces the methylation level at each genomic cytosine, with the option to produce only levels for CpG-context cytosines.
Input arguments
- label:
Reference genome
- type:
data:seq:nucleotide
- label:
Mapped reads
- type:
data:alignment:bam:walt
- description:
WGBS alignment file in Mapped Read (.mr) format.
- label:
Only CpG context sites
- type:
basic:boolean
- description:
Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
- disabled:
symmetric_cpgs
- default:
False
- label:
Merge CpG pairs
- type:
basic:boolean
- description:
Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
- disabled:
cpgs
- default:
True
Output results
- label:
Methylation levels
- type:
basic:file
- label:
Statistics
- type:
basic:file
- label:
Methylation levels BigWig file
- type:
basic:file
- label:
Species
- type:
basic:string
- label:
Build
- type:
basic:string
miRNA pipeline
- data:workflow:mirnaworkflow-mirna (data:reads:fastq:single reads, data:seq:nucleotide up_primers_file, data:seq:nucleotide down_primers_file, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:integer min_overlap, basic:boolean show_advanced, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer maxlen, basic:integer max_n, basic:boolean match_read_wildcards, basic:boolean no_indels, basic:decimal error_rate, data:index:bowtie2 genome, basic:boolean show_alignment_options, basic:string mode, basic:string speed, basic:integer N, basic:integer L, basic:string rep_mode, basic:integer k_reports, data:annotation annotation, basic:string id_attribute, basic:string feature_class, basic:string normalization_type, basic:boolean allow_multi_overlap, basic:boolean count_multi_mapping_reads, basic:string assay_type)[Source: v3.1.0]
Input arguments
- label:
Input miRNA reads.
- type:
data:reads:fastq:single
- label:
5 prime adapter file
- type:
data:seq:nucleotide
- required:
False
- label:
3 prime adapter file
- type:
data:seq:nucleotide
- required:
False
- label:
5 prime adapter sequence
- type:
list:basic:string
- required:
False
- label:
3 prime adapter sequence
- type:
list:basic:string
- required:
False
- label:
Minimal overlap
- type:
basic:integer
- description:
Minimum overlap for an adapter match. Default 5.
- default:
5
- label:
Show advanced preprocessing parameters
- type:
basic:boolean
- default:
False
- label:
Quality on 5 prime
- type:
basic:integer
- description:
Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. Default: 20.
- hidden:
!preprocessing.show_advanced
- default:
28
- label:
Quality on 3 prime
- type:
basic:integer
- description:
Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. Default: 20.
- hidden:
!preprocessing.show_advanced
- default:
28
- label:
Min length
- type:
basic:integer
- description:
Drop the read if it is below a specified length. Default: 15.
- hidden:
!preprocessing.show_advanced
- default:
15
- label:
Max length
- type:
basic:integer
- description:
Drop the read if it is above a specified length. Default: 35.
- hidden:
!preprocessing.show_advanced
- default:
35
- label:
Max numebr of N-s
- type:
basic:integer
- description:
Discard reads having more ‘N’ bases than specified. Default: 1.
- hidden:
!preprocessing.show_advanced
- default:
1
- label:
Match read wildcards
- type:
basic:boolean
- description:
Interpret IUPAC wildcards in reads.
- hidden:
!preprocessing.show_advanced
- default:
True
- label:
No indels
- type:
basic:boolean
- description:
Disable (disallow) insertions and deletions in adapters.
- hidden:
!preprocessing.show_advanced
- default:
True
- label:
Error rate
- type:
basic:decimal
- description:
Maximum allowed error rate (no. of errors divided by the length of the matching region). Default: 0.2.
- hidden:
!preprocessing.show_advanced
- default:
0.2
- label:
Genome reference
- type:
data:index:bowtie2
- description:
Choose the genome reference against which to align reads.
- label:
Show alignment options
- type:
basic:boolean
- default:
False
- label:
Alignment mode
- type:
basic:string
- description:
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score. Default: –local (with sensitivity set to ‘–very-sensitive’ for both options).
- hidden:
!alignment.show_alignment_options
- default:
--local
- choices:
local:
--local
end to end mode:
--end-to-end
- label:
Sensitivity
- type:
basic:string
- description:
A quick parameter presetting for aligning accurately. This option is a shortcut for parameters as follows: For both alignment modes: –very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- hidden:
!alignment.show_alignment_options
- default:
--very-sensitive
- label:
Number of mismatches allowed in seed alignment (N)
- type:
basic:integer
- description:
Sets the number of mismatches allowed in seed. Can be set to 0 or 1. Default: 0
- hidden:
!alignment.show_alignment_options
- default:
0
- label:
Length of seed substrings (L)
- type:
basic:integer
- description:
Sets the length of the seed substrings to align during multiseed alignment. The –very-sensitive preset sets -L to 20 in –end-to-end and in –local mode. For miRNA, a shorter seed length is recommended. Default: -L 8
- hidden:
!alignment.show_alignment_options
- default:
8
- label:
Report mode
- type:
basic:string
- description:
Tool default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments. Default: -k
- hidden:
!alignment.show_alignment_options
- default:
k
- choices:
Tool default mode:
def
-k mode:
k
-a mode (very slow):
a
- label:
Number of reports (for -k mode only)
- type:
basic:integer
- description:
Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. Default: 5
- hidden:
!alignment.show_alignment_options
- default:
5
- label:
Annotation (GTF/GFF3)
- type:
data:annotation
- label:
ID attribute
- type:
basic:string
- description:
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats. miRNA name refers to the miRBase GFF3 ‘Name’ filed and is the default option.
- default:
Name
- choices:
miRNA name:
Name
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
- label:
Feature class
- type:
basic:string
- description:
Feature class (3rd column in GFF file) to be used, all features of other types are ignored. Default: miRNA.
- default:
miRNA
- label:
Normalization type
- type:
basic:string
- description:
The default expression normalization type.
- default:
CPM
- label:
Count multi-overlapping reads
- type:
basic:boolean
- description:
Assign reads to all their overlapping features or meta-features.
- default:
True
- label:
Count multi-mapping reads
- type:
basic:boolean
- description:
For a multi-mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM input is used to detect multi-mapping reads.
- default:
True
- label:
Assay type
- type:
basic:string
- description:
Indicate if strand-specific read counting should be performed. In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay, the read has to be mapped to the same strand as the feature. In strand-specific reverse assay these rules are reversed.
- choices:
Strand non-specific:
non_specific
Strand-specific forward:
forward
Strand-specific reverse:
reverse
Output results
shRNA quantification
- data:workflow:trimalquantworkflow-trim-align-quant (data:reads:fastq:single reads, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:decimal error_rate_5end, basic:decimal error_rate_3end, data:index:bowtie2 genome, basic:string mode, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer readlengths, basic:integer alignscores)[Source: v1.1.0]
Input arguments
- label:
Untrimmed reads.
- type:
data:reads:fastq:single
- description:
First stage of shRNA pipeline. Trims 5’ adapters, then 3’ adapters using the same error rate setting, aligns reads to a reference library and quantifies species.
- label:
5’ adapter sequence
- type:
list:basic:string
- description:
A string of 5’ adapter sequence.
- required:
True
- label:
3’ adapter sequence
- type:
list:basic:string
- description:
A string of 3’ adapter sequence.
- required:
True
- label:
Error rate for 5’
- type:
basic:decimal
- description:
Maximum allowed error rate (no. of errors divided by the length of the matching region) for 5’ trimming.
- required:
False
- default:
0.1
- label:
Error rate for 3’
- type:
basic:decimal
- description:
Maximum allowed error rate (no. of errors divided by the length of the matching region) for 3’ trimming.
- required:
False
- default:
0.1
- label:
Reference library
- type:
data:index:bowtie2
- description:
Choose the reference library against which to align reads.
- label:
Alignment mode
- type:
basic:string
- description:
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default:
--end-to-end
- choices:
end to end mode:
--end-to-end
local:
--local
- label:
Number of mismatches allowed in seed alignment (N)
- type:
basic:integer
- description:
Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
- required:
False
- label:
Length of seed substrings (L)
- type:
basic:integer
- description:
Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
- required:
False
- label:
Disallow gaps within positions (gbar)
- type:
basic:integer
- description:
Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
- required:
False
- label:
Maximal and minimal mismatch penalty (mp)
- type:
basic:string
- description:
Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
- required:
False
- label:
Set read gap open and extend penalties (rdg)
- type:
basic:string
- description:
Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
- required:
False
- label:
Set reference gap open and close penalties (rfg)
- type:
basic:string
- description:
Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
- required:
False
- label:
Minimum alignment score needed for “valid” alignment (score-min)
- type:
basic:string
- description:
Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
- required:
False
- label:
Species lengths threshold
- type:
basic:integer
- description:
Species with read lengths below specified threshold will be removed from final output. Default is no removal.
- label:
Align scores filter threshold
- type:
basic:integer
- description:
Species with align score below specified threshold will be removed from final output. Default is no removal.
Output results
snpEff (General variant annotation) (multi-sample)
- data:variants:vcf:snpeff:snpeff (data:variants:vcf variants, basic:string database, data:variants:vcf dbsnp, basic:string filtering_options, list:data:geneset sets, list:basic:string extract_fields, basic:boolean one_per_line)[Source: v1.1.1]
Annotate variants with SnpEff. SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes). This process also allows filtering of variants with ``SnpSift filter`` command and extracting specific fields from the VCF file with ``SnpSift extractFields`` command. This tool works with multi-sample VCF file as an input.
Input arguments
- label:
Variants (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
snpEff database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
GRCh38.99
- choices:
GRCh37.75:
GRCh37.75
GRCh38.99:
GRCh38.99
- label:
Known variants
- type:
data:variants:vcf
- description:
List of known variants for annotation.
- required:
False
- disabled:
False
- hidden:
False
- label:
Filtering expressions
- type:
basic:string
- description:
Filter VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)
- required:
False
- disabled:
False
- hidden:
False
- label:
Files with list of genes
- type:
list:data:geneset
- description:
Use list of genes, if you only want variants reported for them. Each file must have one string per line.
- required:
False
- disabled:
False
- hidden:
!filtering_options
- label:
Fields to extract
- type:
list:basic:string
- description:
Write fields you want to extract from annonated vcf file and press Enter after each one. Example of fields: `CHROM POS REF ALT ‘ANN[*].GENE’`. For more information follow this [link](https://pcingola.github.io/SnpEff/ss_extractfields/).
- required:
False
- disabled:
False
- hidden:
False
- label:
One effect per line
- type:
basic:boolean
- description:
If there is more than one effect per variant, write them to seperate lines.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Annotated variants (VCF)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of annotated variants
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Extracted annotated variants (VCF)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Index of extracted variants
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
SnpEff genes
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Summary
- type:
basic:file:html
- required:
True
- disabled:
False
- hidden:
False
snpEff (General variant annotation) (single-sample)
- data:variants:vcf:snpeff:single:snpeff-single (data:variants:vcf variants, basic:string database, data:variants:vcf dbsnp, basic:string filtering_options, list:data:geneset sets, list:basic:string extract_fields, basic:boolean one_per_line)[Source: v1.0.1]
Annotate variants with SnpEff. SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes). This process also allows filtering of variants with ``SnpSift filter`` command and extracting specific fields from the VCF file with ``SnpSift extractFields`` command. This tool works with single-sample VCF file as an input.
Input arguments
- label:
Variants (VCF)
- type:
data:variants:vcf
- required:
True
- disabled:
False
- hidden:
False
- label:
snpEff database
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- default:
GRCh38.99
- choices:
GRCh37.75:
GRCh37.75
GRCh38.99:
GRCh38.99
- label:
Known variants
- type:
data:variants:vcf
- description:
List of known variants for annotation.
- required:
False
- disabled:
False
- hidden:
False
- label:
Filtering expressions
- type:
basic:string
- description:
Filter VCF file using arbitraty expressions.Examples of filtering expressions: ‘(ANN[*].GENE = ‘PSD3’)’ or ‘( REF = ‘A’ )’ or ‘(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )’.For more information checkout the official documentation of [SnpSift](https://pcingola.github.io/SnpEff/ss_filter/)
- required:
False
- disabled:
False
- hidden:
False
- label:
Files with list of genes
- type:
list:data:geneset
- description:
Use list of genes, if you only want variants reported for them. Each file must have one string per line.
- required:
False
- disabled:
False
- hidden:
!filtering_options
- label:
Fields to extract
- type:
list:basic:string
- description:
Write fields you want to extract from annonated vcf file and press Enter after each one. Example of fields: `CHROM POS REF ALT ‘ANN[*].GENE’`. For more information follow this [link](https://pcingola.github.io/SnpEff/ss_extractfields/).
- required:
False
- disabled:
False
- hidden:
False
- label:
One effect per line
- type:
basic:boolean
- description:
If there is more than one effect per variant, write them to seperate lines.
- required:
True
- disabled:
False
- hidden:
False
- default:
False
Output results
- label:
Annotated variants (VCF)
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Index of annotated variants
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Extracted annotated variants (VCF)
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Index of extracted variants
- type:
basic:file
- required:
False
- disabled:
False
- hidden:
False
- label:
Species
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
Build
- type:
basic:string
- required:
True
- disabled:
False
- hidden:
False
- label:
SnpEff genes
- type:
basic:file
- required:
True
- disabled:
False
- hidden:
False
- label:
Summary
- type:
basic:file:html
- required:
True
- disabled:
False
- hidden:
False