Process definitions¶
ATAC-Seq¶
-
data:workflow:atacseq
workflow-atac-seq
(data:reads:fastq reads, data:index:bowtie2 genome, data:bed promoter, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:boolean tagalign, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v3.0.1]
This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment, pre-peakcall QC, and calling peaks (with post-peakcall QC). First, reads are aligned to a genome using [Bowtie2](http://bowtie-bio.sourceforge.net/index.shtml) aligner. Next, pre-peakcall QC metrics are calculated. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). Finally, the peaks are called using [MACS2](https://github.com/taoliu/MACS/). The post-peakcall QC report includes additional QC metrics – number of peaks, fraction of reads in peaks (FRiP), number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq
- label
Genome
- type
data:index:bowtie2
- label
Promoter regions BED file
- type
data:bed
- description
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required
False
- label
Alignment mode
- type
basic:string
- description
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default
--local
- choices
end to end mode:
--end-to-end
local:
--local
- label
Speed vs. Sensitivity
- type
basic:string
- default
--sensitive
- choices
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label
Map as single-ended (for paired-end reads only)
- type
basic:boolean
- description
If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
- default
False
- label
Report discordantly matched read
- type
basic:boolean
- description
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default
True
- label
Report single ended
- type
basic:boolean
- description
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
- default
True
- label
Minimal distance
- type
basic:integer
- description
The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
- default
0
- label
Maximal distance
- type
basic:integer
- description
The maximum fragment length for valid paired-end alignments.
- default
2000
- label
Bases to trim from 5’
- type
basic:integer
- description
Number of bases to trim from from 5’ (left) end of each read before alignment.
- default
0
- label
Bases to trim from 3’
- type
basic:integer
- description
Number of bases to trim from from 3’ (right) end of each read before alignment
- default
0
- label
Iterations
- type
basic:integer
- description
Number of iterations.
- default
0
- label
Bases to trim
- type
basic:integer
- description
Number of bases to trim from 3’ end in each iteration.
- default
2
- label
Report mode
- type
basic:string
- description
Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments
- default
def
- choices
Default mode:
def
-k mode:
k
-a mode (very slow):
a
- label
Number of reports (for -k mode only)
- type
basic:integer
- description
Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first.
- default
5
- label
Quality filtering threshold
- type
basic:integer
- default
30
- label
Number of reads to subsample
- type
basic:integer
- default
25000000
- label
Tn5 shifting
- type
basic:boolean
- description
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default
True
- label
User-defined cross-correlation peak strandshift
- type
basic:integer
- description
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- default
0
- label
Use tagAlign files
- type
basic:boolean
- description
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- default
True
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required
False
- hidden
settings.tagalign
- choices
1:
1
auto:
auto
all:
all
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required
False
- hidden
!settings.tagalign
- default
all
- choices
1:
1
auto:
auto
all:
all
- label
Q-value cutoff
- type
basic:decimal
- description
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required
False
- disabled
settings.pvalue && settings.pvalue_prepeak
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required
False
- disabled
settings.qvalue
- hidden
settings.tagalign
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled
settings.qvalue
- hidden
!settings.tagalign || settings.qvalue
- default
0.01
- label
Cap number of peaks by taking top N peaks
- type
basic:integer
- description
To keep all peaks set value to 0.
- disabled
settings.broad
- default
300000
- label
MFOLD range (lower limit)
- type
basic:integer
- description
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required
False
- label
MFOLD range (upper limit)
- type
basic:integer
- description
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required
False
- label
Small local region
- type
basic:integer
- description
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required
False
- label
Large local region
- type
basic:integer
- description
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required
False
- label
extsize
- type
basic:integer
- description
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- default
150
- label
Shift
- type
basic:integer
- description
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- default
-75
- label
Band width
- type
basic:integer
- description
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required
False
- label
Use backgroud lambda as local lambda
- type
basic:boolean
- description
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default
False
- label
Turn on the auto paired-peak model process
- type
basic:boolean
- description
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default
False
- label
Bypass building the shifting model
- type
basic:boolean
- description
While on, MACS will bypass building the shifting model.
- hidden
settings.tagalign
- default
False
- label
Bypass building the shifting model
- type
basic:boolean
- description
While on, MACS will bypass building the shifting model.
- hidden
!settings.tagalign
- default
True
- label
Down-sample
- type
basic:boolean
- description
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default
False
- label
Save fragment pileup and control lambda
- type
basic:boolean
- description
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default
True
- label
Save signal per million reads for fragment pileup profiles
- type
basic:boolean
- disabled
settings.bedgraph === false
- default
True
- label
Call summits
- type
basic:boolean
- description
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default
True
- label
Composite broad regions
- type
basic:boolean
- description
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled
settings.call_summits === true
- default
False
- label
Broad cutoff
- type
basic:decimal
- description
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required
False
- disabled
settings.call_summits === true || settings.broad !== true
Output results
Abstract alignment process¶
-
data:alignment
abstract-alignment
()[Source: v1.0.0]
Input arguments
Output results
- label
Alignment file
- type
basic:file
- label
Alignment index BAI
- type
basic:file
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Abstract annotation process¶
-
data:annotation
abstract-annotation
()[Source: v1.0.0]
Input arguments
Output results
- label
Uploaded file
- type
basic:file
- label
Gene ID source
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Abstract bed process¶
-
data:bed
abstract-bed
()[Source: v1.0.1]
Input arguments
Output results
- label
BED
- type
basic:file
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Abstract differential expression process¶
-
data:differentialexpression
abstract-differentialexpression
()[Source: v1.0.0]
Input arguments
Output results
- label
Differential expression (gene level)
- type
basic:file
- label
Results table (JSON)
- type
basic:json
- label
Results table (file)
- type
basic:file
- label
Gene ID source
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
- label
Feature type
- type
basic:string
Abstract expression process¶
-
data:expression
abstract-expression
()[Source: v1.0.0]
Input arguments
Output results
- label
Normalized expression
- type
basic:file
- label
Read counts
- type
basic:file
- required
False
- label
Expression (json)
- type
basic:json
- label
Expression type
- type
basic:string
- label
Gene ID source
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
- label
Feature type
- type
basic:string
Accel Amplicon Pipeline¶
-
data:workflow:amplicon
workflow-accel
(data:reads:fastq:paired reads, data:seq:nucleotide genome, data:index:bwa bwa_index, data:masterfile:amplicon master_file, data:seq:nucleotide adapters, list:data:variants:vcf known_indels, list:data:variants:vcf known_vars, data:variants:vcf dbsnp, basic:integer mbq, basic:integer stand_call_conf, basic:integer min_bq, basic:integer min_alt_bq, list:data:variants:vcf known_vars_db, basic:decimal af_threshold)[Source: v5.0.1]
Processing pipeline to analyse the Accel-Amplicon NGS panel data. The raw amplicon sequencing reads are quality trimmed using Trimmomatic. The quality of the raw and trimmed data is assesed using the FASTQC tool. Quality trimmed reads are aligned to a reference genome using BWA mem. Sequencing primers are removed from the aligned reads using Primerclip. Amplicon performance stats are calculated using Bedtools coveragebed and Picard CollectTargetedPcrMetrics programs. Prior to variant calling, the alignment file is preprocessed using the GATK IndelRealigner and BaseRecalibrator tools. GATK HaplotypeCaller and Lofreq tools are used to call germline variants. Called variants are annotated using the SnpEff tool. Finally, the amplicon performance metrics and identified variants data are used to generate the PDF analysis report.
Input arguments
- label
Input reads
- type
data:reads:fastq:paired
- label
Genome sequence (FASTA)
- type
data:seq:nucleotide
- label
Genome index (BWA)
- type
data:index:bwa
- label
Experiment Master file
- type
data:masterfile:amplicon
- label
Adapters
- type
data:seq:nucleotide
- description
Provide an Illumina sequencing adapters file (.fasta) with adapters to be removed by Trimmomatic.
- label
Known indels
- type
list:data:variants:vcf
- label
Known variants
- type
list:data:variants:vcf
- label
dbSNP
- type
data:variants:vcf
- label
Min Base Quality
- type
basic:integer
- description
Minimum base quality required to consider a base for calling.
- default
20
- label
Min call confidence threshold
- type
basic:integer
- description
The minimum phred-scaled confidence threshold at which variants should be called.
- default
20
- label
Min baseQ
- type
basic:integer
- description
Skip any base with baseQ smaller than the default value.
- default
20
- label
Min alternate baseQ
- type
basic:integer
- description
Skip alternate bases with baseQ smaller than the default value.
- default
20
- label
Known variants
- type
list:data:variants:vcf
- label
Allele frequency threshold
- type
basic:decimal
- default
0.01
Output results
Align (BWA) and trim adapters¶
-
data:alignment:bam:bwatrim
align-bwa-trim
(data:masterfile:amplicon master_file, data:index:bwa genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v2.1.1]
Align with BWA mem and trim the sam output. The process uses the memory-optimized Primertrim tool.
Input arguments
- label
Master file
- type
data:masterfile:amplicon
- description
Amplicon experiment design file that holds the information about the primers to be removed.
- label
Reference genome
- type
data:index:bwa
- label
Reads
- type
data:reads:fastq
- label
Minimum seed length
- type
basic:integer
- description
Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.
- default
19
- label
Band width
- type
basic:integer
- description
Gaps longer than this will not be found.
- default
100
- label
Re-seeding factor
- type
basic:decimal
- description
Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
- default
1.5
- label
Mark shorter split hits as secondary
- type
basic:boolean
- description
Mark shorter split hits as secondary (for Picard compatibility)
- default
False
- label
Score of a match
- type
basic:integer
- default
1
- label
Mismatch penalty
- type
basic:integer
- default
4
- label
Gap open penalty
- type
basic:integer
- default
6
- label
Gap extension penalty
- type
basic:integer
- default
1
- label
Clipping penalty
- type
basic:integer
- description
Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
- default
5
- label
Penalty for an unpaired read pair
- type
basic:integer
- description
Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
- default
9
- label
Report all found alignments
- type
basic:boolean
- description
Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
- default
False
- label
Report threshold score
- type
basic:integer
- description
Don’t output alignment with score lower than defined number. This option only affects output.
- default
30
Output results
- label
Alignment file
- type
basic:file
- description
Position sorted alignment
- label
Index BAI
- type
basic:file
- label
Statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Alleyoop UTR Rates¶
-
data:alleyoop:utrrates:
alleyoop-utr-rates
(data:seq:nucleotide ref_seq, data:bed regions, data:alignment:bam:slamdunk slamdunk, basic:integer read_length)[Source: v1.2.1]
Run Alleyoop utrrates.
Input arguments
- label
FASTA file containig sequences for aligning
- type
data:seq:nucleotide
- required
True
- hidden
False
- label
BED file with coordinates of regions of interest
- type
data:bed
- required
True
- hidden
False
- label
Slamdunk results
- type
data:alignment:bam:slamdunk
- required
True
- hidden
False
- label
Maximum read length
- type
basic:integer
- description
Maximum length of reads in the input FASTQ file
- required
True
- hidden
False
- default
150
Output results
- label
Tab-separated file containing conversion rates on each region of interest
- type
basic:file
- required
True
- hidden
False
- label
Region of interest conversion rate plot
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Alleyoop collapse¶
-
data:alleyoop:collapse:
alleyoop-collapse
(data:alignment:bam:slamdunk slamdunk, basic:string source)[Source: v1.2.1]
Run Alleyoop collapse tool on Slamdunk results.
Input arguments
- label
Slamdunk results
- type
data:alignment:bam:slamdunk
- required
True
- hidden
False
- label
Gene ID source
- type
basic:string
- required
True
- hidden
False
- default
ENSEMBL
- choices
ENSEMBL:
ENSEMBL
UCSC:
UCSC
Output results
- label
Count report containing SLAMSeq statistics
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Alleyoop rates¶
-
data:alleyoop:rates:
alleyoop-rates
(data:seq:nucleotide ref_seq, data:alignment:bam:slamdunk slamdunk)[Source: v1.1.1]
Run Alleyoop rates.
Input arguments
- label
FASTA file containig sequences for aligning
- type
data:seq:nucleotide
- required
True
- hidden
False
- label
Slamdunk results
- type
data:alignment:bam:slamdunk
- required
True
- hidden
False
Output results
- label
Tab-separated file containing the overall conversion rates
- type
basic:file
- required
True
- hidden
False
- label
Overall conversion rate plot file
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Alleyoop snpeval¶
-
data:alleyoop:snpeval:
alleyoop-snpeval
(data:seq:nucleotide ref_seq, data:bed regions, data:alignment:bam:slamdunk slamdunk, basic:integer read_length)[Source: v1.2.1]
Run Alleyoop snpeval.
Input arguments
- label
FASTA file containig sequences for aligning
- type
data:seq:nucleotide
- required
True
- hidden
False
- label
BED file with coordinates of regions of interest
- type
data:bed
- required
True
- hidden
False
- label
Slamdunk results
- type
data:alignment:bam:slamdunk
- required
True
- hidden
False
- label
Maximum read length
- type
basic:integer
- description
Maximum length of reads in the input FASTQ file
- required
True
- hidden
False
- default
150
Output results
- label
Tab-separated file with read counts, T>C read counts and SNP indication
- type
basic:file
- required
True
- hidden
False
- label
SNP evaluation plot
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Alleyoop summary¶
-
data:alleyoop:summary:
alleyoop-summary
(list:data:alignment:bam:slamdunk slamdunk)[Source: v1.1.1]
Run Alleyoop summary.
Input arguments
- label
Slamdunk results
- type
list:data:alignment:bam:slamdunk
- required
True
- hidden
False
Output results
- label
Tab-separated file with mapping statistics
- type
basic:file
- required
True
- hidden
False
- label
PCA values of the samples based on T>C read counts in regions of interest.
- type
basic:file
- required
False
- hidden
False
- label
PCA plot of the samples based on T>C read counts in regions of interest.
- type
basic:file
- required
False
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Amplicon report¶
-
data:report:amplicon
amplicon-report
(data:picard:coverage pcr_metrics, data:coverage coverage, data:masterfile:amplicon master_file, list:data:snpeff annot_vars, basic:decimal af_threshold)[Source: v1.1.1]
Create amplicon report.
Input arguments
- label
Picard TargetedPcrMetrics
- type
data:picard:coverage
- label
Coverage
- type
data:coverage
- label
Amplicon master file
- type
data:masterfile:amplicon
- label
Annotated variants (snpEff)
- type
list:data:snpeff
- label
Allele frequency threshold
- type
basic:decimal
- default
0.01
Output results
- label
Report
- type
basic:file
- label
Panel name
- type
basic:string
- label
File with sample statistics
- type
basic:file
- label
Amplicon coverage file (nomergebed)
- type
basic:file
- label
Variant tabels (snpEff)
- type
list:basic:file
Amplicon table¶
-
data:varianttable:amplicon
amplicon-table
(data:masterfile:amplicon master_file, data:coverage coverage, list:data:snpeff annot_vars, basic:boolean all_amplicons, basic:string table_name)[Source: v1.2.1]
Create variant table for use together with the genome browser.
Input arguments
- label
Master file
- type
data:masterfile:amplicon
- label
Amplicon coverage
- type
data:coverage
- label
Annotated variants
- type
list:data:snpeff
- label
Report all amplicons
- type
basic:boolean
- default
False
- label
Amplicon table name
- type
basic:string
- default
Amplicons containing variants
Output results
- label
Variant table
- type
basic:json
Annotate novel splice junctions (regtools)¶
-
data:junctions:regtools
regtools-junctions-annotate
(data:seq:nucleotide genome, data:annotation:gtf annotation, data:alignment:bam:star alignment_star, data:alignment:bam alignment, data:bed input_bed_junctions)[Source: v1.1.1]
Identify novel splice junctions by using regtools to annotate against a reference. The process accepts reference genome, reference genome annotation (GTF), and input with reads information (STAR aligment or reads aligned by any other aligner or junctions in BED12 format). If STAR aligner data is given as input, the process calculates BED12 file from STAR ‘SJ.out.tab’ file, and annotates all junctions with ‘regtools junctions annotate’ command. When reads are aligned by other aligner, junctions are extracted with ‘regtools junctions extract’ tool and then annotated with ‘junction annotate’ command. Third option allows user to provide directly BED12 file with junctions, which are then annotated. Finnally, annotated novel junctions are filtered in a separate output file. More information can be found in the [regtools manual](https://regtools.readthedocs.io/en/latest/).
Input arguments
- label
Reference genome
- type
data:seq:nucleotide
- label
Reference genome annotation (GTF)
- type
data:annotation:gtf
- label
STAR alignment
- type
data:alignment:bam:star
- description
Splice junctions detected by STAR aligner (SJ.out.tab STAR output file). Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
- required
False
- label
Alignment
- type
data:alignment:bam
- description
Aligned reads from which splice junctions are going to be extracted. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
- required
False
- label
Junctions in BED12 format
- type
data:bed
- description
Splice junctions in BED12 format. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
- required
False
Output results
- label
Table of annotated novel splice junctions
- type
basic:file
- label
Table of annotated splice junctions
- type
basic:file
- label
Novel splice junctions in BED format
- type
basic:file
- label
Splice junctions in BED format
- type
basic:file
- label
Novel splice junctions in BigBed format
- type
basic:file
- required
False
- label
Splice junctions in BigBed format
- type
basic:file
- required
False
- label
Novel splice junctions bed tbi index for JBrowse
- type
basic:file
- label
Bed tbi index for JBrowse
- type
basic:file
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Archive and make multi-sample report for amplicon data¶
-
data:archive:samples:amplicon
amplicon-archive-multi-report
(list:data data, list:basic:string fields, basic:boolean j)[Source: v0.3.1]
Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names. Additionally, create multi-sample report for selected samples.
Input arguments
- label
Data list
- type
list:data
- label
Output file fields
- type
list:basic:string
- label
Junk paths
- type
basic:boolean
- description
Store just names of saved files (junk the path)
- default
False
Output results
- label
Archive of selected samples and a heatmap comparing them
- type
basic:file
Archive samples¶
-
data:archive:samples
archive-samples
(list:data data, list:basic:string fields, basic:boolean j)[Source: v0.4.1]
Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names.
Input arguments
- label
Data list
- type
list:data
- label
Output file fields
- type
list:basic:string
- label
Junk paths
- type
basic:boolean
- description
Store just names of saved files (junk the path)
- default
False
Output results
- label
Archive
- type
basic:file
BAM file¶
-
data:alignment:bam:upload
upload-bam
(basic:file src, basic:string species, basic:string build)[Source: v1.6.1]
Import a BAM file (.bam), which is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).
Input arguments
- label
Mapping (BAM)
- type
basic:file
- description
A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
- validate_regex
\.(bam)$
- label
Species
- type
basic:string
- description
Species latin name.
- choices
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label
Build
- type
basic:string
Output results
- label
Uploaded file
- type
basic:file
- label
Index BAI
- type
basic:file
- label
Alignment statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
BAM file and index¶
-
data:alignment:bam:upload
upload-bam-indexed
(basic:file src, basic:file src2, basic:string species, basic:string build)[Source: v1.6.1]
Import a BAM file (.bam) and BAM index (.bam.bai). BAM file is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).
Input arguments
- label
Mapping (BAM)
- type
basic:file
- description
A mapping file in BAM format.
- validate_regex
\.(bam)$
- label
bam index (*.bam.bai file)
- type
basic:file
- description
An index file of a BAM mapping file (ending with bam.bai).
- validate_regex
\.(bam.bai)$
- label
Species
- type
basic:string
- description
Species latin name.
- choices
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label
Build
- type
basic:string
Output results
- label
Uploaded file
- type
basic:file
- label
Index BAI
- type
basic:file
- label
Alignment statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
BBDuk (paired-end)¶
-
data:reads:fastq:paired:bbduk
bbduk-paired
(data:reads:fastq:paired reads, basic:integer min_length, basic:boolean show_advanced, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean remove_if_either_bad, basic:boolean find_best_match, basic:boolean perform_error_correction, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:boolean trim_by_overlap, basic:boolean strict_overlap, basic:integer min_overlap, basic:integer min_insert, basic:boolean trim_pairs_evenly, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v2.4.1]
BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.
Input arguments
- label
Reads
- type
data:reads:fastq:paired
- label
Minimum length [minlength=10]
- type
basic:integer
- description
Reads shorter than the minimum length will be discarded after trimming.
- default
10
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Sequences [ref]
- type
list:data:seq:nucleotide
- description
Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
- required
False
- label
Literal sequences [literal]
- type
list:basic:string
- description
Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- default
[]
- label
Kmer length [k=27]
- type
basic:integer
- description
Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
- default
27
- label
Look for reverse complements of kmers in addition to forward kmers [rcomp=t]
- type
basic:boolean
- default
True
- label
Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors [maskmiddle=t]
- type
basic:boolean
- default
True
- label
Minimum number of kmer hits [minkmerhits=1]
- type
basic:integer
- description
Reads need at least this many matching kmers to be considered as matching the reference.
- default
1
- label
Minimum kmer fraction [minkmerfraction=0.0]
- type
basic:decimal
- description
A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
- default
0.0
- label
Minimum coverage fraction [mincovfraction=0.0]
- type
basic:decimal
- description
A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
- default
0.0
- label
Maximum Hamming distance for kmers (substitutions only) [hammingdistance=0]
- type
basic:integer
- default
0
- label
Hamming distance for query kmers [qhdist=0]
- type
basic:integer
- default
0
- label
Maximum edit distance from reference kmers (substitutions and indels) [editdistance=0]
- type
basic:integer
- default
0
- label
Hamming distance for short kmers when looking for shorter kmers [hammingdistance2=0]
- type
basic:integer
- default
0
- label
Hamming distance for short query kmers when looking for shorter kmers [qhdist2=0]
- type
basic:integer
- default
0
- label
Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers [editdistance2=0]
- type
basic:integer
- default
0
- label
Forbid matching of read kmers containing N [forbidn=f]
- type
basic:boolean
- description
By default, these will match a reference ‘A’ if ‘Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
- default
False
- label
Remove both sequences of a paired-end read, if either of them is to be removed [removeifeitherbad=t]
- type
basic:boolean
- default
True
- label
If multiple matches, associate read with sequence sharing most kmers [findbestmatch=t]
- type
basic:boolean
- default
True
- label
Perform error correction with BBMerge prior to kmer operations [ecco=f]
- type
basic:boolean
- default
False
- label
Trimming protocol to remove bases matching reference kmers from reads [ktrim=f]
- type
basic:string
- default
f
- choices
Don’t trim:
f
Trim to the right:
r
Trim to the left:
l
- label
Symbol to replace bases matching reference kmers [kmask=f]
- type
basic:string
- description
Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
- default
f
- label
Only mask bases that are fully covered by kmers [maskfullycovered=f]
- type
basic:boolean
- default
False
- label
Look for shorter kmers at read tips down to this length when k-trimming or masking [mink=0]
- type
basic:integer
- description
-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
- default
-1
- label
Trimming protocol to remove bases with quality below the minimum average region quality from read ends [qtrim=f]
- type
basic:string
- description
Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.
- default
f
- choices
Trim neither end:
f
Trim both ends:
rl
Trim only right end:
r
Trim only left end:
l
Use sliding window:
w
- label
Average quality below which to trim region [trimq=6]
- type
basic:integer
- description
Set trimming protocol to enable this parameter.
- disabled
operations.quality_trim == ‘f’
- default
6
- label
Minimum length of poly-A or poly-T tails to trim on either end of reads [trimpolya=0]
- type
basic:integer
- default
0
- label
Minimum length fraction [mlf=0.0]
- type
basic:decimal
- description
Reads shorter than this fraction of original length after trimming will be discarded.
- default
0.0
- label
Maximum length [maxlength]
- type
basic:integer
- description
Reads longer than this after trimming will be discarded.
- required
False
- label
Minimum average quality [minavgquality=0]
- type
basic:integer
- description
Reads with average quality (after trimming) below this will be discarded.
- default
0
- label
Number of initial bases to calculate minimum average quality from [maqb=0]
- type
basic:integer
- description
Used only if positive.
- default
0
- label
Minimum base quality below which reads are discarded after trimming [minbasequality=0]
- type
basic:integer
- default
0
- label
Minimum number of consecutive called bases [mcb=0]
- type
basic:integer
- default
0
- label
Number of bases to trim around matching kmers [tp=0]
- type
basic:integer
- default
0
- label
Trim adapters based on where paired-end reads overlap [tbo=f]
- type
basic:boolean
- default
False
- label
Adjust sensitivity in ‘Trim adapters based on where paired-end reads overlap’ mode [strictoverlap=t]
- type
basic:boolean
- default
True
- label
Minimum number of overlapping bases [minoverlap=14]
- type
basic:integer
- description
Require this many bases of overlap for detection.
- default
14
- label
Minimum insert size [mininsert=40]
- type
basic:integer
- description
Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
- default
40
- label
Trim both sequences of paired-end reads to the minimum length of either sequence [tpe=f]
- type
basic:boolean
- default
False
- label
Position from which to trim bases to the left [forcetrimleft=0]
- type
basic:integer
- default
0
- label
Position from which to trim bases to the right [forcetrimright=0]
- type
basic:integer
- default
0
- label
Number of bases to trim from the right end [forcetrimright2=0]
- type
basic:integer
- default
0
- label
Modulo to right-trim reads [forcetrimmod=0]
- type
basic:integer
- description
Trim reads to the largest multiple of modulo.
- default
0
- label
Number of leftmost bases to look in for kmer matches [restrictleft=0]
- type
basic:integer
- default
0
- label
Number of rightmosot bases to look in for kmer matches [restrictright=0]
- type
basic:integer
- default
0
- label
Minimum GC content [mingc=0.0]
- type
basic:decimal
- description
Discard reads with lower GC content.
- default
0.0
- label
Maximum GC content [maxgc=1.0]
- type
basic:decimal
- description
Discard reads with higher GC content.
- default
1.0
- label
Max Ns after trimming [maxns=-1]
- type
basic:integer
- description
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- default
-1
- label
Discard reads with invalid characters as bases [tossjunk=f]
- type
basic:boolean
- default
False
- label
Discard reads that fail Illumina chastity filtering [chastityfilter=f]
- type
basic:boolean
- description
Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
- default
False
- label
Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise [barcodefilter=f]
- type
basic:boolean
- description
A barcode must be the last part of the read header.
- default
False
- label
Barcode sequences [barcodes]
- type
list:data:seq:nucleotide
- required
False
- label
Literal barcode sequences [barcodes]
- type
list:basic:string
- description
Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- default
[]
- label
Minimum X coordinate [xmin=-1]
- type
basic:integer
- description
If positive, discard reads with a smaller X coordinate.
- default
-1
- label
Minimum Y coordinate [ymin=-1]
- type
basic:integer
- description
If positive, discard reads with a smaller Y coordinate.
- default
-1
- label
Maximum X coordinate [xmax=-1]
- type
basic:integer
- description
If positive, discard reads with a larger X coordinate.
- default
-1
- label
Maximum Y coordinate [ymax=-1]
- type
basic:integer
- description
If positive, discard reads with a larger Y coordinate.
- default
-1
- label
Minimum entropy [entropy=-1.0]
- type
basic:decimal
- description
Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
- default
-1.0
- label
Length of sliding window used to calculate entropy [entropywindow=50]
- type
basic:integer
- description
To use the sliding window set minimum entropy in range between 0.0 and 1.0.
- default
50
- label
Length of kmers used to calcuate entropy [entropyk=5]
- type
basic:integer
- default
5
- label
Mask low-entropy parts of sequences with N instead of discarding [entropymask=f]
- type
basic:boolean
- default
False
- label
Minimum base frequency [minbasefrequency=0]
- type
basic:integer
- default
0
- label
Disable grouping of bases for reads >50bp [nogroup]
- type
basic:boolean
- description
All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
- default
False
Output results
- label
Remaining upstream reads
- type
list:basic:file
- label
Remaining downstream reads
- type
list:basic:file
- label
Statistics
- type
list:basic:file
- label
Upstream quality control with FastQC
- type
list:basic:file:html
- label
Downstream quality control with FastQC
- type
list:basic:file:html
- label
Download upstream FastQC archive
- type
list:basic:file
- label
Download downstream FastQC archive
- type
list:basic:file
BBDuk (single-end)¶
-
data:reads:fastq:single:bbduk
bbduk-single
(data:reads:fastq:single reads, basic:integer min_length, basic:boolean show_advanced, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean find_best_match, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:integer min_overlap, basic:integer min_insert, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v2.4.1]
BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.
Input arguments
- label
Reads
- type
data:reads:fastq:single
- label
Minimum length [minlength=10]
- type
basic:integer
- description
Reads shorter than the minimum length will be discarded after trimming.
- default
10
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Sequences [ref]
- type
list:data:seq:nucleotide
- description
Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
- required
False
- label
Literal sequences [literal]
- type
list:basic:string
- description
Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- default
[]
- label
Kmer length [k=27]
- type
basic:integer
- description
Kmer length used for finding contaminants. Contaminants shorter than Kmer length will not be found. Kmer length must be at least 1.
- default
27
- label
Look for reverse complements of kmers in addition to forward kmers [rcomp=t]
- type
basic:boolean
- default
True
- label
Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors [maskmiddle=t]
- type
basic:boolean
- default
True
- label
Minimum number of kmer hits [minkmerhits=1]
- type
basic:integer
- description
Reads need at least this many matching kmers to be considered matching the reference.
- default
1
- label
Minimum kmer fraction [minkmerfraction=0.0]
- type
basic:decimal
- description
A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
- default
0.0
- label
Minimum coverage fraction [mincovfraction=0.0]
- type
basic:decimal
- description
A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
- default
0.0
- label
Maximum Hamming distance for kmers (substitutions only) [hammingdistance=0]
- type
basic:integer
- default
0
- label
Hamming distance for query kmers [qhdist=0]
- type
basic:integer
- default
0
- label
Maximum edit distance from reference kmers (substitutions and indels) [editdistance=0]
- type
basic:integer
- default
0
- label
Hamming distance for short kmers when looking for shorter kmers [hammingdistance2=0]
- type
basic:integer
- default
0
- label
Hamming distance for short query kmers when looking for shorter kmers [qhdist2=0]
- type
basic:integer
- default
0
- label
Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers [editdistance2=0]
- type
basic:integer
- default
0
- label
Forbid matching of read kmers containing N [forbidn=f]
- type
basic:boolean
- description
By default, these will match a reference ‘A’ if ‘Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
- default
False
- label
If multiple matches, associate read with sequence sharing most kmers [findbestmatch=f]
- type
basic:boolean
- default
True
- label
Trimming protocol to remove bases matching reference kmers from reads [ktrim=f]
- type
basic:string
- default
f
- choices
Don’t trim:
f
Trim to the right:
r
Trim to the left:
l
- label
Symbol to replace bases matching reference kmers [kmask=f]
- type
basic:string
- description
Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
- default
f
- label
Only mask bases that are fully covered by kmers [maskfullycovered=f]
- type
basic:boolean
- default
False
- label
Look for shorter kmers at read tips down to this length when k-trimming or masking [mink=0]
- type
basic:integer
- description
-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
- default
-1
- label
Trimming protocol to remove bases with quality below the minimum average region quality from read ends [qtrim=f]
- type
basic:string
- description
Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.
- default
f
- choices
Trim neither end:
f
Trim both ends:
rl
Trim only right end:
r
Trim only left end:
l
Use sliding window:
w
- label
Average quality below which to trim region [trimq=6]
- type
basic:integer
- description
Set trimming protocol to enable this parameter.
- disabled
operations.quality_trim == ‘f’
- default
6
- label
Minimum length of poly-A or poly-T tails to trim on either end of reads [trimpolya=0]
- type
basic:integer
- default
0
- label
Minimum length fraction [mlf=0]
- type
basic:decimal
- description
Reads shorter than this fraction of original length after trimming will be discarded.
- default
0.0
- label
Maximum length [maxlength]
- type
basic:integer
- description
Reads longer than this after trimming will be discarded.
- required
False
- label
Minimum average quality [minavgquality=0]
- type
basic:integer
- description
Reads with average quality (after trimming) below this will be discarded.
- default
0
- label
Number of initial bases to calculate minimum average quality from [maqb=0]
- type
basic:integer
- description
Used only if positive.
- default
0
- label
Minimum base quality below which reads are discarded after trimming [minbasequality=0]
- type
basic:integer
- default
0
- label
Minimum number of consecutive called bases [mcb=0]
- type
basic:integer
- default
0
- label
Number of bases to trim around matching kmers [tp=0]
- type
basic:integer
- default
0
- label
Minimum number of overlapping bases [minoverlap=14]
- type
basic:integer
- description
Require this many bases of overlap for detection.
- default
14
- label
Minimum insert size [mininsert=40]
- type
basic:integer
- description
Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
- default
40
- label
Position from which to trim bases to the left [forcetrimleft=0]
- type
basic:integer
- default
0
- label
Position from which to trim bases to the right [forcetrimright=0]
- type
basic:integer
- default
0
- label
Number of bases to trim from the right end [forcetrimright2=0]
- type
basic:integer
- default
0
- label
Modulo to right-trim reads [forcetrimmod=0]
- type
basic:integer
- description
Trim reads to the largest multiple of modulo.
- default
0
- label
Number of leftmost bases to look in for kmer matches [restrictleft=0]
- type
basic:integer
- default
0
- label
Number of rightmosot bases to look in for kmer matches [restricright=0]
- type
basic:integer
- default
0
- label
Minimum GC content [mingc=0.0]
- type
basic:decimal
- description
Discard reads with lower GC content.
- default
0.0
- label
Maximum GC content [maxgc=1.0]
- type
basic:decimal
- description
Discard reads with higher GC content.
- default
1.0
- label
Max Ns after trimming [maxns=-1]
- type
basic:integer
- description
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- default
-1
- label
Discard reads with invalid characters as bases [tossjunk=f]
- type
basic:boolean
- default
False
- label
Discard reads that fail Illumina chastity filtering [chastityfilter=f]
- type
basic:boolean
- description
Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
- default
False
- label
Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise [barcodefilter=f]
- type
basic:boolean
- description
A barcode must be the last part of the read header.
- default
False
- label
Barcode sequences [barcodes]
- type
list:data:seq:nucleotide
- required
False
- label
Literal barcode sequences [barcodes]
- type
list:basic:string
- description
Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- default
[]
- label
Minimum X coordinate [xmin=-1]
- type
basic:integer
- description
If positive, discard reads with a smaller X coordinate.
- default
-1
- label
Minimum Y coordinate [ymin=-1]
- type
basic:integer
- description
If positive, discard reads with a smaller Y coordinate.
- default
-1
- label
Maximum X coordinate [xmax=-1]
- type
basic:integer
- description
If positive, discard reads with a larger X coordinate.
- default
-1
- label
Maximum Y coordinate [ymax=-1]
- type
basic:integer
- description
If positive, discard reads with a larger Y coordinate.
- default
-1
- label
Minimum entropy [entropy=-1]
- type
basic:decimal
- description
Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
- default
-1.0
- label
Length of sliding window used to calculate entropy [entropywindow=50]
- type
basic:integer
- description
To use the sliding window set minimum entropy in range between 0.0 and 1.0.
- default
50
- label
Length of kmers used to calcuate entropy [entropyk=5]
- type
basic:integer
- default
5
- label
Mask low-entropy parts of sequences with N instead of discarding [entropymask=f]
- type
basic:boolean
- default
False
- label
Minimum base frequency [minbasefrequency=0]
- type
basic:integer
- default
0
- label
Disable grouping of bases for reads >50bp [nogroup]
- type
basic:boolean
- description
All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
- default
False
Output results
- label
Remaining reads
- type
list:basic:file
- label
Statistics
- type
list:basic:file
- label
Quality control with FastQC
- type
list:basic:file:html
- label
Download FastQC archive
- type
list:basic:file
BBDuk - STAR - FeatureCounts (3’ mRNA-Seq, paired-end)¶
-
data:workflow:quant:featurecounts:paired
workflow-bbduk-star-fc-quant-paired
(data:reads:fastq:paired reads, data:index:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:index:star rrna_reference, data:index:star globin_reference)[Source: v2.0.1]
This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.
Input arguments
- label
Paired-end reads
- type
data:reads:fastq:paired
- label
Star index
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Adapters
- type
list:data:seq:nucleotide
- description
Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
- required
False
- label
Annotation
- type
data:annotation
- label
Select the type of kit used for library preparation.
- type
basic:string
- choices
Strand-specific forward:
forward
Strand-specific reverse:
reverse
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
Output results
BBDuk - STAR - FeatureCounts (3’ mRNA-Seq, single-end)¶
-
data:workflow:quant:featurecounts:single
workflow-bbduk-star-fc-quant-single
(data:reads:fastq:single reads, data:index:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:index:star rrna_reference, data:index:star globin_reference)[Source: v2.0.1]
This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.
Input arguments
- label
Input single-end reads
- type
data:reads:fastq:single
- label
Star index
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Adapters
- type
list:data:seq:nucleotide
- description
Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
- required
False
- label
Annotation
- type
data:annotation
- label
Select the type of kit used for library preparation.
- type
basic:string
- choices
Strand-specific forward:
forward
Strand-specific reverse:
reverse
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
Output results
BBDuk - STAR - HTSeq-count (paired-end)¶
-
data:workflow:rnaseq:htseq:paired
workflow-bbduk-star-htseq-paired
(data:reads:fastq:paired reads, data:index:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded)[Source: v2.0.1]
This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.nature.com/articles/nmeth.4106). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.
Input arguments
- label
Paired-end reads
- type
data:reads:fastq:paired
- label
Star index
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Adapters
- type
list:data:seq:nucleotide
- description
Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
- required
False
- label
Annotation
- type
data:annotation
- label
Select the QuantSeq kit used for library preparation.
- type
basic:string
- choices
QuantSeq FWD:
yes
QuantSeq REV:
reverse
Output results
BBDuk - STAR - HTSeq-count (single-end)¶
-
data:workflow:rnaseq:htseq:single
workflow-bbduk-star-htseq
(data:reads:fastq:single reads, data:index:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded)[Source: v2.0.1]
This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.nature.com/articles/nmeth.4106). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.
Input arguments
- label
Input single-end reads
- type
data:reads:fastq:single
- label
Star index
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Adapters
- type
list:data:seq:nucleotide
- description
Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
- required
False
- label
annotation
- type
data:annotation
- label
Select the QuantSeq kit used for library preparation.
- type
basic:string
- choices
QuantSeq FWD:
yes
QuantSeq REV:
reverse
Output results
BBDuk - STAR - featureCounts - QC (paired-end)¶
-
data:workflow:rnaseq:featurecounts:qc
workflow-bbduk-star-featurecounts-qc-paired
(data:reads:fastq:paired reads, list:data:seq:nucleotide adapters, basic:boolean show_advanced, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, data:index:star genome, basic:boolean show_advanced, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer outFilterScoreMin, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string alignEndsType, basic:string outSAMunmapped, basic:string outSAMattributes, basic:string outSAMattrRGline, data:annotation annotation, basic:boolean show_advanced, basic:string assay_type, data:index:salmon cdna_index, basic:integer n_reads, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:index:star rrna_reference, data:index:star globin_reference)[Source: v2.0.1]
This RNA-seq pipeline is comprised of three steps preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.
Input arguments
- label
Reads
- type
data:reads:fastq:paired
- label
Adapters
- type
list:data:seq:nucleotide
- required
False
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Custom adapter sequences [literal]
- type
list:basic:string
- description
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- hidden
!preprocessing.show_advanced
- default
[]
- label
K-mer length
- type
basic:integer
- description
K-mer length must be smaller or equal to the length of adapters.
- hidden
!preprocessing.show_advanced
- default
23
- label
Minimum k-mer length at right end of reads used for trimming
- type
basic:integer
- disabled
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- hidden
!preprocessing.show_advanced
- default
11
- label
Maximum Hamming distance for k-mers
- type
basic:integer
- hidden
!preprocessing.show_advanced
- default
1
- label
Max Ns after trimming [maxns=-1]
- type
basic:integer
- description
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- hidden
!preprocessing.show_advanced
- default
-1
- label
Quality below which to trim reads from the right end
- type
basic:integer
- description
Phred algorithm is used, which is more accurate than naive trimming.
- hidden
!preprocessing.show_advanced
- default
10
- label
Minimum read length
- type
basic:integer
- description
Reads shorter than minimum read length after trimming are discarded.
- hidden
!preprocessing.show_advanced
- default
20
- label
Indexed reference genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
The data is unstranded
- type
basic:boolean
- description
For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, c ufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
- hidden
!alignment.show_advanced
- default
False
- label
Remove non-cannonical junctions (Cufflinks compatibility)
- type
basic:boolean
- description
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- hidden
!alignment.show_advanced
- default
False
- label
Detect chimeric and circular alignments
- type
basic:boolean
- description
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- default
False
- label
–chimSegmentMin
- type
basic:integer
- disabled
detect_chimeric.chimeric != true
- default
20
- label
Output in transcript coordinates
- type
basic:boolean
- description
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- default
False
- label
Allow soft-clipping and indels
- type
basic:boolean
- description
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
- disabled
t_coordinates.quantmode != true
- default
False
- label
Count reads
- type
basic:boolean
- description
With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
- disabled
t_coordinates.quantmode != true
- default
False
- label
Type of filtering
- type
basic:string
- description
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
- default
Normal
- choices
Normal:
Normal
BySJout:
BySJout
- label
–outFilterMultimapNmax
- type
basic:integer
- description
Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
- required
False
- label
–outFilterMismatchNmax
- type
basic:integer
- description
Alignment will be output only if it has fewer mismatches than this value (default: 10).
- required
False
- label
–outFilterMismatchNoverLmax
- type
basic:decimal
- description
Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
- required
False
- label
–outFilterScoreMin
- type
basic:integer
- description
Alignment will be output only if its score is higher than or equal to this value (default: 0).
- required
False
- label
–alignSJoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required
False
- label
–alignSJDBoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required
False
- label
–alignIntronMin
- type
basic:integer
- description
Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required
False
- label
–alignIntronMax
- type
basic:integer
- description
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignMatesGapMax
- type
basic:integer
- description
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignEndsType
- type
basic:string
- description
Type of read ends alignment (default: Local).
- required
False
- default
Local
- choices
Local:
Local
EndToEnd:
EndToEnd
Extend5pOfRead1:
Extend5pOfRead1
Extend5pOfReads12:
Extend5pOfReads12
- label
–outSAMunmapped
- type
basic:string
- description
Output of unmapped reads in the SAM format.
- required
False
- default
None
- choices
None:
None
Within:
Within
- label
–outSAMattributes
- type
basic:string
- description
a string of desired SAM attributes, in the order desired for the output SAM.
- required
False
- default
Standard
- choices
None:
None
Standard:
Standard
All:
All
- label
–outSAMattrRGline
- type
basic:string
- description
SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”
- required
False
- label
Annotation
- type
data:annotation
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Assay type
- type
basic:string
- description
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- hidden
!quantification.show_advanced
- default
non_specific
- choices
Strand non-specific:
non_specific
Strand-specific forward:
forward
Strand-specific reverse:
reverse
Detect automatically:
auto
- label
cDNA index file
- type
data:index:salmon
- description
Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
- required
False
- hidden
quantification.assay_type != ‘auto’
- label
Number of reads in subsampled alignment file
- type
basic:integer
- description
Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
- hidden
quantification.assay_type != ‘auto’
- default
5000000
- label
Feature class
- type
basic:string
- description
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- hidden
!quantification.show_advanced
- default
exon
- label
Feature type
- type
basic:string
- description
The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.
- hidden
!quantification.show_advanced
- default
gene
- choices
gene:
gene
transcript:
transcript
- label
ID attribute
- type
basic:string
- description
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID are considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- hidden
!quantification.show_advanced
- default
gene_id
- choices
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
Output results
BBDuk - STAR - featureCounts - QC (single-end)¶
-
data:workflow:rnaseq:featurecounts:qc
workflow-bbduk-star-featurecounts-qc-single
(data:reads:fastq:single reads, list:data:seq:nucleotide adapters, basic:boolean show_advanced, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, data:index:star genome, basic:boolean show_advanced, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer outFilterScoreMin, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string alignEndsType, basic:string outSAMunmapped, basic:string outSAMattributes, basic:string outSAMattrRGline, data:annotation annotation, basic:boolean show_advanced, basic:string assay_type, data:index:salmon cdna_index, basic:integer n_reads, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:index:star rrna_reference, data:index:star globin_reference)[Source: v2.0.1]
This RNA-seq pipeline is comprised of three steps preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.
Input arguments
- label
Reads
- type
data:reads:fastq:single
- label
Adapters
- type
list:data:seq:nucleotide
- required
False
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Custom adapter sequences [literal]
- type
list:basic:string
- description
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- hidden
!preprocessing.show_advanced
- default
[]
- label
K-mer length
- type
basic:integer
- description
K-mer length must be smaller or equal to the length of adapters.
- hidden
!preprocessing.show_advanced
- default
23
- label
Minimum k-mer length at right end of reads used for trimming
- type
basic:integer
- disabled
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- hidden
!preprocessing.show_advanced
- default
11
- label
Maximum Hamming distance for k-mers
- type
basic:integer
- hidden
!preprocessing.show_advanced
- default
1
- label
Max Ns after trimming [maxns=-1]
- type
basic:integer
- description
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- hidden
!preprocessing.show_advanced
- default
-1
- label
Quality below which to trim reads from the right end
- type
basic:integer
- description
Phred algorithm is used, which is more accurate than naive trimming.
- hidden
!preprocessing.show_advanced
- default
10
- label
Minimum read length
- type
basic:integer
- description
Reads shorter than minimum read length after trimming are discarded.
- hidden
!preprocessing.show_advanced
- default
20
- label
Indexed reference genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
The data is unstranded
- type
basic:boolean
- description
For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, c ufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
- hidden
!alignment.show_advanced
- default
False
- label
Remove non-cannonical junctions (Cufflinks compatibility)
- type
basic:boolean
- description
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- hidden
!alignment.show_advanced
- default
False
- label
Detect chimeric and circular alignments
- type
basic:boolean
- description
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- default
False
- label
–chimSegmentMin
- type
basic:integer
- disabled
detect_chimeric.chimeric != true
- default
20
- label
Output in transcript coordinates
- type
basic:boolean
- description
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- default
False
- label
Allow soft-clipping and indels
- type
basic:boolean
- description
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
- disabled
t_coordinates.quantmode != true
- default
False
- label
Count reads
- type
basic:boolean
- description
With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
- disabled
t_coordinates.quantmode != true
- default
False
- label
Type of filtering
- type
basic:string
- description
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
- default
Normal
- choices
Normal:
Normal
BySJout:
BySJout
- label
–outFilterMultimapNmax
- type
basic:integer
- description
Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
- required
False
- label
–outFilterMismatchNmax
- type
basic:integer
- description
Alignment will be output only if it has fewer mismatches than this value (default: 10).
- required
False
- label
–outFilterMismatchNoverLmax
- type
basic:decimal
- description
Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
- required
False
- label
–outFilterScoreMin
- type
basic:integer
- description
Alignment will be output only if its score is higher than or equal to this value (default: 0).
- required
False
- label
–alignSJoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required
False
- label
–alignSJDBoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required
False
- label
–alignIntronMin
- type
basic:integer
- description
Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required
False
- label
–alignIntronMax
- type
basic:integer
- description
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignMatesGapMax
- type
basic:integer
- description
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignEndsType
- type
basic:string
- description
Type of read ends alignment (default: Local).
- required
False
- default
Local
- choices
Local:
Local
EndToEnd:
EndToEnd
Extend5pOfRead1:
Extend5pOfRead1
Extend5pOfReads12:
Extend5pOfReads12
- label
–outSAMunmapped
- type
basic:string
- description
Output of unmapped reads in the SAM format.
- required
False
- default
None
- choices
None:
None
Within:
Within
- label
–outSAMattributes
- type
basic:string
- description
a string of desired SAM attributes, in the order desired for the output SAM.
- required
False
- default
Standard
- choices
None:
None
Standard:
Standard
All:
All
- label
–outSAMattrRGline
- type
basic:string
- description
SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”
- required
False
- label
Annotation
- type
data:annotation
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Assay type
- type
basic:string
- description
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- hidden
!quantification.show_advanced
- default
non_specific
- choices
Strand non-specific:
non_specific
Strand-specific forward:
forward
Strand-specific reverse:
reverse
Detect automatically:
auto
- label
cDNA index file
- type
data:index:salmon
- description
Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
- required
False
- hidden
quantification.assay_type != ‘auto’
- label
Number of reads in subsampled alignment file
- type
basic:integer
- description
Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
- hidden
quantification.assay_type != ‘auto’
- default
5000000
- label
Feature class
- type
basic:string
- description
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- hidden
!quantification.show_advanced
- default
exon
- label
Feature type
- type
basic:string
- description
The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.
- hidden
!quantification.show_advanced
- default
gene
- choices
gene:
gene
transcript:
transcript
- label
ID attribute
- type
basic:string
- description
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- hidden
!quantification.show_advanced
- default
gene_id
- choices
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
Output results
BBDuk - Salmon - QC (paired-end)¶
-
data:workflow:rnaseq:salmon
workflow-bbduk-salmon-qc-paired
(data:reads:fastq:paired reads, data:index:salmon salmon_index, data:index:star genome, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:boolean show_advanced, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:boolean seq_bias, basic:boolean gc_bias, basic:decimal consensus_slack, basic:decimal min_score_fraction, basic:integer range_factorization_bins, basic:integer min_assigned_frag, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v3.0.1]
Alignment-free RNA-seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:paired
- label
Salmon index
- type
data:index:salmon
- label
Indexed reference genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Annotation
- type
data:annotation
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Adapters
- type
list:data:seq:nucleotide
- required
False
- label
Custom adapter sequences [literal]
- type
list:basic:string
- description
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- default
[]
- label
K-mer length
- type
basic:integer
- description
K-mer length must be smaller or equal to the length of adapters.
- default
23
- label
Minimum k-mer length at right end of reads used for trimming
- type
basic:integer
- disabled
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- default
11
- label
Maximum Hamming distance for k-mers
- type
basic:integer
- default
1
- label
Max Ns after trimming [maxns=-1]
- type
basic:integer
- description
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- default
-1
- label
Quality below which to trim reads from the right end
- type
basic:integer
- description
Phred algorithm is used, which is more accurate than naive trimming.
- default
10
- label
Minimum read length
- type
basic:integer
- description
Reads shorter than minimum read length after trimming are discarded.
- default
20
- label
Perform sequence-specific bias correction
- type
basic:boolean
- default
True
- label
Perform fragment GC bias correction.
- type
basic:boolean
- default
True
- label
Consensus slack
- type
basic:decimal
- description
The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.
- required
False
- label
Minimum alignment score fraction
- type
basic:decimal
- description
The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].
- default
0.65
- label
Range factorization bins
- type
basic:integer
- description
Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.
- default
4
- label
Minimum number of assigned fragments
- type
basic:integer
- description
The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.
- default
10
- label
Number of reads
- type
basic:integer
- default
10000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
Output results
BBDuk - Salmon - QC (single-end)¶
-
data:workflow:rnaseq:salmon
workflow-bbduk-salmon-qc-single
(data:reads:fastq:single reads, data:index:salmon salmon_index, data:index:star genome, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:boolean show_advanced, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:boolean seq_bias, basic:boolean gc_bias, basic:decimal consensus_slack, basic:decimal min_score_fraction, basic:integer range_factorization_bins, basic:integer min_assigned_frag, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v3.0.1]
Alignment-free RNA-seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- label
Salmon index
- type
data:index:salmon
- label
Indexed reference genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Annotation
- type
data:annotation
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Adapters
- type
list:data:seq:nucleotide
- required
False
- label
Custom adapter sequences [literal]
- type
list:basic:string
- description
Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
- required
False
- default
[]
- label
K-mer length
- type
basic:integer
- description
K-mer length must be smaller or equal to the length of adapters.
- default
23
- label
Minimum k-mer length at right end of reads used for trimming
- type
basic:integer
- disabled
preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
- default
11
- label
Maximum Hamming distance for k-mers
- type
basic:integer
- default
1
- label
Max Ns after trimming [maxns=-1]
- type
basic:integer
- description
If non-negative, reads with more Ns than this (after trimming) will be discarded.
- default
-1
- label
Quality below which to trim reads from the right end
- type
basic:integer
- description
Phred algorithm is used, which is more accurate than naive trimming.
- default
10
- label
Minimum read length
- type
basic:integer
- description
Reads shorter than minimum read length after trimming are discarded.
- default
20
- label
Perform sequence-specific bias correction
- type
basic:boolean
- default
True
- label
Perform fragment GC bias correction.
- type
basic:boolean
- default
False
- label
Consensus slack
- type
basic:decimal
- description
The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.
- required
False
- label
Minimum alignment score fraction
- type
basic:decimal
- description
The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].
- default
0.65
- label
Range factorization bins
- type
basic:integer
- description
Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.
- default
4
- label
Minimum number of assigned fragments
- type
basic:integer
- description
The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.
- default
10
- label
Number of reads
- type
basic:integer
- default
10000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
Output results
BED file¶
-
data:bed
upload-bed
(basic:file src, basic:string species, basic:string build)[Source: v1.4.1]
Import a BED file (.bed) which is a tab-delimited text file that defines a feature track. It can have any file extension, but .bed is recommended. The BED file format is described on the [UCSC Genome Bioinformatics web site](http://genome.ucsc.edu/FAQ/FAQformat#format1).
Input arguments
- label
BED file
- type
basic:file
- description
Upload BED file annotation track. The first three required BED fields are chrom, chromStart and chromEnd.
- required
True
- validate_regex
\.(bed|narrowPeak)$
- label
Species
- type
basic:string
- description
Species latin name.
- choices
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label
Genome build
- type
basic:string
Output results
- label
BED file
- type
basic:file
- label
Bgzip bed file for JBrowse
- type
basic:file
- label
Bed file index for Jbrowse
- type
basic:file
- label
Species
- type
basic:string
- label
Build
- type
basic:string
BEDPE file¶
-
data:bedpe:
upload-bedpe
(basic:file src, basic:string species, basic:string build)[Source: v1.2.1]
Upload BEDPE files.
Input arguments
- label
Select BEDPE file to upload
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- choices
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label
Build
- type
basic:string
- required
True
- hidden
False
Output results
- label
BEDPE file
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
BWA ALN¶
-
data:alignment:bam:bwaaln
alignment-bwa-aln
(data:index:bwa genome, data:reads:fastq reads, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v2.3.1]
Read aligner for mapping low-divergent sequences against a large reference genome. Designed for Illumina sequence reads up to 100bp.
Input arguments
- label
Reference genome
- type
data:index:bwa
- label
Reads
- type
data:reads:fastq
- label
Quality threshold
- type
basic:integer
- description
Parameter for dynamic read trimming.
- default
0
- label
Use maximum edit distance (excludes fraction of missing alignments)
- type
basic:boolean
- default
False
- label
Maximum edit distance
- type
basic:integer
- hidden
!use_edit
- default
5
- label
Fraction of missing alignments
- type
basic:decimal
- description
The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
- hidden
use_edit
- default
0.04
- label
Use seeds
- type
basic:boolean
- default
False
- label
Seed length
- type
basic:integer
- description
Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
- hidden
!seeds
- default
35
- label
Seed maximum edit distance
- type
basic:integer
- hidden
!seeds
- default
2
Output results
- label
Alignment file
- type
basic:file
- description
Position sorted alignment
- label
Index BAI
- type
basic:file
- label
Unmapped reads
- type
basic:file
- required
False
- label
Statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
BWA MEM¶
-
data:alignment:bam:bwamem
alignment-bwa-mem
(data:index:bwa genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v3.3.2]
BWA MEM is a read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW). See [here](http://bio-bwa.sourceforge.net/) for more information.
Input arguments
- label
Reference genome
- type
data:index:bwa
- label
Reads
- type
data:reads:fastq
- label
Minimum seed length
- type
basic:integer
- description
Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
- default
19
- label
Band width
- type
basic:integer
- description
Gaps longer than this will not be found.
- default
100
- label
Re-seeding factor
- type
basic:decimal
- description
Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
- default
1.5
- label
Mark shorter split hits as secondary
- type
basic:boolean
- description
Mark shorter split hits as secondary (for Picard compatibility)
- default
False
- label
Score of a match
- type
basic:integer
- default
1
- label
Mismatch penalty
- type
basic:integer
- default
4
- label
Gap open penalty
- type
basic:integer
- default
6
- label
Gap extension penalty
- type
basic:integer
- default
1
- label
Clipping penalty
- type
basic:integer
- description
Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
- default
5
- label
Penalty for an unpaired read pair
- type
basic:integer
- description
Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
- default
9
- label
Report all found alignments
- type
basic:boolean
- description
Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
- default
False
- label
Report threshold score
- type
basic:integer
- description
Don’t output alignment with score lower than defined number. This option only affects output.
- default
30
Output results
- label
Alignment file
- type
basic:file
- description
Position sorted alignment
- label
Index BAI
- type
basic:file
- label
Unmapped reads
- type
basic:file
- required
False
- label
Statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
BWA SW¶
-
data:alignment:bam:bwasw
alignment-bwa-sw
(data:index:bwa genome, data:reads:fastq reads, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e)[Source: v2.3.1]
Read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The paired-end mode only works for reads Illumina short-insert libraries.
Input arguments
- label
Reference genome
- type
data:index:bwa
- label
Reads
- type
data:reads:fastq
- label
Score of a match
- type
basic:integer
- default
1
- label
Mismatch penalty
- type
basic:integer
- default
3
- label
Gap open penalty
- type
basic:integer
- default
5
- label
Gap extension penalty
- type
basic:integer
- default
2
Output results
- label
Alignment file
- type
basic:file
- description
Position sorted alignment
- label
Index BAI
- type
basic:file
- label
Unmapped reads
- type
basic:file
- required
False
- label
Statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
BWA genome index¶
-
data:index:bwa:
bwa-index
(data:seq:nucleotide ref_seq)[Source: v1.1.1]
Create BWA genome index.
Input arguments
- label
Reference sequence (nucleotide FASTA)
- type
data:seq:nucleotide
- required
True
- hidden
False
Output results
- label
BWA index
- type
basic:dir
- required
True
- hidden
False
- label
FASTA file (compressed)
- type
basic:file
- required
True
- hidden
False
- label
FASTA file
- type
basic:file
- required
True
- hidden
False
- label
FASTA file index
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Bam split¶
-
data:alignment:bam:primary
bam-split
(data:alignment:bam bam, data:sam:header header, data:sam:header header2)[Source: v0.6.1]
Split hybrid bam file into two bam files.
Input arguments
- label
Hybrid alignment bam
- type
data:alignment:bam
- label
Primary header sam file (optional)
- type
data:sam:header
- description
If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
- required
False
- label
Secondary header sam file (optional)
- type
data:sam:header
- description
If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
- required
False
Output results
- label
Uploaded file
- type
basic:file
- label
Index BAI
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Bamclipper¶
-
data:alignment:bam:bamclipped:
bamclipper
(data:alignment:bam alignment, data:bedpe bedpe, basic:boolean skip)[Source: v1.2.1]
Remove primer sequence from BAM alignments by soft-clipping. This process is a wrapper for bamclipper which can be found at https://github.com/tommyau/bamclipper.
Input arguments
- label
Alignment BAM file
- type
data:alignment:bam
- required
True
- hidden
False
- label
BEDPE file
- type
data:bedpe
- required
False
- hidden
False
- label
Skip Bamclipper step
- type
basic:boolean
- description
Use this option to skip Bamclipper step.
- required
True
- hidden
False
- default
False
Output results
- label
Clipped BAM file
- type
basic:file
- required
True
- hidden
False
- label
Index of clipped BAM file
- type
basic:file
- required
True
- hidden
False
- label
Alignment statistics
- type
basic:file
- required
True
- hidden
False
- label
BigWig file
- type
basic:file
- required
False
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Bamliquidator¶
-
data:bam:plot:bamliquidator
bamliquidator
(basic:string analysis_type, list:data:alignment:bam bam, basic:string cell_type, basic:integer bin_size, data:annotation:gtf regions_gtf, data:bed regions_bed, basic:integer extension, basic:string sense, basic:boolean skip_plot, list:basic:string black_list, basic:integer threads)[Source: v0.3.1]
Set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.
Input arguments
- label
Analysis type
- type
basic:string
- default
bin
- choices
Bin mode:
bin
Region mode:
region
BED mode:
bed
- label
BAM File
- type
list:data:alignment:bam
- label
Cell type
- type
basic:string
- default
cell_type
- label
Bin size
- type
basic:integer
- description
Number of base pairs in each bin. The smaller the bin size the longer the runtime and the larger the data files. Default is 100000.
- required
False
- hidden
analysis_type != ‘bin’
- label
Region gff file / Annotation file (.gff|.gtf)
- type
data:annotation:gtf
- required
False
- hidden
analysis_type != ‘region’
- label
Region bed file / Annotation file (.bed)
- type
data:bed
- required
False
- hidden
analysis_type != ‘bed’
- label
Extension
- type
basic:integer
- description
Extends reads by number of bp
- default
200
- label
Mapping strand to gff file
- type
basic:string
- default
.
- choices
Forward:
+
Reverse:
-
Both:
.
- label
Skip plot
- type
basic:boolean
- required
False
- label
Black list
- type
list:basic:string
- description
One or more chromosome patterns to skip during bin liquidation. Default is to skip any chromosomes that contain any of the following substrings chrUn _random Zv9_ _hap.
- required
False
- label
Threads
- type
basic:integer
- description
Number of threads to run concurrently during liquidation.
- default
1
Output results
- label
Analysis type
- type
basic:string
- hidden
True
- label
Output directory
- type
basic:file
- label
Counts HDF5 file
- type
basic:file
- label
Matrix file
- type
basic:file
- required
False
- hidden
analysis_type != ‘region’
- label
Summary file
- type
basic:file:html
- required
False
- hidden
analysis_type != ‘bin’
Bamplot¶
-
data:bam:plot:bamplot
bamplot
(basic:string genome, data:annotation:gtf input_gff, basic:string input_region, list:data:alignment:bam bam, basic:integer stretch_input, basic:string color, basic:string sense, basic:integer extension, basic:boolean rpm, basic:string yscale, list:basic:string names, basic:string plot, basic:string title, basic:string scale, list:data:bed bed, basic:boolean multi_page)[Source: v1.4.1]
Plot a single locus from a bam.
Input arguments
- label
Genome
- type
basic:string
- choices
HG19:
HG19
HG18:
HG18
MM8:
MM8
MM9:
MM9
MM10:
MM10
RN6:
RN6
RN4:
RN4
- label
Region string
- type
data:annotation:gtf
- description
Enter .gff file.
- required
False
- label
Region string
- type
basic:string
- description
Enter genomic region e.g. chr1:+:1-1000.
- required
False
- label
Bam
- type
list:data:alignment:bam
- description
bam to plot from
- required
False
- label
Stretch-input
- type
basic:integer
- description
Stretch the input regions to a minimum length in bp, e.g. 10000 (for 10kb).
- required
False
- label
Color
- type
basic:string
- description
Enter a colon separated list of colors e.g. 255,0,0:255,125,0, default samples the rainbow.
- default
255,0,0:255,125,0
- label
Sense
- type
basic:string
- description
Map to forward, reverse or’both strands. Default maps to both.
- default
both
- choices
Forward:
forward
Reverse:
reverse
Both:
both
- label
Extension
- type
basic:integer
- description
Extends reads by n bp. Default value is 200bp.
- default
200
- label
rpm
- type
basic:boolean
- description
Normalizes density to reads per million (rpm) Default is False.
- required
False
- label
y scale
- type
basic:string
- description
Choose either relative or uniform y axis scaling. Default is relative scaling.
- default
relative
- choices
relative:
relative
uniform:
uniform
- label
Names
- type
list:basic:string
- description
Enter a comma separated list of names for your bams.
- required
False
- label
Single or multiple polt
- type
basic:string
- description
Choose either all lines on a single plot or multiple plots.
- default
merge
- choices
single:
single
multiple:
multiple
merge:
merge
- label
Title
- type
basic:string
- description
Specify a title for the output plot(s), default will be the coordinate region.
- default
output
- label
Scale
- type
basic:string
- description
Enter a comma separated list of multiplicative scaling factors for your bams. Default is none.
- required
False
- label
Bed
- type
list:data:bed
- description
Add a space-delimited list of bed files to plot.
- required
False
- label
Multi page
- type
basic:boolean
- description
If flagged will create a new pdf for each region.
- default
False
Output results
- label
region plot
- type
basic:file
BaseQualityScoreRecalibrator¶
-
data:alignment:bam:bqsr:
bqsr
(data:alignment:bam bam, data:seq:nucleotide reference, list:data:variants:vcf known_sites, data:bed intervals, basic:string read_group, basic:string validation_stringency)[Source: v2.1.1]
A two pass process of BaseRecalibrator and ApplyBQSR from GATK. See GATK website for more information on BaseRecalibrator. It is possible to modify read group using GATK’s AddOrReplaceGroups through Replace read groups in BAM (``read_group``) input field.
Input arguments
- label
BAM file containing reads
- type
data:alignment:bam
- required
True
- hidden
False
- label
Reference genome file
- type
data:seq:nucleotide
- required
True
- hidden
False
- label
List of known sites of variation
- type
list:data:variants:vcf
- required
True
- hidden
False
- label
One or more genomic intervals over which to operate.
- type
data:bed
- description
This field is optional, but it can speed up the process by restricting calculations to specific genome regions.
- required
False
- hidden
False
- label
Replace read groups in BAM
- type
basic:string
- description
Replace read groups in a BAM file.This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.
- required
True
- hidden
False
- default
- label
Validation stringency
- type
basic:string
- description
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.
- required
True
- hidden
False
- default
STRICT
- choices
STRICT:
STRICT
LENIENT:
LENIENT
SILENT:
SILENT
Output results
- label
Base quality score recalibrated BAM file
- type
basic:file
- required
True
- hidden
False
- label
Index of base quality score recalibrated BAM file
- type
basic:file
- required
True
- hidden
False
- label
Alignment statistics
- type
basic:file
- required
True
- hidden
False
- label
BigWig file
- type
basic:file
- required
False
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
- label
Recalibration tabled
- type
basic:file
- required
True
- hidden
False
BaseSpace file¶
-
data:file
basespace-file-import
(basic:string file_id, basic:secret access_token_secret)[Source: v1.2.1]
Import a file from Illumina BaseSpace.
Input arguments
- label
BaseSpace file ID
- type
basic:string
- label
BaseSpace access token
- type
basic:secret
- description
BaseSpace access token secret handle needed to download the file.
Output results
- label
File
- type
basic:file
Bedtools bamtobed¶
-
data:bedpe:
bedtools-bamtobed
(data:alignment:bam alignment)[Source: v1.1.1]
Takes in a BAM file and calculates a normalization factor in BEDPE format. Done by sorting with Samtools and transformed with Bedtools.
Input arguments
- label
Alignment BAM file
- type
data:alignment:bam
- required
True
- hidden
False
Output results
- label
BEDPE file
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Bisulfite conversion rate¶
-
data:wgbs:bsrate:
bs-conversion-rate
(data:alignment:bam:walt mr, basic:boolean skip, data:seq:nucleotide sequence, basic:boolean count_all, basic:integer read_length, basic:decimal max_mismatch, basic:boolean a_rich)[Source: v1.1.1]
Estimate bisulfite conversion rate in a control set. The program bsrate included in [Methpipe] (https://github.com/smithlabcode/methpipe) will estimate the bisulfite conversion rate.
Input arguments
- label
Aligned reads from bisulfite sequencing
- type
data:alignment:bam:walt
- description
Bisulfite specifc alignment such as WALT is required as .mr file type is used. Duplicatesshould be removed to reduce any bias introduced by incomplete conversion on PCR duplicatereads.
- required
True
- hidden
False
- label
Skip Bisulfite conversion rate step
- type
basic:boolean
- description
Bisulfite conversion rate step can be skipped.
- required
True
- hidden
False
- default
False
- label
Unmethylated control sequence
- type
data:seq:nucleotide
- description
Separate unmethylated control sequence FASTA file is required to estimate bisulfiteconversion rate.
- required
False
- hidden
False
- label
Count all cytosines including CpGs
- type
basic:boolean
- required
True
- hidden
False
- default
True
- label
Average read length
- type
basic:integer
- required
True
- hidden
False
- default
150
- label
Maximum fraction of mismatches
- type
basic:decimal
- required
False
- hidden
False
- label
Reads are A-rich
- type
basic:boolean
- required
True
- hidden
False
- default
False
Output results
- label
Bisulfite conversion rate report
- type
basic:file
- required
True
- hidden
False
Bowtie (Dicty)¶
-
data:alignment:bam:bowtie1
alignment-bowtie
(data:index:bowtie genome, data:reads:fastq reads, basic:string mode, basic:integer m, basic:integer l, basic:boolean use_se, basic:integer trim_5, basic:integer trim_3, basic:integer trim_nucl, basic:integer trim_iter, basic:string r)[Source: v2.3.1]
An ultrafast memory-efficient short read aligner.
Input arguments
- label
Reference genome
- type
data:index:bowtie
- label
Reads
- type
data:reads:fastq
- label
Alignment mode
- type
basic:string
- description
When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq’s default policy. 1. Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the “seed”. 2. The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40. In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.
- default
-n
- choices
Use qualities (-n):
-n
Use mismatches (-v):
-v
- label
Allowed mismatches
- type
basic:integer
- description
When used with “Use qualities (-n)” it is the maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/–seedlen). This may be 0, 1, 2 or 3 and the default is 2 When used with “Use mismatches (-v)” report alignments with at most <int> mismatches.
- default
2
- label
Seed length (for -n only)
- type
basic:integer
- description
Only for “Use qualities (-n)”. Seed length (-l) is the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
- default
28
- label
Map as single-ended (for paired end reads only)
- type
basic:boolean
- description
If this option is selected paired-end reads will be mapped as single-ended.
- default
False
- label
Bases to trim from 5’
- type
basic:integer
- description
Number of bases to trim from from 5’ (left) end of each read before alignment
- default
0
- label
Bases to trim from 3’
- type
basic:integer
- description
Number of bases to trim from from 3’ (right) end of each read before alignment
- default
0
- label
Bases to trim
- type
basic:integer
- description
Number of bases to trim from 3’ end in each iteration.
- default
2
- label
Iterations
- type
basic:integer
- description
Number of iterations.
- default
0
- label
Reporting mode
- type
basic:string
- description
Report up to <int> valid alignments per read or pair (-k) (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the –best and –strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/–offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
- default
-a -m 1 --best --strata
- choices
Report unique alignments:
-a -m 1 --best --strata
Report all alignments:
-a --best
Report all alignments in the best stratum:
-a --best --strata
Output results
- label
Alignment file
- type
basic:file
- description
Position sorted alignment
- label
Index BAI
- type
basic:file
- label
Unmapped reads
- type
basic:file
- required
False
- label
Statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Bowtie genome index¶
-
data:index:bowtie:
bowtie-index
(data:seq:nucleotide ref_seq)[Source: v1.1.1]
Create Bowtie genome index.
Input arguments
- label
Reference sequence (nucleotide FASTA)
- type
data:seq:nucleotide
- required
True
- hidden
False
Output results
- label
Bowtie index
- type
basic:dir
- required
True
- hidden
False
- label
FASTA file (compressed)
- type
basic:file
- required
True
- hidden
False
- label
FASTA file
- type
basic:file
- required
True
- hidden
False
- label
FASTA file index
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Bowtie2¶
-
data:alignment:bam:bowtie2
alignment-bowtie2
(data:index:bowtie2 genome, data:reads:fastq reads, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:boolean no_unal, basic:integer bw_binsize, basic:integer bw_timeout)[Source: v2.5.1]
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small–typically about 2.2 GB for the human genome (2.9 GB for paired-end). See [here](http://bowtie-bio.sourceforge.net/index.shtml) for more information.
Input arguments
- label
Reference genome
- type
data:index:bowtie2
- label
Reads
- type
data:reads:fastq
- label
Alignment mode
- type
basic:string
- description
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default
--end-to-end
- choices
end to end mode:
--end-to-end
local:
--local
- label
Speed vs. Sensitivity
- type
basic:string
- description
A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- required
False
- choices
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label
Map as single-ended (for paired-end reads only)
- type
basic:boolean
- description
If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
- default
False
- label
Report discordantly matched read
- type
basic:boolean
- description
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default
True
- label
Report single ended
- type
basic:boolean
- description
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
- default
True
- label
Minimal distance
- type
basic:integer
- description
The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
- default
0
- label
Maximal distance
- type
basic:integer
- description
The maximum fragment length for valid paired-end alignments.
- default
500
- label
Not concordant when mates overlap
- type
basic:boolean
- description
When true, it is considered not concordant when mates overlap at all. Defaul is false.
- default
False
- label
Dovetail
- type
basic:boolean
- description
If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment.
- default
False
- label
Number of mismatches allowed in seed alignment (N)
- type
basic:integer
- description
Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
- required
False
- label
Length of seed substrings (L)
- type
basic:integer
- description
Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
- required
False
- label
Disallow gaps within positions (gbar)
- type
basic:integer
- description
Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
- required
False
- label
Maximal and minimal mismatch penalty (mp)
- type
basic:string
- description
Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
- required
False
- label
Set read gap open and extend penalties (rdg)
- type
basic:string
- description
Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
- required
False
- label
Set reference gap open and close penalties (rfg)
- type
basic:string
- description
Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
- required
False
- label
Minimum alignment score needed for “valid” alignment (score_min)
- type
basic:string
- description
Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
- required
False
- label
Bases to trim from 5’
- type
basic:integer
- description
Number of bases to trim from from 5’ (left) end of each read before alignment
- default
0
- label
Bases to trim from 3’
- type
basic:integer
- description
Number of bases to trim from from 3’ (right) end of each read before alignment
- default
0
- label
Iterations
- type
basic:integer
- description
Number of iterations.
- default
0
- label
Bases to trim
- type
basic:integer
- description
Number of bases to trim from 3’ end in each iteration.
- default
2
- label
Report mode
- type
basic:string
- description
Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments
- default
def
- choices
Default mode:
def
-k mode:
k
-a mode (very slow):
a
- label
Number of reports (for -k mode only)
- type
basic:integer
- description
Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. default: 5
- default
5
- label
Suppress SAM records for unaligned reads
- type
basic:boolean
- description
When true, suppress SAM records for unaligned reads. Default is false.
- default
False
- label
BigWig bin size
- type
basic:integer
- description
Size of the bins, in bases, for the output of the bigwig/bedgraph file. Default is 50.
- default
50
- label
BigWig timeout (s)
- type
basic:integer
- description
Time, in seconds, before creation of BigWig file is stopped. Default is 480 seconds.
- default
480
Output results
- label
Alignment file
- type
basic:file
- description
Position sorted alignment
- label
Index BAI
- type
basic:file
- label
Unmapped reads
- type
basic:file
- required
False
- label
Statistics
- type
basic:file
- label
BigWig file
- type
basic:file
- required
False
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Bowtie2 genome index¶
-
data:index:bowtie2:
bowtie2-index
(data:seq:nucleotide ref_seq)[Source: v1.1.1]
Create Bowtie2 genome index.
Input arguments
- label
Reference sequence (nucleotide FASTA)
- type
data:seq:nucleotide
- required
True
- hidden
False
Output results
- label
Bowtie2 index
- type
basic:dir
- required
True
- hidden
False
- label
FASTA file (compressed)
- type
basic:file
- required
True
- hidden
False
- label
FASTA file
- type
basic:file
- required
True
- hidden
False
- label
FASTA file index
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Cell Ranger Count¶
-
data:scexpression:10x:
cellranger-count
(data:screads:10x: reads, data:genomeindex:10x: genome_index, basic:string chemistry, basic:integer trim_r1, basic:integer trim_r2, basic:integer expected_cells, basic:integer force_cells)[Source: v1.1.1]
Perform gene expression analysis. Generate single cell feature counts for a single library. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count
Input arguments
- label
10x reads data object
- type
data:screads:10x:
- required
True
- hidden
False
- label
10x genome index data object
- type
data:genomeindex:10x:
- required
True
- hidden
False
- label
Chemistry
- type
basic:string
- description
Assay configuration. By default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection.
- required
False
- hidden
False
- default
auto
- choices
auto:
auto
threeprime:
Single Cell 3'
fiveprime:
Single Cell 5'
SC3Pv1:
Single Cell 3' v1
SC3Pv2:
Single Cell 3' v2
SC3Pv3:
Single Cell 3' v3
C5P-PE:
Single Cell 5' paired-end
SC5P-R2:
Single Cell 5' R2-only
- label
Trim R1
- type
basic:integer
- description
Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3’ v2 or Single Cell 5’. This and “Trim R2” are useful for determining the optimal read length for sequencing.
- required
False
- hidden
False
- label
Trim R2
- type
basic:integer
- description
Hard-trim the input R2 sequence to this length.
- required
False
- hidden
False
- label
Expected number of recovered cells
- type
basic:integer
- required
True
- hidden
False
- default
3000
- label
Force cell number
- type
basic:integer
- description
Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
- required
False
- hidden
False
Output results
- label
Matrix (filtered)
- type
basic:file
- required
True
- hidden
False
- label
Genes (filtered)
- type
basic:file
- required
True
- hidden
False
- label
Barcodes (filtered)
- type
basic:file
- required
True
- hidden
False
- label
Matrix (raw)
- type
basic:file
- required
True
- hidden
False
- label
Genes (raw)
- type
basic:file
- required
True
- hidden
False
- label
Barcodes (raw)
- type
basic:file
- required
True
- hidden
False
- label
Report
- type
basic:file:html
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Gene ID source
- type
basic:string
- required
True
- hidden
False
Cell Ranger Mkref¶
-
data:genomeindex:10x:
cellranger-mkref
(data:seq:nucleotide: genome, data:annotation:gtf: annotation)[Source: v2.1.1]
Reference preparation tool for 10x Genomics Cell Ranger. Build a Cell Ranger-compatible reference from genome FASTA and gene GTF files. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references
Input arguments
- label
Reference genome
- type
data:seq:nucleotide:
- required
True
- hidden
False
- label
Annotation
- type
data:annotation:gtf:
- required
True
- hidden
False
Output results
- label
Indexed genome
- type
basic:dir
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Gene ID source
- type
basic:string
- required
True
- hidden
False
ChIP-Seq (Gene Score)¶
-
data:chipseq:genescore
chipseq-genescore
(data:chipseq:peakscore peakscore, basic:decimal fdr, basic:decimal pval, basic:decimal logratio)[Source: v1.2.1]
Chip-Seq analysis - Gene Score (BCM)
Input arguments
- label
PeakScore file
- type
data:chipseq:peakscore
- description
PeakScore file
- label
FDR threshold
- type
basic:decimal
- description
FDR threshold value (default = 0.00005).
- default
5e-05
- label
Pval threshold
- type
basic:decimal
- description
Pval threshold value (default = 0.00005).
- default
5e-05
- label
Log-ratio threshold
- type
basic:decimal
- description
Log-ratio threshold value (default = 2).
- default
2.0
Output results
- label
Gene Score
- type
basic:file
ChIP-Seq (Peak Score)¶
-
data:chipseq:peakscore
chipseq-peakscore
(data:chipseq:callpeak:macs2 peaks, data:bed bed)[Source: v2.2.1]
Chip-Seq analysis - Peak Score (BCM)
Input arguments
- label
MACS2 results
- type
data:chipseq:callpeak:macs2
- description
MACS2 results file (NarrowPeak)
- label
BED file
- type
data:bed
Output results
- label
Peak Score
- type
basic:file
ChIP-seq (MACS2)¶
-
data:chipseq:batch:macs2
macs2-batch
(list:data:alignment:bam alignments, data:bed promoter, basic:boolean advanced, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.4.2]
This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).
Input arguments
- label
Aligned reads
- type
list:data:alignment:bam
- description
Select multiple treatment/background samples.
- label
Promoter regions BED file
- type
data:bed
- description
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required
False
- label
Show advanced options
- type
basic:boolean
- description
Inspect and modify parameters.
- default
False
- label
Use tagAlign files
- type
basic:boolean
- description
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- hidden
!advanced
- default
True
- label
Quality filtering threshold
- type
basic:integer
- default
30
- label
Number of reads to subsample
- type
basic:integer
- default
15000000
- label
Tn5 shifting
- type
basic:boolean
- description
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default
False
- label
User-defined cross-correlation peak strandshift
- type
basic:integer
- description
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required
False
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required
False
- hidden
tagalign
- choices
1:
1
auto:
auto
all:
all
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required
False
- hidden
!tagalign
- default
all
- choices
1:
1
auto:
auto
all:
all
- label
Q-value cutoff
- type
basic:decimal
- description
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required
False
- disabled
settings.pvalue && settings.pvalue_prepeak
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required
False
- disabled
settings.qvalue
- hidden
tagalign
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled
settings.qvalue
- hidden
!tagalign || settings.qvalue
- default
1e-05
- label
Cap number of peaks by taking top N peaks
- type
basic:integer
- description
To keep all peaks set value to 0.
- disabled
settings.broad
- default
500000
- label
MFOLD range (lower limit)
- type
basic:integer
- description
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required
False
- label
MFOLD range (upper limit)
- type
basic:integer
- description
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required
False
- label
Small local region
- type
basic:integer
- description
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required
False
- label
Large local region
- type
basic:integer
- description
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required
False
- label
extsize
- type
basic:integer
- description
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required
False
- label
Shift
- type
basic:integer
- description
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required
False
- label
Band width
- type
basic:integer
- description
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required
False
- label
Use backgroud lambda as local lambda
- type
basic:boolean
- description
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default
False
- label
Turn on the auto paired-peak model process
- type
basic:boolean
- description
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default
False
- label
Bypass building the shifting model
- type
basic:boolean
- description
While on, MACS will bypass building the shifting model.
- hidden
tagalign
- default
False
- label
Bypass building the shifting model
- type
basic:boolean
- description
While on, MACS will bypass building the shifting model.
- hidden
!tagalign
- default
True
- label
Down-sample
- type
basic:boolean
- description
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default
False
- label
Save fragment pileup and control lambda
- type
basic:boolean
- description
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default
True
- label
Save signal per million reads for fragment pileup profiles
- type
basic:boolean
- disabled
settings.bedgraph === false
- default
True
- label
Call summits
- type
basic:boolean
- description
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default
False
- label
Composite broad regions
- type
basic:boolean
- description
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled
settings.call_summits === true
- default
False
- label
Broad cutoff
- type
basic:decimal
- description
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required
False
- disabled
settings.call_summits === true || settings.broad !== true
- label
Blacklist regions
- type
data:bed
- description
BED file containing genomic regions that should be excluded from the analysis.
- required
False
- label
Calculate enrichment
- type
basic:boolean
- description
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- default
False
- label
Window size
- type
basic:integer
- description
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- default
400
- label
Shift size
- type
basic:string
- description
Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
- default
1:300
Output results
ChIP-seq (MACS2-ROSE2)¶
-
data:chipseq:batch:macs2
macs2-rose2-batch
(list:data:alignment:bam alignments, data:bed promoter, basic:boolean advanced, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:boolean use_filtered_bam, basic:integer tss, basic:integer stitch, data:bed mask, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer profile_window, basic:string shift_size)[Source: v1.4.2]
This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.
Input arguments
- label
Aligned reads
- type
list:data:alignment:bam
- description
Select multiple treatment/background samples.
- label
Promoter regions BED file
- type
data:bed
- description
BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
- required
False
- label
Show advanced options
- type
basic:boolean
- description
Inspect and modify parameters.
- default
False
- label
Use tagAlign files
- type
basic:boolean
- description
Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
- hidden
!advanced
- default
True
- label
Quality filtering threshold
- type
basic:integer
- default
30
- label
Number of reads to subsample
- type
basic:integer
- default
15000000
- label
Tn5 shifting
- type
basic:boolean
- description
Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.
- default
False
- label
User-defined cross-correlation peak strandshift
- type
basic:integer
- description
If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
- required
False
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required
False
- hidden
tagalign
- choices
1:
1
auto:
auto
all:
all
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- required
False
- hidden
!tagalign
- default
all
- choices
1:
1
auto:
auto
all:
all
- label
Q-value cutoff
- type
basic:decimal
- description
The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
- required
False
- disabled
settings.pvalue && settings.pvalue_prepeak
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- required
False
- disabled
settings.qvalue
- hidden
tagalign
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
- disabled
settings.qvalue
- hidden
!tagalign || settings.qvalue
- default
1e-05
- label
Cap number of peaks by taking top N peaks
- type
basic:integer
- description
To keep all peaks set value to 0.
- disabled
settings.broad
- default
500000
- label
MFOLD range (lower limit)
- type
basic:integer
- description
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required
False
- label
MFOLD range (upper limit)
- type
basic:integer
- description
This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
- required
False
- label
Small local region
- type
basic:integer
- description
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required
False
- label
Large local region
- type
basic:integer
- description
Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
- required
False
- label
extsize
- type
basic:integer
- description
While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
- required
False
- label
Shift
- type
basic:integer
- description
Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
- required
False
- label
Band width
- type
basic:integer
- description
The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
- required
False
- label
Use backgroud lambda as local lambda
- type
basic:boolean
- description
With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
- default
False
- label
Turn on the auto paired-peak model process
- type
basic:boolean
- description
Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.
- default
False
- label
Bypass building the shifting model
- type
basic:boolean
- description
While on, MACS will bypass building the shifting model.
- hidden
tagalign
- default
False
- label
Bypass building the shifting model
- type
basic:boolean
- description
While on, MACS will bypass building the shifting model.
- hidden
!tagalign
- default
True
- label
Down-sample
- type
basic:boolean
- description
When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.
- default
False
- label
Save fragment pileup and control lambda
- type
basic:boolean
- description
If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default
True
- label
Save signal per million reads for fragment pileup profiles
- type
basic:boolean
- disabled
settings.bedgraph === false
- default
True
- label
Call summits
- type
basic:boolean
- description
MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
- default
False
- label
Composite broad regions
- type
basic:boolean
- description
When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
- disabled
settings.call_summits === true
- default
False
- label
Broad cutoff
- type
basic:decimal
- description
Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
- required
False
- disabled
settings.call_summits === true || settings.broad !== true
- label
Use Filtered BAM File
- type
basic:boolean
- description
Use filtered BAM file from a MACS2 object to rank enhancers by.
- default
True
- label
TSS exclusion
- type
basic:integer
- description
Enter a distance from TSS to exclude. 0 = no TSS exclusion
- default
0
- label
Stitch
- type
basic:integer
- description
Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
- required
False
- label
Masking BED file
- type
data:bed
- description
Mask a set of regions from analysis. Provide a BED of masking regions.
- required
False
- label
Blacklist regions
- type
data:bed
- description
BED file containing genomic regions that should be excluded from the analysis.
- required
False
- label
Calculate enrichment
- type
basic:boolean
- description
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- default
False
- label
Window size
- type
basic:integer
- description
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- default
400
- label
Shift size
- type
basic:string
- description
Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
- default
1:300
Output results
Chemical Mutagenesis¶
-
data:workflow:chemut
workflow-chemut
(basic:string analysis_type, data:seq:nucleotide genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean advanced, basic:boolean br_and_ind_ra, basic:boolean dbsnp, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:integer stand_emit_conf, basic:integer stand_call_conf, basic:boolean rf, basic:boolean advanced, basic:integer read_depth)[Source: v1.0.2]
Input arguments
- label
Analysis type
- type
basic:string
- description
Choice of the analysis type. Use “SNV” or “INDEL” options to run the GATK analysis only on the haploid portion of the dicty genome. Choose options SNV_CHR2 or INDEL_CHR2 to run the analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).
- default
snv
- choices
SNV:
snv
INDEL:
indel
SNV_CHR2:
snv_chr2
INDEL_CHR2:
indel_chr2
- label
Reference genome
- type
data:seq:nucleotide
- label
Parental strains
- type
list:data:alignment:bam
- label
Mutant strains
- type
list:data:alignment:bam
- label
Advanced options
- type
basic:boolean
- required
False
- default
False
- label
Do variant base recalibration and indel realignment
- type
basic:boolean
- required
False
- hidden
Vc.advanced === false
- default
False
- label
Use dbSNP file
- type
basic:boolean
- description
rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. dbSNP is not used in any way for the calculations themselves.
- required
False
- hidden
Vc.advanced === false
- default
False
- label
Known sites (dbSNP)
- type
data:variants:vcf
- required
False
- hidden
Vc.advanced === false || Vc.br_and_ind_ra === false && Vc.dbsnp === false
- label
Known indels
- type
list:data:variants:vcf
- required
False
- hidden
Vc.advanced === false || Vc.br_and_ind_ra === false
- label
Emission confidence threshold
- type
basic:integer
- description
The minimum confidence threshold (phred-scaled) at which the program should emit sites that appear to be possibly variant.
- required
False
- hidden
Vc.advanced === false
- default
10
- label
Calling confidence threshold
- type
basic:integer
- description
The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.
- required
False
- hidden
Vc.advanced === false
- default
30
- label
ReasignOneMappingQuality Filter
- type
basic:boolean
- description
This read transformer will change a certain read mapping quality to a different value without affecting reads that have other mapping qualities. This is intended primarily for users of RNA-Seq data handling programs such as TopHat, which use MAPQ = 255 to designate uniquely aligned reads. According to convention, 255 normally designates “unknown” quality, and most GATK tools automatically ignore such reads. By reassigning a different mapping quality to those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.
- required
False
- hidden
Vc.advanced === false
- default
False
- label
Advanced options
- type
basic:boolean
- required
False
- default
False
- label
Read depth cutoff
- type
basic:integer
- description
The minimum number of replicate reads required for a variant site to be included.
- required
False
- hidden
Vf.advanced === false
- default
5
Output results
ChipQC¶
-
data:chipqc:
chipqc
(data:alignment:bam alignment, data:chipseq:callpeak peaks, data:bed blacklist, basic:boolean calculate_enrichment, basic:integer quality_threshold, basic:integer profile_window, basic:string shift_size)[Source: v1.1.1]
Calculate quality control metrics for ChIP-seq samples. The analysis is based on ChIPQC package which computs a variety of quality control metrics and statistics, and provides plots and a report for assessment of experimental data for further analysis.
Input arguments
- label
Aligned reads
- type
data:alignment:bam
- required
True
- hidden
False
- label
Called peaks
- type
data:chipseq:callpeak
- required
True
- hidden
False
- label
Blacklist regions
- type
data:bed
- description
BED file containing genomic regions that should be excluded from the analysis.
- required
False
- hidden
False
- label
Calculate enrichment
- type
basic:boolean
- description
Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.
- required
True
- hidden
False
- default
False
- label
Mapping quality threshold
- type
basic:integer
- description
Only reads with mapping quality scores above this threshold will be used for some statistics.
- required
True
- hidden
False
- default
15
- label
Window size
- type
basic:integer
- description
An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.
- required
True
- hidden
False
- default
400
- label
Shift size
- type
basic:string
- description
Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end
- required
True
- hidden
False
- default
1:300
Output results
- label
ChipQC report folder
- type
basic:dir
- required
True
- hidden
False
- label
Cross coverage score plot
- type
basic:file
- required
True
- hidden
False
- label
SSD metric plot
- type
basic:file
- required
True
- hidden
False
- label
Peak profile plot
- type
basic:file
- required
True
- hidden
False
- label
Barplot of reads in peaks
- type
basic:file
- required
True
- hidden
False
- label
Density plot of reads in peaks
- type
basic:file
- required
True
- hidden
False
- label
Heatmap of reads in genomic features
- type
basic:file
- required
False
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Convert GFF3 to GTF¶
-
data:annotation:gtf
gff-to-gtf
(data:annotation:gff3 annotation)[Source: v0.5.1]
Convert GFF3 file to GTF format.
Input arguments
- label
Annotation (GFF3)
- type
data:annotation:gff3
- description
Annotation in GFF3 format.
Output results
- label
Converted GTF file
- type
basic:file
- label
Sorted GTF file
- type
basic:file
- label
Igv index for sorted GTF file
- type
basic:file
- label
Jbrowse track for sorted GTF
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Convert files to reads (paired-end)¶
-
data:reads:fastq:paired
files-to-fastq-paired
(list:data:file src1, list:data:file src2, basic:boolean merge_lanes)[Source: v1.4.1]
Convert FASTQ files to paired-end reads.
Input arguments
- label
Mate1
- type
list:data:file
- label
Mate2
- type
list:data:file
- label
Merge lanes
- type
basic:boolean
- description
Merge paired-end sample data split into multiple sequencing lanes into a single pair of FASTQ files.
- default
False
Output results
- label
Reads file (mate 1)
- type
list:basic:file
- label
Reads file (mate 2)
- type
list:basic:file
- label
Quality control with FastQC (Upstream)
- type
list:basic:file:html
- label
Quality control with FastQC (Downstream)
- type
list:basic:file:html
- label
Download FastQC archive (Upstream)
- type
list:basic:file
- label
Download FastQC archive (Downstream)
- type
list:basic:file
Convert files to reads (single-end)¶
-
data:reads:fastq:single
files-to-fastq-single
(list:data:file src, basic:boolean merge_lanes)[Source: v1.4.1]
Convert FASTQ files to single-end reads.
Input arguments
- label
Reads
- type
list:data:file
- description
Sequencing reads in FASTQ format
- label
Merge lanes
- type
basic:boolean
- description
Merge sample data split into multiple sequencing lanes into a single FASTQ file.
- default
False
Output results
- label
Reads file
- type
list:basic:file
- label
Quality control with FastQC
- type
list:basic:file:html
- label
Download FastQC archive
- type
list:basic:file
Cuffdiff 2.2¶
-
data:differentialexpression:cuffdiff:
cuffdiff
(list:data:cufflinks:cuffquant case, list:data:cufflinks:cuffquant control, list:basic:string labels, data:annotation annotation, data:seq:nucleotide genome, basic:boolean multi_read_correct, basic:boolean create_sets, basic:decimal gene_logfc, basic:decimal gene_fdr, basic:decimal fdr, basic:string library_type, basic:string library_normalization, basic:string dispersion_method)[Source: v3.3.2]
Run Cuffdiff 2.2 analysis. Cuffdiff finds significant changes in transcript expression, splicing, and promoter use. You can use it to find differentially expressed genes and transcripts, as well as genes that are being differentially regulated at the transcriptional and post-transcriptional level. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/) and [here](https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cuffdiff/7) for more information.
Input arguments
- label
Case samples
- type
list:data:cufflinks:cuffquant
- required
True
- hidden
False
- label
Control samples
- type
list:data:cufflinks:cuffquant
- required
True
- hidden
False
- label
Group labels
- type
list:basic:string
- description
Define labels for each sample group.
- required
True
- hidden
False
- default
['control', 'case']
- label
Annotation (GTF/GFF3)
- type
data:annotation
- description
A transcript annotation file produced by cufflinks, cuffcompare, or other tool.
- required
True
- hidden
False
- label
Run bias detection and correction algorithm
- type
data:seq:nucleotide
- description
Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
- required
False
- hidden
False
- label
Do initial estimation procedure to more accurately weight reads with multiple genome mappings
- type
basic:boolean
- required
True
- hidden
False
- default
False
- label
Create gene sets
- type
basic:boolean
- description
After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
- required
True
- hidden
False
- default
False
- label
Log2 fold change threshold for gene sets
- type
basic:decimal
- description
Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
- required
True
- hidden
!create_sets
- default
1.0
- label
FDR threshold for gene sets
- type
basic:decimal
- required
True
- hidden
!create_sets
- default
0.05
- label
Allowed FDR
- type
basic:decimal
- description
The allowed false discovery rate. The default is 0.05.
- required
True
- hidden
False
- default
0.05
- label
Library type
- type
basic:string
- description
In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
- required
True
- hidden
False
- default
fr-unstranded
- choices
fr-unstranded:
fr-unstranded
fr-firststrand:
fr-firststrand
fr-secondstrand:
fr-secondstrand
- label
Library normalization method
- type
basic:string
- description
You can control how library sizes (i.e. sequencing depths) are normalized in Cufflinks and Cuffdiff. Cuffdiff has several methods that require multiple libraries in order to work. Library normalization methods supported by Cufflinks work on one library at a time.
- required
True
- hidden
False
- default
geometric
- choices
geometric:
geometric
classic-fpkm:
classic-fpkm
quartile:
quartile
- label
Dispersion method
- type
basic:string
- description
Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010).
- required
True
- hidden
False
- default
pooled
- choices
pooled:
pooled
per-condition:
per-condition
blind:
blind
poisson:
poisson
Output results
- label
Differential expression
- type
basic:file
- required
True
- hidden
False
- label
Results table (JSON)
- type
basic:json
- required
True
- hidden
False
- label
Results table (file)
- type
basic:file
- required
True
- hidden
False
- label
Differential expression (transcript level)
- type
basic:file
- required
True
- hidden
False
- label
Differential expression (primary transcript)
- type
basic:file
- required
True
- hidden
False
- label
Differential expression (coding sequence)
- type
basic:file
- required
True
- hidden
False
- label
Cuffdiff output
- type
basic:file
- required
True
- hidden
False
- label
Gene ID database
- type
basic:string
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
- label
Feature type
- type
basic:string
- required
True
- hidden
False
Cufflinks 2.2¶
-
data:cufflinks:cufflinks
cufflinks
(data:alignment:bam alignment, data:annotation annotation, data:seq:nucleotide genome, data:annotation:gtf mask_file, basic:string library_type, basic:string annotation_usage, basic:boolean multi_read_correct)[Source: v3.1.1]
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. See [here](http://cole-trapnell-lab.github.io/cufflinks/) for more information.
Input arguments
- label
Aligned reads
- type
data:alignment:bam
- label
Annotation (GTF/GFF3)
- type
data:annotation
- required
False
- label
Run bias detection and correction algorithm
- type
data:seq:nucleotide
- description
Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
- required
False
- label
Mask file
- type
data:annotation:gtf
- description
Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
- required
False
- label
Library type
- type
basic:string
- description
In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
- default
fr-unstranded
- choices
fr-unstranded:
fr-unstranded
fr-firststrand:
fr-firststrand
fr-secondstrand:
fr-secondstrand
- label
Instruct Cufflinks how to use the provided annotation (GFF/GTF) file
- type
basic:string
- description
GTF-guide - tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled. –GTF - tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript.
- default
--GTF-guide
- choices
Use supplied reference annotation to guide RABT assembly (–GTF-guide):
--GTF-guide
Use supplied reference annotation to estimate isoform expression (–GTF):
--GTF
- label
Do initial estimation procedure to more accurately weight reads with multiple genome mappings
- type
basic:boolean
- description
Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
- default
False
Output results
- label
Assembled transcript isoforms
- type
basic:file
- label
Isoforms FPKM tracking
- type
basic:file
- label
Genes FPKM tracking
- type
basic:file
- label
Skipped loci
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Cuffmerge¶
-
data:annotation:cuffmerge
cuffmerge
(list:data:cufflinks:cufflinks expressions, list:data:annotation:gtf gtf, data:annotation gff, data:seq:nucleotide genome, basic:integer threads)[Source: v2.1.1]
Cufflinks includes a script called Cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. The main purpose of Cuffmerge is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/) for more information.
Input arguments
- label
Cufflinks transcripts (GTF)
- type
list:data:cufflinks:cufflinks
- required
False
- label
Annotation files (GTF)
- type
list:data:annotation:gtf
- description
Annotation files you wish to merge together with Cufflinks produced annotation files (e.g. upload Cufflinks annotation GTF file)
- required
False
- label
Reference annotation (GTF/GFF3)
- type
data:annotation
- description
An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.
- required
False
- label
Reference genome
- type
data:seq:nucleotide
- description
This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats. Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension
- required
False
- label
Use this many processor threads
- type
basic:integer
- description
Use this many threads to align reads. The default is 1.
- default
1
Output results
- label
Merged GTF file
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Cuffnorm¶
-
data:cuffnorm
cuffnorm
(list:data:cufflinks:cuffquant cuffquant, data:annotation annotation, basic:boolean useERCC)[Source: v2.3.1]
Cufflinks includes a program, Cuffnorm, that you can use to generate tables of expression values that are properly normalized for library size. Cuffnorm takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM, BAM, or CXB files for two or more samples. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/) for more information. Replicate relation needs to be defined for Cuffnorm to account for replicates. If the replicate relation is not defined, each sample will be treated individually.
Input arguments
- label
Cuffquant expression file
- type
list:data:cufflinks:cuffquant
- label
Annotation (GTF/GFF3)
- type
data:annotation
- description
A transcript annotation file produced by cufflinks, cuffcompare, or other source.
- label
ERCC spike-in normalization
- type
basic:boolean
- description
Use ERRCC spike-in controls for normalization.
- default
False
Output results
- label
Genes count
- type
basic:file
- label
Genes FPKM
- type
basic:file
- label
Genes attr table
- type
basic:file
- label
Isoform count
- type
basic:file
- label
Isoform FPKM
- type
basic:file
- label
Isoform attr table
- type
basic:file
- label
CDS count
- type
basic:file
- label
CDS FPKM
- type
basic:file
- label
CDS attr table
- type
basic:file
- label
TSS groups count
- type
basic:file
- label
TSS groups FPKM
- type
basic:file
- label
TSS attr table
- type
basic:file
- label
Run info
- type
basic:file
- label
FPKM exp scatter plot
- type
basic:file
- label
Boxplot
- type
basic:file
- label
FPKM exp raw
- type
basic:file
- label
Replicate correlatios plot
- type
basic:file
- label
FPKM means
- type
basic:file
- label
Exp FPKM means
- type
basic:file
- label
FKPM exp scatter normalized plot
- type
basic:file
- required
False
- label
FPKM exp normalized
- type
basic:file
- required
False
- label
Spike raw
- type
basic:file
- required
False
- label
Spike normalized
- type
basic:file
- required
False
- label
All R normalization data
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Cuffquant 2.2¶
-
data:cufflinks:cuffquant
cuffquant
(data:alignment:bam alignment, data:annotation annotation, data:seq:nucleotide genome, data:annotation:gtf mask_file, basic:string library_type, basic:boolean multi_read_correct)[Source: v2.1.1]
Cuffquant allows you to compute the gene and transcript expression profiles and save these profiles to files that you can analyze later with Cuffdiff or Cuffnorm. See [here](http://cole-trapnell-lab.github.io/cufflinks/manual/) for more information.
Input arguments
- label
Aligned reads
- type
data:alignment:bam
- label
Annotation (GTF/GFF3)
- type
data:annotation
- label
Run bias detection and correction algorithm
- type
data:seq:nucleotide
- description
Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
- required
False
- label
Mask file
- type
data:annotation:gtf
- description
Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
- required
False
- label
Library type
- type
basic:string
- description
In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
- default
fr-unstranded
- choices
fr-unstranded:
fr-unstranded
fr-firststrand:
fr-firststrand
fr-secondstrand:
fr-secondstrand
- label
Do initial estimation procedure to more accurately weight reads with multiple genome mappings
- type
basic:boolean
- description
Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
- default
False
Output results
- label
Abundances (.cxb)
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
Cuffquant results¶
-
data:cufflinks:cuffquant
upload-cxb
(basic:file src, basic:string source, basic:string species, basic:string build, basic:string feature_type)[Source: v1.3.2]
Upload Cuffquant results file (.cxb)
Input arguments
- label
Cuffquant file
- type
basic:file
- description
Upload Cuffquant results file. Supported extention: *.cxb
- required
True
- validate_regex
\.(cxb)$
- label
Gene ID database
- type
basic:string
- choices
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label
Species
- type
basic:string
- description
Species latin name.
- choices
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label
Build
- type
basic:string
- label
Feature type
- type
basic:string
- default
gene
- choices
gene:
gene
transcript:
transcript
exon:
exon
Output results
- label
Cuffquant results
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
- label
Feature type
- type
basic:string
Custom master file¶
-
data:masterfile:amplicon
upload-master-file
(basic:file src, basic:string panel_name)[Source: v1.2.1]
This should be a tab delimited file (*.txt). Please check the [example](http://genial.is/amplicon-masterfile) file for details.
Input arguments
- label
Master file
- type
basic:file
- validate_regex
\.txt(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$
- label
Panel name
- type
basic:string
Output results
- label
Master file
- type
basic:file
- label
BED file (merged targets)
- type
basic:file
- label
BED file (nonmerged targets)
- type
basic:file
- label
BED file (overlap-free targets)
- type
basic:file
- label
Primers
- type
basic:file
- label
Panel name
- type
basic:string
Cut & Run¶
-
data:workflow:cutnrun
workflow-cutnrun
(data:reads:fastq:paired reads, basic:integer quality, basic:integer nextseq, basic:string phred, basic:integer min_length, basic:integer max_n, basic:boolean retain_unpaired, basic:integer unpaired_len_1, basic:integer unpaired_len_2, basic:integer clip_r1, basic:integer clip_r2, basic:integer three_prime_r1, basic:integer three_prime_r2, list:basic:string adapter, list:basic:string adapter_2, data:seq:nucleotide adapter_file_1, data:seq:nucleotide adapter_file_2, basic:string universal_adapter, basic:integer stringency, basic:decimal error_rate, basic:integer trim_5, basic:integer trim_3, data:index:bowtie2 genome, basic:string mode, basic:string speed, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:boolean no_unal, data:index:bowtie2 genome, basic:string mode, basic:string speed, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:boolean no_overlap, basic:boolean dovetail, basic:boolean no_unal, basic:string format, basic:decimal pvalue, basic:string duplicates, basic:boolean bedgraph, basic:integer min_frag_length, basic:integer max_frag_length, basic:decimal scale, basic:integer bw_binsize, basic:integer bw_timeout)[Source: v1.2.1]
Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome. Aligned reads are processed to produce bigwig files to be viewed in a genome browser. Peaks are called using MACS2. Fragmenting of reads is performed using alignmentSieve from deeptools package.
Input arguments
- label
Input reads
- type
data:reads:fastq:paired
- label
Quality cutoff
- type
basic:integer
- description
Trim low-quality ends from reads based on Phred score.
- required
False
- label
NextSeq/NovaSeq trim cutoff
- type
basic:integer
- description
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.
- required
False
- label
Phred score encoding
- type
basic:string
- description
Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1 .9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming.
- default
--phred33
- choices
ASCII+33:
--phred33
ASCII+64:
--phred64
- label
Minimum length after trimming
- type
basic:integer
- description
Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.
- default
20
- label
Maximum number of Ns
- type
basic:integer
- description
Read exceeding this limit will result in the entire pair being removed from the trimmed output files.
- required
False
- label
Retain unpaired reads after trimming
- type
basic:boolean
- description
If only one of the two paired-end reads “became too short, the longer read will be written.
- default
False
- label
Unpaired read length cutoff of mate 1
- type
basic:integer
- hidden
!quality_trim.retain_unpaired
- default
35
- label
Unpaired read length cutoff for mate 2
- type
basic:integer
- hidden
!quality_trim.retain_unpaired
- default
35
- label
Trim bases from 5’ end of read 1
- type
basic:integer
- description
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.
- required
False
- label
Trim bases from 5’ end of read 2
- type
basic:integer
- description
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.
- required
False
- label
Trim bases from 3’ end of read 1
- type
basic:integer
- description
Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
- required
False
- label
Trim bases from 3’ end of read 2
- type
basic:integer
- description
Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.
- required
False
- label
Read 1 adapter sequence
- type
list:basic:string
- description
Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.
- required
False
- label
Read 2 adapter sequence
- type
list:basic:string
- description
Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.
- required
False
- label
Read 1 adapters file
- type
data:seq:nucleotide
- description
This is mutually exclusive with read 1 adapters and universal adapters.
- required
False
- label
Read 2 adapters file
- type
data:seq:nucleotide
- description
This is mutually exclusive with read 2 adapters and universal adapters.
- required
False
- label
Universal adapters
- type
basic:string
- description
Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.
- required
False
- choices
Illumina:
--illumina
Nextera:
--nextera
Illumina small RNA:
--small_rna
- label
Overlap with adapter sequence required to trim
- type
basic:integer
- description
Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.
- default
1
- label
Maximum allowed error rate
- type
basic:decimal
- description
Number of errors divided by the length of the matching region. Default value of 0.1.
- default
0.1
- label
Hard trim sequence from 3’ end
- type
basic:integer
- description
Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.
- required
False
- label
Hard trim sequences from 5’ end
- type
basic:integer
- description
Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.
- required
False
- label
Species genome
- type
data:index:bowtie2
- label
Alignment mode
- type
basic:string
- description
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default
--local
- choices
end to end mode:
--end-to-end
local:
--local
- label
Speed vs. Sensitivity
- type
basic:string
- description
A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- default
--very-sensitive
- choices
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label
Report discordantly matched read
- type
basic:boolean
- description
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default
True
- label
Report single ended
- type
basic:boolean
- description
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).
- default
True
- label
Minimal distance
- type
basic:integer
- description
The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.
- default
10
- label
Maximal distance
- type
basic:integer
- description
The maximum fragment length (–maxins) for valid paired-end alignments.
- default
700
- label
Not concordant when mates overlap
- type
basic:boolean
- description
When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).
- default
False
- label
Dovetail
- type
basic:boolean
- description
If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.
- default
False
- label
Suppress SAM records for unaligned reads
- type
basic:boolean
- description
When true, suppress SAM records for unaligned reads. Default is true (–no-unal).
- default
True
- label
Spike-in genome
- type
data:index:bowtie2
- label
Alignment mode
- type
basic:string
- description
End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
- default
--local
- choices
end to end mode:
--end-to-end
local:
--local
- label
Speed vs. Sensitivity
- type
basic:string
- description
A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
- default
--very-sensitive
- choices
Very fast:
--very-fast
Fast:
--fast
Sensitive:
--sensitive
Very sensitive:
--very-sensitive
- label
Report discordantly matched read
- type
basic:boolean
- description
If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
- default
True
- label
Report single ended
- type
basic:boolean
- description
If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).
- default
True
- label
Minimal distance
- type
basic:integer
- description
The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.
- default
10
- label
Maximal distance
- type
basic:integer
- description
The maximum fragment length (–maxins) for valid paired-end alignments.
- default
700
- label
Not concordant when mates overlap
- type
basic:boolean
- description
When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).
- default
True
- label
Dovetail
- type
basic:boolean
- description
If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.
- default
False
- label
Suppress SAM records for unaligned reads
- type
basic:boolean
- description
When true, suppress SAM records for unaligned reads. Default is true (–no-unal).
- default
True
- label
Format of tag file
- type
basic:string
- description
This specifies the format of input files. For paired-end data the format dicates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.
- required
False
- default
BAMPE
- choices
BAM:
BAM
BAMPE:
BAMPE
- label
P-value cutoff
- type
basic:decimal
- description
The p-value cutoff.
- required
False
- default
0.001
- label
Number of duplicates
- type
basic:string
- description
It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
- default
all
- choices
1:
1
auto:
auto
all:
all
- label
Save fragment pileup and control lambda
- type
basic:boolean
- description
If this flag is on, MACS will store the fragment pileup, control lambda, -log10(pvalue) and -log10(qvalue) scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
- default
True
- label
Minimum fragment length
- type
basic:integer
- description
The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. Default is 0.
- default
0
- label
Maximum fragment length
- type
basic:integer
- description
The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. Default is 0.
- default
0
- label
Scale factor
- type
basic:decimal
- description
Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).
- default
10000
- label
BigWig bin size
- type
basic:integer
- description
Size of the bins, in bases, for the output of the bigwig/bedgraph file. Default is 50.
- default
50
- label
BigWig timeout
- type
basic:integer
- description
Number of seconds before calculation of BigWig file is aborted. Default is 3600 seconds (1 hour).
- default
3600
Output results
Cutadapt (3’ mRNA-seq, single-end)¶
-
data:reads:fastq:single:cutadapt:
cutadapt-3prime-single
(data:reads:fastq:single reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap, basic:integer times)[Source: v1.2.1]
Process 3’ mRNA-seq datasets using Cutadapt tool.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- required
True
- hidden
False
- label
NextSeq/NovaSeq trim
- type
basic:integer
- description
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
- required
True
- hidden
False
- default
10
- label
Quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
- required
False
- hidden
False
- label
Discard reads shorter than specified minimum length.
- type
basic:integer
- required
True
- hidden
False
- default
20
- label
Mimimum overlap
- type
basic:integer
- description
Minimum overlap between adapter and read for an adapter to be found.
- required
True
- hidden
False
- default
20
- label
Remove up to a specified number of adapters from each read.
- type
basic:integer
- required
True
- hidden
False
- default
2
Output results
- label
Reads file.
- type
list:basic:file
- required
True
- hidden
False
- label
Cutadapt report
- type
basic:file
- required
True
- hidden
False
- label
Quality control with FastQC.
- type
list:basic:file:html
- required
True
- hidden
False
- label
Download FastQC archive.
- type
list:basic:file
- required
True
- hidden
False
Cutadapt (Corall RNA-Seq, paired-end)¶
-
data:reads:fastq:paired:cutadapt:
cutadapt-corall-paired
(data:reads:fastq:paired reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap)[Source: v1.1.2]
Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:paired
- required
True
- hidden
False
- label
NextSeq/NovaSeq trim
- type
basic:integer
- description
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
- required
True
- hidden
False
- default
10
- label
Quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
- required
False
- hidden
False
- label
Minimum read length
- type
basic:integer
- required
True
- hidden
False
- default
20
- label
Mimimum overlap
- type
basic:integer
- description
Minimum overlap between adapter and read for an adapter to be found.
- required
True
- hidden
False
- default
20
Output results
- label
Remaining mate1 reads
- type
list:basic:file
- required
True
- hidden
False
- label
Remaining mate2 reads
- type
list:basic:file
- required
True
- hidden
False
- label
Cutadapt report
- type
basic:file
- required
True
- hidden
False
- label
Mate1 quality control with FastQC
- type
list:basic:file:html
- required
True
- hidden
False
- label
Mate2 quality control with FastQC
- type
list:basic:file:html
- required
True
- hidden
False
- label
Download mate1 FastQC archive
- type
list:basic:file
- required
True
- hidden
False
- label
Download mate2 FastQC archive
- type
list:basic:file
- required
True
- hidden
False
Cutadapt (Corall RNA-Seq, single-end)¶
-
data:reads:fastq:single:cutadapt:
cutadapt-corall-single
(data:reads:fastq:single reads, basic:integer nextseq_trim, basic:integer quality_cutoff, basic:integer min_len, basic:integer min_overlap)[Source: v1.2.1]
Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- required
True
- hidden
False
- label
NextSeq/NovaSeq trim
- type
basic:integer
- description
NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.
- required
True
- hidden
False
- default
10
- label
Quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.
- required
False
- hidden
False
- label
Minimum read length
- type
basic:integer
- required
True
- hidden
False
- default
20
- label
Mimimum overlap
- type
basic:integer
- description
Minimum overlap between adapter and read for an adapter to be found.
- required
True
- hidden
False
- default
20
Output results
- label
Reads file
- type
list:basic:file
- required
True
- hidden
False
- label
Cutadapt report
- type
basic:file
- required
True
- hidden
False
- label
Quality control with FastQC
- type
list:basic:file:html
- required
True
- hidden
False
- label
Download FastQC archive
- type
list:basic:file
- required
True
- hidden
False
Cutadapt (Diagenode CATS, paired-end)¶
-
data:reads:fastq:paired:cutadapt
cutadapt-custom-paired
(data:reads:fastq:paired reads)[Source: v1.3.1]
Cutadapt process configured to be used with the Diagenode CATS kits.
Input arguments
- label
NGS reads
- type
data:reads:fastq:paired
Output results
- label
Reads file (forward)
- type
list:basic:file
- label
Reads file (reverse)
- type
list:basic:file
- label
Cutadapt report
- type
basic:file
- label
Quality control with FastQC (forward)
- type
list:basic:file:html
- label
Quality control with FastQC (reverse)
- type
list:basic:file:html
- label
Download FastQC archive (forward)
- type
list:basic:file
- label
Download FastQC archive (reverse)
- type
list:basic:file
Cutadapt (Diagenode CATS, single-end)¶
-
data:reads:fastq:single:cutadapt
cutadapt-custom-single
(data:reads:fastq:single reads)[Source: v1.3.1]
Cutadapt process configured to be used with the Diagenode CATS kits.
Input arguments
- label
NGS reads
- type
data:reads:fastq:single
Output results
- label
Reads file
- type
list:basic:file
- label
Cutadapt report
- type
basic:file
- label
Quality control with FastQC
- type
list:basic:file:html
- label
Download FastQC archive
- type
list:basic:file
Cutadapt (paired-end)¶
-
data:reads:fastq:paired:cutadapt
cutadapt-paired
(data:reads:fastq:paired reads, data:seq:nucleotide mate1_5prime_file, data:seq:nucleotide mate1_3prime_file, data:seq:nucleotide mate2_5prime_file, data:seq:nucleotide mate2_3prime_file, list:basic:string mate1_5prime_seq, list:basic:string mate1_3prime_seq, list:basic:string mate2_5prime_seq, list:basic:string mate2_3prime_seq, basic:integer times, basic:decimal error_rate, basic:integer min_overlap, basic:boolean match_read_wildcards, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer max_n, basic:string pair_filter)[Source: v2.4.1]
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:paired
- label
5 prime adapter file for Mate 1
- type
data:seq:nucleotide
- required
False
- label
3 prime adapter file for Mate 1
- type
data:seq:nucleotide
- required
False
- label
5 prime adapter file for Mate 2
- type
data:seq:nucleotide
- required
False
- label
3 prime adapter file for Mate 2
- type
data:seq:nucleotide
- required
False
- label
5 prime adapter sequence for Mate 1
- type
list:basic:string
- required
False
- label
3 prime adapter sequence for Mate 1
- type
list:basic:string
- required
False
- label
5 prime adapter sequence for Mate 2
- type
list:basic:string
- required
False
- label
3 prime adapter sequence for Mate 2
- type
list:basic:string
- required
False
- label
Times
- type
basic:integer
- description
Remove up to COUNT adapters from each read.
- default
1
- label
Error rate
- type
basic:decimal
- description
Maximum allowed error rate (no. of errors divided by the length of the matching region).
- default
0.1
- label
Minimal overlap
- type
basic:integer
- description
Minimum overlap for an adapter match.
- default
3
- label
Match read wildcards
- type
basic:boolean
- description
Interpret IUPAC wildcards in reads.
- default
False
- label
NextSeq-specific quality trimming
- type
basic:integer
- description
NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
- required
False
- label
Quality on 5 prime
- type
basic:integer
- description
Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base.
- required
False
- label
Quality on 3 prime
- type
basic:integer
- description
Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base.
- required
False
- label
Crop
- type
basic:integer
- description
Cut the specified number of bases from the end of the reads.
- required
False
- label
Headcrop
- type
basic:integer
- description
Cut the specified number of bases from the start of the reads.
- required
False
- label
Min length
- type
basic:integer
- description
Drop the read if it is below a specified.
- required
False
- label
Max numebr of N-s
- type
basic:integer
- description
Discard reads having more ‘N’ bases than specified.
- required
False
- label
Which of the reads have to match the filtering criterion
- type
basic:string
- description
Which of the reads in a paired-end read have to match the filtering criterion in order for the pair to be filtered.
- default
any
- choices
Any of the reads in a paired-end read have to match the filtering criterion:
any
Both of the reads in a paired-end read have to match the filtering criterion:
both
Output results
- label
Reads file (forward)
- type
list:basic:file
- label
Reads file (reverse)
- type
list:basic:file
- label
Cutadapt report
- type
basic:file
- label
Quality control with FastQC (forward)
- type
list:basic:file:html
- label
Quality control with FastQC (reverse)
- type
list:basic:file:html
- label
Download FastQC archive (forward)
- type
list:basic:file
- label
Download FastQC archive (reverse)
- type
list:basic:file
Cutadapt (single-end)¶
-
data:reads:fastq:single:cutadapt
cutadapt-single
(data:reads:fastq:single reads, data:seq:nucleotide up_primers_file, data:seq:nucleotide down_primers_file, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:integer polya_tail, basic:integer min_overlap, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer max_n, basic:boolean match_read_wildcards, basic:integer times, basic:decimal error_rate)[Source: v2.2.1]
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- label
5 prime adapter file
- type
data:seq:nucleotide
- required
False
- label
3 prime adapter file
- type
data:seq:nucleotide
- required
False
- label
5 prime adapter sequence
- type
list:basic:string
- required
False
- label
3 prime adapter sequence
- type
list:basic:string
- required
False
- label
Poly-A tail
- type
basic:integer
- description
Length of poly-A tail, example - AAAN -> 3, AAAAAN -> 5
- required
False
- label
Minimal overlap
- type
basic:integer
- description
Minimum overlap for an adapter match
- default
3
- label
NextSeq-specific quality trimming
- type
basic:integer
- description
NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
- required
False
- label
Quality on 5 prime
- type
basic:integer
- description
Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
- required
False
- label
Quality on 3 prime
- type
basic:integer
- description
Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
- required
False
- label
Crop
- type
basic:integer
- description
Cut the read to a specified length by removing bases from the end
- required
False
- label
Headcrop
- type
basic:integer
- description
Cut the specified number of bases from the start of the read
- required
False
- label
Min length
- type
basic:integer
- description
Drop the read if it is below a specified length
- required
False
- label
Max numebr of N-s
- type
basic:integer
- description
Discard reads having more ‘N’ bases than specified.
- required
False
- label
Match read wildcards
- type
basic:boolean
- description
Interpret IUPAC wildcards in reads.
- required
False
- default
False
- label
Times
- type
basic:integer
- description
Remove up to COUNT adapters from each read.
- default
1
- label
Error rate
- type
basic:decimal
- description
Maximum allowed error rate (no. of errors divided by the length of the matching region).
- default
0.1
Output results
- label
Reads file
- type
list:basic:file
- label
Cutadapt report
- type
basic:file
- label
Quality control with FastQC
- type
list:basic:file:html
- label
Download FastQC archive
- type
list:basic:file
Cutadapt - STAR - FeatureCounts (3’ mRNA-Seq, single-end)¶
-
data:workflow:quant:featurecounts:single
workflow-cutadapt-star-fc-quant-single
(data:reads:fastq:single reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:boolean show_advanced, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v2.0.1]
This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics. Additional QC steps operate on downsampled reads and include an alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- label
Genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Annotation
- type
data:annotation
- description
Genome annotation file (GTF).
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Reads quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.
- required
False
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads in range [0.0, 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
Output results
Cutadapt - STAR - FeatureCounts - basic QC (3’ mRNA-Seq, single-end)¶
-
data:workflow:quant:featurecounts:single
workflow-cutadapt-star-fc-quant-wo-depletion-single
(data:reads:fastq:single reads, data:index:star star_index, data:annotation annotation, basic:boolean show_advanced, basic:integer quality_cutoff)[Source: v2.0.1]
This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- label
Genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Annotation
- type
data:annotation
- description
Genome annotation file (GTF).
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Reads quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.
- required
False
Output results
Cutadapt - STAR - HTSeq-count (paired-end)¶
-
data:workflow:rnaseq:htseq
workflow-custom-cutadapt-star-htseq-paired
(data:reads:fastq:paired reads, data:index:star genome, data:annotation:gtf gff, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string mode, basic:string feature_class, basic:string id_attribute, basic:boolean name_ordered)[Source: v2.0.1]
This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __cutadapt__ which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.
Input arguments
- label
NGS reads
- type
data:reads:fastq:paired
- label
Indexed reference genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool
- label
Annotation (GFF)
- type
data:annotation:gtf
- label
Assay type
- type
basic:string
- description
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- default
no
- choices
Strand non-specific:
no
Strand-specific forward:
yes
Strand-specific reverse:
reverse
- label
Advanced
- type
basic:boolean
- default
False
- label
Remove non-cannonical junctions (Cufflinks compatibility)
- type
basic:boolean
- description
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- default
False
- label
Detect chimeric and circular alignments
- type
basic:boolean
- description
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- default
False
- label
–chimSegmentMin
- type
basic:integer
- disabled
!star.detect_chimeric.chimeric
- default
20
- label
Output in transcript coordinates
- type
basic:boolean
- description
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- default
False
- label
Allow soft-clipping and indels
- type
basic:boolean
- description
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Count reads
- type
basic:boolean
- description
With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Type of filtering
- type
basic:string
- description
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
- default
Normal
- choices
Normal:
Normal
BySJout:
BySJout
- label
–outFilterMultimapNmax
- type
basic:integer
- description
Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
- required
False
- label
–outFilterMismatchNmax
- type
basic:integer
- description
Alignment will be output only if it has fewer mismatches than this value (default: 10).
- required
False
- label
–outFilterMismatchNoverLmax
- type
basic:decimal
- description
Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
- required
False
- label
–alignSJoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required
False
- label
–alignSJDBoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required
False
- label
–alignIntronMin
- type
basic:integer
- description
Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required
False
- label
–alignIntronMax
- type
basic:integer
- description
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignMatesGapMax
- type
basic:integer
- description
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
Mode
- type
basic:string
- description
Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty
- default
union
- choices
union:
union
intersection-strict:
intersection-strict
intersection-nonempty:
intersection-nonempty
- label
Feature class
- type
basic:string
- description
Feature class (3rd column in GFF file) to be used. All other features will be ignored.
- default
exon
- label
ID attribute
- type
basic:string
- description
GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
- default
gene_id
- label
Use name-ordered BAM file for counting reads
- type
basic:boolean
- description
Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time.
- required
False
- default
False
Output results
Cutadapt - STAR - HTSeq-count (single-end)¶
-
data:workflow:rnaseq:htseq
workflow-custom-cutadapt-star-htseq-single
(data:reads:fastq:single reads, data:index:star genome, data:annotation:gtf gff, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string mode, basic:string feature_class, basic:string id_attribute, basic:boolean name_ordered)[Source: v2.0.1]
This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __cutadapt__ which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.
Input arguments
- label
NGS reads
- type
data:reads:fastq:single
- label
Indexed reference genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool
- label
Annotation (GFF)
- type
data:annotation:gtf
- label
Assay type
- type
basic:string
- description
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- default
no
- choices
Strand non-specific:
no
Strand-specific forward:
yes
Strand-specific reverse:
reverse
- label
Advanced
- type
basic:boolean
- default
False
- label
Remove non-cannonical junctions (Cufflinks compatibility)
- type
basic:boolean
- description
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- default
False
- label
Detect chimeric and circular alignments
- type
basic:boolean
- description
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- default
False
- label
–chimSegmentMin
- type
basic:integer
- disabled
!star.detect_chimeric.chimeric
- default
20
- label
Output in transcript coordinates
- type
basic:boolean
- description
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- default
False
- label
Allow soft-clipping and indels
- type
basic:boolean
- description
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Count reads
- type
basic:boolean
- description
With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Type of filtering
- type
basic:string
- description
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
- default
Normal
- choices
Normal:
Normal
BySJout:
BySJout
- label
–outFilterMultimapNmax
- type
basic:integer
- description
Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
- required
False
- label
–outFilterMismatchNmax
- type
basic:integer
- description
Alignment will be output only if it has fewer mismatches than this value (default: 10).
- required
False
- label
–outFilterMismatchNoverLmax
- type
basic:decimal
- description
Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
- required
False
- label
–alignSJoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required
False
- label
–alignSJDBoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required
False
- label
–alignIntronMin
- type
basic:integer
- description
Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required
False
- label
–alignIntronMax
- type
basic:integer
- description
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignMatesGapMax
- type
basic:integer
- description
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
Mode
- type
basic:string
- description
Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty
- default
union
- choices
union:
union
intersection-strict:
intersection-strict
intersection-nonempty:
intersection-nonempty
- label
Feature class
- type
basic:string
- description
Feature class (3rd column in GFF file) to be used. All other features will be ignored.
- default
exon
- label
ID attribute
- type
basic:string
- description
GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
- default
gene_id
- label
Use name-ordered BAM file for counting reads
- type
basic:boolean
- description
Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time.
- required
False
- default
False
Output results
Cutadapt - STAR - RSEM (Diagenode CATS, paired-end)¶
-
data:workflow:rnaseq:rsem
workflow-custom-cutadapt-star-rsem-paired
(data:reads:fastq:paired reads, data:index:star star_index, data:index:expression expression_index, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax)[Source: v2.0.1]
This RNA-seq pipeline is configured to be used with the Diagenode CATS RNA-seq kits. It is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by cutadapt which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by STAR aligner. Finally, RSEM estimates gene and isoform expression levels from the aligned reads.
Input arguments
- label
NGS reads
- type
data:reads:fastq:paired
- label
STAR genome index
- type
data:index:star
- label
Gene expression indices
- type
data:index:expression
- label
Assay type
- type
basic:string
- description
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- default
no
- choices
Strand non-specific:
no
Strand-specific forward:
yes
Strand-specific reverse:
reverse
- label
Advanced
- type
basic:boolean
- default
False
- label
Remove non-cannonical junctions (Cufflinks compatibility)
- type
basic:boolean
- description
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- default
False
- label
Detect chimeric and circular alignments
- type
basic:boolean
- description
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- default
False
- label
–chimSegmentMin
- type
basic:integer
- disabled
!star.detect_chimeric.chimeric
- default
20
- label
Output in transcript coordinates
- type
basic:boolean
- description
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- default
True
- label
Allow soft-clipping and indels
- type
basic:boolean
- description
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Count reads
- type
basic:boolean
- description
With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Type of filtering
- type
basic:string
- description
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
- default
Normal
- choices
Normal:
Normal
BySJout:
BySJout
- label
–outFilterMultimapNmax
- type
basic:integer
- description
Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
- required
False
- label
–outFilterMismatchNmax
- type
basic:integer
- description
Alignment will be output only if it has fewer mismatches than this value (default: 10).
- required
False
- label
–outFilterMismatchNoverLmax
- type
basic:decimal
- description
Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
- required
False
- label
–alignSJoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required
False
- label
–alignSJDBoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required
False
- label
–alignIntronMin
- type
basic:integer
- description
Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required
False
- label
–alignIntronMax
- type
basic:integer
- description
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignMatesGapMax
- type
basic:integer
- description
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
Output results
Cutadapt - STAR - RSEM (Diagenode CATS, single-end)¶
-
data:workflow:rnaseq:rsem
workflow-custom-cutadapt-star-rsem-single
(data:reads:fastq:single reads, data:index:star star_index, data:index:expression expression_index, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax)[Source: v2.0.1]
This RNA-seq pipeline is configured to be used with the Diagenode CATS RNA-seq kits. It is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by cutadapt which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by STAR aligner. Finally, RSEM estimates gene and isoform expression levels from the aligned reads.
Input arguments
- label
NGS reads
- type
data:reads:fastq:single
- label
STAR genome index
- type
data:index:star
- label
Gene expression indices
- type
data:index:expression
- label
Assay type
- type
basic:string
- description
In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
- default
no
- choices
Strand non-specific:
no
Strand-specific forward:
yes
Strand-specific reverse:
reverse
- label
Advanced
- type
basic:boolean
- default
False
- label
Remove non-cannonical junctions (Cufflinks compatibility)
- type
basic:boolean
- description
It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
- default
False
- label
Detect chimeric and circular alignments
- type
basic:boolean
- description
To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
- default
False
- label
–chimSegmentMin
- type
basic:integer
- disabled
!star.detect_chimeric.chimeric
- default
20
- label
Output in transcript coordinates
- type
basic:boolean
- description
With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
- default
True
- label
Allow soft-clipping and indels
- type
basic:boolean
- description
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Count reads
- type
basic:boolean
- description
With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
- disabled
!star.t_coordinates.quantmode
- default
False
- label
Type of filtering
- type
basic:string
- description
Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
- default
Normal
- choices
Normal:
Normal
BySJout:
BySJout
- label
–outFilterMultimapNmax
- type
basic:integer
- description
Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
- required
False
- label
–outFilterMismatchNmax
- type
basic:integer
- description
Alignment will be output only if it has fewer mismatches than this value (default: 10).
- required
False
- label
–outFilterMismatchNoverLmax
- type
basic:decimal
- description
Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
- required
False
- label
–alignSJoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for spliced alignments (default: 5).
- required
False
- label
–alignSJDBoverhangMin
- type
basic:integer
- description
Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
- required
False
- label
–alignIntronMin
- type
basic:integer
- description
Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
- required
False
- label
–alignIntronMax
- type
basic:integer
- description
Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
- label
–alignMatesGapMax
- type
basic:integer
- description
Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
- required
False
Output results
Cutadapt - STAR - StringTie (Corall, paired-end)¶
-
data:workflow:rnaseq:corall
workflow-corall-paired
(data:reads:fastq:paired reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:boolean show_advanced, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string feature_class, basic:string id_attribute)[Source: v3.0.1]
RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:paired
- label
Genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Annotation
- type
data:annotation
- description
Genome annotation file (GTF).
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Reads quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
- required
False
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
- label
Feature class
- type
basic:string
- description
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- default
exon
- label
ID attribute
- type
basic:string
- description
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- default
gene_id
- choices
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
Output results
Cutadapt - STAR - StringTie (Corall, single-end)¶
-
data:workflow:rnaseq:corall
workflow-corall-single
(data:reads:fastq:single reads, data:index:star star_index, data:annotation annotation, data:index:star rrna_reference, data:index:star globin_reference, basic:boolean show_advanced, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, basic:string feature_class, basic:string id_attribute)[Source: v3.0.1]
RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.
Input arguments
- label
Select sample(s)
- type
data:reads:fastq:single
- label
Genome
- type
data:index:star
- description
Genome index prepared by STAR aligner indexing tool.
- label
Annotation
- type
data:annotation
- description
Genome annotation file (GTF).
- label
Indexed rRNA reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Indexed Globin reference sequence
- type
data:index:star
- description
Reference sequence index prepared by STAR aligner indexing tool.
- label
Show advanced parameters
- type
basic:boolean
- default
False
- label
Reads quality cutoff
- type
basic:integer
- description
Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
- required
False
- label
Number of reads
- type
basic:integer
- default
1000000
- label
Seed
- type
basic:integer
- default
11
- label
Fraction
- type
basic:decimal
- description
Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
- required
False
- label
2-pass mode
- type
basic:boolean
- description
Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
- default
False
- label
Feature class
- type
basic:string
- description
Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
- default
exon
- label
ID attribute
- type
basic:string
- description
GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
- default
gene_id
- choices
gene_id:
gene_id
transcript_id:
transcript_id
ID:
ID
geneid:
geneid
Output results
DESeq2¶
-
data:differentialexpression:deseq2:
differentialexpression-deseq2
(list:data:expression case, list:data:expression control, basic:boolean create_sets, basic:decimal logfc, basic:decimal fdr, basic:boolean beta_prior, basic:boolean count, basic:integer min_count_sum, basic:boolean cook, basic:decimal cooks_cutoff, basic:boolean independent, basic:decimal alpha)[Source: v3.2.2]
Run DESeq2 analysis. The DESeq2 package estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. See [here](https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf) and [here](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) for more information.
Input arguments
- label
Case
- type
list:data:expression
- description
Case samples (replicates)
- required
True
- hidden
False
- label
Control
- type
list:data:expression
- description
Control samples (replicates)
- required
True
- hidden
False
- label
Create gene sets
- type
basic:boolean
- description
After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.
- required
True
- hidden
False
- default
False
- label
Log2 fold change threshold for gene sets
- type
basic:decimal
- description
Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.
- required
True
- hidden
!create_sets
- default
1.0
- label
FDR threshold for gene sets
- type
basic:decimal
- required
True
- hidden
!create_sets
- default
0.05
- label
Beta prior
- type
basic:boolean
- description
Whether or not to put a zero-mean normal prior on the non-intercept coefficients.
- required
True
- hidden
False
- default
False
- label
Filter genes based on expression count
- type
basic:boolean
- required
True
- hidden
False
- default
True
- label
Minimum gene expression count summed over all samples
- type
basic:integer
- description
Filter genes in the expression matrix input. Remove genes where the expression count sum over all samples is below the threshold.
- required
True
- hidden
!filter_options.count
- default
10
- label
Filter genes based on Cook’s distance
- type
basic:boolean
- required
True
- hidden
False
- default
False
- label
Threshold on Cook’s distance
- type
basic:decimal
- description
If one or more samples have Cook’s distance larger than the threshold set here, the p-value for the row is set to NA. If left empty, the default threshold of 0.99 quantile of the F(p, m-p) distribution is used, where p is the number of coefficients being fitted and m is the number of samples. This test excludes Cook’s distance of samples belonging to experimental groups with only two samples.
- required
False
- hidden
!filter_options.cook
- label
Apply independent gene filtering
- type
basic:boolean
- required
True
- hidden
False
- default
False
- label
Significance cut-off used for optimizing independent gene filtering
- type
basic:decimal
- description
The value should be set to adjusted p-value cut-off (FDR).
- required
True
- hidden
!filter_options.independent
- default
0.1
Output results
- label
Differential expression
- type
basic:file
- required
True
- hidden
False
- label
Results table (JSON)
- type
basic:json
- required
True
- hidden
False
- label
Results table (file)
- type
basic:file
- required
True
- hidden
False
- label
Count matrix
- type
basic:file
- required
True
- hidden
False
- label
Gene ID database
- type
basic:string
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
- label
Feature type
- type
basic:string
- required
True
- hidden
False
Deeptools bamCoverage¶
-
data:coverage:bigwig:
scale-bigwig
(data:alignment:bam alignment, data:bedpe bedpe, basic:decimal scale)[Source: v1.1.1]
Creates a scaled BigWig file.
Input arguments
- label
Alignment BAM file
- type
data:alignment:bam
- required
True
- hidden
False
- label
BEDPE Normalization factor
- type
data:bedpe
- description
The BEDPE file describes disjoint genome features, such as structural variations or paired-end sequence alignments. It is used to estimate the scale factor.
- required
True
- hidden
False
- label
Scale for the normalization factor
- type
basic:decimal
- description
Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).
- required
True
- hidden
False
- default
10000
Output results
- label
bigwig file
- type
basic:file
- required
True
- hidden
False
- label
Species
- type
basic:string
- required
True
- hidden
False
- label
Build
- type
basic:string
- required
True
- hidden
False
Detect library strandedness¶
-
data:strandedness
library-strandedness
(data:reads:fastq reads, basic:integer read_number, data:index:salmon salmon_index)[Source: v0.4.1]
This process uses the Salmon transcript quantification tool to automatically infer the NGS library strandedness. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)
Input arguments
- label
Sequencing reads
- type
data:reads:fastq
- description
Sequencing reads in .fastq format. Both single and paired-end libraries are supported
- label
Number of input reads
- type
basic:integer
- description
Number of sequencing reads that are subsampled from each of the original .fastq files before library strand detection
- default
50000
- label
Transcriptome index file
- type
data:index:salmon
- description
Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results
Output results
- label
Library strandedness type
- type
basic:string
- description
The predicted library strandedness type. The codes U and IU indicate ‘strand non-specific’ library for single or paired-end reads, respectively. Codes SF and ISF correspond to the ‘strand-specific forward’ library, for the single or paired-end reads, respectively. For ‘strand-specific reverse’ library, the corresponding codes are SR and ISR. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)
- label
Compatible fragment ratio
- type
basic:decimal
- description
The ratio of fragments that support the predicted library strandedness type
- label
Log file
- type
basic:file
- description
Analysis log file.
Dictyostelium expressions¶
-
data:expression:polya
expression-dicty
(data:alignment:bam alignment, data:annotation:gff3 gff, data:mappability:bcm mappable)[Source: v1.4.1]
Dictyostelium-specific pipeline. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
Input arguments
- label
Aligned sequence
- type
data:alignment:bam
- label
Features (GFF3)
- type
data:annotation:gff3
- label
Mappability
- type
data:mappability:bcm
Output results
- label
Expression RPKUM (polyA)
- type
basic:file
- description
mRNA reads scaled by uniquely mappable part of exons.
- label
Expression RPKM (polyA)
- type
basic:file
- description
mRNA reads scaled by exon length.
- label
Read counts (polyA)
- type
basic:file
- description
mRNA reads uniquely mapped to gene exons.
- label
Expression RPKUM
- type
basic:file
- description
Reads scaled by uniquely mappable part of exons.
- label
Expression RPKM
- type
basic:file
- description
Reads scaled by exon length.
- label
Read counts (raw)
- type
basic:file
- description
Reads uniquely mapped to gene exons.
- label
Expression RPKUM (polyA) (json)
- type
basic:json
- label
Expression Type (default output)
- type
basic:string
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
- label
Feature type
- type
basic:string
Differential Expression (table)¶
-
data:differentialexpression:upload
upload-diffexp
(basic:file src, basic:string gene_id, basic:string logfc, basic:string fdr, basic:string logodds, basic:string fwer, basic:string pvalue, basic:string stat, basic:string source, basic:string species, basic:string build, basic:string feature_type, list:data:expression case, list:data:expression control)[Source: v1.4.1]
Upload Differential Expression table.
Input arguments
- label
Differential expression file
- type
basic:file
- description
Differential expression file. Supported file types: *.xls, *.xlsx, *.tab (tab-delimited file), *.diff. DE file must include columns with log2(fold change) and FDR or pval information. DE file must contain header row with column names. Accepts DESeq, DESeq2, edgeR and CuffDiff output files.
- validate_regex
\.(xls|xlsx|tab|tab.gz|diff|diff.gz)$
- label
Gene ID label
- type
basic:string
- label
LogFC label
- type
basic:string
- label
FDR label
- type
basic:string
- required
False
- label
LogOdds label
- type
basic:string
- required
False
- label
FWER label
- type
basic:string
- required
False
- label
Pvalue label
- type
basic:string
- required
False
- label
Statistics label
- type
basic:string
- required
False
- label
Gene ID database
- type
basic:string
- choices
AFFY:
AFFY
DICTYBASE:
DICTYBASE
ENSEMBL:
ENSEMBL
NCBI:
NCBI
UCSC:
UCSC
- label
Species
- type
basic:string
- description
Species latin name.
- choices
Homo sapiens:
Homo sapiens
Mus musculus:
Mus musculus
Rattus norvegicus:
Rattus norvegicus
Dictyostelium discoideum:
Dictyostelium discoideum
Odocoileus virginianus texanus:
Odocoileus virginianus texanus
Solanum tuberosum:
Solanum tuberosum
- label
Build
- type
basic:string
- description
Genome build or annotation version.
- label
Feature type
- type
basic:string
- default
gene
- choices
gene:
gene
transcript:
transcript
exon:
exon
- label
Case
- type
list:data:expression
- description
Case samples (replicates)
- required
False
- label
Control
- type
list:data:expression
- description
Control samples (replicates)
- required
False
Output results
- label
Differential expression
- type
basic:file
- label
Results table (JSON)
- type
basic:json
- label
Results table (file)
- type
basic:file
- label
Gene ID database
- type
basic:string
- label
Species
- type
basic:string
- label
Build
- type
basic:string
- label
Feature type
- type
basic:string
Differential expression of shRNA¶
-
data:shrna:differentialexpression:
differentialexpression-shrna
(data:file parameter_file, list:data:expression:shrna2quant: expression_data)[Source: v1.2.1]
Performing differential expression on a list of objects. Analysis starts by inputting a set of expression files (count matrices) and a parameter file. Parameter file is an xlsx file and consists of tabs: - `sample_key`: Should have column sample with exact sample name as input expression file(s), columns defining treatment and lastly a column which indicates replicate. - `contrasts`: Define groups which will be used to perform differential expression analysis. Model for DE uses these contrasts and replicate number. In R annotation, this would be ` ~ 1