Process definitions

ATAC-Seq

data:workflow:atacseqworkflow-atac-seq (data:reads:fastq  reads, data:index:bowtie2  genome, data:bed  promoter, basic:string  mode, basic:string  speed, basic:boolean  use_se, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:integer  trim_5, basic:integer  trim_3, basic:integer  trim_iter, basic:integer  trim_nucl, basic:string  rep_mode, basic:integer  k_reports, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:boolean  tagalign, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff)[Source: v3.0.1]

This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment, pre-peakcall QC, and calling peaks (with post-peakcall QC). First, reads are aligned to a genome using [Bowtie2](http://bowtie-bio.sourceforge.net/index.shtml) aligner. Next, pre-peakcall QC metrics are calculated. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). Finally, the peaks are called using [MACS2](https://github.com/taoliu/MACS/). The post-peakcall QC report includes additional QC metrics – number of peaks, fraction of reads in peaks (FRiP), number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq

genome
label

Genome

type

data:index:bowtie2

promoter
label

Promoter regions BED file

type

data:bed

description

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required

False

alignment.mode
label

Alignment mode

type

basic:string

description

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default

--local

choices

  • end to end mode: --end-to-end

  • local: --local

alignment.speed
label

Speed vs. Sensitivity

type

basic:string

default

--sensitive

choices

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

alignment.PE_options.use_se
label

Map as single-ended (for paired-end reads only)

type

basic:boolean

description

If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.

default

False

alignment.PE_options.discordantly
label

Report discordantly matched read

type

basic:boolean

description

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default

True

alignment.PE_options.rep_se
label

Report single ended

type

basic:boolean

description

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.

default

True

alignment.PE_options.minins
label

Minimal distance

type

basic:integer

description

The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.

default

0

alignment.PE_options.maxins
label

Maximal distance

type

basic:integer

description

The maximum fragment length for valid paired-end alignments.

default

2000

alignment.start_trimming.trim_5
label

Bases to trim from 5’

type

basic:integer

description

Number of bases to trim from from 5’ (left) end of each read before alignment.

default

0

alignment.start_trimming.trim_3
label

Bases to trim from 3’

type

basic:integer

description

Number of bases to trim from from 3’ (right) end of each read before alignment

default

0

alignment.trimming.trim_iter
label

Iterations

type

basic:integer

description

Number of iterations.

default

0

alignment.trimming.trim_nucl
label

Bases to trim

type

basic:integer

description

Number of bases to trim from 3’ end in each iteration.

default

2

alignment.reporting.rep_mode
label

Report mode

type

basic:string

description

Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments

default

def

choices

  • Default mode: def

  • -k mode: k

  • -a mode (very slow): a

alignment.reporting.k_reports
label

Number of reports (for -k mode only)

type

basic:integer

description

Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first.

default

5

prepeakqc_settings.q_threshold
label

Quality filtering threshold

type

basic:integer

default

30

prepeakqc_settings.n_sub
label

Number of reads to subsample

type

basic:integer

default

25000000

prepeakqc_settings.tn5
label

Tn5 shifting

type

basic:boolean

description

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default

True

prepeakqc_settings.shift
label

User-defined cross-correlation peak strandshift

type

basic:integer

description

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

default

0

settings.tagalign
label

Use tagAlign files

type

basic:boolean

description

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

default

True

settings.duplicates
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required

False

hidden

settings.tagalign

choices

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required

False

hidden

!settings.tagalign

default

all

choices

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label

Q-value cutoff

type

basic:decimal

description

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required

False

disabled

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required

False

disabled

settings.qvalue

hidden

settings.tagalign

settings.pvalue_prepeak
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled

settings.qvalue

hidden

!settings.tagalign || settings.qvalue

default

0.01

settings.cap_num
label

Cap number of peaks by taking top N peaks

type

basic:integer

description

To keep all peaks set value to 0.

disabled

settings.broad

default

300000

settings.mfold_lower
label

MFOLD range (lower limit)

type

basic:integer

description

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required

False

settings.mfold_upper
label

MFOLD range (upper limit)

type

basic:integer

description

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required

False

settings.slocal
label

Small local region

type

basic:integer

description

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required

False

settings.llocal
label

Large local region

type

basic:integer

description

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required

False

settings.extsize
label

extsize

type

basic:integer

description

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

default

150

settings.shift
label

Shift

type

basic:integer

description

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

default

-75

settings.band_width
label

Band width

type

basic:integer

description

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required

False

settings.nolambda
label

Use backgroud lambda as local lambda

type

basic:boolean

description

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default

False

settings.fix_bimodal
label

Turn on the auto paired-peak model process

type

basic:boolean

description

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default

False

settings.nomodel
label

Bypass building the shifting model

type

basic:boolean

description

While on, MACS will bypass building the shifting model.

hidden

settings.tagalign

default

False

settings.nomodel_prepeak
label

Bypass building the shifting model

type

basic:boolean

description

While on, MACS will bypass building the shifting model.

hidden

!settings.tagalign

default

True

settings.down_sample
label

Down-sample

type

basic:boolean

description

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default

False

settings.bedgraph
label

Save fragment pileup and control lambda

type

basic:boolean

description

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default

True

settings.spmr
label

Save signal per million reads for fragment pileup profiles

type

basic:boolean

disabled

settings.bedgraph === false

default

True

settings.call_summits
label

Call summits

type

basic:boolean

description

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default

True

settings.broad
label

Composite broad regions

type

basic:boolean

description

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled

settings.call_summits === true

default

False

settings.broad_cutoff
label

Broad cutoff

type

basic:decimal

description

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required

False

disabled

settings.call_summits === true || settings.broad !== true

Output results

Abstract alignment process

data:alignmentabstract-alignment ()[Source: v1.0.0]

Input arguments

Output results

bam
label

Alignment file

type

basic:file

bai
label

Alignment index BAI

type

basic:file

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Abstract annotation process

data:annotationabstract-annotation ()[Source: v1.0.0]

Input arguments

Output results

annot
label

Uploaded file

type

basic:file

source
label

Gene ID source

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Abstract bed process

data:bedabstract-bed ()[Source: v1.0.1]

Input arguments

Output results

bed
label

BED

type

basic:file

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Abstract differential expression process

data:differentialexpressionabstract-differentialexpression ()[Source: v1.0.0]

Input arguments

Output results

raw
label

Differential expression (gene level)

type

basic:file

de_json
label

Results table (JSON)

type

basic:json

de_file
label

Results table (file)

type

basic:file

source
label

Gene ID source

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

feature_type
label

Feature type

type

basic:string

Abstract expression process

data:expressionabstract-expression ()[Source: v1.0.0]

Input arguments

Output results

exp
label

Normalized expression

type

basic:file

rc
label

Read counts

type

basic:file

required

False

exp_json
label

Expression (json)

type

basic:json

exp_type
label

Expression type

type

basic:string

source
label

Gene ID source

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

feature_type
label

Feature type

type

basic:string

Accel Amplicon Pipeline

data:workflow:ampliconworkflow-accel (data:reads:fastq:paired  reads, data:seq:nucleotide  genome, data:index:bwa  bwa_index, data:masterfile:amplicon  master_file, data:seq:nucleotide  adapters, list:data:variants:vcf  known_indels, list:data:variants:vcf  known_vars, data:variants:vcf  dbsnp, basic:integer  mbq, basic:integer  stand_call_conf, basic:integer  min_bq, basic:integer  min_alt_bq, list:data:variants:vcf  known_vars_db, basic:decimal  af_threshold)[Source: v5.0.1]

Processing pipeline to analyse the Accel-Amplicon NGS panel data. The raw amplicon sequencing reads are quality trimmed using Trimmomatic. The quality of the raw and trimmed data is assesed using the FASTQC tool. Quality trimmed reads are aligned to a reference genome using BWA mem. Sequencing primers are removed from the aligned reads using Primerclip. Amplicon performance stats are calculated using Bedtools coveragebed and Picard CollectTargetedPcrMetrics programs. Prior to variant calling, the alignment file is preprocessed using the GATK IndelRealigner and BaseRecalibrator tools. GATK HaplotypeCaller and Lofreq tools are used to call germline variants. Called variants are annotated using the SnpEff tool. Finally, the amplicon performance metrics and identified variants data are used to generate the PDF analysis report.

Input arguments

reads
label

Input reads

type

data:reads:fastq:paired

genome
label

Genome sequence (FASTA)

type

data:seq:nucleotide

bwa_index
label

Genome index (BWA)

type

data:index:bwa

master_file
label

Experiment Master file

type

data:masterfile:amplicon

adapters
label

Adapters

type

data:seq:nucleotide

description

Provide an Illumina sequencing adapters file (.fasta) with adapters to be removed by Trimmomatic.

preprocess_bam.known_indels
label

Known indels

type

list:data:variants:vcf

preprocess_bam.known_vars
label

Known variants

type

list:data:variants:vcf

gatk.dbsnp
label

dbSNP

type

data:variants:vcf

gatk.mbq
label

Min Base Quality

type

basic:integer

description

Minimum base quality required to consider a base for calling.

default

20

gatk.stand_call_conf
label

Min call confidence threshold

type

basic:integer

description

The minimum phred-scaled confidence threshold at which variants should be called.

default

20

lofreq.min_bq
label

Min baseQ

type

basic:integer

description

Skip any base with baseQ smaller than the default value.

default

20

lofreq.min_alt_bq
label

Min alternate baseQ

type

basic:integer

description

Skip alternate bases with baseQ smaller than the default value.

default

20

var_annot.known_vars_db
label

Known variants

type

list:data:variants:vcf

report.af_threshold
label

Allele frequency threshold

type

basic:decimal

default

0.01

Output results

Align (BWA) and trim adapters

data:alignment:bam:bwatrimalign-bwa-trim (data:masterfile:amplicon  master_file, data:index:bwa  genome, data:reads:fastq  reads, basic:integer  seed_l, basic:integer  band_w, basic:decimal  re_seeding, basic:boolean  m, basic:integer  match, basic:integer  missmatch, basic:integer  gap_o, basic:integer  gap_e, basic:integer  clipping, basic:integer  unpaired_p, basic:boolean  report_all, basic:integer  report_tr)[Source: v2.1.1]

Align with BWA mem and trim the sam output. The process uses the memory-optimized Primertrim tool.

Input arguments

master_file
label

Master file

type

data:masterfile:amplicon

description

Amplicon experiment design file that holds the information about the primers to be removed.

genome
label

Reference genome

type

data:index:bwa

reads
label

Reads

type

data:reads:fastq

seed_l
label

Minimum seed length

type

basic:integer

description

Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.

default

19

band_w
label

Band width

type

basic:integer

description

Gaps longer than this will not be found.

default

100

re_seeding
label

Re-seeding factor

type

basic:decimal

description

Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

default

1.5

m
label

Mark shorter split hits as secondary

type

basic:boolean

description

Mark shorter split hits as secondary (for Picard compatibility)

default

False

scoring.match
label

Score of a match

type

basic:integer

default

1

scoring.missmatch
label

Mismatch penalty

type

basic:integer

default

4

scoring.gap_o
label

Gap open penalty

type

basic:integer

default

6

scoring.gap_e
label

Gap extension penalty

type

basic:integer

default

1

scoring.clipping
label

Clipping penalty

type

basic:integer

description

Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)

default

5

scoring.unpaired_p
label

Penalty for an unpaired read pair

type

basic:integer

description

Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty

default

9

reporting.report_all
label

Report all found alignments

type

basic:boolean

description

Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.

default

False

reporting.report_tr
label

Report threshold score

type

basic:integer

description

Don’t output alignment with score lower than defined number. This option only affects output.

default

30

Output results

bam
label

Alignment file

type

basic:file

description

Position sorted alignment

bai
label

Index BAI

type

basic:file

stats
label

Statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Alleyoop UTR Rates

data:alleyoop:utrrates:alleyoop-utr-rates (data:seq:nucleotide  ref_seq, data:bed  regions, data:alignment:bam:slamdunk  slamdunk, basic:integer  read_length)[Source: v1.2.1]

Run Alleyoop utrrates.

Input arguments

ref_seq
label

FASTA file containig sequences for aligning

type

data:seq:nucleotide

required

True

hidden

False

regions
label

BED file with coordinates of regions of interest

type

data:bed

required

True

hidden

False

slamdunk
label

Slamdunk results

type

data:alignment:bam:slamdunk

required

True

hidden

False

read_length
label

Maximum read length

type

basic:integer

description

Maximum length of reads in the input FASTQ file

required

True

hidden

False

default

150

Output results

report
label

Tab-separated file containing conversion rates on each region of interest

type

basic:file

required

True

hidden

False

plot
label

Region of interest conversion rate plot

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Alleyoop collapse

data:alleyoop:collapse:alleyoop-collapse (data:alignment:bam:slamdunk  slamdunk, basic:string  source)[Source: v1.2.1]

Run Alleyoop collapse tool on Slamdunk results.

Input arguments

slamdunk
label

Slamdunk results

type

data:alignment:bam:slamdunk

required

True

hidden

False

source
label

Gene ID source

type

basic:string

required

True

hidden

False

default

ENSEMBL

choices

  • ENSEMBL: ENSEMBL

  • UCSC: UCSC

Output results

tcount
label

Count report containing SLAMSeq statistics

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Alleyoop rates

data:alleyoop:rates:alleyoop-rates (data:seq:nucleotide  ref_seq, data:alignment:bam:slamdunk  slamdunk)[Source: v1.1.1]

Run Alleyoop rates.

Input arguments

ref_seq
label

FASTA file containig sequences for aligning

type

data:seq:nucleotide

required

True

hidden

False

slamdunk
label

Slamdunk results

type

data:alignment:bam:slamdunk

required

True

hidden

False

Output results

report
label

Tab-separated file containing the overall conversion rates

type

basic:file

required

True

hidden

False

plot
label

Overall conversion rate plot file

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Alleyoop snpeval

data:alleyoop:snpeval:alleyoop-snpeval (data:seq:nucleotide  ref_seq, data:bed  regions, data:alignment:bam:slamdunk  slamdunk, basic:integer  read_length)[Source: v1.2.1]

Run Alleyoop snpeval.

Input arguments

ref_seq
label

FASTA file containig sequences for aligning

type

data:seq:nucleotide

required

True

hidden

False

regions
label

BED file with coordinates of regions of interest

type

data:bed

required

True

hidden

False

slamdunk
label

Slamdunk results

type

data:alignment:bam:slamdunk

required

True

hidden

False

read_length
label

Maximum read length

type

basic:integer

description

Maximum length of reads in the input FASTQ file

required

True

hidden

False

default

150

Output results

report
label

Tab-separated file with read counts, T>C read counts and SNP indication

type

basic:file

required

True

hidden

False

plot
label

SNP evaluation plot

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Alleyoop summary

data:alleyoop:summary:alleyoop-summary (list:data:alignment:bam:slamdunk  slamdunk)[Source: v1.1.1]

Run Alleyoop summary.

Input arguments

slamdunk
label

Slamdunk results

type

list:data:alignment:bam:slamdunk

required

True

hidden

False

Output results

report
label

Tab-separated file with mapping statistics

type

basic:file

required

True

hidden

False

plot_data
label

PCA values of the samples based on T>C read counts in regions of interest.

type

basic:file

required

False

hidden

False

plot
label

PCA plot of the samples based on T>C read counts in regions of interest.

type

basic:file

required

False

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Amplicon report

data:report:ampliconamplicon-report (data:picard:coverage  pcr_metrics, data:coverage  coverage, data:masterfile:amplicon  master_file, list:data:snpeff  annot_vars, basic:decimal  af_threshold)[Source: v1.1.1]

Create amplicon report.

Input arguments

pcr_metrics
label

Picard TargetedPcrMetrics

type

data:picard:coverage

coverage
label

Coverage

type

data:coverage

master_file
label

Amplicon master file

type

data:masterfile:amplicon

annot_vars
label

Annotated variants (snpEff)

type

list:data:snpeff

af_threshold
label

Allele frequency threshold

type

basic:decimal

default

0.01

Output results

report
label

Report

type

basic:file

panel_name
label

Panel name

type

basic:string

stats
label

File with sample statistics

type

basic:file

amplicon_cov
label

Amplicon coverage file (nomergebed)

type

basic:file

variant_tables
label

Variant tabels (snpEff)

type

list:basic:file

Amplicon table

data:varianttable:ampliconamplicon-table (data:masterfile:amplicon  master_file, data:coverage  coverage, list:data:snpeff  annot_vars, basic:boolean  all_amplicons, basic:string  table_name)[Source: v1.2.1]

Create variant table for use together with the genome browser.

Input arguments

master_file
label

Master file

type

data:masterfile:amplicon

coverage
label

Amplicon coverage

type

data:coverage

annot_vars
label

Annotated variants

type

list:data:snpeff

all_amplicons
label

Report all amplicons

type

basic:boolean

default

False

table_name
label

Amplicon table name

type

basic:string

default

Amplicons containing variants

Output results

variant_table
label

Variant table

type

basic:json

Annotate novel splice junctions (regtools)

data:junctions:regtoolsregtools-junctions-annotate (data:seq:nucleotide  genome, data:annotation:gtf  annotation, data:alignment:bam:star  alignment_star, data:alignment:bam  alignment, data:bed  input_bed_junctions)[Source: v1.1.1]

Identify novel splice junctions by using regtools to annotate against a reference. The process accepts reference genome, reference genome annotation (GTF), and input with reads information (STAR aligment or reads aligned by any other aligner or junctions in BED12 format). If STAR aligner data is given as input, the process calculates BED12 file from STAR ‘SJ.out.tab’ file, and annotates all junctions with ‘regtools junctions annotate’ command. When reads are aligned by other aligner, junctions are extracted with ‘regtools junctions extract’ tool and then annotated with ‘junction annotate’ command. Third option allows user to provide directly BED12 file with junctions, which are then annotated. Finnally, annotated novel junctions are filtered in a separate output file. More information can be found in the [regtools manual](https://regtools.readthedocs.io/en/latest/).

Input arguments

genome
label

Reference genome

type

data:seq:nucleotide

annotation
label

Reference genome annotation (GTF)

type

data:annotation:gtf

alignment_star
label

STAR alignment

type

data:alignment:bam:star

description

Splice junctions detected by STAR aligner (SJ.out.tab STAR output file). Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.

required

False

alignment
label

Alignment

type

data:alignment:bam

description

Aligned reads from which splice junctions are going to be extracted. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.

required

False

input_bed_junctions
label

Junctions in BED12 format

type

data:bed

description

Splice junctions in BED12 format. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.

required

False

Output results

novel_splice_junctions
label

Table of annotated novel splice junctions

type

basic:file

splice_junctions
label

Table of annotated splice junctions

type

basic:file

novel_sj_bed
label

Novel splice junctions in BED format

type

basic:file

bed
label

Splice junctions in BED format

type

basic:file

novel_sj_bigbed_igv_ucsc
label

Novel splice junctions in BigBed format

type

basic:file

required

False

bigbed_igv_ucsc
label

Splice junctions in BigBed format

type

basic:file

required

False

novel_sj_tbi_jbrowse
label

Novel splice junctions bed tbi index for JBrowse

type

basic:file

tbi_jbrowse
label

Bed tbi index for JBrowse

type

basic:file

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Archive and make multi-sample report for amplicon data

data:archive:samples:ampliconamplicon-archive-multi-report (list:data  data, list:basic:string  fields, basic:boolean  j)[Source: v0.3.1]

Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names. Additionally, create multi-sample report for selected samples.

Input arguments

data
label

Data list

type

list:data

fields
label

Output file fields

type

list:basic:string

j
label

Junk paths

type

basic:boolean

description

Store just names of saved files (junk the path)

default

False

Output results

archive
label

Archive of selected samples and a heatmap comparing them

type

basic:file

Archive samples

data:archive:samplesarchive-samples (list:data  data, list:basic:string  fields, basic:boolean  j)[Source: v0.4.1]

Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names.

Input arguments

data
label

Data list

type

list:data

fields
label

Output file fields

type

list:basic:string

j
label

Junk paths

type

basic:boolean

description

Store just names of saved files (junk the path)

default

False

Output results

archive
label

Archive

type

basic:file

BAM file

data:alignment:bam:uploadupload-bam (basic:file  src, basic:string  species, basic:string  build)[Source: v1.6.1]

Import a BAM file (.bam), which is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

Input arguments

src
label

Mapping (BAM)

type

basic:file

description

A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.

validate_regex

\.(bam)$

species
label

Species

type

basic:string

description

Species latin name.

choices

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label

Build

type

basic:string

Output results

bam
label

Uploaded file

type

basic:file

bai
label

Index BAI

type

basic:file

stats
label

Alignment statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

BAM file and index

data:alignment:bam:uploadupload-bam-indexed (basic:file  src, basic:file  src2, basic:string  species, basic:string  build)[Source: v1.6.1]

Import a BAM file (.bam) and BAM index (.bam.bai). BAM file is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

Input arguments

src
label

Mapping (BAM)

type

basic:file

description

A mapping file in BAM format.

validate_regex

\.(bam)$

src2
label

bam index (*.bam.bai file)

type

basic:file

description

An index file of a BAM mapping file (ending with bam.bai).

validate_regex

\.(bam.bai)$

species
label

Species

type

basic:string

description

Species latin name.

choices

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label

Build

type

basic:string

Output results

bam
label

Uploaded file

type

basic:file

bai
label

Index BAI

type

basic:file

stats
label

Alignment statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

BBDuk (paired-end)

data:reads:fastq:paired:bbdukbbduk-paired (data:reads:fastq:paired  reads, basic:integer  min_length, basic:boolean  show_advanced, list:data:seq:nucleotide  sequences, list:basic:string  literal_sequences, basic:integer  kmer_length, basic:boolean  check_reverse_complements, basic:boolean  mask_middle_base, basic:integer  min_kmer_hits, basic:decimal  min_kmer_fraction, basic:decimal  min_coverage_fraction, basic:integer  hamming_distance, basic:integer  query_hamming_distance, basic:integer  edit_distance, basic:integer  hamming_distance2, basic:integer  query_hamming_distance2, basic:integer  edit_distance2, basic:boolean  forbid_N, basic:boolean  remove_if_either_bad, basic:boolean  find_best_match, basic:boolean  perform_error_correction, basic:string  k_trim, basic:string  k_mask, basic:boolean  mask_fully_covered, basic:integer  min_k, basic:string  quality_trim, basic:integer  trim_quality, basic:integer  trim_poly_A, basic:decimal  min_length_fraction, basic:integer  max_length, basic:integer  min_average_quality, basic:integer  min_average_quality_bases, basic:integer  min_base_quality, basic:integer  min_consecutive_bases, basic:integer  trim_pad, basic:boolean  trim_by_overlap, basic:boolean  strict_overlap, basic:integer  min_overlap, basic:integer  min_insert, basic:boolean  trim_pairs_evenly, basic:integer  force_trim_left, basic:integer  force_trim_right, basic:integer  force_trim_right2, basic:integer  force_trim_mod, basic:integer  restrict_left, basic:integer  restrict_right, basic:decimal  min_GC, basic:decimal  max_GC, basic:integer  maxns, basic:boolean  toss_junk, basic:boolean  chastity_filter, basic:boolean  barcode_filter, list:data:seq:nucleotide  barcode_files, list:basic:string  barcode_sequences, basic:integer  x_min, basic:integer  y_min, basic:integer  x_max, basic:integer  y_max, basic:decimal  entropy, basic:integer  entropy_window, basic:integer  entropy_k, basic:boolean  entropy_mask, basic:integer  min_base_frequency, basic:boolean  nogroup)[Source: v2.4.1]

BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

Input arguments

reads
label

Reads

type

data:reads:fastq:paired

min_length
label

Minimum length [minlength=10]

type

basic:integer

description

Reads shorter than the minimum length will be discarded after trimming.

default

10

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

reference.sequences
label

Sequences [ref]

type

list:data:seq:nucleotide

description

Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.

required

False

reference.literal_sequences
label

Literal sequences [literal]

type

list:basic:string

description

Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

default

[]

processing.kmer_length
label

Kmer length [k=27]

type

basic:integer

description

Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.

default

27

processing.check_reverse_complements
label

Look for reverse complements of kmers in addition to forward kmers [rcomp=t]

type

basic:boolean

default

True

processing.mask_middle_base
label

Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors [maskmiddle=t]

type

basic:boolean

default

True

processing.min_kmer_hits
label

Minimum number of kmer hits [minkmerhits=1]

type

basic:integer

description

Reads need at least this many matching kmers to be considered as matching the reference.

default

1

processing.min_kmer_fraction
label

Minimum kmer fraction [minkmerfraction=0.0]

type

basic:decimal

description

A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.

default

0.0

processing.min_coverage_fraction
label

Minimum coverage fraction [mincovfraction=0.0]

type

basic:decimal

description

A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.

default

0.0

processing.hamming_distance
label

Maximum Hamming distance for kmers (substitutions only) [hammingdistance=0]

type

basic:integer

default

0

processing.query_hamming_distance
label

Hamming distance for query kmers [qhdist=0]

type

basic:integer

default

0

processing.edit_distance
label

Maximum edit distance from reference kmers (substitutions and indels) [editdistance=0]

type

basic:integer

default

0

processing.hamming_distance2
label

Hamming distance for short kmers when looking for shorter kmers [hammingdistance2=0]

type

basic:integer

default

0

processing.query_hamming_distance2
label

Hamming distance for short query kmers when looking for shorter kmers [qhdist2=0]

type

basic:integer

default

0

processing.edit_distance2
label

Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers [editdistance2=0]

type

basic:integer

default

0

processing.forbid_N
label

Forbid matching of read kmers containing N [forbidn=f]

type

basic:boolean

description

By default, these will match a reference ‘A’ if ‘Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.

default

False

processing.remove_if_either_bad
label

Remove both sequences of a paired-end read, if either of them is to be removed [removeifeitherbad=t]

type

basic:boolean

default

True

processing.find_best_match
label

If multiple matches, associate read with sequence sharing most kmers [findbestmatch=t]

type

basic:boolean

default

True

processing.perform_error_correction
label

Perform error correction with BBMerge prior to kmer operations [ecco=f]

type

basic:boolean

default

False

operations.k_trim
label

Trimming protocol to remove bases matching reference kmers from reads [ktrim=f]

type

basic:string

default

f

choices

  • Don’t trim: f

  • Trim to the right: r

  • Trim to the left: l

operations.k_mask
label

Symbol to replace bases matching reference kmers [kmask=f]

type

basic:string

description

Allows any non-whitespace character other than t or f. Processes short kmers on both ends.

default

f

operations.mask_fully_covered
label

Only mask bases that are fully covered by kmers [maskfullycovered=f]

type

basic:boolean

default

False

operations.min_k
label

Look for shorter kmers at read tips down to this length when k-trimming or masking [mink=0]

type

basic:integer

description

-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.

default

-1

operations.quality_trim
label

Trimming protocol to remove bases with quality below the minimum average region quality from read ends [qtrim=f]

type

basic:string

description

Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.

default

f

choices

  • Trim neither end: f

  • Trim both ends: rl

  • Trim only right end: r

  • Trim only left end: l

  • Use sliding window: w

operations.trim_quality
label

Average quality below which to trim region [trimq=6]

type

basic:integer

description

Set trimming protocol to enable this parameter.

disabled

operations.quality_trim == ‘f’

default

6

operations.trim_poly_A
label

Minimum length of poly-A or poly-T tails to trim on either end of reads [trimpolya=0]

type

basic:integer

default

0

operations.min_length_fraction
label

Minimum length fraction [mlf=0.0]

type

basic:decimal

description

Reads shorter than this fraction of original length after trimming will be discarded.

default

0.0

operations.max_length
label

Maximum length [maxlength]

type

basic:integer

description

Reads longer than this after trimming will be discarded.

required

False

operations.min_average_quality
label

Minimum average quality [minavgquality=0]

type

basic:integer

description

Reads with average quality (after trimming) below this will be discarded.

default

0

operations.min_average_quality_bases
label

Number of initial bases to calculate minimum average quality from [maqb=0]

type

basic:integer

description

Used only if positive.

default

0

operations.min_base_quality
label

Minimum base quality below which reads are discarded after trimming [minbasequality=0]

type

basic:integer

default

0

operations.min_consecutive_bases
label

Minimum number of consecutive called bases [mcb=0]

type

basic:integer

default

0

operations.trim_pad
label

Number of bases to trim around matching kmers [tp=0]

type

basic:integer

default

0

operations.trim_by_overlap
label

Trim adapters based on where paired-end reads overlap [tbo=f]

type

basic:boolean

default

False

operations.strict_overlap
label

Adjust sensitivity in ‘Trim adapters based on where paired-end reads overlap’ mode [strictoverlap=t]

type

basic:boolean

default

True

operations.min_overlap
label

Minimum number of overlapping bases [minoverlap=14]

type

basic:integer

description

Require this many bases of overlap for detection.

default

14

operations.min_insert
label

Minimum insert size [mininsert=40]

type

basic:integer

description

Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.

default

40

operations.trim_pairs_evenly
label

Trim both sequences of paired-end reads to the minimum length of either sequence [tpe=f]

type

basic:boolean

default

False

operations.force_trim_left
label

Position from which to trim bases to the left [forcetrimleft=0]

type

basic:integer

default

0

operations.force_trim_right
label

Position from which to trim bases to the right [forcetrimright=0]

type

basic:integer

default

0

operations.force_trim_right2
label

Number of bases to trim from the right end [forcetrimright2=0]

type

basic:integer

default

0

operations.force_trim_mod
label

Modulo to right-trim reads [forcetrimmod=0]

type

basic:integer

description

Trim reads to the largest multiple of modulo.

default

0

operations.restrict_left
label

Number of leftmost bases to look in for kmer matches [restrictleft=0]

type

basic:integer

default

0

operations.restrict_right
label

Number of rightmosot bases to look in for kmer matches [restrictright=0]

type

basic:integer

default

0

operations.min_GC
label

Minimum GC content [mingc=0.0]

type

basic:decimal

description

Discard reads with lower GC content.

default

0.0

operations.max_GC
label

Maximum GC content [maxgc=1.0]

type

basic:decimal

description

Discard reads with higher GC content.

default

1.0

operations.maxns
label

Max Ns after trimming [maxns=-1]

type

basic:integer

description

If non-negative, reads with more Ns than this (after trimming) will be discarded.

default

-1

operations.toss_junk
label

Discard reads with invalid characters as bases [tossjunk=f]

type

basic:boolean

default

False

header_parsing.chastity_filter
label

Discard reads that fail Illumina chastity filtering [chastityfilter=f]

type

basic:boolean

description

Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.

default

False

header_parsing.barcode_filter
label

Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise [barcodefilter=f]

type

basic:boolean

description

A barcode must be the last part of the read header.

default

False

header_parsing.barcode_files
label

Barcode sequences [barcodes]

type

list:data:seq:nucleotide

required

False

header_parsing.barcode_sequences
label

Literal barcode sequences [barcodes]

type

list:basic:string

description

Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

default

[]

header_parsing.x_min
label

Minimum X coordinate [xmin=-1]

type

basic:integer

description

If positive, discard reads with a smaller X coordinate.

default

-1

header_parsing.y_min
label

Minimum Y coordinate [ymin=-1]

type

basic:integer

description

If positive, discard reads with a smaller Y coordinate.

default

-1

header_parsing.x_max
label

Maximum X coordinate [xmax=-1]

type

basic:integer

description

If positive, discard reads with a larger X coordinate.

default

-1

header_parsing.y_max
label

Maximum Y coordinate [ymax=-1]

type

basic:integer

description

If positive, discard reads with a larger Y coordinate.

default

-1

complexity.entropy
label

Minimum entropy [entropy=-1.0]

type

basic:decimal

description

Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.

default

-1.0

complexity.entropy_window
label

Length of sliding window used to calculate entropy [entropywindow=50]

type

basic:integer

description

To use the sliding window set minimum entropy in range between 0.0 and 1.0.

default

50

complexity.entropy_k
label

Length of kmers used to calcuate entropy [entropyk=5]

type

basic:integer

default

5

complexity.entropy_mask
label

Mask low-entropy parts of sequences with N instead of discarding [entropymask=f]

type

basic:boolean

default

False

complexity.min_base_frequency
label

Minimum base frequency [minbasefrequency=0]

type

basic:integer

default

0

fastqc.nogroup
label

Disable grouping of bases for reads >50bp [nogroup]

type

basic:boolean

description

All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.

default

False

Output results

fastq
label

Remaining upstream reads

type

list:basic:file

fastq2
label

Remaining downstream reads

type

list:basic:file

statistics
label

Statistics

type

list:basic:file

fastqc_url
label

Upstream quality control with FastQC

type

list:basic:file:html

fastqc_url2
label

Downstream quality control with FastQC

type

list:basic:file:html

fastqc_archive
label

Download upstream FastQC archive

type

list:basic:file

fastqc_archive2
label

Download downstream FastQC archive

type

list:basic:file

BBDuk (single-end)

data:reads:fastq:single:bbdukbbduk-single (data:reads:fastq:single  reads, basic:integer  min_length, basic:boolean  show_advanced, list:data:seq:nucleotide  sequences, list:basic:string  literal_sequences, basic:integer  kmer_length, basic:boolean  check_reverse_complements, basic:boolean  mask_middle_base, basic:integer  min_kmer_hits, basic:decimal  min_kmer_fraction, basic:decimal  min_coverage_fraction, basic:integer  hamming_distance, basic:integer  query_hamming_distance, basic:integer  edit_distance, basic:integer  hamming_distance2, basic:integer  query_hamming_distance2, basic:integer  edit_distance2, basic:boolean  forbid_N, basic:boolean  find_best_match, basic:string  k_trim, basic:string  k_mask, basic:boolean  mask_fully_covered, basic:integer  min_k, basic:string  quality_trim, basic:integer  trim_quality, basic:integer  trim_poly_A, basic:decimal  min_length_fraction, basic:integer  max_length, basic:integer  min_average_quality, basic:integer  min_average_quality_bases, basic:integer  min_base_quality, basic:integer  min_consecutive_bases, basic:integer  trim_pad, basic:integer  min_overlap, basic:integer  min_insert, basic:integer  force_trim_left, basic:integer  force_trim_right, basic:integer  force_trim_right2, basic:integer  force_trim_mod, basic:integer  restrict_left, basic:integer  restrict_right, basic:decimal  min_GC, basic:decimal  max_GC, basic:integer  maxns, basic:boolean  toss_junk, basic:boolean  chastity_filter, basic:boolean  barcode_filter, list:data:seq:nucleotide  barcode_files, list:basic:string  barcode_sequences, basic:integer  x_min, basic:integer  y_min, basic:integer  x_max, basic:integer  y_max, basic:decimal  entropy, basic:integer  entropy_window, basic:integer  entropy_k, basic:boolean  entropy_mask, basic:integer  min_base_frequency, basic:boolean  nogroup)[Source: v2.4.1]

BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

Input arguments

reads
label

Reads

type

data:reads:fastq:single

min_length
label

Minimum length [minlength=10]

type

basic:integer

description

Reads shorter than the minimum length will be discarded after trimming.

default

10

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

reference.sequences
label

Sequences [ref]

type

list:data:seq:nucleotide

description

Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.

required

False

reference.literal_sequences
label

Literal sequences [literal]

type

list:basic:string

description

Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

default

[]

processing.kmer_length
label

Kmer length [k=27]

type

basic:integer

description

Kmer length used for finding contaminants. Contaminants shorter than Kmer length will not be found. Kmer length must be at least 1.

default

27

processing.check_reverse_complements
label

Look for reverse complements of kmers in addition to forward kmers [rcomp=t]

type

basic:boolean

default

True

processing.mask_middle_base
label

Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors [maskmiddle=t]

type

basic:boolean

default

True

processing.min_kmer_hits
label

Minimum number of kmer hits [minkmerhits=1]

type

basic:integer

description

Reads need at least this many matching kmers to be considered matching the reference.

default

1

processing.min_kmer_fraction
label

Minimum kmer fraction [minkmerfraction=0.0]

type

basic:decimal

description

A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.

default

0.0

processing.min_coverage_fraction
label

Minimum coverage fraction [mincovfraction=0.0]

type

basic:decimal

description

A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.

default

0.0

processing.hamming_distance
label

Maximum Hamming distance for kmers (substitutions only) [hammingdistance=0]

type

basic:integer

default

0

processing.query_hamming_distance
label

Hamming distance for query kmers [qhdist=0]

type

basic:integer

default

0

processing.edit_distance
label

Maximum edit distance from reference kmers (substitutions and indels) [editdistance=0]

type

basic:integer

default

0

processing.hamming_distance2
label

Hamming distance for short kmers when looking for shorter kmers [hammingdistance2=0]

type

basic:integer

default

0

processing.query_hamming_distance2
label

Hamming distance for short query kmers when looking for shorter kmers [qhdist2=0]

type

basic:integer

default

0

processing.edit_distance2
label

Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers [editdistance2=0]

type

basic:integer

default

0

processing.forbid_N
label

Forbid matching of read kmers containing N [forbidn=f]

type

basic:boolean

description

By default, these will match a reference ‘A’ if ‘Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.

default

False

processing.find_best_match
label

If multiple matches, associate read with sequence sharing most kmers [findbestmatch=f]

type

basic:boolean

default

True

operations.k_trim
label

Trimming protocol to remove bases matching reference kmers from reads [ktrim=f]

type

basic:string

default

f

choices

  • Don’t trim: f

  • Trim to the right: r

  • Trim to the left: l

operations.k_mask
label

Symbol to replace bases matching reference kmers [kmask=f]

type

basic:string

description

Allows any non-whitespace character other than t or f. Processes short kmers on both ends.

default

f

operations.mask_fully_covered
label

Only mask bases that are fully covered by kmers [maskfullycovered=f]

type

basic:boolean

default

False

operations.min_k
label

Look for shorter kmers at read tips down to this length when k-trimming or masking [mink=0]

type

basic:integer

description

-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.

default

-1

operations.quality_trim
label

Trimming protocol to remove bases with quality below the minimum average region quality from read ends [qtrim=f]

type

basic:string

description

Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.

default

f

choices

  • Trim neither end: f

  • Trim both ends: rl

  • Trim only right end: r

  • Trim only left end: l

  • Use sliding window: w

operations.trim_quality
label

Average quality below which to trim region [trimq=6]

type

basic:integer

description

Set trimming protocol to enable this parameter.

disabled

operations.quality_trim == ‘f’

default

6

operations.trim_poly_A
label

Minimum length of poly-A or poly-T tails to trim on either end of reads [trimpolya=0]

type

basic:integer

default

0

operations.min_length_fraction
label

Minimum length fraction [mlf=0]

type

basic:decimal

description

Reads shorter than this fraction of original length after trimming will be discarded.

default

0.0

operations.max_length
label

Maximum length [maxlength]

type

basic:integer

description

Reads longer than this after trimming will be discarded.

required

False

operations.min_average_quality
label

Minimum average quality [minavgquality=0]

type

basic:integer

description

Reads with average quality (after trimming) below this will be discarded.

default

0

operations.min_average_quality_bases
label

Number of initial bases to calculate minimum average quality from [maqb=0]

type

basic:integer

description

Used only if positive.

default

0

operations.min_base_quality
label

Minimum base quality below which reads are discarded after trimming [minbasequality=0]

type

basic:integer

default

0

operations.min_consecutive_bases
label

Minimum number of consecutive called bases [mcb=0]

type

basic:integer

default

0

operations.trim_pad
label

Number of bases to trim around matching kmers [tp=0]

type

basic:integer

default

0

operations.min_overlap
label

Minimum number of overlapping bases [minoverlap=14]

type

basic:integer

description

Require this many bases of overlap for detection.

default

14

operations.min_insert
label

Minimum insert size [mininsert=40]

type

basic:integer

description

Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.

default

40

operations.force_trim_left
label

Position from which to trim bases to the left [forcetrimleft=0]

type

basic:integer

default

0

operations.force_trim_right
label

Position from which to trim bases to the right [forcetrimright=0]

type

basic:integer

default

0

operations.force_trim_right2
label

Number of bases to trim from the right end [forcetrimright2=0]

type

basic:integer

default

0

operations.force_trim_mod
label

Modulo to right-trim reads [forcetrimmod=0]

type

basic:integer

description

Trim reads to the largest multiple of modulo.

default

0

operations.restrict_left
label

Number of leftmost bases to look in for kmer matches [restrictleft=0]

type

basic:integer

default

0

operations.restrict_right
label

Number of rightmosot bases to look in for kmer matches [restricright=0]

type

basic:integer

default

0

operations.min_GC
label

Minimum GC content [mingc=0.0]

type

basic:decimal

description

Discard reads with lower GC content.

default

0.0

operations.max_GC
label

Maximum GC content [maxgc=1.0]

type

basic:decimal

description

Discard reads with higher GC content.

default

1.0

operations.maxns
label

Max Ns after trimming [maxns=-1]

type

basic:integer

description

If non-negative, reads with more Ns than this (after trimming) will be discarded.

default

-1

operations.toss_junk
label

Discard reads with invalid characters as bases [tossjunk=f]

type

basic:boolean

default

False

header_parsing.chastity_filter
label

Discard reads that fail Illumina chastity filtering [chastityfilter=f]

type

basic:boolean

description

Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.

default

False

header_parsing.barcode_filter
label

Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise [barcodefilter=f]

type

basic:boolean

description

A barcode must be the last part of the read header.

default

False

header_parsing.barcode_files
label

Barcode sequences [barcodes]

type

list:data:seq:nucleotide

required

False

header_parsing.barcode_sequences
label

Literal barcode sequences [barcodes]

type

list:basic:string

description

Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

default

[]

header_parsing.x_min
label

Minimum X coordinate [xmin=-1]

type

basic:integer

description

If positive, discard reads with a smaller X coordinate.

default

-1

header_parsing.y_min
label

Minimum Y coordinate [ymin=-1]

type

basic:integer

description

If positive, discard reads with a smaller Y coordinate.

default

-1

header_parsing.x_max
label

Maximum X coordinate [xmax=-1]

type

basic:integer

description

If positive, discard reads with a larger X coordinate.

default

-1

header_parsing.y_max
label

Maximum Y coordinate [ymax=-1]

type

basic:integer

description

If positive, discard reads with a larger Y coordinate.

default

-1

complexity.entropy
label

Minimum entropy [entropy=-1]

type

basic:decimal

description

Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.

default

-1.0

complexity.entropy_window
label

Length of sliding window used to calculate entropy [entropywindow=50]

type

basic:integer

description

To use the sliding window set minimum entropy in range between 0.0 and 1.0.

default

50

complexity.entropy_k
label

Length of kmers used to calcuate entropy [entropyk=5]

type

basic:integer

default

5

complexity.entropy_mask
label

Mask low-entropy parts of sequences with N instead of discarding [entropymask=f]

type

basic:boolean

default

False

complexity.min_base_frequency
label

Minimum base frequency [minbasefrequency=0]

type

basic:integer

default

0

fastqc.nogroup
label

Disable grouping of bases for reads >50bp [nogroup]

type

basic:boolean

description

All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.

default

False

Output results

fastq
label

Remaining reads

type

list:basic:file

statistics
label

Statistics

type

list:basic:file

fastqc_url
label

Quality control with FastQC

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive

type

list:basic:file

BBDuk - STAR - FeatureCounts (3’ mRNA-Seq, paired-end)

data:workflow:quant:featurecounts:pairedworkflow-bbduk-star-fc-quant-paired (data:reads:fastq:paired  reads, data:index:star  star_index, list:data:seq:nucleotide  adapters, data:annotation  annotation, basic:string  stranded, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, data:index:star  rrna_reference, data:index:star  globin_reference)[Source: v2.0.1]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.

Input arguments

reads
label

Paired-end reads

type

data:reads:fastq:paired

star_index
label

Star index

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

adapters
label

Adapters

type

list:data:seq:nucleotide

description

Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.

required

False

annotation
label

Annotation

type

data:annotation

stranded
label

Select the type of kit used for library preparation.

type

basic:string

choices

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.advanced.seed
label

Seed

type

basic:integer

default

11

downsampling.advanced.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.advanced.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

qc.rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

Output results

BBDuk - STAR - FeatureCounts (3’ mRNA-Seq, single-end)

data:workflow:quant:featurecounts:singleworkflow-bbduk-star-fc-quant-single (data:reads:fastq:single  reads, data:index:star  star_index, list:data:seq:nucleotide  adapters, data:annotation  annotation, basic:string  stranded, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, data:index:star  rrna_reference, data:index:star  globin_reference)[Source: v2.0.1]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.

Input arguments

reads
label

Input single-end reads

type

data:reads:fastq:single

star_index
label

Star index

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

adapters
label

Adapters

type

list:data:seq:nucleotide

description

Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.

required

False

annotation
label

Annotation

type

data:annotation

stranded
label

Select the type of kit used for library preparation.

type

basic:string

choices

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.advanced.seed
label

Seed

type

basic:integer

default

11

downsampling.advanced.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.advanced.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

qc.rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

Output results

BBDuk - STAR - HTSeq-count (paired-end)

data:workflow:rnaseq:htseq:pairedworkflow-bbduk-star-htseq-paired (data:reads:fastq:paired  reads, data:index:star  star_index, list:data:seq:nucleotide  adapters, data:annotation  annotation, basic:string  stranded)[Source: v2.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.nature.com/articles/nmeth.4106). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

Input arguments

reads
label

Paired-end reads

type

data:reads:fastq:paired

star_index
label

Star index

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

adapters
label

Adapters

type

list:data:seq:nucleotide

description

Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.

required

False

annotation
label

Annotation

type

data:annotation

stranded
label

Select the QuantSeq kit used for library preparation.

type

basic:string

choices

  • QuantSeq FWD: yes

  • QuantSeq REV: reverse

Output results

BBDuk - STAR - HTSeq-count (single-end)

data:workflow:rnaseq:htseq:singleworkflow-bbduk-star-htseq (data:reads:fastq:single  reads, data:index:star  star_index, list:data:seq:nucleotide  adapters, data:annotation  annotation, basic:string  stranded)[Source: v2.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.nature.com/articles/nmeth.4106). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

Input arguments

reads
label

Input single-end reads

type

data:reads:fastq:single

star_index
label

Star index

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

adapters
label

Adapters

type

list:data:seq:nucleotide

description

Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.

required

False

annotation
label

annotation

type

data:annotation

stranded
label

Select the QuantSeq kit used for library preparation.

type

basic:string

choices

  • QuantSeq FWD: yes

  • QuantSeq REV: reverse

Output results

BBDuk - STAR - featureCounts - QC (paired-end)

data:workflow:rnaseq:featurecounts:qcworkflow-bbduk-star-featurecounts-qc-paired (data:reads:fastq:paired  reads, list:data:seq:nucleotide  adapters, basic:boolean  show_advanced, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, data:index:star  genome, basic:boolean  show_advanced, basic:boolean  unstranded, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chimSegmentMin, basic:boolean  quantmode, basic:boolean  singleend, basic:boolean  gene_counts, basic:string  outFilterType, basic:integer  outFilterMultimapNmax, basic:integer  outFilterMismatchNmax, basic:decimal  outFilterMismatchNoverLmax, basic:integer  outFilterScoreMin, basic:integer  alignSJoverhangMin, basic:integer  alignSJDBoverhangMin, basic:integer  alignIntronMin, basic:integer  alignIntronMax, basic:integer  alignMatesGapMax, basic:string  alignEndsType, basic:string  outSAMunmapped, basic:string  outSAMattributes, basic:string  outSAMattrRGline, data:annotation  annotation, basic:boolean  show_advanced, basic:string  assay_type, data:index:salmon  cdna_index, basic:integer  n_reads, basic:string  feature_class, basic:string  feature_type, basic:string  id_attribute, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, data:index:star  rrna_reference, data:index:star  globin_reference)[Source: v2.0.1]

This RNA-seq pipeline is comprised of three steps preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.

Input arguments

preprocessing.reads
label

Reads

type

data:reads:fastq:paired

preprocessing.adapters
label

Adapters

type

list:data:seq:nucleotide

required

False

preprocessing.show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

preprocessing.custom_adapter_sequences
label

Custom adapter sequences [literal]

type

list:basic:string

description

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

hidden

!preprocessing.show_advanced

default

[]

preprocessing.kmer_length
label

K-mer length

type

basic:integer

description

K-mer length must be smaller or equal to the length of adapters.

hidden

!preprocessing.show_advanced

default

23

preprocessing.min_k
label

Minimum k-mer length at right end of reads used for trimming

type

basic:integer

disabled

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

hidden

!preprocessing.show_advanced

default

11

preprocessing.hamming_distance
label

Maximum Hamming distance for k-mers

type

basic:integer

hidden

!preprocessing.show_advanced

default

1

preprocessing.maxns
label

Max Ns after trimming [maxns=-1]

type

basic:integer

description

If non-negative, reads with more Ns than this (after trimming) will be discarded.

hidden

!preprocessing.show_advanced

default

-1

preprocessing.trim_quality
label

Quality below which to trim reads from the right end

type

basic:integer

description

Phred algorithm is used, which is more accurate than naive trimming.

hidden

!preprocessing.show_advanced

default

10

preprocessing.min_length
label

Minimum read length

type

basic:integer

description

Reads shorter than minimum read length after trimming are discarded.

hidden

!preprocessing.show_advanced

default

20

alignment.genome
label

Indexed reference genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

alignment.show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

alignment.unstranded
label

The data is unstranded

type

basic:boolean

description

For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, c ufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.

hidden

!alignment.show_advanced

default

False

alignment.noncannonical
label

Remove non-cannonical junctions (Cufflinks compatibility)

type

basic:boolean

description

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

hidden

!alignment.show_advanced

default

False

alignment.detect_chimeric.chimeric
label

Detect chimeric and circular alignments

type

basic:boolean

description

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

default

False

alignment.detect_chimeric.chimSegmentMin
label

–chimSegmentMin

type

basic:integer

disabled

detect_chimeric.chimeric != true

default

20

alignment.t_coordinates.quantmode
label

Output in transcript coordinates

type

basic:boolean

description

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

default

False

alignment.t_coordinates.singleend
label

Allow soft-clipping and indels

type

basic:boolean

description

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

disabled

t_coordinates.quantmode != true

default

False

alignment.t_coordinates.gene_counts
label

Count reads

type

basic:boolean

description

With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).

disabled

t_coordinates.quantmode != true

default

False

alignment.filtering.outFilterType
label

Type of filtering

type

basic:string

description

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab

default

Normal

choices

  • Normal: Normal

  • BySJout: BySJout

alignment.filtering.outFilterMultimapNmax
label

–outFilterMultimapNmax

type

basic:integer

description

Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).

required

False

alignment.filtering.outFilterMismatchNmax
label

–outFilterMismatchNmax

type

basic:integer

description

Alignment will be output only if it has fewer mismatches than this value (default: 10).

required

False

alignment.filtering.outFilterMismatchNoverLmax
label

–outFilterMismatchNoverLmax

type

basic:decimal

description

Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.

required

False

alignment.filtering.outFilterScoreMin
label

–outFilterScoreMin

type

basic:integer

description

Alignment will be output only if its score is higher than or equal to this value (default: 0).

required

False

alignment.alignment.alignSJoverhangMin
label

–alignSJoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required

False

alignment.alignment.alignSJDBoverhangMin
label

–alignSJDBoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required

False

alignment.alignment.alignIntronMin
label

–alignIntronMin

type

basic:integer

description

Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required

False

alignment.alignment.alignIntronMax
label

–alignIntronMax

type

basic:integer

description

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

alignment.alignment.alignMatesGapMax
label

–alignMatesGapMax

type

basic:integer

description

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

alignment.alignment.alignEndsType
label

–alignEndsType

type

basic:string

description

Type of read ends alignment (default: Local).

required

False

default

Local

choices

  • Local: Local

  • EndToEnd: EndToEnd

  • Extend5pOfRead1: Extend5pOfRead1

  • Extend5pOfReads12: Extend5pOfReads12

alignment.output_sam_bam.outSAMunmapped
label

–outSAMunmapped

type

basic:string

description

Output of unmapped reads in the SAM format.

required

False

default

None

choices

  • None: None

  • Within: Within

alignment.output_sam_bam.outSAMattributes
label

–outSAMattributes

type

basic:string

description

a string of desired SAM attributes, in the order desired for the output SAM.

required

False

default

Standard

choices

  • None: None

  • Standard: Standard

  • All: All

alignment.output_sam_bam.outSAMattrRGline
label

–outSAMattrRGline

type

basic:string

description

SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”

required

False

quantification.annotation
label

Annotation

type

data:annotation

quantification.show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

quantification.assay_type
label

Assay type

type

basic:string

description

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

hidden

!quantification.show_advanced

default

non_specific

choices

  • Strand non-specific: non_specific

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

  • Detect automatically: auto

quantification.cdna_index
label

cDNA index file

type

data:index:salmon

description

Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.

required

False

hidden

quantification.assay_type != ‘auto’

quantification.n_reads
label

Number of reads in subsampled alignment file

type

basic:integer

description

Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.

hidden

quantification.assay_type != ‘auto’

default

5000000

quantification.feature_class
label

Feature class

type

basic:string

description

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

hidden

!quantification.show_advanced

default

exon

quantification.feature_type
label

Feature type

type

basic:string

description

The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.

hidden

!quantification.show_advanced

default

gene

choices

  • gene: gene

  • transcript: transcript

quantification.id_attribute
label

ID attribute

type

basic:string

description

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID are considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

hidden

!quantification.show_advanced

default

gene_id

choices

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.advanced.seed
label

Seed

type

basic:integer

default

11

downsampling.advanced.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.advanced.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

qc.rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

Output results

BBDuk - STAR - featureCounts - QC (single-end)

data:workflow:rnaseq:featurecounts:qcworkflow-bbduk-star-featurecounts-qc-single (data:reads:fastq:single  reads, list:data:seq:nucleotide  adapters, basic:boolean  show_advanced, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, data:index:star  genome, basic:boolean  show_advanced, basic:boolean  unstranded, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chimSegmentMin, basic:boolean  quantmode, basic:boolean  singleend, basic:boolean  gene_counts, basic:string  outFilterType, basic:integer  outFilterMultimapNmax, basic:integer  outFilterMismatchNmax, basic:decimal  outFilterMismatchNoverLmax, basic:integer  outFilterScoreMin, basic:integer  alignSJoverhangMin, basic:integer  alignSJDBoverhangMin, basic:integer  alignIntronMin, basic:integer  alignIntronMax, basic:integer  alignMatesGapMax, basic:string  alignEndsType, basic:string  outSAMunmapped, basic:string  outSAMattributes, basic:string  outSAMattrRGline, data:annotation  annotation, basic:boolean  show_advanced, basic:string  assay_type, data:index:salmon  cdna_index, basic:integer  n_reads, basic:string  feature_class, basic:string  feature_type, basic:string  id_attribute, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, data:index:star  rrna_reference, data:index:star  globin_reference)[Source: v2.0.1]

This RNA-seq pipeline is comprised of three steps preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.

Input arguments

preprocessing.reads
label

Reads

type

data:reads:fastq:single

preprocessing.adapters
label

Adapters

type

list:data:seq:nucleotide

required

False

preprocessing.show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

preprocessing.custom_adapter_sequences
label

Custom adapter sequences [literal]

type

list:basic:string

description

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

hidden

!preprocessing.show_advanced

default

[]

preprocessing.kmer_length
label

K-mer length

type

basic:integer

description

K-mer length must be smaller or equal to the length of adapters.

hidden

!preprocessing.show_advanced

default

23

preprocessing.min_k
label

Minimum k-mer length at right end of reads used for trimming

type

basic:integer

disabled

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

hidden

!preprocessing.show_advanced

default

11

preprocessing.hamming_distance
label

Maximum Hamming distance for k-mers

type

basic:integer

hidden

!preprocessing.show_advanced

default

1

preprocessing.maxns
label

Max Ns after trimming [maxns=-1]

type

basic:integer

description

If non-negative, reads with more Ns than this (after trimming) will be discarded.

hidden

!preprocessing.show_advanced

default

-1

preprocessing.trim_quality
label

Quality below which to trim reads from the right end

type

basic:integer

description

Phred algorithm is used, which is more accurate than naive trimming.

hidden

!preprocessing.show_advanced

default

10

preprocessing.min_length
label

Minimum read length

type

basic:integer

description

Reads shorter than minimum read length after trimming are discarded.

hidden

!preprocessing.show_advanced

default

20

alignment.genome
label

Indexed reference genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

alignment.show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

alignment.unstranded
label

The data is unstranded

type

basic:boolean

description

For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, c ufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.

hidden

!alignment.show_advanced

default

False

alignment.noncannonical
label

Remove non-cannonical junctions (Cufflinks compatibility)

type

basic:boolean

description

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

hidden

!alignment.show_advanced

default

False

alignment.detect_chimeric.chimeric
label

Detect chimeric and circular alignments

type

basic:boolean

description

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

default

False

alignment.detect_chimeric.chimSegmentMin
label

–chimSegmentMin

type

basic:integer

disabled

detect_chimeric.chimeric != true

default

20

alignment.t_coordinates.quantmode
label

Output in transcript coordinates

type

basic:boolean

description

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

default

False

alignment.t_coordinates.singleend
label

Allow soft-clipping and indels

type

basic:boolean

description

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

disabled

t_coordinates.quantmode != true

default

False

alignment.t_coordinates.gene_counts
label

Count reads

type

basic:boolean

description

With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).

disabled

t_coordinates.quantmode != true

default

False

alignment.filtering.outFilterType
label

Type of filtering

type

basic:string

description

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab

default

Normal

choices

  • Normal: Normal

  • BySJout: BySJout

alignment.filtering.outFilterMultimapNmax
label

–outFilterMultimapNmax

type

basic:integer

description

Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).

required

False

alignment.filtering.outFilterMismatchNmax
label

–outFilterMismatchNmax

type

basic:integer

description

Alignment will be output only if it has fewer mismatches than this value (default: 10).

required

False

alignment.filtering.outFilterMismatchNoverLmax
label

–outFilterMismatchNoverLmax

type

basic:decimal

description

Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.

required

False

alignment.filtering.outFilterScoreMin
label

–outFilterScoreMin

type

basic:integer

description

Alignment will be output only if its score is higher than or equal to this value (default: 0).

required

False

alignment.alignment.alignSJoverhangMin
label

–alignSJoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required

False

alignment.alignment.alignSJDBoverhangMin
label

–alignSJDBoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required

False

alignment.alignment.alignIntronMin
label

–alignIntronMin

type

basic:integer

description

Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required

False

alignment.alignment.alignIntronMax
label

–alignIntronMax

type

basic:integer

description

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

alignment.alignment.alignMatesGapMax
label

–alignMatesGapMax

type

basic:integer

description

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

alignment.alignment.alignEndsType
label

–alignEndsType

type

basic:string

description

Type of read ends alignment (default: Local).

required

False

default

Local

choices

  • Local: Local

  • EndToEnd: EndToEnd

  • Extend5pOfRead1: Extend5pOfRead1

  • Extend5pOfReads12: Extend5pOfReads12

alignment.output_sam_bam.outSAMunmapped
label

–outSAMunmapped

type

basic:string

description

Output of unmapped reads in the SAM format.

required

False

default

None

choices

  • None: None

  • Within: Within

alignment.output_sam_bam.outSAMattributes
label

–outSAMattributes

type

basic:string

description

a string of desired SAM attributes, in the order desired for the output SAM.

required

False

default

Standard

choices

  • None: None

  • Standard: Standard

  • All: All

alignment.output_sam_bam.outSAMattrRGline
label

–outSAMattrRGline

type

basic:string

description

SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”

required

False

quantification.annotation
label

Annotation

type

data:annotation

quantification.show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

quantification.assay_type
label

Assay type

type

basic:string

description

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

hidden

!quantification.show_advanced

default

non_specific

choices

  • Strand non-specific: non_specific

  • Strand-specific forward: forward

  • Strand-specific reverse: reverse

  • Detect automatically: auto

quantification.cdna_index
label

cDNA index file

type

data:index:salmon

description

Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.

required

False

hidden

quantification.assay_type != ‘auto’

quantification.n_reads
label

Number of reads in subsampled alignment file

type

basic:integer

description

Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.

hidden

quantification.assay_type != ‘auto’

default

5000000

quantification.feature_class
label

Feature class

type

basic:string

description

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

hidden

!quantification.show_advanced

default

exon

quantification.feature_type
label

Feature type

type

basic:string

description

The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.

hidden

!quantification.show_advanced

default

gene

choices

  • gene: gene

  • transcript: transcript

quantification.id_attribute
label

ID attribute

type

basic:string

description

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

hidden

!quantification.show_advanced

default

gene_id

choices

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.advanced.seed
label

Seed

type

basic:integer

default

11

downsampling.advanced.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.advanced.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

qc.rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

Output results

BBDuk - Salmon - QC (paired-end)

data:workflow:rnaseq:salmonworkflow-bbduk-salmon-qc-paired (data:reads:fastq:paired  reads, data:index:salmon  salmon_index, data:index:star  genome, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:boolean  show_advanced, list:data:seq:nucleotide  adapters, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, basic:boolean  seq_bias, basic:boolean  gc_bias, basic:decimal  consensus_slack, basic:decimal  min_score_fraction, basic:integer  range_factorization_bins, basic:integer  min_assigned_frag, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v3.0.1]

Alignment-free RNA-seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:paired

salmon_index
label

Salmon index

type

data:index:salmon

genome
label

Indexed reference genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

annotation
label

Annotation

type

data:annotation

rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

preprocessing.adapters
label

Adapters

type

list:data:seq:nucleotide

required

False

preprocessing.custom_adapter_sequences
label

Custom adapter sequences [literal]

type

list:basic:string

description

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

default

[]

preprocessing.kmer_length
label

K-mer length

type

basic:integer

description

K-mer length must be smaller or equal to the length of adapters.

default

23

preprocessing.min_k
label

Minimum k-mer length at right end of reads used for trimming

type

basic:integer

disabled

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

default

11

preprocessing.hamming_distance
label

Maximum Hamming distance for k-mers

type

basic:integer

default

1

preprocessing.maxns
label

Max Ns after trimming [maxns=-1]

type

basic:integer

description

If non-negative, reads with more Ns than this (after trimming) will be discarded.

default

-1

preprocessing.trim_quality
label

Quality below which to trim reads from the right end

type

basic:integer

description

Phred algorithm is used, which is more accurate than naive trimming.

default

10

preprocessing.min_length
label

Minimum read length

type

basic:integer

description

Reads shorter than minimum read length after trimming are discarded.

default

20

quantification.seq_bias
label

Perform sequence-specific bias correction

type

basic:boolean

default

True

quantification.gc_bias
label

Perform fragment GC bias correction.

type

basic:boolean

default

True

quantification.consensus_slack
label

Consensus slack

type

basic:decimal

description

The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.

required

False

quantification.min_score_fraction
label

Minimum alignment score fraction

type

basic:decimal

description

The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].

default

0.65

quantification.range_factorization_bins
label

Range factorization bins

type

basic:integer

description

Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.

default

4

quantification.min_assigned_frag
label

Minimum number of assigned fragments

type

basic:integer

description

The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.

default

10

downsampling.n_reads
label

Number of reads

type

basic:integer

default

10000000

downsampling.seed
label

Seed

type

basic:integer

default

11

downsampling.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

Output results

BBDuk - Salmon - QC (single-end)

data:workflow:rnaseq:salmonworkflow-bbduk-salmon-qc-single (data:reads:fastq:single  reads, data:index:salmon  salmon_index, data:index:star  genome, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:boolean  show_advanced, list:data:seq:nucleotide  adapters, list:basic:string  custom_adapter_sequences, basic:integer  kmer_length, basic:integer  min_k, basic:integer  hamming_distance, basic:integer  maxns, basic:integer  trim_quality, basic:integer  min_length, basic:boolean  seq_bias, basic:boolean  gc_bias, basic:decimal  consensus_slack, basic:decimal  min_score_fraction, basic:integer  range_factorization_bins, basic:integer  min_assigned_frag, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v3.0.1]

Alignment-free RNA-seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

salmon_index
label

Salmon index

type

data:index:salmon

genome
label

Indexed reference genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

annotation
label

Annotation

type

data:annotation

rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

preprocessing.adapters
label

Adapters

type

list:data:seq:nucleotide

required

False

preprocessing.custom_adapter_sequences
label

Custom adapter sequences [literal]

type

list:basic:string

description

Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.

required

False

default

[]

preprocessing.kmer_length
label

K-mer length

type

basic:integer

description

K-mer length must be smaller or equal to the length of adapters.

default

23

preprocessing.min_k
label

Minimum k-mer length at right end of reads used for trimming

type

basic:integer

disabled

preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0

default

11

preprocessing.hamming_distance
label

Maximum Hamming distance for k-mers

type

basic:integer

default

1

preprocessing.maxns
label

Max Ns after trimming [maxns=-1]

type

basic:integer

description

If non-negative, reads with more Ns than this (after trimming) will be discarded.

default

-1

preprocessing.trim_quality
label

Quality below which to trim reads from the right end

type

basic:integer

description

Phred algorithm is used, which is more accurate than naive trimming.

default

10

preprocessing.min_length
label

Minimum read length

type

basic:integer

description

Reads shorter than minimum read length after trimming are discarded.

default

20

quantification.seq_bias
label

Perform sequence-specific bias correction

type

basic:boolean

default

True

quantification.gc_bias
label

Perform fragment GC bias correction.

type

basic:boolean

default

False

quantification.consensus_slack
label

Consensus slack

type

basic:decimal

description

The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 in selective alignment mode and 0 otherwise.

required

False

quantification.min_score_fraction
label

Minimum alignment score fraction

type

basic:decimal

description

The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].

default

0.65

quantification.range_factorization_bins
label

Range factorization bins

type

basic:integer

description

Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes.

default

4

quantification.min_assigned_frag
label

Minimum number of assigned fragments

type

basic:integer

description

The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.

default

10

downsampling.n_reads
label

Number of reads

type

basic:integer

default

10000000

downsampling.seed
label

Seed

type

basic:integer

default

11

downsampling.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

Output results

BED file

data:bedupload-bed (basic:file  src, basic:string  species, basic:string  build)[Source: v1.4.1]

Import a BED file (.bed) which is a tab-delimited text file that defines a feature track. It can have any file extension, but .bed is recommended. The BED file format is described on the [UCSC Genome Bioinformatics web site](http://genome.ucsc.edu/FAQ/FAQformat#format1).

Input arguments

src
label

BED file

type

basic:file

description

Upload BED file annotation track. The first three required BED fields are chrom, chromStart and chromEnd.

required

True

validate_regex

\.(bed|narrowPeak)$

species
label

Species

type

basic:string

description

Species latin name.

choices

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label

Genome build

type

basic:string

Output results

bed
label

BED file

type

basic:file

bed_jbrowse
label

Bgzip bed file for JBrowse

type

basic:file

tbi_jbrowse
label

Bed file index for Jbrowse

type

basic:file

species
label

Species

type

basic:string

build
label

Build

type

basic:string

BEDPE file

data:bedpe:upload-bedpe (basic:file  src, basic:string  species, basic:string  build)[Source: v1.2.1]

Upload BEDPE files.

Input arguments

src
label

Select BEDPE file to upload

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

choices

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label

Build

type

basic:string

required

True

hidden

False

Output results

bedpe
label

BEDPE file

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

BWA ALN

data:alignment:bam:bwaalnalignment-bwa-aln (data:index:bwa  genome, data:reads:fastq  reads, basic:integer  q, basic:boolean  use_edit, basic:integer  edit_value, basic:decimal  fraction, basic:boolean  seeds, basic:integer  seed_length, basic:integer  seed_dist)[Source: v2.3.1]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for Illumina sequence reads up to 100bp.

Input arguments

genome
label

Reference genome

type

data:index:bwa

reads
label

Reads

type

data:reads:fastq

q
label

Quality threshold

type

basic:integer

description

Parameter for dynamic read trimming.

default

0

use_edit
label

Use maximum edit distance (excludes fraction of missing alignments)

type

basic:boolean

default

False

edit_value
label

Maximum edit distance

type

basic:integer

hidden

!use_edit

default

5

fraction
label

Fraction of missing alignments

type

basic:decimal

description

The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.

hidden

use_edit

default

0.04

seeds
label

Use seeds

type

basic:boolean

default

False

seed_length
label

Seed length

type

basic:integer

description

Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.

hidden

!seeds

default

35

seed_dist
label

Seed maximum edit distance

type

basic:integer

hidden

!seeds

default

2

Output results

bam
label

Alignment file

type

basic:file

description

Position sorted alignment

bai
label

Index BAI

type

basic:file

unmapped
label

Unmapped reads

type

basic:file

required

False

stats
label

Statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

BWA MEM

data:alignment:bam:bwamemalignment-bwa-mem (data:index:bwa  genome, data:reads:fastq  reads, basic:integer  seed_l, basic:integer  band_w, basic:decimal  re_seeding, basic:boolean  m, basic:integer  match, basic:integer  missmatch, basic:integer  gap_o, basic:integer  gap_e, basic:integer  clipping, basic:integer  unpaired_p, basic:boolean  report_all, basic:integer  report_tr)[Source: v3.3.2]

BWA MEM is a read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW). See [here](http://bio-bwa.sourceforge.net/) for more information.

Input arguments

genome
label

Reference genome

type

data:index:bwa

reads
label

Reads

type

data:reads:fastq

seed_l
label

Minimum seed length

type

basic:integer

description

Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.

default

19

band_w
label

Band width

type

basic:integer

description

Gaps longer than this will not be found.

default

100

re_seeding
label

Re-seeding factor

type

basic:decimal

description

Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

default

1.5

m
label

Mark shorter split hits as secondary

type

basic:boolean

description

Mark shorter split hits as secondary (for Picard compatibility)

default

False

scoring.match
label

Score of a match

type

basic:integer

default

1

scoring.missmatch
label

Mismatch penalty

type

basic:integer

default

4

scoring.gap_o
label

Gap open penalty

type

basic:integer

default

6

scoring.gap_e
label

Gap extension penalty

type

basic:integer

default

1

scoring.clipping
label

Clipping penalty

type

basic:integer

description

Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)

default

5

scoring.unpaired_p
label

Penalty for an unpaired read pair

type

basic:integer

description

Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty

default

9

reporting.report_all
label

Report all found alignments

type

basic:boolean

description

Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.

default

False

reporting.report_tr
label

Report threshold score

type

basic:integer

description

Don’t output alignment with score lower than defined number. This option only affects output.

default

30

Output results

bam
label

Alignment file

type

basic:file

description

Position sorted alignment

bai
label

Index BAI

type

basic:file

unmapped
label

Unmapped reads

type

basic:file

required

False

stats
label

Statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

BWA SW

data:alignment:bam:bwaswalignment-bwa-sw (data:index:bwa  genome, data:reads:fastq  reads, basic:integer  match, basic:integer  missmatch, basic:integer  gap_o, basic:integer  gap_e)[Source: v2.3.1]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The paired-end mode only works for reads Illumina short-insert libraries.

Input arguments

genome
label

Reference genome

type

data:index:bwa

reads
label

Reads

type

data:reads:fastq

match
label

Score of a match

type

basic:integer

default

1

missmatch
label

Mismatch penalty

type

basic:integer

default

3

gap_o
label

Gap open penalty

type

basic:integer

default

5

gap_e
label

Gap extension penalty

type

basic:integer

default

2

Output results

bam
label

Alignment file

type

basic:file

description

Position sorted alignment

bai
label

Index BAI

type

basic:file

unmapped
label

Unmapped reads

type

basic:file

required

False

stats
label

Statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

BWA genome index

data:index:bwa:bwa-index (data:seq:nucleotide  ref_seq)[Source: v1.1.1]

Create BWA genome index.

Input arguments

ref_seq
label

Reference sequence (nucleotide FASTA)

type

data:seq:nucleotide

required

True

hidden

False

Output results

index
label

BWA index

type

basic:dir

required

True

hidden

False

fastagz
label

FASTA file (compressed)

type

basic:file

required

True

hidden

False

fasta
label

FASTA file

type

basic:file

required

True

hidden

False

fai
label

FASTA file index

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Bam split

data:alignment:bam:primarybam-split (data:alignment:bam  bam, data:sam:header  header, data:sam:header  header2)[Source: v0.6.1]

Split hybrid bam file into two bam files.

Input arguments

bam
label

Hybrid alignment bam

type

data:alignment:bam

header
label

Primary header sam file (optional)

type

data:sam:header

description

If no header file is provided, the headers will be extracted from the hybrid alignment bam file.

required

False

header2
label

Secondary header sam file (optional)

type

data:sam:header

description

If no header file is provided, the headers will be extracted from the hybrid alignment bam file.

required

False

Output results

bam
label

Uploaded file

type

basic:file

bai
label

Index BAI

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Bamclipper

data:alignment:bam:bamclipped:bamclipper (data:alignment:bam  alignment, data:bedpe  bedpe, basic:boolean  skip)[Source: v1.2.1]

Remove primer sequence from BAM alignments by soft-clipping. This process is a wrapper for bamclipper which can be found at https://github.com/tommyau/bamclipper.

Input arguments

alignment
label

Alignment BAM file

type

data:alignment:bam

required

True

hidden

False

bedpe
label

BEDPE file

type

data:bedpe

required

False

hidden

False

skip
label

Skip Bamclipper step

type

basic:boolean

description

Use this option to skip Bamclipper step.

required

True

hidden

False

default

False

Output results

bam
label

Clipped BAM file

type

basic:file

required

True

hidden

False

bai
label

Index of clipped BAM file

type

basic:file

required

True

hidden

False

stats
label

Alignment statistics

type

basic:file

required

True

hidden

False

bigwig
label

BigWig file

type

basic:file

required

False

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Bamliquidator

data:bam:plot:bamliquidatorbamliquidator (basic:string  analysis_type, list:data:alignment:bam  bam, basic:string  cell_type, basic:integer  bin_size, data:annotation:gtf  regions_gtf, data:bed  regions_bed, basic:integer  extension, basic:string  sense, basic:boolean  skip_plot, list:basic:string  black_list, basic:integer  threads)[Source: v0.3.1]

Set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.

Input arguments

analysis_type
label

Analysis type

type

basic:string

default

bin

choices

  • Bin mode: bin

  • Region mode: region

  • BED mode: bed

bam
label

BAM File

type

list:data:alignment:bam

cell_type
label

Cell type

type

basic:string

default

cell_type

bin_size
label

Bin size

type

basic:integer

description

Number of base pairs in each bin. The smaller the bin size the longer the runtime and the larger the data files. Default is 100000.

required

False

hidden

analysis_type != ‘bin’

regions_gtf
label

Region gff file / Annotation file (.gff|.gtf)

type

data:annotation:gtf

required

False

hidden

analysis_type != ‘region’

regions_bed
label

Region bed file / Annotation file (.bed)

type

data:bed

required

False

hidden

analysis_type != ‘bed’

extension
label

Extension

type

basic:integer

description

Extends reads by number of bp

default

200

sense
label

Mapping strand to gff file

type

basic:string

default

.

choices

  • Forward: +

  • Reverse: -

  • Both: .

skip_plot
label

Skip plot

type

basic:boolean

required

False

black_list
label

Black list

type

list:basic:string

description

One or more chromosome patterns to skip during bin liquidation. Default is to skip any chromosomes that contain any of the following substrings chrUn _random Zv9_ _hap.

required

False

threads
label

Threads

type

basic:integer

description

Number of threads to run concurrently during liquidation.

default

1

Output results

analysis_type
label

Analysis type

type

basic:string

hidden

True

output_dir
label

Output directory

type

basic:file

counts
label

Counts HDF5 file

type

basic:file

matrix
label

Matrix file

type

basic:file

required

False

hidden

analysis_type != ‘region’

summary
label

Summary file

type

basic:file:html

required

False

hidden

analysis_type != ‘bin’

Bamplot

data:bam:plot:bamplotbamplot (basic:string  genome, data:annotation:gtf  input_gff, basic:string  input_region, list:data:alignment:bam  bam, basic:integer  stretch_input, basic:string  color, basic:string  sense, basic:integer  extension, basic:boolean  rpm, basic:string  yscale, list:basic:string  names, basic:string  plot, basic:string  title, basic:string  scale, list:data:bed  bed, basic:boolean  multi_page)[Source: v1.4.1]

Plot a single locus from a bam.

Input arguments

genome
label

Genome

type

basic:string

choices

  • HG19: HG19

  • HG18: HG18

  • MM8: MM8

  • MM9: MM9

  • MM10: MM10

  • RN6: RN6

  • RN4: RN4

input_gff
label

Region string

type

data:annotation:gtf

description

Enter .gff file.

required

False

input_region
label

Region string

type

basic:string

description

Enter genomic region e.g. chr1:+:1-1000.

required

False

bam
label

Bam

type

list:data:alignment:bam

description

bam to plot from

required

False

stretch_input
label

Stretch-input

type

basic:integer

description

Stretch the input regions to a minimum length in bp, e.g. 10000 (for 10kb).

required

False

color
label

Color

type

basic:string

description

Enter a colon separated list of colors e.g. 255,0,0:255,125,0, default samples the rainbow.

default

255,0,0:255,125,0

sense
label

Sense

type

basic:string

description

Map to forward, reverse or’both strands. Default maps to both.

default

both

choices

  • Forward: forward

  • Reverse: reverse

  • Both: both

extension
label

Extension

type

basic:integer

description

Extends reads by n bp. Default value is 200bp.

default

200

rpm
label

rpm

type

basic:boolean

description

Normalizes density to reads per million (rpm) Default is False.

required

False

yscale
label

y scale

type

basic:string

description

Choose either relative or uniform y axis scaling. Default is relative scaling.

default

relative

choices

  • relative: relative

  • uniform: uniform

names
label

Names

type

list:basic:string

description

Enter a comma separated list of names for your bams.

required

False

plot
label

Single or multiple polt

type

basic:string

description

Choose either all lines on a single plot or multiple plots.

default

merge

choices

  • single: single

  • multiple: multiple

  • merge: merge

title
label

Title

type

basic:string

description

Specify a title for the output plot(s), default will be the coordinate region.

default

output

scale
label

Scale

type

basic:string

description

Enter a comma separated list of multiplicative scaling factors for your bams. Default is none.

required

False

bed
label

Bed

type

list:data:bed

description

Add a space-delimited list of bed files to plot.

required

False

multi_page
label

Multi page

type

basic:boolean

description

If flagged will create a new pdf for each region.

default

False

Output results

plot
label

region plot

type

basic:file

BaseQualityScoreRecalibrator

data:alignment:bam:bqsr:bqsr (data:alignment:bam  bam, data:seq:nucleotide  reference, list:data:variants:vcf  known_sites, data:bed  intervals, basic:string  read_group, basic:string  validation_stringency)[Source: v2.1.1]

A two pass process of BaseRecalibrator and ApplyBQSR from GATK. See GATK website for more information on BaseRecalibrator. It is possible to modify read group using GATK’s AddOrReplaceGroups through Replace read groups in BAM (``read_group``) input field.

Input arguments

bam
label

BAM file containing reads

type

data:alignment:bam

required

True

hidden

False

reference
label

Reference genome file

type

data:seq:nucleotide

required

True

hidden

False

known_sites
label

List of known sites of variation

type

list:data:variants:vcf

required

True

hidden

False

intervals
label

One or more genomic intervals over which to operate.

type

data:bed

description

This field is optional, but it can speed up the process by restricting calculations to specific genome regions.

required

False

hidden

False

read_group
label

Replace read groups in BAM

type

basic:string

description

Replace read groups in a BAM file.This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a “;”, e.g. “-ID=1;-LB=GENIALIS;-PL=ILLUMINA;-PU=BARCODE;-SM=SAMPLENAME1”. See tool’s documentation for more information on tag names. Note that PL, LB, PU and SM are require fields. See caveats of rewriting read groups in the documentation.

required

True

hidden

False

default

validation_stringency
label

Validation stringency

type

basic:string

description

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.

required

True

hidden

False

default

STRICT

choices

  • STRICT: STRICT

  • LENIENT: LENIENT

  • SILENT: SILENT

Output results

bam
label

Base quality score recalibrated BAM file

type

basic:file

required

True

hidden

False

bai
label

Index of base quality score recalibrated BAM file

type

basic:file

required

True

hidden

False

stats
label

Alignment statistics

type

basic:file

required

True

hidden

False

bigwig
label

BigWig file

type

basic:file

required

False

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

recal_table
label

Recalibration tabled

type

basic:file

required

True

hidden

False

BaseSpace file

data:filebasespace-file-import (basic:string  file_id, basic:secret  access_token_secret)[Source: v1.2.1]

Import a file from Illumina BaseSpace.

Input arguments

file_id
label

BaseSpace file ID

type

basic:string

access_token_secret
label

BaseSpace access token

type

basic:secret

description

BaseSpace access token secret handle needed to download the file.

Output results

file
label

File

type

basic:file

Bedtools bamtobed

data:bedpe:bedtools-bamtobed (data:alignment:bam  alignment)[Source: v1.1.1]

Takes in a BAM file and calculates a normalization factor in BEDPE format. Done by sorting with Samtools and transformed with Bedtools.

Input arguments

alignment
label

Alignment BAM file

type

data:alignment:bam

required

True

hidden

False

Output results

bedpe
label

BEDPE file

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Bisulfite conversion rate

data:wgbs:bsrate:bs-conversion-rate (data:alignment:bam:walt  mr, basic:boolean  skip, data:seq:nucleotide  sequence, basic:boolean  count_all, basic:integer  read_length, basic:decimal  max_mismatch, basic:boolean  a_rich)[Source: v1.1.1]

Estimate bisulfite conversion rate in a control set. The program bsrate included in [Methpipe] (https://github.com/smithlabcode/methpipe) will estimate the bisulfite conversion rate.

Input arguments

mr
label

Aligned reads from bisulfite sequencing

type

data:alignment:bam:walt

description

Bisulfite specifc alignment such as WALT is required as .mr file type is used. Duplicatesshould be removed to reduce any bias introduced by incomplete conversion on PCR duplicatereads.

required

True

hidden

False

skip
label

Skip Bisulfite conversion rate step

type

basic:boolean

description

Bisulfite conversion rate step can be skipped.

required

True

hidden

False

default

False

sequence
label

Unmethylated control sequence

type

data:seq:nucleotide

description

Separate unmethylated control sequence FASTA file is required to estimate bisulfiteconversion rate.

required

False

hidden

False

count_all
label

Count all cytosines including CpGs

type

basic:boolean

required

True

hidden

False

default

True

read_length
label

Average read length

type

basic:integer

required

True

hidden

False

default

150

max_mismatch
label

Maximum fraction of mismatches

type

basic:decimal

required

False

hidden

False

a_rich
label

Reads are A-rich

type

basic:boolean

required

True

hidden

False

default

False

Output results

report
label

Bisulfite conversion rate report

type

basic:file

required

True

hidden

False

Bowtie (Dicty)

data:alignment:bam:bowtie1alignment-bowtie (data:index:bowtie  genome, data:reads:fastq  reads, basic:string  mode, basic:integer  m, basic:integer  l, basic:boolean  use_se, basic:integer  trim_5, basic:integer  trim_3, basic:integer  trim_nucl, basic:integer  trim_iter, basic:string  r)[Source: v2.3.1]

An ultrafast memory-efficient short read aligner.

Input arguments

genome
label

Reference genome

type

data:index:bowtie

reads
label

Reads

type

data:reads:fastq

mode
label

Alignment mode

type

basic:string

description

When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq’s default policy. 1. Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the “seed”. 2. The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40. In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.

default

-n

choices

  • Use qualities (-n): -n

  • Use mismatches (-v): -v

m
label

Allowed mismatches

type

basic:integer

description

When used with “Use qualities (-n)” it is the maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/–seedlen). This may be 0, 1, 2 or 3 and the default is 2 When used with “Use mismatches (-v)” report alignments with at most <int> mismatches.

default

2

l
label

Seed length (for -n only)

type

basic:integer

description

Only for “Use qualities (-n)”. Seed length (-l) is the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.

default

28

use_se
label

Map as single-ended (for paired end reads only)

type

basic:boolean

description

If this option is selected paired-end reads will be mapped as single-ended.

default

False

start_trimming.trim_5
label

Bases to trim from 5’

type

basic:integer

description

Number of bases to trim from from 5’ (left) end of each read before alignment

default

0

start_trimming.trim_3
label

Bases to trim from 3’

type

basic:integer

description

Number of bases to trim from from 3’ (right) end of each read before alignment

default

0

trimming.trim_nucl
label

Bases to trim

type

basic:integer

description

Number of bases to trim from 3’ end in each iteration.

default

2

trimming.trim_iter
label

Iterations

type

basic:integer

description

Number of iterations.

default

0

reporting.r
label

Reporting mode

type

basic:string

description

Report up to <int> valid alignments per read or pair (-k) (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the –best and –strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/–offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

default

-a -m 1 --best --strata

choices

  • Report unique alignments: -a -m 1 --best --strata

  • Report all alignments: -a --best

  • Report all alignments in the best stratum: -a --best --strata

Output results

bam
label

Alignment file

type

basic:file

description

Position sorted alignment

bai
label

Index BAI

type

basic:file

unmapped
label

Unmapped reads

type

basic:file

required

False

stats
label

Statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Bowtie genome index

data:index:bowtie:bowtie-index (data:seq:nucleotide  ref_seq)[Source: v1.1.1]

Create Bowtie genome index.

Input arguments

ref_seq
label

Reference sequence (nucleotide FASTA)

type

data:seq:nucleotide

required

True

hidden

False

Output results

index
label

Bowtie index

type

basic:dir

required

True

hidden

False

fastagz
label

FASTA file (compressed)

type

basic:file

required

True

hidden

False

fasta
label

FASTA file

type

basic:file

required

True

hidden

False

fai
label

FASTA file index

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Bowtie2

data:alignment:bam:bowtie2alignment-bowtie2 (data:index:bowtie2  genome, data:reads:fastq  reads, basic:string  mode, basic:string  speed, basic:boolean  use_se, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  no_overlap, basic:boolean  dovetail, basic:integer  N, basic:integer  L, basic:integer  gbar, basic:string  mp, basic:string  rdg, basic:string  rfg, basic:string  score_min, basic:integer  trim_5, basic:integer  trim_3, basic:integer  trim_iter, basic:integer  trim_nucl, basic:string  rep_mode, basic:integer  k_reports, basic:boolean  no_unal, basic:integer  bw_binsize, basic:integer  bw_timeout)[Source: v2.5.1]

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small–typically about 2.2 GB for the human genome (2.9 GB for paired-end). See [here](http://bowtie-bio.sourceforge.net/index.shtml) for more information.

Input arguments

genome
label

Reference genome

type

data:index:bowtie2

reads
label

Reads

type

data:reads:fastq

mode
label

Alignment mode

type

basic:string

description

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default

--end-to-end

choices

  • end to end mode: --end-to-end

  • local: --local

speed
label

Speed vs. Sensitivity

type

basic:string

description

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

required

False

choices

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

PE_options.use_se
label

Map as single-ended (for paired-end reads only)

type

basic:boolean

description

If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.

default

False

PE_options.discordantly
label

Report discordantly matched read

type

basic:boolean

description

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default

True

PE_options.rep_se
label

Report single ended

type

basic:boolean

description

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.

default

True

PE_options.minins
label

Minimal distance

type

basic:integer

description

The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.

default

0

PE_options.maxins
label

Maximal distance

type

basic:integer

description

The maximum fragment length for valid paired-end alignments.

default

500

PE_options.no_overlap
label

Not concordant when mates overlap

type

basic:boolean

description

When true, it is considered not concordant when mates overlap at all. Defaul is false.

default

False

PE_options.dovetail
label

Dovetail

type

basic:boolean

description

If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment.

default

False

alignment_options.N
label

Number of mismatches allowed in seed alignment (N)

type

basic:integer

description

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

required

False

alignment_options.L
label

Length of seed substrings (L)

type

basic:integer

description

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.

required

False

alignment_options.gbar
label

Disallow gaps within positions (gbar)

type

basic:integer

description

Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.

required

False

alignment_options.mp
label

Maximal and minimal mismatch penalty (mp)

type

basic:string

description

Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.

required

False

alignment_options.rdg
label

Set read gap open and extend penalties (rdg)

type

basic:string

description

Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.

required

False

alignment_options.rfg
label

Set reference gap open and close penalties (rfg)

type

basic:string

description

Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.

required

False

alignment_options.score_min
label

Minimum alignment score needed for “valid” alignment (score_min)

type

basic:string

description

Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.

required

False

start_trimming.trim_5
label

Bases to trim from 5’

type

basic:integer

description

Number of bases to trim from from 5’ (left) end of each read before alignment

default

0

start_trimming.trim_3
label

Bases to trim from 3’

type

basic:integer

description

Number of bases to trim from from 3’ (right) end of each read before alignment

default

0

trimming.trim_iter
label

Iterations

type

basic:integer

description

Number of iterations.

default

0

trimming.trim_nucl
label

Bases to trim

type

basic:integer

description

Number of bases to trim from 3’ end in each iteration.

default

2

reporting.rep_mode
label

Report mode

type

basic:string

description

Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments

default

def

choices

  • Default mode: def

  • -k mode: k

  • -a mode (very slow): a

reporting.k_reports
label

Number of reports (for -k mode only)

type

basic:integer

description

Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. default: 5

default

5

output_opts.no_unal
label

Suppress SAM records for unaligned reads

type

basic:boolean

description

When true, suppress SAM records for unaligned reads. Default is false.

default

False

misc_opts.bw_binsize
label

BigWig bin size

type

basic:integer

description

Size of the bins, in bases, for the output of the bigwig/bedgraph file. Default is 50.

default

50

misc_opts.bw_timeout
label

BigWig timeout (s)

type

basic:integer

description

Time, in seconds, before creation of BigWig file is stopped. Default is 480 seconds.

default

480

Output results

bam
label

Alignment file

type

basic:file

description

Position sorted alignment

bai
label

Index BAI

type

basic:file

unmapped
label

Unmapped reads

type

basic:file

required

False

stats
label

Statistics

type

basic:file

bigwig
label

BigWig file

type

basic:file

required

False

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Bowtie2 genome index

data:index:bowtie2:bowtie2-index (data:seq:nucleotide  ref_seq)[Source: v1.1.1]

Create Bowtie2 genome index.

Input arguments

ref_seq
label

Reference sequence (nucleotide FASTA)

type

data:seq:nucleotide

required

True

hidden

False

Output results

index
label

Bowtie2 index

type

basic:dir

required

True

hidden

False

fastagz
label

FASTA file (compressed)

type

basic:file

required

True

hidden

False

fasta
label

FASTA file

type

basic:file

required

True

hidden

False

fai
label

FASTA file index

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Cell Ranger Count

data:scexpression:10x:cellranger-count (data:screads:10x:  reads, data:genomeindex:10x:  genome_index, basic:string  chemistry, basic:integer  trim_r1, basic:integer  trim_r2, basic:integer  expected_cells, basic:integer  force_cells)[Source: v1.1.1]

Perform gene expression analysis. Generate single cell feature counts for a single library. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count

Input arguments

reads
label

10x reads data object

type

data:screads:10x:

required

True

hidden

False

genome_index
label

10x genome index data object

type

data:genomeindex:10x:

required

True

hidden

False

chemistry
label

Chemistry

type

basic:string

description

Assay configuration. By default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection.

required

False

hidden

False

default

auto

choices

  • auto: auto

  • threeprime: Single Cell 3'

  • fiveprime: Single Cell 5'

  • SC3Pv1: Single Cell 3' v1

  • SC3Pv2: Single Cell 3' v2

  • SC3Pv3: Single Cell 3' v3

  • C5P-PE: Single Cell 5' paired-end

  • SC5P-R2: Single Cell 5' R2-only

trim_r1
label

Trim R1

type

basic:integer

description

Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3’ v2 or Single Cell 5’. This and “Trim R2” are useful for determining the optimal read length for sequencing.

required

False

hidden

False

trim_r2
label

Trim R2

type

basic:integer

description

Hard-trim the input R2 sequence to this length.

required

False

hidden

False

expected_cells
label

Expected number of recovered cells

type

basic:integer

required

True

hidden

False

default

3000

force_cells
label

Force cell number

type

basic:integer

description

Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.

required

False

hidden

False

Output results

matrix_filtered
label

Matrix (filtered)

type

basic:file

required

True

hidden

False

genes_filtered
label

Genes (filtered)

type

basic:file

required

True

hidden

False

barcodes_filtered
label

Barcodes (filtered)

type

basic:file

required

True

hidden

False

matrix_raw
label

Matrix (raw)

type

basic:file

required

True

hidden

False

genes_raw
label

Genes (raw)

type

basic:file

required

True

hidden

False

barcodes_raw
label

Barcodes (raw)

type

basic:file

required

True

hidden

False

report
label

Report

type

basic:file:html

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

source
label

Gene ID source

type

basic:string

required

True

hidden

False

Cell Ranger Mkref

data:genomeindex:10x:cellranger-mkref (data:seq:nucleotide:  genome, data:annotation:gtf:  annotation)[Source: v2.1.1]

Reference preparation tool for 10x Genomics Cell Ranger. Build a Cell Ranger-compatible reference from genome FASTA and gene GTF files. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references

Input arguments

genome
label

Reference genome

type

data:seq:nucleotide:

required

True

hidden

False

annotation
label

Annotation

type

data:annotation:gtf:

required

True

hidden

False

Output results

genome_index
label

Indexed genome

type

basic:dir

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

source
label

Gene ID source

type

basic:string

required

True

hidden

False

ChIP-Seq (Gene Score)

data:chipseq:genescorechipseq-genescore (data:chipseq:peakscore  peakscore, basic:decimal  fdr, basic:decimal  pval, basic:decimal  logratio)[Source: v1.2.1]

Chip-Seq analysis - Gene Score (BCM)

Input arguments

peakscore
label

PeakScore file

type

data:chipseq:peakscore

description

PeakScore file

fdr
label

FDR threshold

type

basic:decimal

description

FDR threshold value (default = 0.00005).

default

5e-05

pval
label

Pval threshold

type

basic:decimal

description

Pval threshold value (default = 0.00005).

default

5e-05

logratio
label

Log-ratio threshold

type

basic:decimal

description

Log-ratio threshold value (default = 2).

default

2.0

Output results

genescore
label

Gene Score

type

basic:file

ChIP-Seq (Peak Score)

data:chipseq:peakscorechipseq-peakscore (data:chipseq:callpeak:macs2  peaks, data:bed  bed)[Source: v2.2.1]

Chip-Seq analysis - Peak Score (BCM)

Input arguments

peaks
label

MACS2 results

type

data:chipseq:callpeak:macs2

description

MACS2 results file (NarrowPeak)

bed
label

BED file

type

data:bed

Output results

peak_score
label

Peak Score

type

basic:file

ChIP-seq (MACS2)

data:chipseq:batch:macs2macs2-batch (list:data:alignment:bam  alignments, data:bed  promoter, basic:boolean  advanced, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  profile_window, basic:string  shift_size)[Source: v1.4.2]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

Input arguments

alignments
label

Aligned reads

type

list:data:alignment:bam

description

Select multiple treatment/background samples.

promoter
label

Promoter regions BED file

type

data:bed

description

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required

False

advanced
label

Show advanced options

type

basic:boolean

description

Inspect and modify parameters.

default

False

tagalign
label

Use tagAlign files

type

basic:boolean

description

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

hidden

!advanced

default

True

prepeakqc_settings.q_threshold
label

Quality filtering threshold

type

basic:integer

default

30

prepeakqc_settings.n_sub
label

Number of reads to subsample

type

basic:integer

default

15000000

prepeakqc_settings.tn5
label

Tn5 shifting

type

basic:boolean

description

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default

False

prepeakqc_settings.shift
label

User-defined cross-correlation peak strandshift

type

basic:integer

description

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required

False

settings.duplicates
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required

False

hidden

tagalign

choices

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required

False

hidden

!tagalign

default

all

choices

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label

Q-value cutoff

type

basic:decimal

description

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required

False

disabled

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required

False

disabled

settings.qvalue

hidden

tagalign

settings.pvalue_prepeak
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled

settings.qvalue

hidden

!tagalign || settings.qvalue

default

1e-05

settings.cap_num
label

Cap number of peaks by taking top N peaks

type

basic:integer

description

To keep all peaks set value to 0.

disabled

settings.broad

default

500000

settings.mfold_lower
label

MFOLD range (lower limit)

type

basic:integer

description

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required

False

settings.mfold_upper
label

MFOLD range (upper limit)

type

basic:integer

description

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required

False

settings.slocal
label

Small local region

type

basic:integer

description

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required

False

settings.llocal
label

Large local region

type

basic:integer

description

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required

False

settings.extsize
label

extsize

type

basic:integer

description

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required

False

settings.shift
label

Shift

type

basic:integer

description

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required

False

settings.band_width
label

Band width

type

basic:integer

description

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required

False

settings.nolambda
label

Use backgroud lambda as local lambda

type

basic:boolean

description

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default

False

settings.fix_bimodal
label

Turn on the auto paired-peak model process

type

basic:boolean

description

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default

False

settings.nomodel
label

Bypass building the shifting model

type

basic:boolean

description

While on, MACS will bypass building the shifting model.

hidden

tagalign

default

False

settings.nomodel_prepeak
label

Bypass building the shifting model

type

basic:boolean

description

While on, MACS will bypass building the shifting model.

hidden

!tagalign

default

True

settings.down_sample
label

Down-sample

type

basic:boolean

description

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default

False

settings.bedgraph
label

Save fragment pileup and control lambda

type

basic:boolean

description

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default

True

settings.spmr
label

Save signal per million reads for fragment pileup profiles

type

basic:boolean

disabled

settings.bedgraph === false

default

True

settings.call_summits
label

Call summits

type

basic:boolean

description

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default

False

settings.broad
label

Composite broad regions

type

basic:boolean

description

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled

settings.call_summits === true

default

False

settings.broad_cutoff
label

Broad cutoff

type

basic:decimal

description

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required

False

disabled

settings.call_summits === true || settings.broad !== true

chipqc_settings.blacklist
label

Blacklist regions

type

data:bed

description

BED file containing genomic regions that should be excluded from the analysis.

required

False

chipqc_settings.calculate_enrichment
label

Calculate enrichment

type

basic:boolean

description

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

default

False

chipqc_settings.profile_window
label

Window size

type

basic:integer

description

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

default

400

chipqc_settings.shift_size
label

Shift size

type

basic:string

description

Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end

default

1:300

Output results

ChIP-seq (MACS2-ROSE2)

data:chipseq:batch:macs2macs2-rose2-batch (list:data:alignment:bam  alignments, data:bed  promoter, basic:boolean  advanced, basic:boolean  tagalign, basic:integer  q_threshold, basic:integer  n_sub, basic:boolean  tn5, basic:integer  shift, basic:string  duplicates, basic:string  duplicates_prepeak, basic:decimal  qvalue, basic:decimal  pvalue, basic:decimal  pvalue_prepeak, basic:integer  cap_num, basic:integer  mfold_lower, basic:integer  mfold_upper, basic:integer  slocal, basic:integer  llocal, basic:integer  extsize, basic:integer  shift, basic:integer  band_width, basic:boolean  nolambda, basic:boolean  fix_bimodal, basic:boolean  nomodel, basic:boolean  nomodel_prepeak, basic:boolean  down_sample, basic:boolean  bedgraph, basic:boolean  spmr, basic:boolean  call_summits, basic:boolean  broad, basic:decimal  broad_cutoff, basic:boolean  use_filtered_bam, basic:integer  tss, basic:integer  stitch, data:bed  mask, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  profile_window, basic:string  shift_size)[Source: v1.4.2]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

Input arguments

alignments
label

Aligned reads

type

list:data:alignment:bam

description

Select multiple treatment/background samples.

promoter
label

Promoter regions BED file

type

data:bed

description

BED file containing promoter regions (TSS+-1000 bp for example). Needed to get the number of peaks and reads mapped to promoter regions.

required

False

advanced
label

Show advanced options

type

basic:boolean

description

Inspect and modify parameters.

default

False

tagalign
label

Use tagAlign files

type

basic:boolean

description

Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.

hidden

!advanced

default

True

prepeakqc_settings.q_threshold
label

Quality filtering threshold

type

basic:integer

default

30

prepeakqc_settings.n_sub
label

Number of reads to subsample

type

basic:integer

default

15000000

prepeakqc_settings.tn5
label

Tn5 shifting

type

basic:boolean

description

Tn5 transposon shifting. Shift reads on “+” strand by 4 bp and reads on “-” strand by 5 bp.

default

False

prepeakqc_settings.shift
label

User-defined cross-correlation peak strandshift

type

basic:integer

description

If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.

required

False

settings.duplicates
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required

False

hidden

tagalign

choices

  • 1: 1

  • auto: auto

  • all: all

settings.duplicates_prepeak
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

required

False

hidden

!tagalign

default

all

choices

  • 1: 1

  • auto: auto

  • all: all

settings.qvalue
label

Q-value cutoff

type

basic:decimal

description

The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.

required

False

disabled

settings.pvalue && settings.pvalue_prepeak

settings.pvalue
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

required

False

disabled

settings.qvalue

hidden

tagalign

settings.pvalue_prepeak
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.

disabled

settings.qvalue

hidden

!tagalign || settings.qvalue

default

1e-05

settings.cap_num
label

Cap number of peaks by taking top N peaks

type

basic:integer

description

To keep all peaks set value to 0.

disabled

settings.broad

default

500000

settings.mfold_lower
label

MFOLD range (lower limit)

type

basic:integer

description

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required

False

settings.mfold_upper
label

MFOLD range (upper limit)

type

basic:integer

description

This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.

required

False

settings.slocal
label

Small local region

type

basic:integer

description

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required

False

settings.llocal
label

Large local region

type

basic:integer

description

Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000 bp for small local region (–slocal), and 10000 bp for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.

required

False

settings.extsize
label

extsize

type

basic:integer

description

While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.

required

False

settings.shift
label

Shift

type

basic:integer

description

Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.

required

False

settings.band_width
label

Band width

type

basic:integer

description

The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.

required

False

settings.nolambda
label

Use backgroud lambda as local lambda

type

basic:boolean

description

With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.

default

False

settings.fix_bimodal
label

Turn on the auto paired-peak model process

type

basic:boolean

description

Turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tag. If set, MACS will be terminated if paired-peak model has failed.

default

False

settings.nomodel
label

Bypass building the shifting model

type

basic:boolean

description

While on, MACS will bypass building the shifting model.

hidden

tagalign

default

False

settings.nomodel_prepeak
label

Bypass building the shifting model

type

basic:boolean

description

While on, MACS will bypass building the shifting model.

hidden

!tagalign

default

True

settings.down_sample
label

Down-sample

type

basic:boolean

description

When set to true, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change.

default

False

settings.bedgraph
label

Save fragment pileup and control lambda

type

basic:boolean

description

If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default

True

settings.spmr
label

Save signal per million reads for fragment pileup profiles

type

basic:boolean

disabled

settings.bedgraph === false

default

True

settings.call_summits
label

Call summits

type

basic:boolean

description

MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.

default

False

settings.broad
label

Composite broad regions

type

basic:boolean

description

When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.

disabled

settings.call_summits === true

default

False

settings.broad_cutoff
label

Broad cutoff

type

basic:decimal

description

Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1

required

False

disabled

settings.call_summits === true || settings.broad !== true

rose_settings.use_filtered_bam
label

Use Filtered BAM File

type

basic:boolean

description

Use filtered BAM file from a MACS2 object to rank enhancers by.

default

True

rose_settings.tss
label

TSS exclusion

type

basic:integer

description

Enter a distance from TSS to exclude. 0 = no TSS exclusion

default

0

rose_settings.stitch
label

Stitch

type

basic:integer

description

Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.

required

False

rose_settings.mask
label

Masking BED file

type

data:bed

description

Mask a set of regions from analysis. Provide a BED of masking regions.

required

False

chipqc_settings.blacklist
label

Blacklist regions

type

data:bed

description

BED file containing genomic regions that should be excluded from the analysis.

required

False

chipqc_settings.calculate_enrichment
label

Calculate enrichment

type

basic:boolean

description

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

default

False

chipqc_settings.profile_window
label

Window size

type

basic:integer

description

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

default

400

chipqc_settings.shift_size
label

Shift size

type

basic:string

description

Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end

default

1:300

Output results

Chemical Mutagenesis

data:workflow:chemutworkflow-chemut (basic:string  analysis_type, data:seq:nucleotide  genome, list:data:alignment:bam  parental_strains, list:data:alignment:bam  mutant_strains, basic:boolean  advanced, basic:boolean  br_and_ind_ra, basic:boolean  dbsnp, data:variants:vcf  known_sites, list:data:variants:vcf  known_indels, basic:integer  stand_emit_conf, basic:integer  stand_call_conf, basic:boolean  rf, basic:boolean  advanced, basic:integer  read_depth)[Source: v1.0.2]

Input arguments

analysis_type
label

Analysis type

type

basic:string

description

Choice of the analysis type. Use “SNV” or “INDEL” options to run the GATK analysis only on the haploid portion of the dicty genome. Choose options SNV_CHR2 or INDEL_CHR2 to run the analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).

default

snv

choices

  • SNV: snv

  • INDEL: indel

  • SNV_CHR2: snv_chr2

  • INDEL_CHR2: indel_chr2

genome
label

Reference genome

type

data:seq:nucleotide

parental_strains
label

Parental strains

type

list:data:alignment:bam

mutant_strains
label

Mutant strains

type

list:data:alignment:bam

Vc.advanced
label

Advanced options

type

basic:boolean

required

False

default

False

Vc.br_and_ind_ra
label

Do variant base recalibration and indel realignment

type

basic:boolean

required

False

hidden

Vc.advanced === false

default

False

Vc.dbsnp
label

Use dbSNP file

type

basic:boolean

description

rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. dbSNP is not used in any way for the calculations themselves.

required

False

hidden

Vc.advanced === false

default

False

Vc.known_sites
label

Known sites (dbSNP)

type

data:variants:vcf

required

False

hidden

Vc.advanced === false || Vc.br_and_ind_ra === false && Vc.dbsnp === false

Vc.known_indels
label

Known indels

type

list:data:variants:vcf

required

False

hidden

Vc.advanced === false || Vc.br_and_ind_ra === false

Vc.stand_emit_conf
label

Emission confidence threshold

type

basic:integer

description

The minimum confidence threshold (phred-scaled) at which the program should emit sites that appear to be possibly variant.

required

False

hidden

Vc.advanced === false

default

10

Vc.stand_call_conf
label

Calling confidence threshold

type

basic:integer

description

The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.

required

False

hidden

Vc.advanced === false

default

30

Vc.rf
label

ReasignOneMappingQuality Filter

type

basic:boolean

description

This read transformer will change a certain read mapping quality to a different value without affecting reads that have other mapping qualities. This is intended primarily for users of RNA-Seq data handling programs such as TopHat, which use MAPQ = 255 to designate uniquely aligned reads. According to convention, 255 normally designates “unknown” quality, and most GATK tools automatically ignore such reads. By reassigning a different mapping quality to those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.

required

False

hidden

Vc.advanced === false

default

False

Vf.advanced
label

Advanced options

type

basic:boolean

required

False

default

False

Vf.read_depth
label

Read depth cutoff

type

basic:integer

description

The minimum number of replicate reads required for a variant site to be included.

required

False

hidden

Vf.advanced === false

default

5

Output results

ChipQC

data:chipqc:chipqc (data:alignment:bam  alignment, data:chipseq:callpeak  peaks, data:bed  blacklist, basic:boolean  calculate_enrichment, basic:integer  quality_threshold, basic:integer  profile_window, basic:string  shift_size)[Source: v1.1.1]

Calculate quality control metrics for ChIP-seq samples. The analysis is based on ChIPQC package which computs a variety of quality control metrics and statistics, and provides plots and a report for assessment of experimental data for further analysis.

Input arguments

alignment
label

Aligned reads

type

data:alignment:bam

required

True

hidden

False

peaks
label

Called peaks

type

data:chipseq:callpeak

required

True

hidden

False

blacklist
label

Blacklist regions

type

data:bed

description

BED file containing genomic regions that should be excluded from the analysis.

required

False

hidden

False

calculate_enrichment
label

Calculate enrichment

type

basic:boolean

description

Calculate enrichment of signal in known genomic annotation. By default annotation is provided from the TranscriptDB package specified by genome bulid which should match one of the supported annotations (hg19, hg38, hg18, mm10, mm9, rn4, ce6, dm3). If annotation is not supported the analysis is skipped.

required

True

hidden

False

default

False

advanced.quality_threshold
label

Mapping quality threshold

type

basic:integer

description

Only reads with mapping quality scores above this threshold will be used for some statistics.

required

True

hidden

False

default

15

advanced.profile_window
label

Window size

type

basic:integer

description

An integer indicating the width of the window used for peak profiles. Peaks will be centered on their summits and include half of the window size upstream and half downstream of this point.

required

True

hidden

False

default

400

advanced.shift_size
label

Shift size

type

basic:string

description

Vector of values to try when computing optimal shift sizes. It should be specifeird as consecutive numbers vector with start:end

required

True

hidden

False

default

1:300

Output results

report_folder
label

ChipQC report folder

type

basic:dir

required

True

hidden

False

ccplot
label

Cross coverage score plot

type

basic:file

required

True

hidden

False

coverage_histogram
label

SSD metric plot

type

basic:file

required

True

hidden

False

peak_profile
label

Peak profile plot

type

basic:file

required

True

hidden

False

peaks_barplot
label

Barplot of reads in peaks

type

basic:file

required

True

hidden

False

peaks_density_plot
label

Density plot of reads in peaks

type

basic:file

required

True

hidden

False

enrichment_heatmap
label

Heatmap of reads in genomic features

type

basic:file

required

False

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Convert GFF3 to GTF

data:annotation:gtfgff-to-gtf (data:annotation:gff3  annotation)[Source: v0.5.1]

Convert GFF3 file to GTF format.

Input arguments

annotation
label

Annotation (GFF3)

type

data:annotation:gff3

description

Annotation in GFF3 format.

Output results

annot
label

Converted GTF file

type

basic:file

annot_sorted
label

Sorted GTF file

type

basic:file

annot_sorted_idx_igv
label

Igv index for sorted GTF file

type

basic:file

annot_sorted_track_jbrowse
label

Jbrowse track for sorted GTF

type

basic:file

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Convert files to reads (paired-end)

data:reads:fastq:pairedfiles-to-fastq-paired (list:data:file  src1, list:data:file  src2, basic:boolean  merge_lanes)[Source: v1.4.1]

Convert FASTQ files to paired-end reads.

Input arguments

src1
label

Mate1

type

list:data:file

src2
label

Mate2

type

list:data:file

merge_lanes
label

Merge lanes

type

basic:boolean

description

Merge paired-end sample data split into multiple sequencing lanes into a single pair of FASTQ files.

default

False

Output results

fastq
label

Reads file (mate 1)

type

list:basic:file

fastq2
label

Reads file (mate 2)

type

list:basic:file

fastqc_url
label

Quality control with FastQC (Upstream)

type

list:basic:file:html

fastqc_url2
label

Quality control with FastQC (Downstream)

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive (Upstream)

type

list:basic:file

fastqc_archive2
label

Download FastQC archive (Downstream)

type

list:basic:file

Convert files to reads (single-end)

data:reads:fastq:singlefiles-to-fastq-single (list:data:file  src, basic:boolean  merge_lanes)[Source: v1.4.1]

Convert FASTQ files to single-end reads.

Input arguments

src
label

Reads

type

list:data:file

description

Sequencing reads in FASTQ format

merge_lanes
label

Merge lanes

type

basic:boolean

description

Merge sample data split into multiple sequencing lanes into a single FASTQ file.

default

False

Output results

fastq
label

Reads file

type

list:basic:file

fastqc_url
label

Quality control with FastQC

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive

type

list:basic:file

Cuffdiff 2.2

data:differentialexpression:cuffdiff:cuffdiff (list:data:cufflinks:cuffquant  case, list:data:cufflinks:cuffquant  control, list:basic:string  labels, data:annotation  annotation, data:seq:nucleotide  genome, basic:boolean  multi_read_correct, basic:boolean  create_sets, basic:decimal  gene_logfc, basic:decimal  gene_fdr, basic:decimal  fdr, basic:string  library_type, basic:string  library_normalization, basic:string  dispersion_method)[Source: v3.3.2]

Run Cuffdiff 2.2 analysis. Cuffdiff finds significant changes in transcript expression, splicing, and promoter use. You can use it to find differentially expressed genes and transcripts, as well as genes that are being differentially regulated at the transcriptional and post-transcriptional level. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/) and [here](https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cuffdiff/7) for more information.

Input arguments

case
label

Case samples

type

list:data:cufflinks:cuffquant

required

True

hidden

False

control
label

Control samples

type

list:data:cufflinks:cuffquant

required

True

hidden

False

labels
label

Group labels

type

list:basic:string

description

Define labels for each sample group.

required

True

hidden

False

default

['control', 'case']

annotation
label

Annotation (GTF/GFF3)

type

data:annotation

description

A transcript annotation file produced by cufflinks, cuffcompare, or other tool.

required

True

hidden

False

genome
label

Run bias detection and correction algorithm

type

data:seq:nucleotide

description

Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.

required

False

hidden

False

multi_read_correct
label

Do initial estimation procedure to more accurately weight reads with multiple genome mappings

type

basic:boolean

required

True

hidden

False

default

False

create_sets
label

Create gene sets

type

basic:boolean

description

After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.

required

True

hidden

False

default

False

gene_logfc
label

Log2 fold change threshold for gene sets

type

basic:decimal

description

Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.

required

True

hidden

!create_sets

default

1.0

gene_fdr
label

FDR threshold for gene sets

type

basic:decimal

required

True

hidden

!create_sets

default

0.05

fdr
label

Allowed FDR

type

basic:decimal

description

The allowed false discovery rate. The default is 0.05.

required

True

hidden

False

default

0.05

library_type
label

Library type

type

basic:string

description

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

required

True

hidden

False

default

fr-unstranded

choices

  • fr-unstranded: fr-unstranded

  • fr-firststrand: fr-firststrand

  • fr-secondstrand: fr-secondstrand

library_normalization
label

Library normalization method

type

basic:string

description

You can control how library sizes (i.e. sequencing depths) are normalized in Cufflinks and Cuffdiff. Cuffdiff has several methods that require multiple libraries in order to work. Library normalization methods supported by Cufflinks work on one library at a time.

required

True

hidden

False

default

geometric

choices

  • geometric: geometric

  • classic-fpkm: classic-fpkm

  • quartile: quartile

dispersion_method
label

Dispersion method

type

basic:string

description

Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010).

required

True

hidden

False

default

pooled

choices

  • pooled: pooled

  • per-condition: per-condition

  • blind: blind

  • poisson: poisson

Output results

raw
label

Differential expression

type

basic:file

required

True

hidden

False

de_json
label

Results table (JSON)

type

basic:json

required

True

hidden

False

de_file
label

Results table (file)

type

basic:file

required

True

hidden

False

transcript_diff_exp
label

Differential expression (transcript level)

type

basic:file

required

True

hidden

False

tss_group_diff_exp
label

Differential expression (primary transcript)

type

basic:file

required

True

hidden

False

cds_diff_exp
label

Differential expression (coding sequence)

type

basic:file

required

True

hidden

False

cuffdiff_output
label

Cuffdiff output

type

basic:file

required

True

hidden

False

source
label

Gene ID database

type

basic:string

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

feature_type
label

Feature type

type

basic:string

required

True

hidden

False

Cuffmerge

data:annotation:cuffmergecuffmerge (list:data:cufflinks:cufflinks  expressions, list:data:annotation:gtf  gtf, data:annotation  gff, data:seq:nucleotide  genome, basic:integer  threads)[Source: v2.1.1]

Cufflinks includes a script called Cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. The main purpose of Cuffmerge is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/) for more information.

Input arguments

expressions
label

Cufflinks transcripts (GTF)

type

list:data:cufflinks:cufflinks

required

False

gtf
label

Annotation files (GTF)

type

list:data:annotation:gtf

description

Annotation files you wish to merge together with Cufflinks produced annotation files (e.g. upload Cufflinks annotation GTF file)

required

False

gff
label

Reference annotation (GTF/GFF3)

type

data:annotation

description

An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.

required

False

genome
label

Reference genome

type

data:seq:nucleotide

description

This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats. Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension

required

False

threads
label

Use this many processor threads

type

basic:integer

description

Use this many threads to align reads. The default is 1.

default

1

Output results

annot
label

Merged GTF file

type

basic:file

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Cuffnorm

data:cuffnormcuffnorm (list:data:cufflinks:cuffquant  cuffquant, data:annotation  annotation, basic:boolean  useERCC)[Source: v2.3.1]

Cufflinks includes a program, Cuffnorm, that you can use to generate tables of expression values that are properly normalized for library size. Cuffnorm takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM, BAM, or CXB files for two or more samples. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/) for more information. Replicate relation needs to be defined for Cuffnorm to account for replicates. If the replicate relation is not defined, each sample will be treated individually.

Input arguments

cuffquant
label

Cuffquant expression file

type

list:data:cufflinks:cuffquant

annotation
label

Annotation (GTF/GFF3)

type

data:annotation

description

A transcript annotation file produced by cufflinks, cuffcompare, or other source.

useERCC
label

ERCC spike-in normalization

type

basic:boolean

description

Use ERRCC spike-in controls for normalization.

default

False

Output results

genes_count
label

Genes count

type

basic:file

genes_fpkm
label

Genes FPKM

type

basic:file

genes_attr
label

Genes attr table

type

basic:file

isoform_count
label

Isoform count

type

basic:file

isoform_fpkm
label

Isoform FPKM

type

basic:file

isoform_attr
label

Isoform attr table

type

basic:file

cds_count
label

CDS count

type

basic:file

cds_fpkm
label

CDS FPKM

type

basic:file

cds_attr
label

CDS attr table

type

basic:file

tss_groups_count
label

TSS groups count

type

basic:file

tss_groups_fpkm
label

TSS groups FPKM

type

basic:file

tss_attr
label

TSS attr table

type

basic:file

run_info
label

Run info

type

basic:file

raw_scatter
label

FPKM exp scatter plot

type

basic:file

boxplot
label

Boxplot

type

basic:file

fpkm_exp_raw
label

FPKM exp raw

type

basic:file

replicate_correlations
label

Replicate correlatios plot

type

basic:file

fpkm_means
label

FPKM means

type

basic:file

exp_fpkm_means
label

Exp FPKM means

type

basic:file

norm_scatter
label

FKPM exp scatter normalized plot

type

basic:file

required

False

fpkm_exp_norm
label

FPKM exp normalized

type

basic:file

required

False

spike_raw
label

Spike raw

type

basic:file

required

False

spike_norm
label

Spike normalized

type

basic:file

required

False

R_data
label

All R normalization data

type

basic:file

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Cuffquant 2.2

data:cufflinks:cuffquantcuffquant (data:alignment:bam  alignment, data:annotation  annotation, data:seq:nucleotide  genome, data:annotation:gtf  mask_file, basic:string  library_type, basic:boolean  multi_read_correct)[Source: v2.1.1]

Cuffquant allows you to compute the gene and transcript expression profiles and save these profiles to files that you can analyze later with Cuffdiff or Cuffnorm. See [here](http://cole-trapnell-lab.github.io/cufflinks/manual/) for more information.

Input arguments

alignment
label

Aligned reads

type

data:alignment:bam

annotation
label

Annotation (GTF/GFF3)

type

data:annotation

genome
label

Run bias detection and correction algorithm

type

data:seq:nucleotide

description

Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.

required

False

mask_file
label

Mask file

type

data:annotation:gtf

description

Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

required

False

library_type
label

Library type

type

basic:string

description

In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

default

fr-unstranded

choices

  • fr-unstranded: fr-unstranded

  • fr-firststrand: fr-firststrand

  • fr-secondstrand: fr-secondstrand

multi_read_correct
label

Do initial estimation procedure to more accurately weight reads with multiple genome mappings

type

basic:boolean

description

Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.

default

False

Output results

cxb
label

Abundances (.cxb)

type

basic:file

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

Cuffquant results

data:cufflinks:cuffquantupload-cxb (basic:file  src, basic:string  source, basic:string  species, basic:string  build, basic:string  feature_type)[Source: v1.3.2]

Upload Cuffquant results file (.cxb)

Input arguments

src
label

Cuffquant file

type

basic:file

description

Upload Cuffquant results file. Supported extention: *.cxb

required

True

validate_regex

\.(cxb)$

source
label

Gene ID database

type

basic:string

choices

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label

Species

type

basic:string

description

Species latin name.

choices

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label

Build

type

basic:string

feature_type
label

Feature type

type

basic:string

default

gene

choices

  • gene: gene

  • transcript: transcript

  • exon: exon

Output results

cxb
label

Cuffquant results

type

basic:file

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

feature_type
label

Feature type

type

basic:string

Custom master file

data:masterfile:ampliconupload-master-file (basic:file  src, basic:string  panel_name)[Source: v1.2.1]

This should be a tab delimited file (*.txt). Please check the [example](http://genial.is/amplicon-masterfile) file for details.

Input arguments

src
label

Master file

type

basic:file

validate_regex

\.txt(|\.gz|\.bz2|\.tgz|\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$

panel_name
label

Panel name

type

basic:string

Output results

master_file
label

Master file

type

basic:file

bedfile
label

BED file (merged targets)

type

basic:file

nomergebed
label

BED file (nonmerged targets)

type

basic:file

olapfreebed
label

BED file (overlap-free targets)

type

basic:file

primers
label

Primers

type

basic:file

panel_name
label

Panel name

type

basic:string

Cut & Run

data:workflow:cutnrunworkflow-cutnrun (data:reads:fastq:paired  reads, basic:integer  quality, basic:integer  nextseq, basic:string  phred, basic:integer  min_length, basic:integer  max_n, basic:boolean  retain_unpaired, basic:integer  unpaired_len_1, basic:integer  unpaired_len_2, basic:integer  clip_r1, basic:integer  clip_r2, basic:integer  three_prime_r1, basic:integer  three_prime_r2, list:basic:string  adapter, list:basic:string  adapter_2, data:seq:nucleotide  adapter_file_1, data:seq:nucleotide  adapter_file_2, basic:string  universal_adapter, basic:integer  stringency, basic:decimal  error_rate, basic:integer  trim_5, basic:integer  trim_3, data:index:bowtie2  genome, basic:string  mode, basic:string  speed, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  no_overlap, basic:boolean  dovetail, basic:boolean  no_unal, data:index:bowtie2  genome, basic:string  mode, basic:string  speed, basic:boolean  discordantly, basic:boolean  rep_se, basic:integer  minins, basic:integer  maxins, basic:boolean  no_overlap, basic:boolean  dovetail, basic:boolean  no_unal, basic:string  format, basic:decimal  pvalue, basic:string  duplicates, basic:boolean  bedgraph, basic:integer  min_frag_length, basic:integer  max_frag_length, basic:decimal  scale, basic:integer  bw_binsize, basic:integer  bw_timeout)[Source: v1.2.1]

Analysis of samples processed for high resolution mapping of DNA binding sites using targeted nuclease strategy. The process is named CUT&RUN which stands for Cleavage Under Target and Release Using Nuclease. Workflow includes steps of trimming the reads with trimgalore, aligning them using bowtie2 to target species genome as well as a spike-in genome. Aligned reads are processed to produce bigwig files to be viewed in a genome browser. Peaks are called using MACS2. Fragmenting of reads is performed using alignmentSieve from deeptools package.

Input arguments

reads
label

Input reads

type

data:reads:fastq:paired

options_trimming.quality_trim.quality
label

Quality cutoff

type

basic:integer

description

Trim low-quality ends from reads based on Phred score.

required

False

options_trimming.quality_trim.nextseq
label

NextSeq/NovaSeq trim cutoff

type

basic:integer

description

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This will set a specific quality cutoff, but qualities of G bases are ignored. This can not be used with Quality cutoff and will override it.

required

False

options_trimming.quality_trim.phred
label

Phred score encoding

type

basic:string

description

Use either ASCII+33 quality scores as Phred scores (Sanger/Illumina 1 .9+ encoding) or ASCII+64 quality scores (Illumina 1.5 encoding) for quality trimming.

default

--phred33

choices

  • ASCII+33: --phred33

  • ASCII+64: --phred64

options_trimming.quality_trim.min_length
label

Minimum length after trimming

type

basic:integer

description

Discard reads that became shorter than selected length because of either quality or adapter trimming. Both reads of a read-pair need to be longer than specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads with Retain unpaired. A value of 0 disables filtering based on length.

default

20

options_trimming.quality_trim.max_n
label

Maximum number of Ns

type

basic:integer

description

Read exceeding this limit will result in the entire pair being removed from the trimmed output files.

required

False

options_trimming.quality_trim.retain_unpaired
label

Retain unpaired reads after trimming

type

basic:boolean

description

If only one of the two paired-end reads “became too short, the longer read will be written.

default

False

options_trimming.quality_trim.unpaired_len_1
label

Unpaired read length cutoff of mate 1

type

basic:integer

hidden

!quality_trim.retain_unpaired

default

35

options_trimming.quality_trim.unpaired_len_2
label

Unpaired read length cutoff for mate 2

type

basic:integer

hidden

!quality_trim.retain_unpaired

default

35

options_trimming.quality_trim.clip_r1
label

Trim bases from 5’ end of read 1

type

basic:integer

description

This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end.

required

False

options_trimming.quality_trim.clip_r2
label

Trim bases from 5’ end of read 2

type

basic:integer

description

This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5’ end. For paired-end bisulfite sequencing, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation.

required

False

options_trimming.quality_trim.three_prime_r1
label

Trim bases from 3’ end of read 1

type

basic:integer

description

Remove bases from the 3’ end of read 1 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.

required

False

options_trimming.quality_trim.three_prime_r2
label

Trim bases from 3’ end of read 2

type

basic:integer

description

Remove bases from the 3’ end of read 2 after adapter/quality trimming has been performed. This may remove some unwanted bias from the 3’ end that is not directly related to adapter sequence or basecall quality.

required

False

options_trimming.adapter_trim.adapter
label

Read 1 adapter sequence

type

list:basic:string

description

Adapter sequences to be trimmed. Also see universal adapters field for predefined adapters. This is mutually exclusive with read 1 adapters file and universal adapters.

required

False

options_trimming.adapter_trim.adapter_2
label

Read 2 adapter sequence

type

list:basic:string

description

Optional adapter sequence to be trimmed off read 2 of paired-end files. This is mutually exclusive with read 2 adapters file and universal adapters.

required

False

options_trimming.adapter_trim.adapter_file_1
label

Read 1 adapters file

type

data:seq:nucleotide

description

This is mutually exclusive with read 1 adapters and universal adapters.

required

False

options_trimming.adapter_trim.adapter_file_2
label

Read 2 adapters file

type

data:seq:nucleotide

description

This is mutually exclusive with read 2 adapters and universal adapters.

required

False

options_trimming.adapter_trim.universal_adapter
label

Universal adapters

type

basic:string

description

Instead of default detection use specific adapters. Use 13bp of the Illumina universal adapter, 12bp of the Nextera adapter or 12bp of the Illumina Small RNA 3’ Adapter. Selecting to trim smallRNA adapters will also lower the length value to 18bp. If the smallRNA libraries are paired-end then read 2 adapter will be set to the Illumina small RNA 5’ adapter automatically (GATCGTCGGACT) unless defined explicitly. This is mutually exclusive with manually defined adapters and adapter files.

required

False

choices

  • Illumina: --illumina

  • Nextera: --nextera

  • Illumina small RNA: --small_rna

options_trimming.adapter_trim.stringency
label

Overlap with adapter sequence required to trim

type

basic:integer

description

Defaults to a very stringent setting of 1, i.e. even a single base pair of overlapping sequence will be trimmed of the 3’ end of any read.

default

1

options_trimming.adapter_trim.error_rate
label

Maximum allowed error rate

type

basic:decimal

description

Number of errors divided by the length of the matching region. Default value of 0.1.

default

0.1

options_trimming.hard_trim.trim_5
label

Hard trim sequence from 3’ end

type

basic:integer

description

Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 3’ end. This is incompatible with other hard trimming options.

required

False

options_trimming.hard_trim.trim_3
label

Hard trim sequences from 5’ end

type

basic:integer

description

Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to bp from the 5’ end. This is incompatible with other hard trimming options.

required

False

options_aln_species.genome
label

Species genome

type

data:index:bowtie2

options_aln_species.mode
label

Alignment mode

type

basic:string

description

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default

--local

choices

  • end to end mode: --end-to-end

  • local: --local

options_aln_species.speed
label

Speed vs. Sensitivity

type

basic:string

description

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

default

--very-sensitive

choices

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

options_aln_species.discordantly
label

Report discordantly matched read

type

basic:boolean

description

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default

True

options_aln_species.rep_se
label

Report single ended

type

basic:boolean

description

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).

default

True

options_aln_species.minins
label

Minimal distance

type

basic:integer

description

The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.

default

10

options_aln_species.maxins
label

Maximal distance

type

basic:integer

description

The maximum fragment length (–maxins) for valid paired-end alignments.

default

700

options_aln_species.no_overlap
label

Not concordant when mates overlap

type

basic:boolean

description

When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).

default

False

options_aln_species.dovetail
label

Dovetail

type

basic:boolean

description

If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.

default

False

options_aln_species.no_unal
label

Suppress SAM records for unaligned reads

type

basic:boolean

description

When true, suppress SAM records for unaligned reads. Default is true (–no-unal).

default

True

options_aln_spikein.genome
label

Spike-in genome

type

data:index:bowtie2

options_aln_spikein.mode
label

Alignment mode

type

basic:string

description

End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.

default

--local

choices

  • end to end mode: --end-to-end

  • local: --local

options_aln_spikein.speed
label

Speed vs. Sensitivity

type

basic:string

description

A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

default

--very-sensitive

choices

  • Very fast: --very-fast

  • Fast: --fast

  • Sensitive: --sensitive

  • Very sensitive: --very-sensitive

options_aln_spikein.discordantly
label

Report discordantly matched read

type

basic:boolean

description

If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.

default

True

options_aln_spikein.rep_se
label

Report single ended

type

basic:boolean

description

If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates. Default is true (–no-mixed).

default

True

options_aln_spikein.minins
label

Minimal distance

type

basic:integer

description

The minimum fragment length (–minins) for valid paired-end alignments. Value 0 imposes no minimum.

default

10

options_aln_spikein.maxins
label

Maximal distance

type

basic:integer

description

The maximum fragment length (–maxins) for valid paired-end alignments.

default

700

options_aln_spikein.no_overlap
label

Not concordant when mates overlap

type

basic:boolean

description

When true, it is considered not concordant when mates overlap at all. Defaul is true (–no-overlap).

default

True

options_aln_spikein.dovetail
label

Dovetail

type

basic:boolean

description

If the mates “dovetail”, that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. Default: mates cannot dovetail in a concordant alignment. If true, parameter –dovetail is turned on.

default

False

options_aln_spikein.no_unal
label

Suppress SAM records for unaligned reads

type

basic:boolean

description

When true, suppress SAM records for unaligned reads. Default is true (–no-unal).

default

True

options_pc.format
label

Format of tag file

type

basic:string

description

This specifies the format of input files. For paired-end data the format dicates how MACS2 will treat mates. If the selected format is BAM, MACS2 will only keep the left mate (5’ end) tag. However, when format BAMPE is selected, MACS2 will use actual insert sizes of pairs of reads to build fragment pileup, instead of building bimodal distribution plus and minus strand reads to predict fragment size.

required

False

default

BAMPE

choices

  • BAM: BAM

  • BAMPE: BAMPE

options_pc.pvalue
label

P-value cutoff

type

basic:decimal

description

The p-value cutoff.

required

False

default

0.001

options_pc.duplicates
label

Number of duplicates

type

basic:string

description

It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.

default

all

choices

  • 1: 1

  • auto: auto

  • all: all

options_pc.bedgraph
label

Save fragment pileup and control lambda

type

basic:boolean

description

If this flag is on, MACS will store the fragment pileup, control lambda, -log10(pvalue) and -log10(qvalue) scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.

default

True

options_sieve.min_frag_length
label

Minimum fragment length

type

basic:integer

description

The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. Default is 0.

default

0

options_sieve.max_frag_length
label

Maximum fragment length

type

basic:integer

description

The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. Default is 0.

default

0

options_scale.scale
label

Scale factor

type

basic:decimal

description

Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).

default

10000

options_misc.bw_binsize
label

BigWig bin size

type

basic:integer

description

Size of the bins, in bases, for the output of the bigwig/bedgraph file. Default is 50.

default

50

options_misc.bw_timeout
label

BigWig timeout

type

basic:integer

description

Number of seconds before calculation of BigWig file is aborted. Default is 3600 seconds (1 hour).

default

3600

Output results

Cutadapt (3’ mRNA-seq, single-end)

data:reads:fastq:single:cutadapt:cutadapt-3prime-single (data:reads:fastq:single  reads, basic:integer  nextseq_trim, basic:integer  quality_cutoff, basic:integer  min_len, basic:integer  min_overlap, basic:integer  times)[Source: v1.2.1]

Process 3’ mRNA-seq datasets using Cutadapt tool.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

required

True

hidden

False

options.nextseq_trim
label

NextSeq/NovaSeq trim

type

basic:integer

description

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.

required

True

hidden

False

default

10

options.quality_cutoff
label

Quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.

required

False

hidden

False

options.min_len
label

Discard reads shorter than specified minimum length.

type

basic:integer

required

True

hidden

False

default

20

options.min_overlap
label

Mimimum overlap

type

basic:integer

description

Minimum overlap between adapter and read for an adapter to be found.

required

True

hidden

False

default

20

options.times
label

Remove up to a specified number of adapters from each read.

type

basic:integer

required

True

hidden

False

default

2

Output results

fastq
label

Reads file.

type

list:basic:file

required

True

hidden

False

report
label

Cutadapt report

type

basic:file

required

True

hidden

False

fastqc_url
label

Quality control with FastQC.

type

list:basic:file:html

required

True

hidden

False

fastqc_archive
label

Download FastQC archive.

type

list:basic:file

required

True

hidden

False

Cutadapt (Corall RNA-Seq, paired-end)

data:reads:fastq:paired:cutadapt:cutadapt-corall-paired (data:reads:fastq:paired  reads, basic:integer  nextseq_trim, basic:integer  quality_cutoff, basic:integer  min_len, basic:integer  min_overlap)[Source: v1.1.2]

Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:paired

required

True

hidden

False

options.nextseq_trim
label

NextSeq/NovaSeq trim

type

basic:integer

description

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.

required

True

hidden

False

default

10

options.quality_cutoff
label

Quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.

required

False

hidden

False

options.min_len
label

Minimum read length

type

basic:integer

required

True

hidden

False

default

20

options.min_overlap
label

Mimimum overlap

type

basic:integer

description

Minimum overlap between adapter and read for an adapter to be found.

required

True

hidden

False

default

20

Output results

fastq
label

Remaining mate1 reads

type

list:basic:file

required

True

hidden

False

fastq2
label

Remaining mate2 reads

type

list:basic:file

required

True

hidden

False

report
label

Cutadapt report

type

basic:file

required

True

hidden

False

fastqc_url
label

Mate1 quality control with FastQC

type

list:basic:file:html

required

True

hidden

False

fastqc_url2
label

Mate2 quality control with FastQC

type

list:basic:file:html

required

True

hidden

False

fastqc_archive
label

Download mate1 FastQC archive

type

list:basic:file

required

True

hidden

False

fastqc_archive2
label

Download mate2 FastQC archive

type

list:basic:file

required

True

hidden

False

Cutadapt (Corall RNA-Seq, single-end)

data:reads:fastq:single:cutadapt:cutadapt-corall-single (data:reads:fastq:single  reads, basic:integer  nextseq_trim, basic:integer  quality_cutoff, basic:integer  min_len, basic:integer  min_overlap)[Source: v1.2.1]

Pre-process reads obtained using CORALL Total RNA-Seq Library Prep Kit. Trim UMI-tags from input reads and use Cutadapt to remove adapters and run QC filtering steps.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

required

True

hidden

False

options.nextseq_trim
label

NextSeq/NovaSeq trim

type

basic:integer

description

NextSeq/NovaSeq-specific quality trimming. Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of standard quality-cutoff trimming and is suitable for the use with data generated by the recent Illumina machines that utilize two-color chemistry to encode the four bases.

required

True

hidden

False

default

10

options.quality_cutoff
label

Quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq trim option.

required

False

hidden

False

options.min_len
label

Minimum read length

type

basic:integer

required

True

hidden

False

default

20

options.min_overlap
label

Mimimum overlap

type

basic:integer

description

Minimum overlap between adapter and read for an adapter to be found.

required

True

hidden

False

default

20

Output results

fastq
label

Reads file

type

list:basic:file

required

True

hidden

False

report
label

Cutadapt report

type

basic:file

required

True

hidden

False

fastqc_url
label

Quality control with FastQC

type

list:basic:file:html

required

True

hidden

False

fastqc_archive
label

Download FastQC archive

type

list:basic:file

required

True

hidden

False

Cutadapt (Diagenode CATS, paired-end)

data:reads:fastq:paired:cutadaptcutadapt-custom-paired (data:reads:fastq:paired  reads)[Source: v1.3.1]

Cutadapt process configured to be used with the Diagenode CATS kits.

Input arguments

reads
label

NGS reads

type

data:reads:fastq:paired

Output results

fastq
label

Reads file (forward)

type

list:basic:file

fastq2
label

Reads file (reverse)

type

list:basic:file

report
label

Cutadapt report

type

basic:file

fastqc_url
label

Quality control with FastQC (forward)

type

list:basic:file:html

fastqc_url2
label

Quality control with FastQC (reverse)

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive (forward)

type

list:basic:file

fastqc_archive2
label

Download FastQC archive (reverse)

type

list:basic:file

Cutadapt (Diagenode CATS, single-end)

data:reads:fastq:single:cutadaptcutadapt-custom-single (data:reads:fastq:single  reads)[Source: v1.3.1]

Cutadapt process configured to be used with the Diagenode CATS kits.

Input arguments

reads
label

NGS reads

type

data:reads:fastq:single

Output results

fastq
label

Reads file

type

list:basic:file

report
label

Cutadapt report

type

basic:file

fastqc_url
label

Quality control with FastQC

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive

type

list:basic:file

Cutadapt (paired-end)

data:reads:fastq:paired:cutadaptcutadapt-paired (data:reads:fastq:paired  reads, data:seq:nucleotide  mate1_5prime_file, data:seq:nucleotide  mate1_3prime_file, data:seq:nucleotide  mate2_5prime_file, data:seq:nucleotide  mate2_3prime_file, list:basic:string  mate1_5prime_seq, list:basic:string  mate1_3prime_seq, list:basic:string  mate2_5prime_seq, list:basic:string  mate2_3prime_seq, basic:integer  times, basic:decimal  error_rate, basic:integer  min_overlap, basic:boolean  match_read_wildcards, basic:integer  nextseq_trim, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:integer  max_n, basic:string  pair_filter)[Source: v2.4.1]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:paired

adapters.mate1_5prime_file
label

5 prime adapter file for Mate 1

type

data:seq:nucleotide

required

False

adapters.mate1_3prime_file
label

3 prime adapter file for Mate 1

type

data:seq:nucleotide

required

False

adapters.mate2_5prime_file
label

5 prime adapter file for Mate 2

type

data:seq:nucleotide

required

False

adapters.mate2_3prime_file
label

3 prime adapter file for Mate 2

type

data:seq:nucleotide

required

False

adapters.mate1_5prime_seq
label

5 prime adapter sequence for Mate 1

type

list:basic:string

required

False

adapters.mate1_3prime_seq
label

3 prime adapter sequence for Mate 1

type

list:basic:string

required

False

adapters.mate2_5prime_seq
label

5 prime adapter sequence for Mate 2

type

list:basic:string

required

False

adapters.mate2_3prime_seq
label

3 prime adapter sequence for Mate 2

type

list:basic:string

required

False

adapters.times
label

Times

type

basic:integer

description

Remove up to COUNT adapters from each read.

default

1

adapters.error_rate
label

Error rate

type

basic:decimal

description

Maximum allowed error rate (no. of errors divided by the length of the matching region).

default

0.1

adapters.min_overlap
label

Minimal overlap

type

basic:integer

description

Minimum overlap for an adapter match.

default

3

adapters.match_read_wildcards
label

Match read wildcards

type

basic:boolean

description

Interpret IUPAC wildcards in reads.

default

False

modify_reads.nextseq_trim
label

NextSeq-specific quality trimming

type

basic:integer

description

NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.

required

False

modify_reads.leading
label

Quality on 5 prime

type

basic:integer

description

Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base.

required

False

modify_reads.trailing
label

Quality on 3 prime

type

basic:integer

description

Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base.

required

False

modify_reads.crop
label

Crop

type

basic:integer

description

Cut the specified number of bases from the end of the reads.

required

False

modify_reads.headcrop
label

Headcrop

type

basic:integer

description

Cut the specified number of bases from the start of the reads.

required

False

filtering.minlen
label

Min length

type

basic:integer

description

Drop the read if it is below a specified.

required

False

filtering.max_n
label

Max numebr of N-s

type

basic:integer

description

Discard reads having more ‘N’ bases than specified.

required

False

filtering.pair_filter
label

Which of the reads have to match the filtering criterion

type

basic:string

description

Which of the reads in a paired-end read have to match the filtering criterion in order for the pair to be filtered.

default

any

choices

  • Any of the reads in a paired-end read have to match the filtering criterion: any

  • Both of the reads in a paired-end read have to match the filtering criterion: both

Output results

fastq
label

Reads file (forward)

type

list:basic:file

fastq2
label

Reads file (reverse)

type

list:basic:file

report
label

Cutadapt report

type

basic:file

fastqc_url
label

Quality control with FastQC (forward)

type

list:basic:file:html

fastqc_url2
label

Quality control with FastQC (reverse)

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive (forward)

type

list:basic:file

fastqc_archive2
label

Download FastQC archive (reverse)

type

list:basic:file

Cutadapt (single-end)

data:reads:fastq:single:cutadaptcutadapt-single (data:reads:fastq:single  reads, data:seq:nucleotide  up_primers_file, data:seq:nucleotide  down_primers_file, list:basic:string  up_primers_seq, list:basic:string  down_primers_seq, basic:integer  polya_tail, basic:integer  min_overlap, basic:integer  nextseq_trim, basic:integer  leading, basic:integer  trailing, basic:integer  crop, basic:integer  headcrop, basic:integer  minlen, basic:integer  max_n, basic:boolean  match_read_wildcards, basic:integer  times, basic:decimal  error_rate)[Source: v2.2.1]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

adapters.up_primers_file
label

5 prime adapter file

type

data:seq:nucleotide

required

False

adapters.down_primers_file
label

3 prime adapter file

type

data:seq:nucleotide

required

False

adapters.up_primers_seq
label

5 prime adapter sequence

type

list:basic:string

required

False

adapters.down_primers_seq
label

3 prime adapter sequence

type

list:basic:string

required

False

adapters.polya_tail
label

Poly-A tail

type

basic:integer

description

Length of poly-A tail, example - AAAN -> 3, AAAAAN -> 5

required

False

adapters.min_overlap
label

Minimal overlap

type

basic:integer

description

Minimum overlap for an adapter match

default

3

modify_reads.nextseq_trim
label

NextSeq-specific quality trimming

type

basic:integer

description

NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.

required

False

modify_reads.leading
label

Quality on 5 prime

type

basic:integer

description

Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.

required

False

modify_reads.trailing
label

Quality on 3 prime

type

basic:integer

description

Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.

required

False

modify_reads.crop
label

Crop

type

basic:integer

description

Cut the read to a specified length by removing bases from the end

required

False

modify_reads.headcrop
label

Headcrop

type

basic:integer

description

Cut the specified number of bases from the start of the read

required

False

filtering.minlen
label

Min length

type

basic:integer

description

Drop the read if it is below a specified length

required

False

filtering.max_n
label

Max numebr of N-s

type

basic:integer

description

Discard reads having more ‘N’ bases than specified.

required

False

filtering.match_read_wildcards
label

Match read wildcards

type

basic:boolean

description

Interpret IUPAC wildcards in reads.

required

False

default

False

filtering.times
label

Times

type

basic:integer

description

Remove up to COUNT adapters from each read.

default

1

filtering.error_rate
label

Error rate

type

basic:decimal

description

Maximum allowed error rate (no. of errors divided by the length of the matching region).

default

0.1

Output results

fastq
label

Reads file

type

list:basic:file

report
label

Cutadapt report

type

basic:file

fastqc_url
label

Quality control with FastQC

type

list:basic:file:html

fastqc_archive
label

Download FastQC archive

type

list:basic:file

Cutadapt - STAR - FeatureCounts (3’ mRNA-Seq, single-end)

data:workflow:quant:featurecounts:singleworkflow-cutadapt-star-fc-quant-single (data:reads:fastq:single  reads, data:index:star  star_index, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:boolean  show_advanced, basic:integer  quality_cutoff, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass)[Source: v2.0.1]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics. Additional QC steps operate on downsampled reads and include an alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

star_index
label

Genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

annotation
label

Annotation

type

data:annotation

description

Genome annotation file (GTF).

rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

cutadapt.quality_cutoff
label

Reads quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.

required

False

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.seed
label

Seed

type

basic:integer

default

11

downsampling.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads in range [0.0, 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

Output results

Cutadapt - STAR - FeatureCounts - basic QC (3’ mRNA-Seq, single-end)

data:workflow:quant:featurecounts:singleworkflow-cutadapt-star-fc-quant-wo-depletion-single (data:reads:fastq:single  reads, data:index:star  star_index, data:annotation  annotation, basic:boolean  show_advanced, basic:integer  quality_cutoff)[Source: v2.0.1]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

star_index
label

Genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

annotation
label

Annotation

type

data:annotation

description

Genome annotation file (GTF).

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

cutadapt.quality_cutoff
label

Reads quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.

required

False

Output results

Cutadapt - STAR - HTSeq-count (paired-end)

data:workflow:rnaseq:htseqworkflow-custom-cutadapt-star-htseq-paired (data:reads:fastq:paired  reads, data:index:star  genome, data:annotation:gtf  gff, basic:string  stranded, basic:boolean  advanced, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chimSegmentMin, basic:boolean  quantmode, basic:boolean  singleend, basic:boolean  gene_counts, basic:string  outFilterType, basic:integer  outFilterMultimapNmax, basic:integer  outFilterMismatchNmax, basic:decimal  outFilterMismatchNoverLmax, basic:integer  alignSJoverhangMin, basic:integer  alignSJDBoverhangMin, basic:integer  alignIntronMin, basic:integer  alignIntronMax, basic:integer  alignMatesGapMax, basic:string  mode, basic:string  feature_class, basic:string  id_attribute, basic:boolean  name_ordered)[Source: v2.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __cutadapt__ which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

Input arguments

reads
label

NGS reads

type

data:reads:fastq:paired

genome
label

Indexed reference genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool

gff
label

Annotation (GFF)

type

data:annotation:gtf

stranded
label

Assay type

type

basic:string

description

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

default

no

choices

  • Strand non-specific: no

  • Strand-specific forward: yes

  • Strand-specific reverse: reverse

advanced
label

Advanced

type

basic:boolean

default

False

star.noncannonical
label

Remove non-cannonical junctions (Cufflinks compatibility)

type

basic:boolean

description

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

default

False

star.detect_chimeric.chimeric
label

Detect chimeric and circular alignments

type

basic:boolean

description

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

default

False

star.detect_chimeric.chimSegmentMin
label

–chimSegmentMin

type

basic:integer

disabled

!star.detect_chimeric.chimeric

default

20

star.t_coordinates.quantmode
label

Output in transcript coordinates

type

basic:boolean

description

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

default

False

star.t_coordinates.singleend
label

Allow soft-clipping and indels

type

basic:boolean

description

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

disabled

!star.t_coordinates.quantmode

default

False

star.t_coordinates.gene_counts
label

Count reads

type

basic:boolean

description

With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).

disabled

!star.t_coordinates.quantmode

default

False

star.filtering.outFilterType
label

Type of filtering

type

basic:string

description

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab

default

Normal

choices

  • Normal: Normal

  • BySJout: BySJout

star.filtering.outFilterMultimapNmax
label

–outFilterMultimapNmax

type

basic:integer

description

Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).

required

False

star.filtering.outFilterMismatchNmax
label

–outFilterMismatchNmax

type

basic:integer

description

Alignment will be output only if it has fewer mismatches than this value (default: 10).

required

False

star.filtering.outFilterMismatchNoverLmax
label

–outFilterMismatchNoverLmax

type

basic:decimal

description

Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.

required

False

star.alignment.alignSJoverhangMin
label

–alignSJoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required

False

star.alignment.alignSJDBoverhangMin
label

–alignSJDBoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required

False

star.alignment.alignIntronMin
label

–alignIntronMin

type

basic:integer

description

Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required

False

star.alignment.alignIntronMax
label

–alignIntronMax

type

basic:integer

description

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

star.alignment.alignMatesGapMax
label

–alignMatesGapMax

type

basic:integer

description

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

htseq.mode
label

Mode

type

basic:string

description

Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty

default

union

choices

  • union: union

  • intersection-strict: intersection-strict

  • intersection-nonempty: intersection-nonempty

htseq.feature_class
label

Feature class

type

basic:string

description

Feature class (3rd column in GFF file) to be used. All other features will be ignored.

default

exon

htseq.id_attribute
label

ID attribute

type

basic:string

description

GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.

default

gene_id

htseq.name_ordered
label

Use name-ordered BAM file for counting reads

type

basic:boolean

description

Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time.

required

False

default

False

Output results

Cutadapt - STAR - HTSeq-count (single-end)

data:workflow:rnaseq:htseqworkflow-custom-cutadapt-star-htseq-single (data:reads:fastq:single  reads, data:index:star  genome, data:annotation:gtf  gff, basic:string  stranded, basic:boolean  advanced, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chimSegmentMin, basic:boolean  quantmode, basic:boolean  singleend, basic:boolean  gene_counts, basic:string  outFilterType, basic:integer  outFilterMultimapNmax, basic:integer  outFilterMismatchNmax, basic:decimal  outFilterMismatchNoverLmax, basic:integer  alignSJoverhangMin, basic:integer  alignSJDBoverhangMin, basic:integer  alignIntronMin, basic:integer  alignIntronMax, basic:integer  alignMatesGapMax, basic:string  mode, basic:string  feature_class, basic:string  id_attribute, basic:boolean  name_ordered)[Source: v2.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __cutadapt__ which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

Input arguments

reads
label

NGS reads

type

data:reads:fastq:single

genome
label

Indexed reference genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool

gff
label

Annotation (GFF)

type

data:annotation:gtf

stranded
label

Assay type

type

basic:string

description

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

default

no

choices

  • Strand non-specific: no

  • Strand-specific forward: yes

  • Strand-specific reverse: reverse

advanced
label

Advanced

type

basic:boolean

default

False

star.noncannonical
label

Remove non-cannonical junctions (Cufflinks compatibility)

type

basic:boolean

description

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

default

False

star.detect_chimeric.chimeric
label

Detect chimeric and circular alignments

type

basic:boolean

description

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

default

False

star.detect_chimeric.chimSegmentMin
label

–chimSegmentMin

type

basic:integer

disabled

!star.detect_chimeric.chimeric

default

20

star.t_coordinates.quantmode
label

Output in transcript coordinates

type

basic:boolean

description

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

default

False

star.t_coordinates.singleend
label

Allow soft-clipping and indels

type

basic:boolean

description

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

disabled

!star.t_coordinates.quantmode

default

False

star.t_coordinates.gene_counts
label

Count reads

type

basic:boolean

description

With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).

disabled

!star.t_coordinates.quantmode

default

False

star.filtering.outFilterType
label

Type of filtering

type

basic:string

description

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab

default

Normal

choices

  • Normal: Normal

  • BySJout: BySJout

star.filtering.outFilterMultimapNmax
label

–outFilterMultimapNmax

type

basic:integer

description

Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).

required

False

star.filtering.outFilterMismatchNmax
label

–outFilterMismatchNmax

type

basic:integer

description

Alignment will be output only if it has fewer mismatches than this value (default: 10).

required

False

star.filtering.outFilterMismatchNoverLmax
label

–outFilterMismatchNoverLmax

type

basic:decimal

description

Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.

required

False

star.alignment.alignSJoverhangMin
label

–alignSJoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required

False

star.alignment.alignSJDBoverhangMin
label

–alignSJDBoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required

False

star.alignment.alignIntronMin
label

–alignIntronMin

type

basic:integer

description

Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required

False

star.alignment.alignIntronMax
label

–alignIntronMax

type

basic:integer

description

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

star.alignment.alignMatesGapMax
label

–alignMatesGapMax

type

basic:integer

description

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

htseq.mode
label

Mode

type

basic:string

description

Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty

default

union

choices

  • union: union

  • intersection-strict: intersection-strict

  • intersection-nonempty: intersection-nonempty

htseq.feature_class
label

Feature class

type

basic:string

description

Feature class (3rd column in GFF file) to be used. All other features will be ignored.

default

exon

htseq.id_attribute
label

ID attribute

type

basic:string

description

GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.

default

gene_id

htseq.name_ordered
label

Use name-ordered BAM file for counting reads

type

basic:boolean

description

Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time.

required

False

default

False

Output results

Cutadapt - STAR - RSEM (Diagenode CATS, paired-end)

data:workflow:rnaseq:rsemworkflow-custom-cutadapt-star-rsem-paired (data:reads:fastq:paired  reads, data:index:star  star_index, data:index:expression  expression_index, basic:string  stranded, basic:boolean  advanced, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chimSegmentMin, basic:boolean  quantmode, basic:boolean  singleend, basic:boolean  gene_counts, basic:string  outFilterType, basic:integer  outFilterMultimapNmax, basic:integer  outFilterMismatchNmax, basic:decimal  outFilterMismatchNoverLmax, basic:integer  alignSJoverhangMin, basic:integer  alignSJDBoverhangMin, basic:integer  alignIntronMin, basic:integer  alignIntronMax, basic:integer  alignMatesGapMax)[Source: v2.0.1]

This RNA-seq pipeline is configured to be used with the Diagenode CATS RNA-seq kits. It is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by cutadapt which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by STAR aligner. Finally, RSEM estimates gene and isoform expression levels from the aligned reads.

Input arguments

reads
label

NGS reads

type

data:reads:fastq:paired

star_index
label

STAR genome index

type

data:index:star

expression_index
label

Gene expression indices

type

data:index:expression

stranded
label

Assay type

type

basic:string

description

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

default

no

choices

  • Strand non-specific: no

  • Strand-specific forward: yes

  • Strand-specific reverse: reverse

advanced
label

Advanced

type

basic:boolean

default

False

star.noncannonical
label

Remove non-cannonical junctions (Cufflinks compatibility)

type

basic:boolean

description

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

default

False

star.detect_chimeric.chimeric
label

Detect chimeric and circular alignments

type

basic:boolean

description

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

default

False

star.detect_chimeric.chimSegmentMin
label

–chimSegmentMin

type

basic:integer

disabled

!star.detect_chimeric.chimeric

default

20

star.t_coordinates.quantmode
label

Output in transcript coordinates

type

basic:boolean

description

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

default

True

star.t_coordinates.singleend
label

Allow soft-clipping and indels

type

basic:boolean

description

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

disabled

!star.t_coordinates.quantmode

default

False

star.t_coordinates.gene_counts
label

Count reads

type

basic:boolean

description

With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).

disabled

!star.t_coordinates.quantmode

default

False

star.filtering.outFilterType
label

Type of filtering

type

basic:string

description

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab

default

Normal

choices

  • Normal: Normal

  • BySJout: BySJout

star.filtering.outFilterMultimapNmax
label

–outFilterMultimapNmax

type

basic:integer

description

Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).

required

False

star.filtering.outFilterMismatchNmax
label

–outFilterMismatchNmax

type

basic:integer

description

Alignment will be output only if it has fewer mismatches than this value (default: 10).

required

False

star.filtering.outFilterMismatchNoverLmax
label

–outFilterMismatchNoverLmax

type

basic:decimal

description

Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.

required

False

star.alignment.alignSJoverhangMin
label

–alignSJoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required

False

star.alignment.alignSJDBoverhangMin
label

–alignSJDBoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required

False

star.alignment.alignIntronMin
label

–alignIntronMin

type

basic:integer

description

Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required

False

star.alignment.alignIntronMax
label

–alignIntronMax

type

basic:integer

description

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

star.alignment.alignMatesGapMax
label

–alignMatesGapMax

type

basic:integer

description

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

Output results

Cutadapt - STAR - RSEM (Diagenode CATS, single-end)

data:workflow:rnaseq:rsemworkflow-custom-cutadapt-star-rsem-single (data:reads:fastq:single  reads, data:index:star  star_index, data:index:expression  expression_index, basic:string  stranded, basic:boolean  advanced, basic:boolean  noncannonical, basic:boolean  chimeric, basic:integer  chimSegmentMin, basic:boolean  quantmode, basic:boolean  singleend, basic:boolean  gene_counts, basic:string  outFilterType, basic:integer  outFilterMultimapNmax, basic:integer  outFilterMismatchNmax, basic:decimal  outFilterMismatchNoverLmax, basic:integer  alignSJoverhangMin, basic:integer  alignSJDBoverhangMin, basic:integer  alignIntronMin, basic:integer  alignIntronMax, basic:integer  alignMatesGapMax)[Source: v2.0.1]

This RNA-seq pipeline is configured to be used with the Diagenode CATS RNA-seq kits. It is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by cutadapt which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by STAR aligner. Finally, RSEM estimates gene and isoform expression levels from the aligned reads.

Input arguments

reads
label

NGS reads

type

data:reads:fastq:single

star_index
label

STAR genome index

type

data:index:star

expression_index
label

Gene expression indices

type

data:index:expression

stranded
label

Assay type

type

basic:string

description

In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.

default

no

choices

  • Strand non-specific: no

  • Strand-specific forward: yes

  • Strand-specific reverse: reverse

advanced
label

Advanced

type

basic:boolean

default

False

star.noncannonical
label

Remove non-cannonical junctions (Cufflinks compatibility)

type

basic:boolean

description

It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.

default

False

star.detect_chimeric.chimeric
label

Detect chimeric and circular alignments

type

basic:boolean

description

To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.

default

False

star.detect_chimeric.chimSegmentMin
label

–chimSegmentMin

type

basic:integer

disabled

!star.detect_chimeric.chimeric

default

20

star.t_coordinates.quantmode
label

Output in transcript coordinates

type

basic:boolean

description

With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.

default

True

star.t_coordinates.singleend
label

Allow soft-clipping and indels

type

basic:boolean

description

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

disabled

!star.t_coordinates.quantmode

default

False

star.t_coordinates.gene_counts
label

Count reads

type

basic:boolean

description

With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).

disabled

!star.t_coordinates.quantmode

default

False

star.filtering.outFilterType
label

Type of filtering

type

basic:string

description

Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab

default

Normal

choices

  • Normal: Normal

  • BySJout: BySJout

star.filtering.outFilterMultimapNmax
label

–outFilterMultimapNmax

type

basic:integer

description

Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).

required

False

star.filtering.outFilterMismatchNmax
label

–outFilterMismatchNmax

type

basic:integer

description

Alignment will be output only if it has fewer mismatches than this value (default: 10).

required

False

star.filtering.outFilterMismatchNoverLmax
label

–outFilterMismatchNoverLmax

type

basic:decimal

description

Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.

required

False

star.alignment.alignSJoverhangMin
label

–alignSJoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for spliced alignments (default: 5).

required

False

star.alignment.alignSJDBoverhangMin
label

–alignSJDBoverhangMin

type

basic:integer

description

Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).

required

False

star.alignment.alignIntronMin
label

–alignIntronMin

type

basic:integer

description

Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).

required

False

star.alignment.alignIntronMax
label

–alignIntronMax

type

basic:integer

description

Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

star.alignment.alignMatesGapMax
label

–alignMatesGapMax

type

basic:integer

description

Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).

required

False

Output results

Cutadapt - STAR - StringTie (Corall, paired-end)

data:workflow:rnaseq:corallworkflow-corall-paired (data:reads:fastq:paired  reads, data:index:star  star_index, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:boolean  show_advanced, basic:integer  quality_cutoff, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:string  feature_class, basic:string  id_attribute)[Source: v3.0.1]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:paired

star_index
label

Genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

annotation
label

Annotation

type

data:annotation

description

Genome annotation file (GTF).

rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

cutadapt.quality_cutoff
label

Reads quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.

required

False

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.seed
label

Seed

type

basic:integer

default

11

downsampling.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

quantification.feature_class
label

Feature class

type

basic:string

description

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

default

exon

quantification.id_attribute
label

ID attribute

type

basic:string

description

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

default

gene_id

choices

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

Output results

Cutadapt - STAR - StringTie (Corall, single-end)

data:workflow:rnaseq:corallworkflow-corall-single (data:reads:fastq:single  reads, data:index:star  star_index, data:annotation  annotation, data:index:star  rrna_reference, data:index:star  globin_reference, basic:boolean  show_advanced, basic:integer  quality_cutoff, basic:integer  n_reads, basic:integer  seed, basic:decimal  fraction, basic:boolean  two_pass, basic:string  feature_class, basic:string  id_attribute)[Source: v3.0.1]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the featureCounts tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

Input arguments

reads
label

Select sample(s)

type

data:reads:fastq:single

star_index
label

Genome

type

data:index:star

description

Genome index prepared by STAR aligner indexing tool.

annotation
label

Annotation

type

data:annotation

description

Genome annotation file (GTF).

rrna_reference
label

Indexed rRNA reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

globin_reference
label

Indexed Globin reference sequence

type

data:index:star

description

Reference sequence index prepared by STAR aligner indexing tool.

show_advanced
label

Show advanced parameters

type

basic:boolean

default

False

cutadapt.quality_cutoff
label

Reads quality cutoff

type

basic:integer

description

Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.

required

False

downsampling.n_reads
label

Number of reads

type

basic:integer

default

1000000

downsampling.seed
label

Seed

type

basic:integer

default

11

downsampling.fraction
label

Fraction

type

basic:decimal

description

Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.

required

False

downsampling.two_pass
label

2-pass mode

type

basic:boolean

description

Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.

default

False

quantification.feature_class
label

Feature class

type

basic:string

description

Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.

default

exon

quantification.id_attribute
label

ID attribute

type

basic:string

description

GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.

default

gene_id

choices

  • gene_id: gene_id

  • transcript_id: transcript_id

  • ID: ID

  • geneid: geneid

Output results

DESeq2

data:differentialexpression:deseq2:differentialexpression-deseq2 (list:data:expression  case, list:data:expression  control, basic:boolean  create_sets, basic:decimal  logfc, basic:decimal  fdr, basic:boolean  beta_prior, basic:boolean  count, basic:integer  min_count_sum, basic:boolean  cook, basic:decimal  cooks_cutoff, basic:boolean  independent, basic:decimal  alpha)[Source: v3.2.2]

Run DESeq2 analysis. The DESeq2 package estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. See [here](https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf) and [here](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) for more information.

Input arguments

case
label

Case

type

list:data:expression

description

Case samples (replicates)

required

True

hidden

False

control
label

Control

type

list:data:expression

description

Control samples (replicates)

required

True

hidden

False

create_sets
label

Create gene sets

type

basic:boolean

description

After calculating differential gene expressions create gene sets for up-regulated genes, down-regulated genes and all genes.

required

True

hidden

False

default

False

logfc
label

Log2 fold change threshold for gene sets

type

basic:decimal

description

Genes above Log2FC are considered as up-regulated and genes below -Log2FC as down-regulated.

required

True

hidden

!create_sets

default

1.0

fdr
label

FDR threshold for gene sets

type

basic:decimal

required

True

hidden

!create_sets

default

0.05

options.beta_prior
label

Beta prior

type

basic:boolean

description

Whether or not to put a zero-mean normal prior on the non-intercept coefficients.

required

True

hidden

False

default

False

filter_options.count
label

Filter genes based on expression count

type

basic:boolean

required

True

hidden

False

default

True

filter_options.min_count_sum
label

Minimum gene expression count summed over all samples

type

basic:integer

description

Filter genes in the expression matrix input. Remove genes where the expression count sum over all samples is below the threshold.

required

True

hidden

!filter_options.count

default

10

filter_options.cook
label

Filter genes based on Cook’s distance

type

basic:boolean

required

True

hidden

False

default

False

filter_options.cooks_cutoff
label

Threshold on Cook’s distance

type

basic:decimal

description

If one or more samples have Cook’s distance larger than the threshold set here, the p-value for the row is set to NA. If left empty, the default threshold of 0.99 quantile of the F(p, m-p) distribution is used, where p is the number of coefficients being fitted and m is the number of samples. This test excludes Cook’s distance of samples belonging to experimental groups with only two samples.

required

False

hidden

!filter_options.cook

filter_options.independent
label

Apply independent gene filtering

type

basic:boolean

required

True

hidden

False

default

False

filter_options.alpha
label

Significance cut-off used for optimizing independent gene filtering

type

basic:decimal

description

The value should be set to adjusted p-value cut-off (FDR).

required

True

hidden

!filter_options.independent

default

0.1

Output results

raw
label

Differential expression

type

basic:file

required

True

hidden

False

de_json
label

Results table (JSON)

type

basic:json

required

True

hidden

False

de_file
label

Results table (file)

type

basic:file

required

True

hidden

False

count_matrix
label

Count matrix

type

basic:file

required

True

hidden

False

source
label

Gene ID database

type

basic:string

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

feature_type
label

Feature type

type

basic:string

required

True

hidden

False

Deeptools bamCoverage

data:coverage:bigwig:scale-bigwig (data:alignment:bam  alignment, data:bedpe  bedpe, basic:decimal  scale)[Source: v1.1.1]

Creates a scaled BigWig file.

Input arguments

alignment
label

Alignment BAM file

type

data:alignment:bam

required

True

hidden

False

bedpe
label

BEDPE Normalization factor

type

data:bedpe

description

The BEDPE file describes disjoint genome features, such as structural variations or paired-end sequence alignments. It is used to estimate the scale factor.

required

True

hidden

False

scale
label

Scale for the normalization factor

type

basic:decimal

description

Magnitude of the scale factor. The scaling factor is calculated by dividing the scale with the number of features in BEDPE (scale/(number of features)).

required

True

hidden

False

default

10000

Output results

bigwig
label

bigwig file

type

basic:file

required

True

hidden

False

species
label

Species

type

basic:string

required

True

hidden

False

build
label

Build

type

basic:string

required

True

hidden

False

Detect library strandedness

data:strandednesslibrary-strandedness (data:reads:fastq  reads, basic:integer  read_number, data:index:salmon  salmon_index)[Source: v0.4.1]

This process uses the Salmon transcript quantification tool to automatically infer the NGS library strandedness. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

Input arguments

reads
label

Sequencing reads

type

data:reads:fastq

description

Sequencing reads in .fastq format. Both single and paired-end libraries are supported

read_number
label

Number of input reads

type

basic:integer

description

Number of sequencing reads that are subsampled from each of the original .fastq files before library strand detection

default

50000

salmon_index
label

Transcriptome index file

type

data:index:salmon

description

Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results

Output results

strandedness
label

Library strandedness type

type

basic:string

description

The predicted library strandedness type. The codes U and IU indicate ‘strand non-specific’ library for single or paired-end reads, respectively. Codes SF and ISF correspond to the ‘strand-specific forward’ library, for the single or paired-end reads, respectively. For ‘strand-specific reverse’ library, the corresponding codes are SR and ISR. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

fragment_ratio
label

Compatible fragment ratio

type

basic:decimal

description

The ratio of fragments that support the predicted library strandedness type

log
label

Log file

type

basic:file

description

Analysis log file.

Dictyostelium expressions

data:expression:polyaexpression-dicty (data:alignment:bam  alignment, data:annotation:gff3  gff, data:mappability:bcm  mappable)[Source: v1.4.1]

Dictyostelium-specific pipeline. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Input arguments

alignment
label

Aligned sequence

type

data:alignment:bam

gff
label

Features (GFF3)

type

data:annotation:gff3

mappable
label

Mappability

type

data:mappability:bcm

Output results

exp
label

Expression RPKUM (polyA)

type

basic:file

description

mRNA reads scaled by uniquely mappable part of exons.

rpkmpolya
label

Expression RPKM (polyA)

type

basic:file

description

mRNA reads scaled by exon length.

rc
label

Read counts (polyA)

type

basic:file

description

mRNA reads uniquely mapped to gene exons.

rpkum
label

Expression RPKUM

type

basic:file

description

Reads scaled by uniquely mappable part of exons.

rpkm
label

Expression RPKM

type

basic:file

description

Reads scaled by exon length.

rc_raw
label

Read counts (raw)

type

basic:file

description

Reads uniquely mapped to gene exons.

exp_json
label

Expression RPKUM (polyA) (json)

type

basic:json

exp_type
label

Expression Type (default output)

type

basic:string

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

feature_type
label

Feature type

type

basic:string

Differential Expression (table)

data:differentialexpression:uploadupload-diffexp (basic:file  src, basic:string  gene_id, basic:string  logfc, basic:string  fdr, basic:string  logodds, basic:string  fwer, basic:string  pvalue, basic:string  stat, basic:string  source, basic:string  species, basic:string  build, basic:string  feature_type, list:data:expression  case, list:data:expression  control)[Source: v1.4.1]

Upload Differential Expression table.

Input arguments

src
label

Differential expression file

type

basic:file

description

Differential expression file. Supported file types: *.xls, *.xlsx, *.tab (tab-delimited file), *.diff. DE file must include columns with log2(fold change) and FDR or pval information. DE file must contain header row with column names. Accepts DESeq, DESeq2, edgeR and CuffDiff output files.

validate_regex

\.(xls|xlsx|tab|tab.gz|diff|diff.gz)$

gene_id
label

Gene ID label

type

basic:string

logfc
label

LogFC label

type

basic:string

fdr
label

FDR label

type

basic:string

required

False

logodds
label

LogOdds label

type

basic:string

required

False

fwer
label

FWER label

type

basic:string

required

False

pvalue
label

Pvalue label

type

basic:string

required

False

stat
label

Statistics label

type

basic:string

required

False

source
label

Gene ID database

type

basic:string

choices

  • AFFY: AFFY

  • DICTYBASE: DICTYBASE

  • ENSEMBL: ENSEMBL

  • NCBI: NCBI

  • UCSC: UCSC

species
label

Species

type

basic:string

description

Species latin name.

choices

  • Homo sapiens: Homo sapiens

  • Mus musculus: Mus musculus

  • Rattus norvegicus: Rattus norvegicus

  • Dictyostelium discoideum: Dictyostelium discoideum

  • Odocoileus virginianus texanus: Odocoileus virginianus texanus

  • Solanum tuberosum: Solanum tuberosum

build
label

Build

type

basic:string

description

Genome build or annotation version.

feature_type
label

Feature type

type

basic:string

default

gene

choices

  • gene: gene

  • transcript: transcript

  • exon: exon

case
label

Case

type

list:data:expression

description

Case samples (replicates)

required

False

control
label

Control

type

list:data:expression

description

Control samples (replicates)

required

False

Output results

raw
label

Differential expression

type

basic:file

de_json
label

Results table (JSON)

type

basic:json

de_file
label

Results table (file)

type

basic:file

source
label

Gene ID database

type

basic:string

species
label

Species

type

basic:string

build
label

Build

type

basic:string

feature_type
label

Feature type

type

basic:string

Differential expression of shRNA

data:shrna:differentialexpression:differentialexpression-shrna (data:file  parameter_file, list:data:expression:shrna2quant:  expression_data)[Source: v1.2.1]

Performing differential expression on a list of objects. Analysis starts by inputting a set of expression files (count matrices) and a parameter file. Parameter file is an xlsx file and consists of tabs: - `sample_key`: Should have column sample with exact sample name as input expression file(s), columns defining treatment and lastly a column which indicates replicate. - `contrasts`: Define groups which will be used to perform differential expression analysis. Model for DE uses these contrasts and replicate number. In R annotation, this would be ` ~ 1