'

Pipelines

  • Reference-based mapping: maps short or long reads to a reference genome with bowtie2 or minimap2, respectively. Handles Illumina and Oxford Nanopore data.
  • De novo assembly: assembles short Illumina reads with Spades.
  • RNA-Seq: performs read alignment with HiSAT2, as well as transcript assembly and quantification with stringtie. Also prepares input files for DESeq2.

Scripts

  • Cleaning FASTA headers: clean FASTA headers with regex
  • Counting bases: calculates per-base totals/percentages of a fasta, distinguishing between repetitive and non-repetitive bases.
  • Plot coverage: creates an interactive coverage plot from a set of bed files. Uses Bokeh.

One-liners

  • Get number of lines from a raw content URL
wget -q -O - [url here] | grep '>' | wc -l
  • Get lengths of a fasta sequence
awk '/^>/ {if (len) print len; len=0; next} {len += length($0)} END {if (len) print len}' sequence.fasta
  • Get basic statistics from a stream of numbers (e.g. fasta lengths). Requires r-base.
[cmd] | R -q -e 'x <-scan(file("stdin")); summary(x); sd(x)'