Bioinformatics pipelines and scripts

159 words

'

Pipelines

Reference-based mapping: maps short or long reads to a reference genome with bowtie2 or minimap2, respectively. Handles Illumina and Oxford Nanopore data.
De novo assembly: assembles short Illumina reads with Spades.
RNA-Seq: performs read alignment with HiSAT2, as well as transcript assembly and quantification with stringtie. Also prepares input files for DESeq2.

Scripts

Cleaning FASTA headers: clean FASTA headers with regex
Counting bases: calculates per-base totals/percentages of a fasta, distinguishing between repetitive and non-repetitive bases.
Plot coverage: creates an interactive coverage plot from a set of bed files. Uses Bokeh.

One-liners

Get number of lines from a raw content URL

wget -q -O - [url here] | grep '>' | wc -l

Get lengths of a fasta sequence

awk '/^>/ {if (len) print len; len=0; next} {len += length($0)} END {if (len) print len}' sequence.fasta

Get basic statistics from a stream of numbers (e.g. fasta lengths). Requires r-base.

[cmd] | R -q -e 'x <-scan(file("stdin")); summary(x); sd(x)'