One large project I am working on involves sequencing the genomes of many hundred Eucalyptus individuals across the section Adnataria (subgenus Symphyomyrtus). The Eucalyptus reference genome is of E. grandis, in an adjacent section within the same sub-genus. Molecularly, this equates to something in the order of 5% of sites diverged between most Adnataria and the reference. Therefore, care is needed when aligning short reads to the reference. The aligner should be sensitive enough to align short reads to a reasonably diverged reference, without increasing the number of false-positive or otherwise erroneous alignments.
read more
The best JabRef/BetterBibTex citation key
I like citekeys like "murray17_kwip", which seems the most concise representation. This can be achieved by the following JabRef code:read moreFreebayes uses too much ram without merged BAMs
I've been running some very large variant calling runs lately, and have run into RAM limitations on the cluster. According to this GitHub issue, freebayes really ought have merged bams as input with many samples. Merging bams reduces the memory footprint from >20GB per core to about 2GB per core.read moreUsing getopt twice in one program
read moreGetopt won't run twice? There's an easy fix
The Arabidopsis Transcriptome under Dynamic Light Conditions
I wrote my honours thesis in 2013, in the lab of Justin Borevitz. I aimed to develop lab growth facilities that mimic natural rhythms of light, humidity and temperature, and use these facilities to investigate the effects of different lighting conditions on the transcriptomes of Arabidopsis.read more