An update on Bioawk

Bioawk is a great project started by Heng Li some years ago. The aim was to take the awk source code and modify it slightly for use with common biological formats and adding in some new functions. Heng's original doesn't accept too many pull requests so to add in some features, I maintain my own … Continue reading An update on Bioawk

Getting data from NCBI assembly using the accession number only

NCBI's assembly database is a great one-stop-shop for genomic data and annotations but it's actually kind of difficult to download data if you only know the accession number of an assembly. The documentation says that the assembly database is integrated with entrez-direct, a great set of command line utilities for accessing NCBI data from the … Continue reading Getting data from NCBI assembly using the accession number only

Creating huge metabolic overviews for comparative genomics

I love looking at KEGG maps and using them to understand an organisms metabolism but they have their limitations. For starters, you're obviously stuck with how they are drawn, which in most cases includes many variations on a particular pathway. Secondly, the tools for mapping on your own genes to a pathway are limited to … Continue reading Creating huge metabolic overviews for comparative genomics

Going from a messy supplementary table to good clean data

Bioinformatics... Or 'advanced file copying' as I like to call it. — Nick Loman (@pathogenomenick) January 29, 2014   Get ready for some advanced file copying! I recently had to clean up some data from the supplementary material from Pereira et. al 2011, which is a very nice table of manually annotated genes in sulfate … Continue reading Going from a messy supplementary table to good clean data