GenBank reaching 2.5 million COI sequences

At BopCo we use the reference library Genbank to compare our produced barcodes from unknown samples against the available reference sequences of known species. For animal requests (e.g. birdstrikes, insect pest, bushmeat) or related projects, the most popular barcode is the cytochrome c oxidase subunit 1 (COI). One of the main contemplations we make, before presenting a taxonomic assignment by DNA barcoding, is the quality of reference database. 

A recent study documents trends and usability of COI for (meta)barcoding. Currently over 2.5 million COI sequences are available on Genbank. An impressive number, without doubt, but the usability of these records is not guaranteed due to quality, labelling or clarity issues. The authors found that only half of the 2.5 million COI records were fully identified to the species rank, of which 92% at least 500 bp long and 74% having a country annotation. A number of suggestions are made to ensure the quality of Genbank in the future. The paper also mentions that other big reference library BOLD, which although having more private data, has more features to assess barcode reliability built-in. The cross-reference between the databases is reportedly far from complete, meaning the online representation of COI will be even higher.

The full paper is available here: https://doi.org/10.1101/353904


Mon, 2018-08-06 11:51 -- BopCo
Scratchpads developed and conceived by (alphabetical): Ed Baker, Katherine Bouton Alice Heaton Dimitris Koureas, Laurence Livermore, Dave Roberts, Simon Rycroft, Ben Scott, Vince Smith