A SEquence Globally Unique IDentifier (SEGUID) Proteome Database

External Databases, Interfaces No Comments »

Protein sequences stored in UniProt have a CRC64 checksum.  This means that it is possible to take an arbitrary sequence (with or without a known identifier), calculate the CRC64 checksum for it, and then search UniProt for that specific sequence.  There are two problems with this:

  1. UniProt does not provide any web services for querying their database.  I’ll check again, but, I don’t see it :-(
  2. There is a low possibility of collision; that is, two sequences may produce the same CRC64 checksum.

On Nature Precedings I came across A SEquence Globally Unique IDentifier (SEGUID) Proteome Database that solves the two problems above.

Powered by ScribeFire.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Login