Description

My Genome Collection with Assigned Taxonomy (My Gee CAT) provides for the management of extremely large sequence databases. The current database that I manage with this package contains nearly 100 million sequence files. The program is written entirely in PERL and makes use of a MySQL database for the backend. The main power of MyGCAT is that the taxonomic assignment of all records allows users to create custom BLAST databases selected by data table source and taxonomic membership at any taxonomic level. Taxonomic sublevels may also be excluded from the database. For example, a database of the Embryophyta may be created which excludes all members of the Poaceae. This provides a powerful way to create BLAST databases that are specific to the actual hypothesis that the user is interested in testing as opposed to doing a massive search through all of GenBank.

Features

Fully implemented:

Under development:


Screenshots

Log On to MyGCAT

Database security is provided via the user list from the MySQL database. Users must be registered as legitimate MySQL users to be able to access the HTML interface to the database.

Review Table Properties

You can review all of the properties of the tables that are in the MySQL database prior to running a query. In the future this interface will provide the ability to update specific tables from NCBI. Currently database updates occur as cron jobs once a week.

Generate a Database

The user password must be entered again. The output formats that are available include WU-BLAST, NCBI-BLAST, or FASTA formatted text files. A set of taxon ids to include in the output file can be selected, and taxonomic classes to exclude from these included sets may also be selected. The data tables that are included in the output file may also be selected here. All taxonomic IDs make use of the NCBI taxon id numbers. The example below will create a database name Embryophyta that includes all of the embyrophytes in the selected table but will exclude all Poaceae.

Review Query

The user has the opportunity to review the query before running the program. The name and rank of the taxa that were selected for inclusion and exclusion are shown here for clarity.

Databases Available for Download

Once a file has been created, it will be tarred and gzipped before being transferred to a central location for download via a web interface.


Author: James Estill
Last Updated: Sunday, 12 June 2005
SourceForge.net Logo