Search This Blog

Friday 15 April 2011

UPLOADING SEQUENCES TO THE DATABASES/SEQUENCE SUBMISSIONS

Sequence can be submitted in NCBI GenBank using:
  1. Sequin
  2. BankIt

Sequin

Sequin is a stand-alone software tool developed by the National Center for Biotechnology Information (NCBI) for submitting and updating sequences to the GenBank, EMBL, and DDBJ databases. Sequin has the capacity to handle long sequences and sets of sequences (segmented entries, as well as population, phylogenetic, and mutation studies). It also allows sequence editing and updating, and provides complex annotation capabilities. In addition, Sequin contains a number of built-in validation functions for enhanced quality assurance.

File Formats Accepted

Sequin normally expects to read sequence files in FASTA format. Note that most sequence analysis software packages include FASTA or "raw" as one of the available output formats. Population studies, phylogenetic studies, mutation studies, and environmental samples may be entered in either FASTA format, or in PHYLIP, NEXUS, MACAW, or FASTA+GAP formats if you are submitting an alignment.

Creating a Submission

Sequin is organized into a series of forms for entering submitting authors, entering organism and sequences, entering information such as strain, gene, and protein names, viewing the complete submission, and editing and annotating the submission. The goal is to go quickly from raw sequence data to an assembled record that can be viewed, edited, and submitted to your database of choice.

Submitting Authors Form: The pages in the Submitting Authors form ask you to provide the release date, a working title, names and contact information of submitting authors, and affiliation information.

Submission page: This page asks for a tentative title for a manuscript describing the sequence and will initially mark the manuscript as being unpublished. When the article is published, the database staff will update the sequence record with the new citation. This page also lets you indicate that a record should be held confidential by the database until a specified date, although the preferred policy is to release the record immediately into the public databases. It also contains pages of contact, author and author’s affiliation.

Sequence Format Form: Submission Type: Single Sequence if you have a single contiguous mRNA or genomic DNA sequence.  Segmented Sequence if you have a single collection of non-overlapping, non-contiguous sequences that cover a specified genetic region from a single source. A standard example is a set of genomic DNA sequences that encode exons from a gene along with fragments of their flanking introns. Gapped Sequence if you have a single non-contiguous mRNA or genomic DNA sequence. A gapped sequence contains specified gaps of known or unknown length where the exact nucleotide sequence has not been determined. Sequence Format: FASTA, FASTA+GAP, NEXUS, PHYLIP,etc. Then we have to fill Organism page and Annotation page (this is optional) before final submission. Now, the program will supply an automatic identifier which will be used for deposition in database and for future correspondence.


BankIt

BankIt is a web based tool developed by the National Center for Biotechnology Information (NCBI) for submitting and updating sequences to the GenBank,


Creating a Submission

Contact Information: Name, address, phone number, fax number and email address of the submitter must be entered when registering and submitting for the first time
Release date information: Immediately after it is processed at NCBI or on a date the submitter specifies

Reference information: Sequence authors: names of the researchers who are credited with the sequence Publication information: Unpublished, In-Press, or Published; and applicable citation information (paper's title, authors, journal title, volume, issue, year, pages)

Submission Category and Type: Original sequencing or Third Party Annotation
Single sequence, sequence set (phylogenetic, population, environmental, etc), or batch

Nucleotide sequence(s): Input (cut-and-paste) single or multiple sequences or Upload them as a FASTA file; FASTA files should include organisms in their definition lines
Sequences must be at least 200 nucleotides long (unless they are complete exons, non-coding RNAs (ncRNAs), microsatellites or ancient DNA)

Molecule type: what was sequenced? (genomic DNA, mRNA, genomic RNA, cRNA, etc)
Topology: linear or circular (circular must be complete, such as a complete plasmid)

Organism name, applicable source modifiers, location : Genus and species names (if not previously provided in FASTA file) If name is new or unrecognized, provide best known taxonomic lineage If genus and/or species names are not known, provide most specific name known (for example:Bacillus sp., Uncultured bacterium, Uncultured archaeon) Most complete name for any synthetic vector (for example: Cloning vector pAB234, Transfer vector p789Abc) Source modifiers include: strain, clone, isolate, specimen-voucher, isolation-source, country Location: organelle (mitochondrion, chloroplast, etc); map and/or chromosome

Features of the sequence: Upload files or use input forms to add all applicable features (for example: CDS, gene, rRNA, tRNA, microsatellite, exon, intron)





No comments:

Post a Comment