Search Field Descriptions for Sequence Database

Monica Romiti; Peter Cooper

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Entrez Sequences Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.

Entrez Sequences Help [Internet].

Show details

Contents

< PrevNext >

Search Field Descriptions for Sequence Database

Monica Romiti, M.L.S. and Peter Cooper, Ph.D.

Author Information and Affiliations

Created: December 3, 2010; Last Update: July 5, 2024.

Estimated reading time: 4 minutes

Table 1.

Fields available for Nucleotide and Protein Sequence Databases.

Search Field	Short Field Specifier^†	Definition
[Accession]	[ACCN]	The accession number assigned by NCBI. Examples: AF123456[ACCN] Nucleotide NP_000240[ACCN] Protein
[All Fields]	[ALL]	All terms from all search fields in the database. Example: human[All Fields] Nucleotide Protein (Compare with human[Organism], see [Organism] entry in this table.)
[Author]	[AU] [AUTH]	All authors from all references in the records. The format is last name [space] first initial(s), without punctuation. Example: venter jc[AUTH] Nucleotide Protein
[EC/RN Number]	[ECNO]	Enzyme Commission (EC) number for an enzyme activity. Example: 5.3.1.9[ECNO]) Protein Nucleotide (glucose-6-phosphate isomerase)
[Feature Key] (Nucleotide, Protein, GSS)	[FKEY]	Biological features listed in the Feature Table of the sequence records. Examples: 3 utr[FKEY] Nucleotide nonstdres[FKEY] Protein The GenBank feature table definition has more information on available features.
[Filter]	[FILT] [SB]	Filtered subsets of the database. An important kind of filter is based on the presence of links to other records. Other filters create useful subsets of data such as those set as Filters in the Discovery column of search results Examples: Links nucleotide_protein[Filter] Nucleotide protein_structure[Filter] Protein Organism or properties subsets all[filter] Nucleotide Protein mrna[filter] Nucleotide refseq[filter] Nucleotide Protein mammals[filter] Nucleotide Protein
[Gene Name]	[GENE]	Gene names annotated on database records. For NCBI Reference Sequences, these names correspond to official nomenclature guidelines when possible. Submitters provide the gene names on GenBank/GenPept records. Gene names on submitted records may be historical names or vary from official guidelines for other reasons. Example: BRCA1[GENE] Nucleotide Protein
[Bioproject]	[BPRJ]	The numeric unique identifier for the BioProject that produced the sequence records. Examples: 13139[Bioproject] Nucleotide Protein (Oryza sativa Japonica) 21117[Bioproject] Nucleotide (Pelagic Microbial Assemblages in the Oligotrophic Ocean)
[Issue]	[ISS]	The issue number of the journals cited on sequence records, not generally useful in sequence databases.
[Journal]	[JOUR]	The name of the journals cited on sequence records. Journal names are indexed in the database in abbreviated form although many full titles are mapped to their abbreviations. Journals are also indexed by their by International Standard Serial Number (ISSN). Examples: proceedings of the national academy of sciences of the united states of america[Journal] Nucleotide Protein Proc Natl Acad Sci U S A[Journal] Nucleotide Protein 0027-8424[Journal] Nucleotide Protein
[Keyword]	[KYWD]	Keywords applied by submitter or from controlled vocabularies applied by NCBI or other databases. Except for specific kinds of records, such as the examples given below, the terms in this index are not well controlled. This field is unpopulated for many GenBank/GenPept records. Examples: BARCODE[KYWD] Nucleotide Protein HTG[KYWD] Nucleotide RefSeqGene[KYWD] Nucleotide WGS_MASTER[KYWD] Nucleotide
[Modification Date]	[MDAT]	The date of most recent modification of a sequence record. The date format is YYYY/MM/DD. Only the year is required. The Modification Date is often used as a range of dates. The colon ( : ) separates the beginning and end of a date range. Examples: 2023/01/08[MDAT] Nucleotide Protein 1995/09[MDAT] Nucleotide Protein 2022/01:2023/12/31[MDAT] Nucleotide Protein
[Molecular Weight] (Protein only)	[MOLWT]	The molecular weight in Daltons of the protein chain calculated from the amino acids only. This may not correspond to the molecular weight of the protein obtained from biological samples because of incomplete data or post-translational modifications of the protein in living systems. The colon ( : ) separates the beginning and end of a molecular weight range. Examples: 3039[MOLWT] Protein 25000:75000[MOLWT] Protein
[Organism]	[ORGN]	The scientific and common names for the complete taxonomy of organisms that are the source of the sequence records.This vocabulary includes all available nodes in the NCBI taxonomy database. Examples: cellular organisms[ORGN] Nucleotide Protein firmicutes[ORGN] Nucleotide Protein human[ORGN] Nucleotide Protein Escherichia coli O157:H7[ORGN] Nucleotide Protein
[Page Number]	[PAGE]	The page numbers of the articles that are cited on the sequence record, not generally useful in sequence databases.
[Primary Accession]	[PACC]	The primary accession number of the sequence record. This is the first one appearing on the ACCESSION line in the GenBank/GenPept format. Many records have additional secondary accessions representing records that have been merged. The Accession field indexes both primary and secondary accessions. Examples: U01317[PACC] Nucleotide M18047[PACC] Nucleotide (Compare: M18047[ACCN] Nucleotide, see [Accession] entry in this table.)
[Primary Organism]	[PORGN]	The primary organism when there is more than one source organism. Examples: human[PORGN] Nucleotide (Compare with human[ORGN] Nucleotide, see [Organism] entry in this table.)
[Properties]	[PROP]	Molecular type, source database, and other properties of the sequence record. Terms indexed for this field are a useful classification system for sequence records. Examples: Molecule type biomol_ncrna[PROP] Nucleotide biomol_genomic[PROP] Nucleotide biomol_mrna[PROP] Nucleotide Cellular location gene_in_genomic[PROP] Nucleotide Protein gene_in_mitochondrion[PROP] Nucleotide Protein gene_in_plastid[PROP] Nucleotide Protein GenBank division gbdiv_htg[PROP] Nucleotide gbdiv_vrt[PROP] Nucleotide Protein (These GenBank division queries must be combined with srcdb_genbank[PROP] to retrieve only GenBank records.) Database source srcdb_genbank[PROP] Nucleotide Protein srcdb_ddbj/embl/genbank[PROP] Nucleotide Protein srcdb_refseq[PROP] Nucleotide Protein srcdb_pdb[PROP] Nucleotide Protein srcdb_swiss-prot[PROP] Protein
[Protein Name]	[PROT]	The names of protein products as annotated on sequence records. The content of this field is not well controlled for GenBank/GenPept records and may contain inaccurate or incomplete information. Examples: aldolase[Protein Name] Nucleotide Protein
[Publication Date]	[PDAT]	The date that records were made public in Entrez. The date format is YYYY/MM/DD. The colon ( : ) separates the beginning and end of a date range. Examples: 2023/01/08[PDAT] Nucleotide Protein 1995/09[PDAT] Nucleotide Protein 2022/01:2023/12/31[PDAT] Nucleotide Protein
[SeqID String]	[SQID]	The NCBI identifier string for the sequence record. This is a brief structured format used by NCBI software. Example: gnl asm gca 000000215 2 chr3 45328308[SeqID String] Nucleotide
[Sequence Length]	[SLEN]	The total length of the sequence − the number of nucleotides or amino acids in the sequence. The colon ( : ) separates the beginning and end of a length range. Examples: 755[SLEN] Nucleotide Protein 100:1000[SLEN] Nucleotide Protein
[Substance Name]	[SUBS]	The names of chemical substances associated with a record. This field is only populated for sequences extracted from structure records – PDB derived sequences. The associated residue position is often included. Examples: mg, 1010[Substance Name] Nucleotide atp[Substance Name] Protein
[Text Word]	[WORD]	Text on a sequence record that is not indexed in other fields. Terms indexed here are included in an All Fields search, not generally useful.
[Title]	[TI] OR [TITL]	Words and phrases found in the title of the sequence record. The title is the DEFINITION line of the GenBank/GenPept format of the record. This line summarizes the biology of the sequence and includes the organism, product name, gene symbol, molecule type, and sequence completeness. complete cds[TI] Nucleotide kinesin[TI] Nucleotide Protein liver[TI] Nucleotide Protein uncultured[TI] Nucleotide Protein
[Volume]	[VOL]	Contains the volume number of the journals in references on the sequence record, not generally useful in the sequence databases.

†: Queries using any term followed by the full name of the indexed field in square brackets will only retrieve records with the term indexed in that field. For example a search with apolipoprotein[Title] finds only records with “apolipoprotein” indexed for their Title field. Some fields have shorter names that can also be used instead of the full name. These are listed in the Abbreviated Field Specifier column of Table 1 when available.

Bookshelf ID: NBK49540

Contents

< PrevNext >

PubReader
Print View
Cite this Page
Romiti M, Cooper P. Search Field Descriptions for Sequence Database. 2010 Dec 3 [Updated 2024 Jul 5]. In: Entrez Sequences Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.
PDF version of this page (114K)
PDF version of this title (2.0M)

Other titles in this collection

NCBI Help Manual

Recent Activity

Clear Turn Off Turn On

Search Field Descriptions for Sequence Database - Entrez Sequences Help
Search Field Descriptions for Sequence Database - Entrez Sequences Help

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Entrez Sequences Help [Internet].

Search Field Descriptions for Sequence Database

Table 1.

Views

Other titles in this collection

Recent Activity