RefSeqGene and LRG (Locus Reference Genomic)
How do RefSeqGene and LRG compare?
How RefSeqGene and LRG are similar
When a RefSeqGene record is assigned an LRG accession, it means that the sequence, the definition and labeling of exons, and the definition of product transcript(s) and protein(s) are identical for that version of the RefSeqGene and the LRG. In other words, it will make no difference if variants are reported in LRG or RefSeqGene coordinates. These key values are identical:
- the genomic sequence
- the location of exons
- the numbering of exons
- the sequence(s) of the reference cDNA(s)
- the sequence(s) of the protein product(s)
How RefSeqGene and LRG differ
The format of the RefSeqGene record differs from that of an LRG. Take for example NG_007400.1 (GenBank format, Graphics format) compared to LRG_1. Also, there are still RefSeqGene accessions that have not yet been assigned an LRG accession.
Which RefSeqGenes have been established as LRG accessions?
A RefSeqGene that has been assigned an LRG accession can be identified by the following:
- The LRG identifier is displayed in the sequence title, as in (LRG_1) of the RefSeqGene record.
- The t1/t2/... identifier(s) is/are provided as a cross-reference to each RefSeq cDNA.
- The p1/p2/... identifier(s) is/are provided as a cross-reference to each RefSeq protein.
- The LRG accession is reported from NCBI's Gene database in the Reference Sequences section (e.g. http://www.ncbi.nlm.nih.gov/gene/7157#reference-sequences)
- The file ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/RefSeqGene/LRG_RefSeqGene reports the accessions for RefSeqGene and LRG genomic regions, transcripts, and proteins
How can LRG data be found and visualized at NCBI?
- Use the Browse section of the RefSeqGene site.
- A URL to retrieve/display the sequence in GenBank format can be constructed by appending the LRG identifier after http://www.ncbi.nlm.nih.gov/nuccore/ as in http://www.ncbi.nlm.nih.gov/nuccore/LRG_1
- All LRG sequences can be retrieved from the Nucleotide database by using a range query: LRG_1:LRG_9999[ACCN]
- Any Gene that has a public LRG can be retrieved from Gene based on the property has_lrg. The LRG accession is reported to the right of the RefSeqGene accession in the references sequence section of the Gene page. It is also displayed in the annotation of the alignment of the RefSeqGene/LRG sequence to the chromosome sequence that can be displayed in the sequence viewer embedded in the Genomic regions, transcripts, and products section of Gene's full report page.
- A query on an LRG accession can be used to search all NCBI databases (e.g. http://www.ncbi.nlm.nih.gov/gquery/?term=LRG_377). At present, results are usually restricted to Gene and Nucleotide. ClinVar will be reporting data in LRG coordinates soon.
- If you have an LRG number, and know the chromosome of the gene, you can find the alignment of the RefSeqGene/LRG on that chromosome by searching for the LRG accession via Tools-> Search. The result will be in the Features tab, based on the Gene symbol. If you have not configured your display to include alignments (Configure->Alignments->NG Alignments->Configure) you must do so to see the RefSeqGene/LRG placement. Once the alignment is displayed, hover over it to open the tooltip completely; you can follow the link in the alignment to the RefSeqGene.
Overview/ Time Line
The RefSeqGene project got started at NCBI early in 2006 based on a request from Dr. M. Gulley of the College of American Pathologists, reinforced by staff of the CETT program at NIH's Office of Rare Diseases. Discussion soon expanded to curators of locus-specific databases, especially since one of our early collaborators, Dr. Sue Povey, curated TSC2 which was also of interest to CETT. The first RefSeqGene records were released early in 2007, with over 1000 being public at the end of 2008.
In April, 2008, staff of NCBI (RefSeqGene, dbSNP and dbGaP), GEN2PHEN, and EBI met in Hinxton, UK, to discuss how to establish a stable internationally accepted sequence standard for reporting the position of human variation. The experience of the RefSeqGene project was reviewed, and a proposal for implementation was distributed to LSDB curators. This proposal was published.
The RefSeqGene/LRG collaboration has defined methods to establish LRG accessions to RefSeqGene sequences. Tools were developed at NCBI to convert RefSeqGene sequences to the LRG format and a database to track the accessions is in place. The LRG has an established web site (http://www.lrg-sequence.org/) with comprehensive documentation.