FAQ about using ClinVar
Please see our submission FAQ for questions about the submission process.
How to search ClinVar
- How do I search ClinVar efficiently?
- How do I retrieve a list of variants in a gene in order of location?
- How can I sort results in gene-specific order?
- How can I retrieve batches of data from ClinVar?
- How can I find other variants at a location, or other variants that cause the same protein change?
Using the web display
- What do SCV and RCV mean?
- Why are there different classifications for the same variant?
- Why are there multiple RCV accessions in ClinVar for the same variant?
- I'm interested in a variant that was submitted with the condition "not specified". What does that mean?
- What does it mean if a ClinVar record is " criteria provided, single submitter" but has more than one submission?
- I think a variant in ClinVar has the wrong classification. What should I do about that?
- I see multiple dates on a ClinVar record. What do they mean?
- How can I tell if a record became public on the ClinVar website after the monthly VCF file was created?
- When a variant may lie in multiple genes, what does ClinVar report?
- A variant in ClinVar is described as a 1-nt deletion in the genomic DNA, but a 60-nt deletion in the mRNA. Is that an error?
- My browser does not display the pdf file documenting Assertion Method. What do I do?
Data sources and processing
- What data sources are included in ClinVar?
- Why don't ClinVar records include HGMD identifiers?
- Why do some variants in ClinVar have no genomic location?
- Is a new version number assigned for any change in an SCV record?
- When there are multiple RefSeqs for a gene, does ClinVar select a subset to use for reporting?
- What is ClinVar's convention for representing the location of variation with length differences when there are multiple options, left or right justified?
- I included specific ages and geographic origins for individuals in my submission; why are they reported as a range and a larger geographic region?
Reports
- Why doesn't the VCF file contain all the data in the XML file?
- Where can I find statistics about the number of ClinVar submissions?
- On the submitter summary page, why is the number of submissions sometimes different from the number of records from that submitter?
Citing ClinVar
How to search ClinVar
1. How do I search ClinVar efficiently?
As documented in more detail here , ClinVar can be searched with terms like
- gene symbols, e.g. PTEN
- HGVS expressions, e.g. NM_000314.4:c.395G>T
- protein changes, e.g. G132V
- rs numbers, e.g. rs180177042
- diseases, e.g. PTEN hamartoma tumor syndrome
- submitters, e.g. Invitae
- location on a chromosome, e.g a range on chromosome for an assembly, e.g between 89623000 and 89730000 on chromosome 10 based on GRCh37
Note that by default, searching uses the exact search terms provided; for example, searching for "Noonan" finds records that include the word Noonan but does not find records with the word "Noonan's". Consider doing a wild-card search like "Noonan*" if you want to expand your search. Also, ClinVar queries search all fields of data by default. More information on how to narrow your query by searching particular fields in available in the ClinVar help document . If you have favorite queries that you will do periodically, you can login to MyNCBI and save your searches. Saved searches can be run on-the-fly or you can receive regular email updates with results of the search.
2. How do I retrieve a list of variants in a gene in order of location?
You can search for the gene symbol in ClinVar:
- results are returned in order of ascending genomic location
- variants in genes on the plus strand of the chromosome are in ascending order
- variants in genes on the minus strand of the chromosome are in descending order; at this time we do not have an option to make these variants sort in the opposite order
- use the "Send to" in the upper right of the page to save the results to a file
Or you can get the data from the FTP site. You can look in the VCF file or the variant_summary.txt file for your gene of interest and parse the protein expressions to order by protein location.
3. How can I sort results in gene-specific order?
When a gene is on the negative strand, the location of a variation relative to the gene sorts in opposite order to the location on the genome. The position column on ClinVar's tabular display is the chromosome location, so the ordering seems counterintuitive. At present we do not provide a method to sort in the opposite order.
4. How can I retrieve batches of data from ClinVar?
ClinVar does not currently support a batch query interface, but there are several approaches that might still meet your needs:
Use case | Possible solutions |
---|---|
Variant-specific data for a list of genes |
Query ClinVar by listing the genes using the Boolean OR, and download the results interactively or using E-utilities |
Download the file https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz and process to extract gene-specific lines. | |
Download the full data extract https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz and process to extract gene-specific records. |
|
Variant-specific data for a list of conditions | Use the same approach as for genes, but use instead a list of identifiers such as MIM numbers or Concept UIDs (CUI) from MedGen. The file variant_summary.txt reports identifiers for conditions, but not their names. |
Gene-disease relationships |
https://ftp.ncbi.nlm.nih.gov/pub/clinvar/gene_condition_source_id reports gene-disease relationships used in ClinVar with attribution to the source. Current sources are OMIM®, GeneReviews®, and some from NCBI staff curation. This report includes all genes with variants asserted to result in a disease and submitted to ClinVar. |
5. How can I find other variants at a location, or other variants that cause the same protein change?
Variation Viewer is recommended for searching based on location. From a ClinVar record, click on the Variation Viewer link in the Browser views section on the right-hand side. This browser shows all the variation in ClinVar, dbSNP, and dbVar in the region of the variant of interest. Other variants that cause the same protein change can be searched using the protein HGVS expression, or by querying for the gene symbol and the one- or three-letter protein change.
Using the web display
1. What do SCV and RCV mean?
Read our description of identifiers in ClinVar, including SCV and RCV numbers.
2. Why are there different classification for the same variant?
ClinVar is an archive for classification made by our submitters. Submitters may disagree on how a variant should be classified; read more about how ClinVar calculates consensus and conflicts in different types of classifications. ClinVar does not arbitrate and resolve these conflicts. However, if we have a submission for that variant from an expert panel or a professional society, the assertion made by the expert panel or professional society is displayed and different interpretations from other submitters are not reported as conflicts.
A VCV record may also have different classifications for the same variant for different conditions. Looking at RCV records may help distinguish these cases. See the next question for more information.
3. Why are there multiple RCV accessions in ClinVar for the same variant?
An RCV accession in ClinVar is based on a variant-condition combination, not the variant alone. This representation was selected so that distinct accessions could be assigned to variants that result in distinct disorders. Each submitted classification is assigned an accession of the format SCV000000000.0 and versioned if the submitter updates a record ( e.g. SCV000000001.1 would be updated to SCV000000001.2). Each unique combination of variant-condition relationship is aggregated into a ClinVar record with an accession of the format RCV000000000.0.
A persistent issue for clinical genetics is the lack of consensus about how to describe the clinical condition that results from variation in known disease genes. As a result, we recognize there are often multiple RCV accessions assigned to the same variant for conditions that might be considered the same or overlapping. We anticipate this will change over time as expert panels review the data and decide how to describe the condition. When updates result in matching conditions, the RCV accessions will be merged.
Some variants have more than one RCV because the variant is in a gene that is associated with distinct disorders. For example, variants in RYR1 may have different classifications for malignant hyperthermia and for central core myopathy.
4. I'm interested in a variant that was submitted with the condition "not specified". What does that mean?
Some submitters want to assert that a variant is benign with respect to a specific condition, which leaves the possibility that it is clinically relevant for a different condition. Other submitters want to report that the variant is generally benign, in that it does not appear to cause any genetic disorder that should be observable because it is highly penetrant. ClinVar, in collaboration with members of the ClinGen project, requests that submitters provide "not specified" as the condition to indicate that they are not specifying any single condition but rather that the variant is generally benign. Use of this term for this kind of submission may be re-evaluated in the future.
5. What does it mean if a ClinVar record is "criteria provided, single submitter" but has more than one submission?
The review status "classified by single submitter" is based on non-expert panel submissions which include a classification, assertion criteria, and evidence for the variant classification. Some submissions to ClinVar lack assertion criteria, evidence for the classification, or less frequently the variant classification itself. These submissions are included in the count of submissions for a variant, but they may not contribute to the variant's review status. Read more about ClinVar’s review status.
6. I think a variant in ClinVar has the wrong classification. What should I do about that?
The goal of the ClinVar database is to represent the classifications provided by our submitters; therefore, ClinVar staff cannot change the classification that is submitted to us. If you think a variant in ClinVar has been classified incorrectly, we encourage you to submit your own classification of the variant along with your evidence, such as recent publications. The ClinVar submission wizard can be used to submit a single variant classification with minimal time commitment. Although your submission will not change the classification from other submitters, it will change the overall classification to indicate that there are conflicting reports of pathogenicity and users should look at all the available evidence. ClinVar records with conflicts may also prompt the previous submitters or expert panels to review the variant classification.
7. I see multiple dates on a ClinVar record. What do they mean?
Read our description of dates on ClinVar records.
8. How can I tell if a record became public on the ClinVar website after the monthly VCF file was created?
At the bottom left side of a ClinVar record is a date called "Last Updated". If this date is after the date on the VCF file, then the web is displaying data that is newer than data in the VCF file. The website is updated weekly, while the VCF file is created monthly.
9. When a variant affects multiple genes, what does ClinVar report?
There are several situations in which a variation may be considered to have a relationship to more than one gene. The reported gene or genes, and the preferred designations, are selected as follows:
Submitted as | Reported as |
---|---|
location on a cDNA with or without identifying the gene | the gene as submitted or calculated from the cDNA, and preferred name as calculated from the cDNA reference standard for that gene |
genomic location covering multiple non-overlapping genes, with no gene specified | all calculated genes in the region based on the most recent NCBI annotation release. The preferred designation is a genomic HGVS expression without a gene symbol. |
genomic location covering multiple overlapping genes, including those with shared exons | all genes are reported , but the preferred description is based on selection of the RefSeq that corresponds to an exonic location. If the variation is in an exon of more than one gene, then the preferred description will not include a gene symbol, and will be based only on the genomic location. |
10. A variant in ClinVar is described as a 1-nt deletion in the genomic DNA, but a 60-nt deletion in the mRNA. Is that an error?
This variant may represent a 1-nt deletion in genomic DNA that results in exon skipping, and therefore a larger deletion in the mRNA.
11. My browser does not display the file documenting Assertion Method. What do I do?
Settings in some browsers may need to be adjusted to display files provided as pdf. There are several options you can use to display the contents of the file:
- Use FireFox. At the time of this writing, the .pdf files are displayed with no difficulty.
-
Download the file, and then use your own tools to display
- Right click on the name of the assertion method.
- Select Save link as... and define the location to save the file
- Use your Acrobat Reader to read the file from directory where you save the file.
-
Alter the setting for your plugins (Chrome)
- Connect to chrome://settings/content
- Follow the link to Manage individual plugins
- This will open chrome://plugins
- Disable Chrome PDF viewer
- Enable Adobe Reader
- Go back to chrome://settings/content and click on Done
-
Review compatibility settings (Internet Explorer) https://msdn.microsoft.com/en-us/library/dn321449.aspx
Data Sources and Processing
1. What data sources are included in ClinVar?
From the ClinVar homepage, click on the Statistics link in the navigation bar at the top of the page, then click "List of submitters". This takes you to a summary of ClinVar's submitters.
2. Why don't ClinVar records include HGMD identifiers?
Allele information from HGMD is not publicly available, so ClinVar is unable to connect variants accurately to the appropriate record in HGMD.
3. Why do some variants in ClinVar have no genomic location?
ClinVar accepts the description of the variant provided by the submitter. We believe it helps our public to find records, even though the location of the variant on a defined sequence may be uncertain. In some cases, the original assay was at the protein level, and the numbering system for that protein is uncertain or the nucleotide change that may generate the protein change is indeterminate. In other cases, deletions have been reported relative to a transcript, and it is not clear whether the nucleotide change in the genome resulted from a corresponding genomic change, or aberrant splicing generated from another genomic location. Many submissions do have supporting citations, but ClinVar does not have the resources to review the literature to establish the precise nucleotide locations. We welcome submissions that would improve these data for us.
In other words, ClinVar does assign accessions to submissions that represent human variation descriptively, rather than based on an explicit public nucleotide sequence. In that situation, the search results table shows no nucleotide location for the variant and no links are provided to viewers. If the location of the allele is determined, the record is updated. This may not require re-submission from the submitter, if NCBI staff are able to establish the location of the variant based on review of the literature.
4. Is a new version number assigned for any change in an SCV record?
Any change that the submitter makes to any SCV record causes the version number to increase. The SCV version number does not increase if there is a change in data that NCBI provides, such as allele frequencies, additional HGVS expressions or a MedGen ID for a condition. Data that NCBI provides are packaged only in the VCV and RCV accessions, and changes to those data to not cause a VCV or RCV version to increment either. A new version is assigned to a VCV or RCV if there is a new version of an SCV, an SCV is deleted, or if more SCVs are aggregated into the VCV or RCV.
5. When there are multiple RefSeqs for a gene, does ClinVar select a subset to use for reporting?
No, ClinVar reports the location of a variant on all RefSeq transcripts for a gene.
6. What is ClinVar's convention for representing the location of variation with length differences when there are multiple options, left or right justified?
To conform to conventions of HGVS notation, ClinVar will represent the location of the sequence change at the right-most location. The standard for VCF, however, is POS coordinate is based on the leftmost possible position of the variant . For this reason, the location represented by a dbSNP rs number may be left of the location represented by the HGVS notation.
7. I included specific ages and geographic origins for individuals in my submission; why are they reported as a range and a larger geographic region?
Based on concerns of identifiability, when a specific age is submitted for an individual, ClinVar reports the age as the corresponding decade. Similarly, small countries are reported as a larger geographic region, e.g. Costa Rica is publicly reported as Central America.
Reports
1. Why doesn't the VCF file contain all the data in the XML file?
ClinVar's VCF files are currently limited to variants in ClinVar that have a precise genomic location. Variants with imprecise start and stop, such as exon deletions and CNVs detected by microarray, are not included in ClinVar's VCF files at this time.
The ClinVar VCF files can be retrieved from ClinVar's ftp site:
https://ftp.ncbi.nlm.nih.gov/pub/clinvar/
and there is more information about the VCF files in the README file.
2. Where can I find who has submitted to ClinVar and statistics about number of ClinVar submissions?
ClinVar reports global statistics for the number of submissions, genes, and variants from ClinVar submissions. We also provide a list of submitters, with counts per submitter. Note that variation is represented according to the ClinVar data model, in that a variation is represented as a set which may have one or more members. For example, two variations submitted together in cis are members of a single set and are counted in the statistics as one variation.
3. On the submitter summary page, why is the number of submissions sometimes different from the number of records from that submitter?
The count of submissions on the submitter page is a count of the number of submitted variant classifications from that submitter. This may include submissions for the same variant but with different conditions. The number of records returned in a search for that submitter is for the number of variants that the submitter has reported. Because the submitter may have reported multiple classifications for the same variant with different conditions, the number of variants in the search results may be less than the number of submissions on the submitter summary page.
Citing ClinVar
1. How should I refer to a ClinVar record in written reports?
ClinVar records should be referred to with accession and version numbers. It important to include the version number because the classification may change over time; the version number allows you to distinguish between previous and current versions. If you need to build URLs based on a ClinVar accession, please note the instructions for constructing links to ClinVar.
2. How should I reference ClinVar?
To cite the original ClinVar paper:
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014 Jan 1;42(1):D980-5. doi: 10.1093/nar/gkt1113. [PubMed PMID: 24234437] Find links to other publications from the ClinVar team on the introduction page for ClinVar.
If you wish to reference a specific submitted record, please cite the SCV accession and version.
If you wish to reference a aggregate ClinVar record, please cite the VCV or RCVaccession and version.