Software
GRAF: GRAF (Genetic Relationship and Fingerprinting) is a C++ program that quickly finds the closely related subjects, infers subject ancestry, determines subject sexes using genotypes and compares the results derived from genotypes with those reported in the phenotype datasets. It includes the following three features:
- GRAF-rel (included in all versions).
Finds duplicate samples (or identical twins) and closely related subjects using genotypes and compares them with those reported in the pedigree file and the subject-sample mapping (SSM) file.
- GRAF-pop (included in version 2.0 and newer ones).
Infers subject ancestry from genotypes, estimates population structure and uses the results to validate the self-reported populations in the phenotype datasets.
Note: GRAF-pop feature has been upgraded to GrafPop software (see below). Although the GRAF-pop feature in GRAF 2.4 still can be used, it is strongly recommended that the GrafPop software be used to replace GRAF-pop.
- GRAF-sex (included in version 2.4).
Determines subject sexes using the genotypes and uses them to validate the self-reported sexes in phenotype datasets.
For a more detailed description, see GRAF_README.
Click the following link to download GRAF 2.4.
GrafPop: The GRAF-pop feature of the GRAF software package has been upgraded to a separated software package GrafPop, which is independent of GRAF. GrafPop has the following main changes in comparison to GRAF-pop:
- GrafPop uses about 100,000 SNPs for ancestry inference, which is 10x increase.
- GRAF-pop uses fingerprint SNPs, which are not related to ancestry, to determine subject populations, while GrafPop uses ancestry related SNPs for ancestry inference.
- GRAF-pop requires that the SNPs in the input genotype dataset be entered with RS IDs, but GrafPop accepts either RS IDs, or chromosome positions in Genome Builds 37 and 38.
- GRAF only accepts a PLINK set as input genotype dataset, but GrafPop accepts both PLINK and VCF files (zipped or not).
GrafPop accepts multiple VCF files, e.g., one file for each chromosome.
- Plotting results in graphs and saving results in tables are done by separated scripts in GrafPop.
For a more detailed description, see GrafPop_README.
Download GrafPop 1.0. Source code is available on GitHub: GrafPop source code.
REFERENCES:
- Jin Y, Schäffer AA, Sherry ST, and Feolo M (2017). Quickly identifying identical and closely related subjects in large databases using genotype data. PLoS One. 12(6):e0179106.[Abstract][PDF]
- Jin Y, Schäffer AA, Feolo M, Holmes JB and Kattman BL (2019). GRAF-pop: A Fast Distance-based Method to Infer Subject Ancestry from Multiple Genotype Datasets without Principal Components Analysis. G3: Genes | Genomes | Genetics. DOI: 10.1534/g3.118.200925. [Abstract][PDF]
TransEAV: dbGaP requires that the submitted phenotypic datasets be rectangular tables with each row representing one subject or sample, and each column representing a phenotypic trait or attribute (called variable in dbGaP), and each cell storing one attribute value. However, sometimes datasets are collected and recorded using Entity-Attribute-Value model (EAV) model. In EAV model, one dataset table usually has three columns: subject (or sample), attribute, and value, and each row stores only one attribute value for one subject or sample. This script converts a dataset in EAV model to a rectangular table that can be submitted to dbGaP. For a more detailed description, please see README.txt in the package.
Click the following link to download TransEAV.