GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations
(A) Genetic variants in the VCF format are loaded into the GEMINI database framework using the load sub-command. A PED file describing the sex, phenotype(s), and relatedness of the samples in the VCF may be provided to facilitate downstream analyses such as searches for de novo mutations or variants meeting specific inheritance patterns. (B) Each variant in the VCF file is annotated with information from several genome annotation sources that facilitate variant exploration and prioritization. The variants and associated annotations are stored in the variants and variant_impacts tables. (C) Researchers may also integrate their own annotations to facilitate custom analyses using annotations that are not pre-installed with the GEMINI software. (D) Genotype information for all samples is stored as compressed arrays to enable database scalability and users may access genotype information for individual samples through an enhanced SQL interface. (*) KEGG and HPRD annotations are not stored directly in the variants table, but are rather used in the context of specific GEMINI analysis tools.