Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis
Each locus is conceptualized as a discrete-state particle with available states corresponding to its genotypes (e.g., CC, CT, TT) in a fictitious potential well, which controls the number of SNPs filtered. The information free energy F of each SNP is determined by its relevance to the phenotype. SNPs less relevant to the phenotype have higher free energy (more noise) and are positioned near the top of the potential well. Interaction (Relief-F, represented by E) and independent (Random Forest, represented by S) effect scores are coupled by the optimization parameter T, analogous to temperature in the free energy F. Initially, the information free energy F is calculated for all SNPs in the data set with the coupling constant T = 1 (step 0). The coupling constant is varied about unity so that the set of SNPs is removed that gives the largest increase in classification accuracy over the previous iteration (step 1). This defines the updated coupling T and yields the new collection of SNPs with the SNPs evaporated with the most noise (least relevance to the phenotype) (step 2). If the target number of SNPs is reached (step 3), then a genetic-association interaction network (GAIN) is generated from the collection of SNPs that have been enriched for interactions and relevance to the phenotype by EC. Otherwise, if the target number of SNPs has not been reached yet, the coupling parameter again is varied about the previous coupling and the evaporation process is repeated. Permutation is used to select the target number of SNPs.