ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
First, a subject Ri (green x in the figure) is selected and its nearest k hits and misses are found in the space of all genes. In the figure, k = 1 is shown as an example, and the nearest hit of Ri (nearest×symbol) and its nearest miss (nearest circle symbol) are indicated by vectors (arrows) in the full space of genes. To estimate the contribution of subject Ri to the ability of each gene to discriminate between cases and controls, we calculate the difference (diff) in the expression of the gene between Ri and its hits and between Ri and its misses. These hit and miss differences are illustrated as projections onto the gene A (blue) and gene B (red) axes. Subject Ri contributes a positive discrimination for gene B because the difference between the miss projection and Ri is greater than the difference between the hit projection and Ri. Subject Ri contributes nearly 0 to the importance of gene A because the miss and hit projected differences are almost the same. These estimates of the differential expression of each gene are averaged for each subject (the above steps are repeated for each subject Ri). The above Relief-F algorithm is repeated for a range of k, resulting in each gene having an array of scores corresponding to each k. The highest Relief-F score is used for each gene.