Biomedical Discovery Acceleration, with Applications to Craniofacial Development
The choice of probability combination functions in semantic integration and in the combination of knowledge and data networks is critical to the utility of the system. This figure shows the global characteristics of a variety of possible combination functions. Probabilities from two sources P1 and P2 (horizontal plane) are combined. Color indicates the magnitude of the combination (vertical axis) from 0.0 (blue) to 1.0 (red). Application of Fisher's F, Mudholkar-George's T and Liptak-Stouffer's Z has been modified their treatment in  to emphasize agreement on high probabilities rather than low p-values. The s and v parameters of the logistic function (Logit) were estimated as in . The probability of a network edge given by the knowledgebase is calculated as described above, using the Noisy-OR function with the CONS reliability Pnet = 1−Πi (1−ri). The probability from the external expression data source for an edge between two proteins x and y is simply the absolute value of the Pearson correlation coefficient computed between the expression profile vectors Pexp = | correlation(x,y) |. The edge probabilities from the two sources are then combined either using the average of the probabilities Average = [Pnet+Pexp]/2 or the average of logistic functions of the probabilities Logit = [logistic(Pnet)+logistic(Pexp)]/2 where logistic(X) = 1/(1−e−s(X−v)). As in , the parameter v is set to the mean of the corresponding distribution and the parameter s is set to 6/v to yield a moderate slope. The reporting component of the system then uses the values of the combined function to extract sub-networks of high probability, either by including all edges exceeding a given score or the set of top scoring edges.