Information Recovery in Behavioral Networks

In the context of agent based modeling and network theory, we focus on the problem of recovering behavior-related choice information from origin-destination type data, a topic also known under the name of network tomography. As a basis for predicting agents' choices we emphasize the connection between adaptive intelligent behavior, causal entropy maximization, and self-organized behavior in an open dynamic system. We cast this problem in the form of binary and weighted networks and suggest information theoretic entropy-driven methods to recover estimates of the unknown behavioral flow parameters. Our objective is to recover the unknown behavioral values across the ensemble analytically, without explicitly sampling the configuration space. In order to do so, we consider the Cressie-Read family of entropic functionals, enlarging the set of estimators commonly employed to make optimal use of the available information. More specifically, we explicitly work out two cases of particular interest: Shannon functional and the likelihood functional. We then employ them for the analysis of both univariate and bivariate data sets, comparing their accuracy in reproducing the observed trends.

In the context of agent based modeling and network theory, we focus on the problem of recovering behavior-related choice information from origin-destination type data, a topic also known under the name of network tomography. As a basis for predicting agents' choices we emphasize the connection between adaptive intelligent behavior, causal entropy maximization and self-organized behavior in an open dynamic system. We cast this problem in the form of binary and weighted networks and suggest information theoretic entropy-driven methods to recover estimates of the unknown behavioral flow parameters. Our objective is to recover the unknown behavioral values across the ensemble analytically, without explicitly sampling the configuration space. In order to do so, we consider the Cressie-Read family of entropic functionals, enlarging the set of estimators commonly employed to make optimal use of the available information. More specifically, we explicitly work out two cases of particular interest: Shannon functional and the likelihood functional. We then employ them for the analysis of both univariate and bivariate data sets, comparing their accuracy in reproducing the observed trends.

A. Univariate data sets
As previously noted, eq. 6 induces a distribution on the ensemble of pathways. In other words, eq. 6 allows us to restate the problem of predicting the fluxes on origin-destination networks as a (more) general problem of statistical inference, where the unknown distribution on the pathways {p c } C c=1 must be determined on the basis of partial information and represented by the conditions c p c = 1 and where the second equation in I.1 is nothing else than eq. 6, rephrased in more general terms (with Q α c replacing A αc and Q α replacing r α ). Eq. 7 can thus be rewritten as and the probability coefficients are obtained by solving the system The resolution of the system I. 3 gives us the desired coefficients {p c } C c=1 as functions of the Lagrangean multipliers, p c = p c ( θ), ∀ c. Once found, the parametric probability coefficients must be substituted back into L, in order to obtain a quantity which is a function of the unknowns solely: L( θ). The last step in the procedure is the optimization of the function L( θ), by finding the values of the parameters θ * which satisfy the condition For expository purposes, we explicitly demonstrate the analytical derivation of the Shannon functional for univariate data sets. In this case, the probability coefficients given by eq. I.3 have the expression Our probability coefficients can be thus rewritten as Substituting the analytical expression of p c back into L produces a quantity which is solely function of the vector of unknown parameters θ and the function to optimize with respect to the vector θ becomes Mellon University (black squares represent ones, white squares represent zeros -see [27]), composed by twelve subnetworks, communicating via two routers (one with four subnetworks, the second one with the remaining eight subnetworks -the routers are linked via a single connection). The network topology we consider yields 24 observed aggregate traffic volumes and 144 origin-destination traffic volumes to be estimated.

B. A second worked-out example concerning univariate data sets
For completeness, we discuss a second example of traffic networks. The data set was collected at the Information Networking Institute of Carnegie Mellon University (see [27]) whose routing matrix is reported in Figure A in S1 Information The network topology we consider yields 24 observed aggregate traffic volumes and 144 origin-destination traffic volumes, observed every five minutes (473 points in time). This second dataset is larger than the first, allowing us to test the scalability of our approach.
The analysis of Carnegie University data is illustrated in Figure B in S1 Information Again, our method captures the chosen temporal trends, impliying that our procedure is applicable to problems with higher dimensionality. However, the results concerning Carnegie University data present some differences with respect to the Bell Labs ones.
Since a visual inspection of Figure B in S1 Information is not feasible, to quantify the agreement between our estimates and the observations we have calculated the correlation coefficient between the ob- Despite the rather high values of r, the strongly oscillatory character of the observed data set seems to have the effect of lowering the performance of our procedure: in fact, our estimations predict a "smoother" behavior than that of real data which, on the other hand, appear much more irregular (see lowest panels of Figure B in S1 Information). As for the Bell Labs data set, the net result is that high values of traffic data are well estimated while the lower ones (included the zero ones) are generally overestimated.
Quite surprisingly, even the differences characterizing the performances of the two functionals are larger than for the Bell Labs data set: this time the best result (witnessed by the higher correlation coefficients for all the time points) is obtained by the Shannon functional which seems to better follow the irregular observed trends: the predictions obtained by the likelihood functional, in fact, show flat regions which in turn have the effect of lowering the numerical correlation value.

C. Bivariate data sets
For bivariate problems, the CR family of functionals becomes I(p, q, γ) = 1 γ(γ + 1)  For bivariate problems, the number of multipliers rises, since the required number of normalization conditions equals the number of matrix rows. Thus, in order to correctly implement our approach, two vectors α and β must be considered. Constraining equation I.8 for bivariate data sets (and again for Shannon entropy) leads to and maximizing it with respect to p jk implies that the functional form of our coefficients is by substituing back into L we get Similar results are obtained for the other functionals.

D. A second worked-out example concerning bivariate data sets
The second bivariate data set we discuss comes from an application in political science and concerns voter behavior and candidate choice (as reported in Table A in S1 Information -see [26]). The result of the application of our method to the elections percentages is shown in Table B in S1 Information.
Since privacy issues prevent the percentage of people voting for a given candidate from being available, the second bivariate data set we analyzed provides only aggregate data about the elections results: the single matrix entries are thus missing. Nonetheless, our method provides a prediction of the unknown entries, by adopting the same procedure used for the "eggs and bacon" problem. As can be seen from table Table  B in S1 Information, Shannon functional and the likelihood functional give compatible estimates of the voting percentages: this similarity is effectively summed up by the "global" Pearson correlation coefficient between the Shannon expected matrix and the likelihood expected matrix (both treated as an unique vector of numbers), equal to 0.988716. It should be noted, however, that significative differences can be observed for the percentages referring to the independent candidates. Nonetheless, when interpreted in the light of the previous results, these differences carry an important information, signalling that independent candidates true percentages are, probably, not only the lowest ones, but even compatible with zero. Table B. Estimated precint-level percentages of Louisiana's 5th CD elections (see [23]