Predicting Thermodynamic Properties of PBXTHs with New Quantum Topological Indexes

Novel group quantitative structure-property relationship (QSPR) models on the thermodynamic properties of PBXTHs were presented, by the multiple linear regression (MLR) analysis method. Four thermodynamic properties were studied: the entropy (Sθ), the standard enthalpy of formation (ΔfHθ), the standard Gibbs energy of formation (ΔfGθ), and the relative standard Gibbs energy of formation (ΔRGθ). The results by the formula indicate that the calculated and predicted data in this study are in good agreement with those in literature and the deviation is within the experimental errors. To validate the estimation reliability for internal samples and the predictive ability for other samples, leave-one-out (LOO) cross validation (CV) and external validation were performed, and the results show that the models are satisfactory.


Introduction
Quantitative structure-property relationship (QSPR) remains the focus of many studies aimed at the modeling and prediction of physicochemical properties or biological activities of molecules, because of their convenience and importance for practical use and molecular design when the physicochemical properties or biological activities of compounds are closely related with their structures [1][2][3].
In QSPR studies, developing topological index is a very crucial step, which is graph theoretical descriptor obtained by transforming molecular structures into the corresponding molecular graphs [4][5]. Since the first topological index W was proposed by Wiener in 1947, more and more topological indexes have been constructed because of their simpleness, speediness, and accuracy [6]. Many of them were based on the distance matrixes, such as Balaban index, Hyper-Wiener index, Hyper-Detour index, Detour index, Hosoya index, and Pasareti index. However, these distance matrices consisted of the shortest distances from vertex i to all other (n -1) vertices in the molecular graphs, and the shortest distance of two adjacent atoms vertex was regarded as ''1". In fact, the topological space distances is not ''1", therefore, most of them could not reveal the real connection among atoms, and are not suitable for heteroatom-containing and multiple bond organic compounds [7][8][9].
Recently, more useful and significant topological indexes have been derived from the molecular structural information and the chemical conditions of atoms, for example, Lu index based on the relative electro-negativity and the relative bond length of vertices [10]; the augmented eccentric connectivity index on the ground of the adjacency-cum-distance [11]. At the same time, our group proposed some new topological descriptors, such as PY 1 、PY 2 indexes on the basis of space distance matrix, equilibrium electro-negativity and the branching effect [9], PX 1 、PX 2 indexes based on topological distance matrix, the branch vertex of atoms and equilibrium electro-negativity [12], PE index on the ground of distance matrix and equilibrium electro-negativity [13].
XTH (xanthone) compound, which is the main component of gentianaceaescutellaria stonecrop, is a common folk medicine used as clearing heat, anti phlogosis, liver-protection, cholagogic, detoxification in naxi nationality, tibetan and miao nationality [14]. Due to their wide distribution and important application, xanthonederivants have gained the interest of researchers. For example, PBXTHs (polybrominatedxanthones) are important xanthonederivants [15].
Because the structures of PBXTHs are similar, which have 135 possible structures, according to the number of Br atoms and different replace locations of XTH. If the physical and chemical properties or thermodynamic properties of each PBXTH compounds are determined, it is not realistic in terms of both manpower and material resources. So, QSPRs have been extensively used in molecular structure description and property investigation on PBXTHs and demonstrated obvious advantages.
In this work, as a continuation of our earlier work [8-9, 12-13, 16-18], the new quantum topological indices XP 1 、XP 2 of XTH and 135 PBXTHs were constructed combined with the theory of quantum chemistry and topological chemistry. At the same time, the multiple linear regression (MLR) analysis was used to build novel group QSPR models for predicting of the thermodynamic properties (S θ S, Δ f H θ , Δ f G θ Δ and Δ R G θ Δ)of XTH and 135 PBXTHs.

Data Set
All the experimental data of the thermodynamic properties(S θ , Δ f H θ , Δ f G θ and Δ R G θ S) of XTH and 135 PBXTHs used in this work, were obtained from the calculated values in literature [19].

Construction of new quantum topological indices XP 1 、XP 2
The QSPR studies of XTH and 135 PBXTHs were performed in four fundamental stages: (1) Selection of data set; (2) Construction of new quantum topological indices XP 1 、XP 2 ; (3)Multiple linear regression (MLR) statistical analysis; and (4) Model validation techniques. The first as well as the most crucial step is how to exactly extract sufficiently the molecular structure information with numerical format from the molecular graph [20].
Structure and atom label of XTH is given in Fig 1: First, MOPAC 7.0 software was used to optimize the initial geometric parameters of molecular structures of XTH and 135 PBXTHs by constructing and using AM1 semi-empirical quantum chemistry methods. Then, the further geometric configuration optimization and vibration analysis were completed by using Gaussian03 software on the B3LYP/6-31+ G (d) basis set, with the application of density functional theory (DFT). When the stable molecular configuration forming, the potential energy surface scanning method was used to scan all possible bond angle, the dihedral angle, and the corresponding relationship between energy and geometric configuration will be set. On this basis, the spatial topological distance std ij were calculated between individual atoms of XTH and 135 PBXTHs.
In order to extract the molecular structure information of XTH and 135 PBXTHs sufficiently, we adopt the distance matrix D and the branching degree matrix V to descript molecular structure. The distance matrix D of n atoms in a molecule, a square symmetric matrix, can be expressed as D = [d ij ] n×n . where d ij is the length of the shortest path between the vertices i and j in molecular skeleton graph. Instead, in this paper, d ij was revised by using the spatial topological distance std ij . Therefore, the following distance matrix is 3 D topological distance matrix 3 D. 3  As one of the main properties of atoms, electro-negativity represents the ability of atoms to obtain or lose electrons when it is in a compound. The larger the electro-negativity of an atom is, the stronger the ability of the atom to attract electrons is. Based on Pauling electro-negativity, the group electro-negativity x G can be calculated by the method of stepwise addition [21].
The group electro-negativity of a group structural tree is illustrated in Fig 2. When the group is a single atom, its group electro-negativity is Pauling electro-negativity of this atom. For a group with more than two levels, all the atoms or groups attached to "anchor atom" are weighted equally, which can be expressed as follows [22]. The equilibrium of the first level: The equilibrium of the second level: . . .. . . The equilibrium of the k-th level: Then, the group electro-negativity χ G is defined as: For a molecule with an equilibrium structure, the equilibrium electro-negativity of atom i is defined as: Where χ iA is the Pauling electro-negativity for atom i, χ G is the electro-negativity of group directly attached to atom i calculated by Eq (1), and l is the group number directly attached to atom i. For a two-level group such as "= CH 2 " and "-CHI 2 ", all of the atom are weighted equally, so that: And: For a group with more than two levels, all of the atoms or groups attached to the "anchor atom" are weighted equally. For example, In this paper, the equilibrium electro-negativity matrix E is established to reflect every atomic chemical environmental change of a molecule, and the matrix E is defined as following, E = [χ 1 χ 2 . . . χ n-1 χ n ]. T is the transpose of the matrix (the same below).
In addition, the branching degree matrix V is established with each atom bonding state and the coupling relationship between atoms, in order to reflect the branching effect of each atom in molecule. The matrix V is defined as following, Where z i represents the number of valence electron outside the atom nucleus, h i is the number of hydrogen atoms connecting with atom i.
Molecular structure and property are closely related with the atom space effect, the character of the bonding atoms (such as equilibrium electro-negativity) and the branching effect between the atoms. We think that these three factors cooperatively affect the molecular character and property. In this paper, we established a new extension matrix S on the basis of the topological distance matrix 3 D, the equilibrium electro-negativity matrix E andthe branching degree matrix V. And the matrix S is defined: M = 3 D×E×V. At the same time, the matrix S is expressed as following: Then, the correctional matrix Q is established by Eq (3), on the basis of the extension matrix S.
The characteristic values λ Q, n of the correctional matrix Q are calculated using MATLAB, which are arranged from small to big.
In this paper, the new quantum topological indices XP 1 、XP 2 will be defined as [9].: Where λ Q,min is the fisrt characteristic values λ Q, 1 of the correctional matrix Q, and λ Q,max is the the n-th characteristic values λ Q, n . For example, the molecular structure of 2,8-DBXTH is given in Fig 3. and the correctional matrix Q 2,8-DBXTH is given below.  According to the same method, the new quantum topological indices XP 1 、XP 2 of XTH and 135 PBXTHswere constructed. The calculation results are shown in Table 1.

Regression analysis
The simplest expression of the fundamentalprinciple of QSPR theory is a linear relationship P = a+bX between a property P and the chosen moleculardescriptor X, where a and b are real numbers determinedby a standard least-square procedure [23]. According tothe aforementioned method, the multiple linear regression (MLR) analysis using the new quantum topological indices XP 1 、XP 2 was performed for obtaining the QSPR models of the thermodynamic properties (S θ ,Δ f G θ and Δ R G θ ) of XTH and 135 PBXTHs. At the same time, to test the stability of QSPR models, leave-one-out (LOO) cross validation (CV)was carried out. Thefinal QSPR models are conducted as follows: Where n is the number of data points; R is the correlation coefficient; R cv , S, F are the crossvalidated correlation coefficient, the standard error of estimate, and the Fisher statistic value, respectively. Particularly, if the correlation coefficient, the Fisher criterion and the cross-validated correlation coefficient are high, the new quantum topological indices XP 1 、XP 2 are better to explain the thermodynamic properties (S θ ,Δ f G θ and Δ R G θ ) of XTH and 135 PBXTHs. From Eq (5) to Eq (7), the high correlation coefficient and the low standard deviation of the model indicate that there are very good correlation between the thermodynamic properties (S θ ,Δ f G θ and Δ R G θ ) of XTH and 135 PBXTHs. The correlation coefficient Rs(R,R adj and R CV ) of the three QSPR models are all above 0.99, belongs to the optimal level. And, the high correlation coefficient and cross-validated correlation coefficient demonstrate that the new proposed QSPR modelsare more robust and have increased predictive power. Table 1 gives the predicted (Pre.) values of the thermodynamic properties (S θ ,Δ f G θ and Δ R G θ ) of XTH and 135 PBXTHsusing the Eq (5) to Eq (7). The analysis of plots has shown to be very useful to confirm the quality of a model or to detect the anomalies. The plots of the calculated value in literature [19] versus the predicted values of the thermodynamic properties (S θ ,Δ f G θ and Δ R G θ ) of XTH and 135 PBXTHsare presented in Figs 4-6, which show that they are very close. And the average relative error is only 0.85%, 1.19%, and 0.79%, respectively. All the results show that the three QSPR models have a good predictive power.

QSPR models cross-validation
All predictive QSPR models require validation to decide whether they can be used to make predictions. If a QSPR model cannot be used to make a prediction, then it is of no practical use. The quality of goodness-of-fit of the models is quantified by the correlation coefficient (including R, R adj. , and R CV ), the standard error (S), the Fisher statistic value (F) and the average relative error (ARE). On the other hand, it is worth mentioning that the models having the best correlation potential need not have the best predictive value [24].
Generally, the most popular validation criterion to explore the robustness of a predictive model is to analyze the influence of each individual object that configures the final equation. This procedure is known as cross-validation (CV) or internal validation by leave-one-out (LOO) [25]. The leave-one-out cross-validations were performed in training test. And each time one compound is left out from the training set, and then the model based on the others is used to predict the compound extracted, that is, a model is built with n -1 compounds and the n-th compound is predicted. For the test set, the predicted values are obtained from the model using the whole training set. The parameters of the method can play important roles in assessing the performance of QSPR models, which are S S , D S and R CV [26].
The correlation coefficient for cross-validation (R CV ) is then calculated by the following equation: Ss ¼ where n is the number of compounds included in the QSPR models, y i,cal and y i,pre are the calculated value in literature [19] and the predicted values obtained in this paper using the Eqs (5), (6) and (7), respectively and y i,avg is the average calculated values in literature [19]. From Eq (5) to Eq (7), one can see that the quality of the models for the thermodynamic properties (S θ , Δ f G θ and Δ R G θ ) of XTH and 135 PBXTHs are satisfactory. And all the values of R and R CV are very close, which shows the good stability and predictivity of the three QSPR models.
In this study, the calculated values in literature [19]of the standard enthalpy of formation Δ f H θ were studied as test set, and the QSPR model were obtained between the new quantum topological indices XP 1 、XP 2 and the calculated values of the standard enthalpy of formation Δ f H θ , according to the topological model Δ f H θ = a 1 + a 2 PX 1 + a 3 PX 2 . The result shown as