Semiempirical Quantum Chemistry Model for the Lanthanides: RM1 (Recife Model 1) Parameters for Dysprosium, Holmium and Erbium

Complexes of dysprosium, holmium, and erbium find many applications as single-molecule magnets, as contrast agents for magnetic resonance imaging, as anti-cancer agents, in optical telecommunications, etc. Therefore, the development of tools that can be proven helpful to complex design is presently an active area of research. In this article, we advance a major improvement to the semiempirical description of lanthanide complexes: the Recife Model 1, RM1, model for the lanthanides, parameterized for the trications of Dy, Ho, and Er. By representing such lanthanide in the RM1 calculation as a three-electron atom with a set of 5 d, 6 s, and 6 p semiempirical orbitals, the accuracy of the previous sparkle models, mainly concentrated on lanthanide-oxygen and lanthanide-nitrogen distances, is extended to other types of bonds in the trication complexes’ coordination polyhedra, such as lanthanide-carbon, lanthanide-chlorine, etc. This is even more important as, for example, lanthanide-carbon atom distances in the coordination polyhedra of the complexes comprise about 30% of all distances for all complexes of Dy, Ho, and Er considered. Our results indicate that the average unsigned mean error for the lanthanide-carbon distances dropped from an average of 0.30 Å, for the sparkle models, to 0.04 Å for the RM1 model for the lanthanides; for a total of 509 such distances for the set of all Dy, Ho, and Er complexes considered. A similar behavior took place for the other distances as well, such as lanthanide-chlorine, lanthanide-bromine, lanthanide, phosphorus and lanthanide-sulfur. Thus, the RM1 model for the lanthanides, being advanced in this article, broadens the range of application of semiempirical models to lanthanide complexes by including comprehensively many other types of bonds not adequately described by the previous models.


Introduction
Lanthanide complexes, as is well known, have a wide range of high technology applications. Of particular importance is the discovery that, due to their slow magnetization relaxation, lanthanide mononuclear complexes may function as singlemolecule magnets [1,2], the ultimate size limit for spin-based devices. Dysprosium complexes, in particular, will be very important in the development of magnetic materials because of recent results leading to the highest relaxation energy barriers for multinuclear clusters [3,4], the highest temperature at which hysteresis has been observed for any single complex [5], and a record magnetic blocking temperature of 8.3 K at a sweep rate of 0.08 Ts-1 [6]. Future research, for example, might be directed towards the design of dysprosium complexes that may operate as single-molecule magnets capable of preserving their magnetization at higher and more practical temperatures [6]. Dysprosium complexes are therefore promising for optical storage and memory.
Not only that, both dysprosium and holmium complexes can also effectively function in magnetic resonance imaging, MRI, as negative contrast agents at high magnetic fields, producing darker images, and as agents for susceptibility-induced enhancement at low magnetic fields [7]. Indeed, they are complementary to gadolinium complexes, which act as positive contrast agents, which brighten the image. Indeed, through the simultaneous applications of gadolinium and dysprosium based contrast agents to the MRI diagnosis of conditions such as ischemic heart disease, unprecedented details can now be revealed [8,9]. Future efforts will likely be intensified towards the design of such MRI contrast agents for the imaging of cellular molecular events involved in normal and pathological processes, including site specific macromolecular and particulate delivery systems [7].
Holmium is also employed in cancer therapeutics due to the characteristics of its 166 Ho isotope and of its complexes, like 166Ho-DOTMP which has been used in combination with chemotherapy in the treatment of myeloma because it concentrates in metastases of the skeleton and irradiates bone marrow [10].
Erbium (III) luminesces at 1.55 mm, essentially at the center of the third telecommunication window at 1.540 nm. Hence, erbium has been used in long-distance optical transmissions, power amplifiers, repeaters, etc. However, inorganic materials doped with erbium, display a very narrow full width at half maximum [11]. In order to increase the band width, erbium complexes have been used in order to both protect the erbium ion from vibrational coupling, at the same time enhancing the absorption of light through the so-called antenna effect. Indeed, erbium complexes have been prepared that exhibit a much broader full width at half maximum of 68 nm [12], a significant broadening when compared to erbium implanted silica which has a typical value of 11 nm for its most intense peak.
Thus, the design of lanthanide complexes towards enhancement of the property of interest, while seeking to avoid eventual side effects to the health of the subject (where applicable) is an active area of research, which may largely benefit from quantum chemical tools that attempt to predict several of the physical, chemical and even pharmacological [13] properties of the conjectural new structures being considered; structures which might be sketched by assembling around the lanthanide ion, ligands, selected from a library of ligands in a combinatorial manner. And the most important information, from which essentially all quantum chemical property predictions derive, is an accurate geometry of the molecular structure of the complex.
Predictions of geometries of lanthanide complexes from ab initio calculations are not so easy due not only to significant relativistic effects, a consequence of their high atomic numbers, but also to the complex manifold of microstates due, not only to a partially filled f-shell, but also from possibly partially filled 5 d 6 s and 6 p shells [14]. Therefore, full geometry optimizations from such first principles calculations are essentially unfeasible for the technologically useful complexes, which usually exhibit sizes of the order of 100 atoms or more. As a consequence, effective core potentials arise as a practical and very efficient manner of circumventing the complexity, while retaining important characteristics of ab initio calculations. Of these, the most widely used are the relativistic pseudopotentials of Dolg [15,16] which represent an excellent compromise between accuracy and usage of computational resources, mainly computing time. So far, the most thorough study of the geometry prediction accuracy of these relativistic potentials has been carried out by our research group in 2006 when full geometry optimizations were carried out on 52 different lanthanide complexes, including complexes of dysprosium (III), holmium (III) and erbium (III). [17] The counterintuitive results obtained indicated that the best combination of method with basis set when using the MWB pseudopotential was RHF/STO-3G when the intent of the calculation was to predict the geometry of the coordination polyhedron -very important for any subsequent ligand field application. Moreover, either increasing the basis set, or adding electron correlation, only worsened the quality of the resulting coordination polyhedron. On the other hand, although the quality of the obtained coordination polyhedron via RH/ STO-3G was very good, that could not be said of the geometry of the attached organic ligands.
In 1994, we introduced the Sparkle Model for the calculation of lanthanide complexes [18,19], a semiempirical model within the framework of the AM1 semiempirical model [20], which replaced the lanthanide ion by a +3e charge, with the corresponding Coulomb field superimposed to a repulsive potential of the form exp(-ar), with a being a parameter designed to somewhat delineate the size of the lanthanide ion, preventing the implosion of the ligands towards it. A very useful method of obtaining absorption spectra of lanthanide complexes was subsequently published [21]. Later [22], Gaussians were added to the core-core repulsion of the sparkle-ligand atom to make the Sparkle Model more consistent with the AM1. In 2005, based on a parameterization scheme employed for europium, gadolinium and terbium [23], the first useful and accurate semiempirical model for dysprosium was defined [24], followed by holmium [25]; and in 2006 for erbium [26]. These models were defined for AM1, and became later available in MOPAC2007 [27], the overall model being called Sparkle/AM1. So far, most applications of the Sparkle Model are related to luminescence research [28][29][30]. But since different semiempirical models possess different accuracies and eventually develop particular niches of applications, it soon became a necessity to extend the Sparkle Model to others, giving rise to Sparkle/PM3 [31,32], Sparkle/PM6 [33], Sparkle/PM7 [34], targeted to solids, and Sparkle/RM1 [35].
However, none of the above mentioned Sparkle Models attaches semiempirical atomic orbitals to the lanthanide ion. Nevertheless, these models are all very accurate to describe lanthanide-ligand atom distances when the coordinating atom of the ligands is another lanthanide, oxygen or nitrogen. By moving towards other types of lanthanide-ligand atom bonds, however, the accuracy of the Sparkle Models starts to wane. All that points out to the fact that there is some degree of overlap between the orbitals All these parameters are as defined in the formalisms and equations of the RM1 model. *Parameters are s, p, and d atomic orbital one-electron one-center integrals U ss , U pp and U dd ; the s, p, and d Slater atomic orbital exponents j s , j p , and j d ; the s, p, and d atomic orbital one-electron two-center resonance integral terms b s , b p , and b d ; the core-core repulsion term a; the two-electron integrals F 0 sd , G 2 sd ; and the additive term r core needed to evaluate core-electron and core-core nuclear interactions; the second set of exponents to compute the one-center integrals j s ', j p ', and j d '; and the six parameters for the two Gaussian functions: height, a i ; inverse broadness, b i ; and displacement, c i . doi:10.1371/journal.pone.0086376.t001 of the lanthanide and those of the coordinating atoms -in short, there is a degree of covalence not taken into account by the Sparkle Model.
In this article, in order to considerably broaden the range of applications of semiempirical methods for lanthanide complexes, we introduce a new model with orbitals for the lanthanide trications of dysprosium, holmium, and erbium, within RM1 [36], which we call simply RM1 model for the lanthanides, a significantly more general model, not to be confused with Sparkle/RM1 [35] which does not have orbitals associated with the lanthanide ion.

Methods
The rationale of the RM1 model for the lanthanides starts with the following electron configuration for the lanthanide atoms: {[Xe]4f n }5d 1 6s 2 , with n = 9, 10, 11 for Dy, Ho and Er, respectively. The semiempirical core of the atoms then becomes {[Xe]4f n }. The semiempirical valence shells will now have three electrons and will be described by 5 d, 6 s and 6 p orbitals, for a total of 9 orbitals. Hence the model will work for trications only, because for dications there would be a need to parameterize another core of the form {[Xe]4f n+1 } and assign two electrons to the valence shells, although they could still be described by another set of 5 d, 6 s, and 6 p orbitals. Since trications are by far the most common form of lanthanide ions, as before, we expect the present parameterization to be able to tackle essentially all cases relevant to technological applications. The next step is to define the universes of complexes, one universe for each of the lanthanide ions under consideration. Accordingly, we selected from the Cambridge Crystallographic Database [37][38][39] all available complexes of high crystallographic quality (R ,0.05), for a total of 61 of Dy(III), 40 of Ho(III), and 50 of Er(III).
We then proceeded to select sub-sets of complexes, the parameterization sets, according to some metric capable of guaranteeing that these sub-sets are representative of the universe of complexes with respect to some accuracy measure. Assuming Table 2. Unsigned mean errors, UME (Dy-L) s and UMEs, for the RM1 model for the lanthanides, as compared to the respective experimental crystallographic values, obtained from the Cambridge Structural Database, [37][38][39] for each of the 61 dysprosium (III) complexes.  that any difficulties Sparkle/AM1 might be having in describing the coordination polyhedron of the complexes is a reasonable first order approximation to the eventual overall difficulty which the present model will encounter, we defined the following R i metric for each one of the i complexes of the universe for each lanthanide trication: where j runs over all types of bonds, e.g. Ln-N, Ln-O, Ln-C, Ln-S, Ln-P, etc; k, runs over all bonds of type j; s dist j is the standard deviation of all crystallographic bond lengths of type j for all complexes of the universe; d CSD i,j,k is the crystallographic k th bond distance of type j for complex i; d Calc i,j,k is the calculated value of the same bond; s angle is the standard deviation of all crystallographic bond angles of the type A-Ln-B, with A,B = O, N, C, S, Cl, and Br; h exp i,l is the crystallographic l th bond angle of complex I; and h Calc i,l is its calculated counterpart. The standard deviations were calculated from the experimental data only. We also found out that there was no need to split the angles into types, as they all formed a homogeneous set. The divisions of the errors by their corresponding standard deviations make sure that the summations in Eq. (1) add comparable terms. To the set of R i values, each one associated with a different complex, we employed a hierarchical clustering analysis DIANA [40]. DIANA starts out with one large cluster containing all complexes. In the subsequent steps, the complexes that are the most dissimilar are split off into smaller clusters -a procedure which continues until each complex forms a cluster of itself. From the resulting dendogram, we chose two sets of complexes as parameterization sets: a smaller and a larger one. For Dy(III) these sets contained 13 and 26 complexes, respectively. The corresponding numbers for Ho(III) were 12 and 20, and for Er(III) 16, and 39.
The parameterization was carried out to minimize the sum of R i s for all complexes of parameterization set, with the difference that the calculated distances and angles in Eq. (1), are now the ones calculated by the model being parameterized. For the parameterization, we used a combination of Simplex and generalized simulated annealing [41] algorithms. We started with the smaller parameterization sets. Once these preliminary optimizations converged, we then expanded the parameterization sets to the larger ones and repeated the process until termination. Table 1 presents the final optimized parameters.

Results and Discussion
In order to evaluate the quality of the optimized parameters, we devised two measures [23,42]. Both are based on the following formula: where UME stands for unsigned mean error; i refers to a given complex; n is the number of distances taken into consideration in the given complex; the superscript CSD indicates that the distance R is an experimental crystallographic distance taken from CSD, and the superscript RM1 means that the distance was calculated from the present model. In the first measure we consider only distances between the central lanthanide ion and its directly coordinating atom distances, which we call UME (Ln-L). In the second measure, which we call simply UME, we consider not only the lanthanide ion-directly coordinating atoms as before, but also all distances between all atoms of the coordinating polyhedron, thus indirectly taking into account the angles within the coordination polyhedron.
Tables 2-4 present UME (Ln-L) s and UMEs for the universe set of complexes for each of the lanthanide trications: Dy(III), Ho(III), and Er(III), identified by their respective CSD codes.
We now proceed to the statistical validation of the model [43]. If the parameterizations captured the essence of the coordinating bonds, then the histograms of both UME (Ln-L) and UME must follow gamma distribution functions since, by definition, the UMEs can only have positive values. The gamma distributions are then adjusted to reproduce the mean and variance of the UME (Ln-L) s, for each of the parameterized trications. Finally, the qualities of the gamma distribution fits of the data were then assessed via the one-sample nonparametric Kolmogorov-Smirnoff test [44]. If the p-value of the test is larger than 0.05, then the gamma distribution fit is justified within a 95% interval and the use of the mean and variance of the data, as accuracy measures, is also statistically justified. Accordingly, Tables 5 and 6 display the mean, variance, and p-value of the test for each of the lanthanide ions, for both the UME (Ln-L) s and UMEs. All p-values are substantially larger than 0.05 and, therefore, the means and variances in Tables 5 and 6 can justifiably be taken as accuracy measures of the models for Dy(III), Ho(III), and Er(III).
We now proceed to analyze the performance of the models with respect to specific types of distances for Dy(III), Ho(III), and            And that is the reason why UMEs for the Ln-O and Ln-N types of bonds in the present article tend to be different, slightly larger, when set side by side to similar Ln-O and Ln-N UMEs of the original sparkle model articles. However, not to unnecessarily crowd the present article, in the tables, we only show numbers computed using the old models, but for the new test set. Dy-Dy distances in dilanthanide complexes of Dy (III), Ho (III), and Er (III) lie in the range from 3.6 Å to 6.6 Å , while lanthanideother ligand atom distances lie on average around 2.5 Å . That is why Ln-Ln UMEs are larger than other Ln-L UMEs. The previous Sparkle Models focused on these Ln-Ln, a also on Ln-O, and Ln-N distances only. Indeed, considering only Dy complexes (Table 7), there are 404 distances of these types, which represent 53% of all distances involving Dy(III) in its universe of complexes. The next most important types are Dy(III)-C distances, for which there are 315 of them making up 41% of the total.
By examining Table 7 and Figure 4, we can see a significant improvement in these next types of distances, with UME (Dy-C) s for the RM1 model for the lanthanides being 0.03 Å , whereas the corresponding average value of the sparkle models is 0.27 Å , a value 9 times larger. That alone would justify the introduction of the RM1 model for the lanthanides because, in the case of dysprosium, almost half the extant Ln-L distances are significantly more accurately described by RM1.
The situation is less dramatic but still significant for the other trications being parameterized, when Ln-C distances represent 24% of the total for Ho(III), and 17% for Er(III). The RM1 model for the lanthanides is even further justified when we compare other types of less common distances like Ln-S, Ln-Cl, and Ln-Br, because it outperforms all previous sparkle models as shown in Tables 7-9 and Figures 5-7. In all these instances, the RM1 UMEs tend to be almost ten times smaller than the corresponding errors of all previous sparkle models.
Finally, we can have an idea of the accuracy of the angles by examining the distances between any two atoms of the coordination polyhedron, the L-L9distances. For all three lanthanide trications, there was a significant reduction of these UMEs by a factor of two when compared to the corresponding UMEs for the previous sparkle models: from 0.26 Å to 0.13 Å , as can be inferred from Figure 8. This is indirect evidence that the angles are much better predicted in the RM1 model for the lanthanides.

Conclusions
The RM1 model for the lanthanides represents a significant improvement in the theoretical semiempirical modeling of lanthanide complexes, which started with the sparkle model in 1994 [18,19] which and attained maturity with the introduction of Sparkle/AM1 in 2005 [23], and was extended to Sparkle/PM3 [31,32], Sparkle/PM6 [33], to Sparkle/PM7 [34] and up to Sparkle/RM1 [35], the last two in 2013.
There is, however, a cost associated with the improvement. The RM1 model for the lanthanides adds nine more orbitals per lanthanide to the calculation, whereas all sparkle models add none. Thus, for a single SCF calculation of a complex of about 60 atoms, for example, an RM1 model for the lanthanides calculation takes about twice the computing time of a Sparkle/RM1 calculation. Such cost can become even weightier if the molecular structure contains more than one lanthanide ion, as is usually the case of metal-organic frameworks. Since the performance of both Sparkle/RM1 and RM1 model for the lanthanides is essentially equivalent for Ln-Ln, Ln-O, and Ln-N, the user may still benefit from the speed of the sparkle models if the structure of interest contains only these types of bonds, as it takes place in the majority of cases.
In conclusion, via the introduction of a set of 5 d, 6 s, and 6 p semiempirical atomic orbitals, the RM1 model for the lanthanides thus extends the Sparkle Models' capabilities of correctly describing Ln-Ln, Ln-O, and Ln-N distances, to other types of distances, such as Ln-C, Ln-S, Ln-P, Ln-Cl, and Ln-Br, while simultaneously improving the coordinating bond angles.

Supporting Information
File S1 Instructions on how to run the RM1 model for the lanthanides in MOPAC12 [45], and MOPAC sample input and output files for complexes of Dy(III), Ho(III), and Er(III). (DOC)