## Services on Demand

## Article

## Indicators

- Cited by SciELO

## Related links

- Similars in SciELO

## Share

## Anales de la Asociación Química Argentina

##
*Print version* ISSN 0365-0375

### An. Asoc. Quím. Argent. vol.94 no.4-6 Buenos Aires Aug./Dec. 2006

**REGULAR PAPERS**

**QSPR Evaluation of thermodynamic properties of acyclic and aromatic compounds**

**Duchowicz, P.R ^{1}; Castro, E.A.^{1}; Fernández, F.M.^{1}; Pankratov, A.N.^{2} **

^{1}INIFTA, C.C. 16, Suc. 4, Department of Chemistry, La Plata Nacional University, La Plata (B1906ZAA), Argentina

FAX: +54 221 4254642, E-mail: prduchowicz@yahoo.com.ar

^{2} Department of Chemistry, N. G. Chernyshevskii Saratov State University, 83 Astrakhanskaya Street, Saratov 410012, Russia

Received June 15^{th}, 2006. In final form October 30^{th}, 2006

**Abstract**

*Although parametrized Semiempirical Molecular Orbital Methods were especially designed to obtain enthalpies of formation and other related thermodynamical parameters, in some situations the predictions cannot be compared to the experimental results simply because they present serious drawbacks.Therefore, there is room to apply a wide variety of predictive methods for estimating thermodynamical properties, such as the well-known Group Contribution Methods and the Quantitative Structure-Property Relationships. A QSPR study of 163 enthalpies of formation, 37 Gibbs free energy changes, and 40 standard entropies of elements is established for a representative set of acyclic and aromatic compounds, on the basis of fundamental concepts on molecular structure such as the count of atoms and types of chemical bonds. A recent method discovered in our group called the Replacement Method is employed here to find the best models in a pool containing 33 descriptors. An 8 parameters-model was able to correlate the heats of formation with atoms and bond types (R=0.9553, R _{l-25%-o}=0.8200) but with great dispersion ( S=19.173 Kcal/mol ). For the case of standard entropies (R=0.9813, R _{l-20%-o}=0.8659) and for free energies (R=0.9869, R_{ l-20%-o}=0.9000) the models obtained perform poorer than those calculated with Semiempirical Methods.*

**Resumen**

*A pesar de que los métodos parametrizados de Orbitales Moleculares Semiempíricos se diseñaron especialmente para obtener entalpías de formación u otros parámetros termodinámicos, en algunas situaciones las predicciones no pueden compararse con los resultados experimentales simplemente porque presentan serias anomalías. Por tanto, hay lugar para aplicar una amplia variedad de métodos predictivos para estimar propiedades termodinámicas, tales como los bien conocidos Métodos de Contribución de Grupos y las Relaciones Cuantitativas Estructura-Propiedad. Se establece un estudio QSPR para 163 entalpías de formación, 37 cambios de energía libre de Gibbs, y 40 entropías estándar a partir de elementos para un conjunto representativo de compuestos acíclicos y aromáticos, sobre la base de conceptos fundamentales de estructura molecular como lo son la cuenta de átomos y tipos de enlace químico. Se emplea un método reciente descubierto en nuestro grupo llamado Método de Reemplazo para encontrar los mejores modelos a partir de un conjunto total de 33 descriptores. Un modelo de 8 parámetros fue capaz de correlacionar los calores de formación con los átomos y tipo de enlaces (R=0,9553, R _{l-25%-o}=0,8200) pero con gran dispersión (S=19,173 Kcal/mol). Para el caso de entropías estándar (R=0,9813, R _{l-20%-o}=0,8659) y para energías libres (R=0,9869, R _{l-20%-o}=0,9000) los resultados que se obtienen resultan más pobres a los alcanzados con los Métodos Semiempíricos.*

**Introduction**

Thermodynamics is a phenomenological theory of matter. As such, it draws its concepts directly from experiments [1]. Thermodynamic parameters are measurable macroscopic quantities associated with macroscopic systems, such as pressure P, volume V, temperature T, and magnetic field B, which are defined experimentally. Macroscopic systems (like gases, liquids, or solids) began first to be systematically investigated from a macroscopic phenomenological point of view in the last century, and the laws thus discovered formed the subject of Thermodynamics. The strength of this discipline is its great generality, which allows making valid statements based on a minimum number of postulates without requiring any detailed assumptions about the microscopic *(i.e*. molecular) properties of the system [2].

Thermodynamics, which makes up a logical subject of great elegance, is a powerful method for studying chemical phenomena and can be developed quite independently of the atomic and molecular theory. It has a permanence which might, for example, be compared with that of Euclid’s geometric theorems in plane geometry, which is not shared by our ever-changing views on the nature of atoms and molecules [3].

The prediction of thermodynamic and physical properties for organic compounds in different conditions (*i.e*. temperature, pressure) is vital for the design of chemical and petrochemical plants. Also, experimental measurements of some thermodynamic parameters involve experimental difficulties and they are not always feasible, and the corresponding methods possess real drawbacks [4,5]. Thus, it is necessary to resort to a theoretical calculation of these parameters, which is now accessible because an important, fruitful and current field of research in contemporary chemistry is the model and prediction of physical-chemistry properties of molecules [6,7]. This kind of study is based on the paradigm that physical-chemistry and biological properties are dependent on molecular structure. As a consequence, one of the most important points in such a research is the selection of adequate descriptors containing the information stored in the molecular structure [8].

The most common software packages used in chemical engineering design incorporate efficient algorithms for the prediction of thermodynamic and physical properties of interest, by means of the Group Contribution Methods (*GCM*) [9-11]. These techniques are easy to apply, relying solely on the sum of contributions of each molecular structure fragment to a given thermodynamic property. The basic assumption of these methods is the transferability concept for a group; if this hypothesis does not hold, then *GCM* can be corrected with experimental data, when available, to achieve better predictions. This converts *GCM* into a computational intensive technique.

A drawback of *GCM* is that in its basic form (without corrections) it cannot model isomeric structures; this is not a problem for small organic compounds, although the situation gets worse for bigger size compounds with increasing number of conformers. Another associated problem is that there are not always measured data available to extend these methods to less common compounds such as molecules containing fused aromatic rings or to organometallic compounds. This is an inconvenient also present in semiempirical methods if they are not properly parametrized.

All commented limitations for *GCM* point to the need of employing a different theoretical framework, such as that encompassed in the realms of the Quantitative Structure-Property Relationships (*QSPR*). A fundamental difference between both methodologies is that in *QSPR* the user can consider only theoretically defined molecular descriptors to represent the molecular structure, not relying on empirical parameters for improving the model´s limitations. The graph-theoretical approach to *QSPR* is based on a well-defined mathematical representation of the chemical structure, relating a property (*P*) with a set of molecular descriptors (*d*) through an arbitrary function (*f*), which usually represents a polynomial relationship. These molecular descriptors are commonly named “*topological indices*” [12,13], they contain relevant information about the structure. Owing to the complexity of molecular structures, it seems to be nearly impossible to expect that a single set of descriptors would contain all the relevant structural information. Therefore, the search for novel molecular structure descriptors continues and it is a field of active research within the realm of *QSPR* theory. However, this search should not be carried out at random. Instead it should follow some regular procedure based on the desired attributes that a molecular structure descriptor needs to possess [8].

The most simple and obvious sort of graph theoretical indices, having a quite direct chemical interpretation, are atoms and chemical bond types. Although they have been considered as suitable molecular descriptors, they have not been widely employed. Several applications made by two of us (P. R. D. and E. A. C.) have demonstrated their usefulness to predict physical-chemistry properties and biological activities [14-18]. They can be computed rather easily and have the advantage that they may be applied to a quite diverse sets of structures.

In a recent paper one of the authors (A. N. P.) has computed standard values of DH_{f}, Sº and DG_{f} for a representative set of acyclic and aromatic compounds by means of the semiempirical all-valence MNDO, AM1 and PM3 molecular orbital methods [19]. It was demonstrated the existence of quantitative relationships between experimental data and theoretical results, although some inconsistencies were noted.

The aim of this study is to present the results of the estimations of these three fundamental thermodynamic parameters by exploring the performance of the elemental atom and bond-type molecular descriptors, comparing the resulting predictions with those reported for the three semiempirical methods. In contrast to *GCM*, where each structure is dissected in all its constituent fragments (being a different number for each molecule) and using all of them to predict the property value, in *QSPR* one has to select among all the fragments resulting from the entire set of compounds, the most representative and common contributing the most to the property being modeled. For this purpose we resort to the recently proposed *RM* technique, which is described in next section. Then, we report results for the predictions of thermodynamic functions and discuss them with respect to previous calculations. Finally, we state our main conclusions derived from this study.

**Materials and Methods**

*Chemical Data*

Using a set of 33 atom and bond types we represented the molecular structure for the three thermodynamic properties under study. Although many *QSPR* studies have been proposed to predict thermodynamical properties, most of them rely on homogeneous type of calibration sets, since in this way the predictions are better. In the present report, we use a diverse data set of compounds. The 163 DH_{f} shown in Table 1 were extracted from the MOPAC manual [20], in order to compare the model predictions with the reported MNDO, AM1 and PM3 calculations. We chose the molecules appropriately in order to create a balanced calibration set, that is, including in the set approximately the same number of molecules possessing a given functional group in its structure. The experimental values of 40 Sº and 37 DG_{f} were both obtained from Pankratov’s paper [19] since the MOPAC manual does not report experimental values for them.

**Table 1**. Details of the descriptors appearing in the present study.

*Computer Software*

We computed the best calibration models with the Replacement Method [21], which constitutes a good approximation to the combinatorial search (*FS*) of variables and allow studying a pool *D* containing thousands of them. This new procedure simply consists in replacing a chosen variable of the regression by another that minimizes *S*. The method is as follows: choose d descriptors {X_{1},X_{2},…,X_{d}} at random and do a linear regression. Choose one of the descriptors of this set, say X_{i}, and replace it for each of the D descriptors of the pool (except itself) keeping the best resulting set. Since one can start replacing any of the d descriptors in the initial model, then a regression equation with d variables has d possible paths to achieve the final result; for example, the choice above will develop into path i. Next, choose the variable with greatest relative error in its coefficient (except the one replaced in the previous step) and replace it with all the *D* descriptors (except itself) keeping again the best set. Replace all the remaining variables in the same way bypassing those replaced in previous steps. When finishing, start again with the variable having greatest relative error in the coefficient and repeat the whole process. Repeat this process as many times as necessary until the set of descriptors remains unchanged. At the end, we have the best model for the path *i*. Proceed in exactly the same way for all possible paths *i* = 1,2,…,d, compare the resulting models, and keep the best one. Our numerical experiments show that in this way one obtains a model almost as good as the best achieved with* FS*.

**Results and Discussion**

It is very common in the *QSPR* theory to built models not containing an excessive number of descriptors, in order to make possible the interpretation of the established relationship in terms of interaction mechanisms [22].

A simple procedure to control the model expansion is by plotting the correlation coefficients of calibration (*R*) and leave-one-out cross-validation (*R*_{loo}), as a function of the number of variables present in the model. By analyzing the plot, it is found that the statistical parameters of the model improve up to a certain point called “break point”, beyond that, the improvement can be considered negligible because the relative change in the parameters is less important. Consequently, the model corresponding to the break point is supposed to be the best model in the set of variables analyzed.

Figure 1 shows this type of plot for DH_{f} with a break point appearing at 8 variables included in the model. The statistical parameters of the equation are the following:

*D**H _{f}* = 24 (±3) + 19 (±2)

*C*– 12.0 (±0.8)

*H*– 20 (±2)

*Cl*– 48 (±2)

*O*+ 44 (±4)

*F-N*– 7 (±1)

*C-C*– 60 (±2)

_{arom}*F*+ 88 (±5)

*NO*(1)

_{2} *N *= 163, *R *= 0.9553, *S *= 19.173 Kcal.mol^{-1}, *F *= 200.924

*R _{loo}* = 0.9468, S

_{loo}= 20.307 Kcal.mol

^{-1}

*R _{l-25%-o}*

_{ }= 0.8200,

*S*

_{l-25%-}_{o}= 23.240 Kcal.mol

^{-1}

**Figure 1**. Model selection for DH_{f}.

where the leave-many-out technique was studied for 100000 cases of exclusion of compounds generated at random (for any of the three thermodynamic properties considered). Details of the descriptors appearing in all the equations are given in Table 1. Figure 2 plots the dispersions (difference between the experimental and predicted values of the property) as a function of the experimental property, revealing that the deviations are randomly distributed and not following any kind of pattern, also suggesting that data clustering is absent.

**Figure 2**. Dispersion plot for (2).

From Figure 3 it is clear that is possible to correlate the 163 DH_{f} with atoms and bonds but with great dispersion in the predicted values *(S*). This can be better seen in Table 2, where the absolute errors in most of the cases are much greater than those achieved with any of the three semiempirical methods. It is well known that experimental uncertainties for DH_{f} are around 2-3 kcal.mol^{-1} [20,23-26].

The second property to consider is *Sº*; the set comprising 40 compounds is given in Table 3, and the corresponding plot for *R* and Rloo in Figure 4. The best regression found is for 8 variables not manifesting clustering of data (see Figure 5).

**Figure 3**. Linearity of model (2).

**Table 2**. Experimental and predicted DH_{f} (Kcal.mol^{-1}) with associated errors when using semiempirical methods.

Error = absolute value of the difference between predicted and experimental property.

**Table 3**. Experimental and predicted values of Sº (cal.mol^{-1}.K^{-1}) and DG_{f} (Kcal.mol^{-1}).

(a)Number as in Ref. [19].

**Figure 4**. Model selection for Sº.

**Figure 5**. Dispersion plot for (3).

*Sº* = 46 (±1) + 3.5 (±0.2) *C* + 5.3 (±0.4)* C-C* + 11 (±2)* I* + 5.8 (±0.8) *O* + 12 (±1)* S* + 4 (±1) *N-H* + 7 (±2) *Cl* + 10 (±2) *Br* (2)

*N* = 40, *R* = 0.9813, *S* = 2.946 cal.mol^{-1}.K^{-1}, *F* = 101.037

*R _{loo}*= 0.9722,

*S*= 3.162 cal.mol

_{loo}^{-1}.K

^{-1}

R_{l-20%-o} = 0.8659, S_{l-20%-o} = 6.925 cal.mol^{-1}.K^{-1}

Other thermodynamical quantities under study are 37 DG_{f}. According to Figure 7, the best model has again 8 descriptors, with predictions randomly distributed in Figure 8 and indicated in Table 3.

*D**G _{f}* = 4 (±3) + 9.1 (±0.5)

*C*– 4.1 (±0.5)

*H*+ 14 (±3)

*N*– 41 (±3)

*C-O*– 45 (±4)

*C=O*+ 32 (±5)

*C-C*– 14 (±4)

_{triple bond}*Cl*(3)

*N* = 37, *R* = 0.9869, *S* = 6.754 Kcal.mol^{-1}, *F* = 130.681

*R _{loo}* = 0.9795,

*S*= 7.330 Kcal.mol

_{loo}^{-1}

*Rl _{-20%-o}*= 0.9000, Sl

_{-20%-o}= 8.538 Kcal.mol

^{-1}

**Figure 6**. Linearity of model (3).

**Figure 7**. Model selection for DG_{f}.

**Figure 8**. Dispersion plot for (5).

**Figure 9**. Linearity of model (5).

Tables 2 and 3 reveal that, as was the case for DH_{f}, the constitutional descriptors are not enough to obtain better predictions for *Sº* and DG_{f} than those achieved with the more sophisticated molecular orbital theory methods. Additional structural information is needed to describe better the behavior of the thermodynamical quantities, like that present in more elaborated topological indices, geometrical or electronic variables. Despite these results, the naive descriptors showed to be quantitative and able to establish relationships with the properties, although not as good as desired. In the three cases considered a linear equation was found suitable as the mathematical function of the model, according to the proximity of *R* to unity.

**Conclusions**

We have presented a rather simple and direct calculation scheme to derive thermodynamic parameters, such as heat of formation, entropy and Gibbs free energy. This approach is quite different to the molecular orbital theory, which have to make some involved calculation procedures to build the Fock matrix. The choice of topological descriptors is based on the most intuitive chemical concepts: atoms and chemical bonds. Comparisons between the theoretical estimations and experimental data revealed here that this sort of elementary descriptors have to be complemented with others derived in a different way, that is to say, taking into account geometrical and electronic characteristics of the molecular structure. At present, research along this line is under development in our laboratories and results will be published elsewhere in the forthcoming future.

**Acknowledgment**

P. R. D. acknowledges financial support from CONICET, RA.

**References**

[1] Huang, K., Statistical Mechanics, Wiley, New York, **1963**. [ Links ]

[2] Reif, F., Fundamentals of Statistical and Thermal Analysis, McGraw-Hill, New York, **1965**, pp. 2-4. [ Links ]

[3] Barrow, G.M., Physical Chemistry, McGraw-Hill, New York, **1961**. [ Links ]

[4] Stull, D.R.; Westrum, E.F.; Sinke, G.C., The Chemical Thermodynamics of Organic Compounds, Wiley, New York, **1969**. [ Links ]

[5] Gurvich, L.V.; Karachevtsev, G.V.; Kondrat’ev, V.N. Lebedev, Yu.A.; Medvedev, V.A.; Potapov, V.K.; Khodeev, Yu.S., Energies of Chemical Bonds Splitting Ionization Potentials and Electron Affinity, V. N. Kondrat’ev, Ed. Nauka, Moscow,** 1974**. [ Links ]

[6] Randic, M.; Trinajstic, N. New J. Chem., **1994**, 18, 179. [ Links ]

[7] Randic, M., New J. Chem., **1996**, 20, 10019. [ Links ]

[8] Estrada, E.; Ivanciuc, O.; Gutman, I.; Gutiérrez, A.; Rodríguez, L., New J. Chem., **1998**, 22, 819. [ Links ]

[9] Predict, http://www.mwsoftware.com/dragon/desc.html [ Links ]

[10] ChemEng Software Design, http://www.cesd.com/chempage.htm [ Links ]

[11] Artist, http://www.ddbst.de/new/Win_DDBSP/frame_Artist.htm [ Links ]

[12] Sablijc, A.; Trinajstic, N., Acta Pharm. Jugosl., **1981**, 31, 189. [ Links ]

[13] Hansen, P.J.; Jurs, P.C. J. Chem. Educ., **1988**, 65, 574. [ Links ]

[14] Duchowicz, P.R.; Castro, E.A., Arkivoc, **2001**, 2, 227. [ Links ]

[15] Duchowicz, P.R.; Castro, E.A., J. Indian Chem. Soc., **2001**, 78, 192. [ Links ]

[16] Duchowicz, P.R.; Castro, E.A., J. Korean Chem. Soc., **2000**, 44, 501. [ Links ]

[17] Duchowicz, P.R.; Castro, E.A., Acta Chem. Slov., **2000**, 47, 281. [ Links ]

[18] Duchowicz, P.R.; Castro, E.A., J. Korean Chem. Soc., **1999**, 43, 621. [ Links ]

[19] Pankratov, A.N., Afinidad, **1999**, LVI, 257. [ Links ]

[20] MOPAC, http://www.cachesoftware.com/mopac/Mopac2002manual/table_of_heats.html [ Links ]

[21] Duchowicz, P.R.; Castro, E.A.; Fernández, F.M., Commun. Math. Comput. Chem.(MATCH), **2006**, 55, 179. [ Links ]

[22] Katritzky, A.R.; Fara, D.C.; Karelson, M., Bioorg. Med. Chem., **2004**, 12, 3027. [ Links ]

[23] Mercader, A.; Castro, E.A.; Toropov, A.A., Int. J. Mol. Sci., **2001**, 2, 121. [ Links ]

[24] Mercader, A.; Castro, E.A.; Toropov, A.A., Chem. Phys. Lett., **2000**, 330, 612. [ Links ]

[25] Gavernet, L.; Firpo, M.; Castro, E.A., Rev. Roum. Chim., **1998**, 43, 1079. [ Links ]

[26] Vericat, C.; Castro, E.A., Egyp. J. Chem., **1998**, 41, 109. [ Links ]