SciELO - Scientific Electronic Library Online

vol.41 número4Optimization of the leaching conditions of an autoclave: Application to the dissolution of ferrocolumbite from san luis province, ArgentinaSoret and dufour effects on heat and mass transfer due to a stretching cylinder saturated porous medium with chemically-reactive species índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados




  • No hay articulos citadosCitado por SciELO

Links relacionados

  • No hay articulos similaresSimilares en SciELO


Latin American applied research

versión impresa ISSN 0327-0793

Lat. Am. appl. res. vol.41 no.4 Bahía Blanca oct. 2011



On the use of continuous distribution models for characterization of crude oils

G. M. Xavier†, K. M. Boaventura† and F.C. Peixoto†

† Centro de Pesquisas e Desenvolvimento Leopoldo A. M. de Mello (CENPES - PETROBRAS),
Horácio de Macedo 950, Cidade Universitária - Ilha do Fundão - Rio de Janeiro, 21941-915, Brasil;

† Departamento de Engenharia Química e de Petróleo, Escola de Engenharia, Universidade Federal Fluminense,
Rua Passo da Pátria, 156 bloco D sala 307 - São Domingos - Niterói, 24210-240, Brasil

Abstract — Crude oil characterization plays a key role in upstream as well downstream operations of petroleum supply chain It is usually carried out by a batch distillation process known as true boiling point (TBP) distillation, which represents a "footprint" of the crude oil composition profile, once its shape depends on the amount and volatility of components in a given crude oil. In the last decades, crude oil characterization methods by continuous distribution models have been proposed, as an option to the classic (discrete) pseudo component approach. The comparative performance of five continuous distribution models - Beta, Gamma, Riazi, Weibull and Weibull extreme - in characterizing the TBP crude oil distillation curve is presented in this work. A large TBP database of different types of Brazilian crude oil is used to identify the optimal characterization parameters of these models by a least-squares statistical criterion. The modeling performance of each continuous distribution model was measured using statistical estimators. The Weibull extreme model presented the most adequate performance in terms of the root mean squared error (RMSE) for all crude oils. In general, the model parameters uncertainties increase with the crude oil API density, despite the reversed behavior shown by Gamma model.

Keywords — Continuous distribution functions; Crude oil; Characterization; Parameters confidence regions.


In the crude oil supply chain, different crude oil characterization systems are used for several different decision processes across upstream operations (i.e. exploration and production) as well downstream operations (marketing, transport and refining).

Crude oil assays are one of the most widely used characterization systems, which are the primary data source for petroleum refineries and crude oil marketers. Crude oil assays are used to know the potential distillation yields and physico-chemical properties that a crude oil will produce when distilled. Process operating conditions analysis and crude oil economical evaluations are often based on these results.

True boiling point (TBP) distillation, which reports experimental data of cumulative mass or volume distillation yields according to the boiling point temperatures, is largely employed in crude oil assays. As distillation curves are based on several experimental temperature-yield data, there is a need of interpolating those using cubic splines methods or probability distribution functions (Nelson, 1968; Whitson, 1983), which offers more accurate adjustment.

Literature presents many distribution functions that have been used for calculations on petroleum industry. Gamma distribution was used to characterize crude oil and the parameters of the fitted distribution were related to API gravity in Behrenbruch and Dedigama (2007). Sánchez et al. (2007) analyzed 25 probability distribution functions for a total of 137 distillation curves and Weibull extreme, Kumaraswamy and Weibull functions were found to be the best distribution functions for fitting distillation data. Riazi et al. (2004) presents different options for characterization of different types of crude oils and also appropriate methods of basic parameter estimation for thermodynamic models.

The present study was developed with the main objective of compare the fitting performance of five probability distribution functions - Beta, Gamma, Riazi (Riazi et al., 2004), Weibull and Weibull extreme (Sánchez et al., 2007) - for Brazilian crude oil characterization.


A mathematical function, called distribution function (DF), can be used to model the frequencies of occurrences of entities or events. If the event is discrete, it can be biunivocaly associated to a discrete random variable and the distribution function maps the frequency of occurrence of each value of the variable; if the event is continuous, it can be associated to a continuous random variable and integrals of the distribution function maps the frequency the variable falls within intervals (Abramowitz and Stegun, 1965). Once boiling points are continuous variables, only continuous distribution functions will be analyzed on this study.

The probabilities of a continuous random variable are modeled using continuous distribution functions. In this context, cumulative distribution function (CDF) can also be used to map the amount of entities that has boiling points lower than a particular temperature. Cumulative distribution functions generally exhibit the same behavior that is observed with distillation curves.

On the present study, five cumulative distribution functions were used for fitting Brazilian crude oil TBP distillation data, Beta, Gamma, Riazi, Weibull and Weibull extreme, respectively given by:


where G is the gamma function and A, B, C and D are distribution parameters which must be statistically fitted to experimental data. In this work, we assumed A = 0, in Eq. (5), which means that there is no reason to assume that there is a lower limit for x.


True boiling point distillation data consisting of 29 narrow cuts of 41 Brazilian crude oils were obtained from PETROBRAS proprietary database of crude oil assays. The experimental distillation data were obtained at PETROBRAS R&D center using standardized methods (e.g. ASTM D 5307 and ASTM D 2892).

As suggested by Sánchez et al. (2007), temperature was changed to a dimensionless form through the equation below:


where x is the dimensionless temperature, T is the actual boiling point and Ti and Tf are initial and final reference temperatures, 111.7 K and 1273 K, respectively.

The most common way to statistically fit parameters such as those in Eq. 1 to Eq. 5 is the maximum likelihood criteria, which leads to coherent and unbiased estimators. Once there was no information regarding replication in the experimental database employed, we could not use different weights in the regression, and opted for the well-known least-squares method (Montgomery et al., 2006). For that, an objective function comprising the sum of the squares of the deviation of each of the five CDF models from each correspondent experimental data was built, and the function from MATLAB® was used to fit all parameters.

As an example, Fig. 1 shows the comparison of TBP experimental and predicted values for a crude oil using all five distribution functions analyzed on the present study, and Fig. 2 presents its respective residual values.

Figure 1. Experimental (?) and predicted ( Beta, + Gamma, ? Riazi, * Weibull and ? Weibull extreme) mass yield values for a crude oil

Figure 2. Residual values for Beta, + Gamma, ? Riazi, * Weibull and ? × Weibull extreme functions

Once parameters were fitted for each CDF and each crude oil, the root mean square error for each case was calculated as follows:


where n is the number of experimental data, p is the number of the distribution function parameters, yexp and ycal are the experimental and calculated yield values, respectively. It is also a coherent and unbiased estimator for the standard deviation of the model.

Confidence regions are important tools for the analysis of fitting reliability. They are pictorial representations of the region (in this work, in percent deviations from the fitted value), in which there is a fixed level of confidence in stating that the real value of the parameter lies. Therefore, parameter confidence regions at level 95% were constructed as follows:


where ß is the vector of fitted parameters, J is the Jacobian matrix of the model with respect to parameters and 95% is the abscissa that is greater than 95% of the values of a Fisher variable with degrees of freedom p and n - p.

It can be shown that the kernel matrix in Eq. (8) is positive-definite, which leads to the fact that the region is an ellipse. Additionally, the right side of the equation, divided by each diagonal element of the kernel matrix, can be used to generate lower and upper limits for each parameter, within the confidence expected.


In Fig. 1 and Fig. 2, it can be seen that all five models exhibit adequate adherence to experimental data, which turns the qualitative selection of one into a difficult task. Therefore, some further analyses were conducted, mainly devoted to variance and covariance estimation.

Table 1 shows typical values for the RMSE and the correlation coefficients (R2) for each model.

Table 1. Distribution Functions Statistical Parameters

Figure 3 shows the RMSE for API values lower and higher than 35. Finally, Fig. 4 and Fig. 5 depict the parameters confidence regions for different API densities.

Figure 3. Model RMSE versus crude API density ( Beta, + Gamma, ? Riazi, * Weibull and ? Weibull extreme); Dash-dot line (-.) represents the distillation method reproducibility

Figure 4. Parameters confidence regions for Beta, Gamma, Riazi and Weibull functions

Figure 5. Parameters confidence regions for Weibull extreme function (A = 0)


In Table 1, it can be seen that Weibull extreme provides the highest R2 and the lowest RMSE. Figure 3 indicates that Beta, Riazi, Weibull and Weibull extreme distribution functions exhibits better adherence to experimental data for API lower than 35°API. In the same figure, it can be seen that Gamma distribution function presents adequate adherence in the whole API density range considered.

Figures 4 and 5 show that, in general, model parameters uncertainties increase with API density, which can be noticed by the increasing area in the related confidence regions. It must be said that Gamma model was an exception.

Results obtained for the set of 41 Brazilian crude oils and 1189 distillation data points show that Weibull extreme model presents the best performance within the models in terms of correlation coefficient and root mean squared error and this corroborates with the findings of Sánchez et al. (2007). One could credit this behavior to the fact that this is the only model with four adjustable parameters, whilst all other ones exhibit only two. However, related confidence regions show that this was not the reason, since no prohibitively high areas were observed and, more important, no sharp ellipses were found, what could indicate high parameter covariance.

All models show a better fitting performance for crude oils with API density below 35°API, behavior that is not followed by Gamma model which presents virtually the same capability for both API density ranges.

The preliminary results presented here empirically support the use of continuous distribution models for crude oil TBP profile modeling. Also, as pointed by Nelson (1968), there is even theoretical reason to expect that this would be true.

Although the true mechanism of petroleum formation is unknown; there is some agreement that it involves several chemical or biochemical reactions. The product of an organic chemical reaction tends to be of a particular type or size of molecule. Meanwhile, by side reactions (or chance) other higher boiling and lower boiling products are also formed but in smaller amounts, and the more these side products differ from the primary product of the reaction, the smaller the amount of them.

This kind of phenomena is usually well represented by probability distribution models, but further studies are still necessary in order to compare the statistical performance of this approach with others usually employed in the crude oil TBP profile modeling (i.e. interpolating using cubic splines or Hermite polynomials).


A, B, C and D = Distribution parameters
ß = Distribution parameters
= Jacobian matrix
n = Number of experimental data points
p = Number of the distribution function parameters
G = Gamma function
T = Actual boiling point
Ti, Tf = Initial and final reference temperatures
x = Dimensionless temperature
yexp, ycal = Experimental and calculated yield values
= Fisher distribution abscissa for confidence region calculations
~ = Fitted parameter

Karla Boaventura and Gilberto Xavier gratefully acknowledge PETROBRAS for the information disclosure permission.

1. Abramowitz M. and I.A. Stegun, Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables, Dover Publications (1965).         [ Links ]
2. Behrenbruch, P. and T. Dedigama, "Classification and characterisation of crude oils based on distillation properties", J. Pet. Sci. Eng., 57, 166-180 (2007).         [ Links ]
3. Montgomery, D.C., G.C. Runger and N.F. Hubele, Engineering Statistics, Wiley (2006).         [ Links ]
4. Nelson, W.L., "Does crude boil at 1400oF?", The Oil and Gas J., 67, 125-126 (1968)         [ Links ]
5. Riazi, M.R., H.A. Al-Adwani, A. Bishara, "The impact of characterization methods on properties of reservoir fluids and crude oils: options and restrictions", J. Pet. Sci. Eng., 42, 195-207 (2004).         [ Links ]
6. Sánchez, S., J. Ancheyta, W. C. McCaffrey, "Comparison of Probability Distribution Functions for Fitting Distillation Curves of Petroleum", Energy & Fuels, 21, 2955-2963 (2007).         [ Links ]
7. Whitson, C.H., "Characterizing Hydrocarbon Plus Fractions", Soc. Pet. Eng. J., 23, 683-694 (1983).         [ Links ]

Received: December 30, 2009
Accepted: August 17, 2010
Recommended by subject editor: Orlando Alfano