SciELO - Scientific Electronic Library Online

 
vol.13Radial percolation reveals that Cancer Stem Cells are trapped in the core of colonies índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

  • No hay articulos citadosCitado por SciELO

Links relacionados

  • No hay articulos similaresSimilares en SciELO

Compartir


Papers in physics

versión On-line ISSN 1852-4249

Pap. Phys. vol.13  La Plata ene. 2021

http://dx.doi.org/https://doi.org/10.4279/pip.130001 

Artículos

A method for continuous-range sequence analysis with Jensen-Shannon divergence

M. A. Ré

G. G. Aguirre Varela

1 Centro de Investigacion en Informatica para la Ingeniera, Universidad Tecnologica Nacional, Facultad Regional Cordoba, Maestro Lopez esq. Cruz Roja Argentina, (5016) Cordoba, Argentina.

2 GFA - Facultad de Matematica, Astronoma, fisica y Computacion, Universidad Nacional de Cordoba, Av. Medina Allende s/n, Ciudad Universitaria, (5000) Cordoba, Argentina.

3 Instituto de Fsica Enrique Gaviola (IFEG), Facultad de Matematica, Astronoma, Fsica y Computacion, Universidad Nacional de Cordoba, Ciudad Universitaria, (5000) Cordoba, Argentina.

Abstract

Mutual Information (MI) is a useful Information Theory tool for the recognition of mutual dependence between data sets. Several methods have been developed fore estimation of MI when both data sets are of the discrete type or when both are of the continuous type. However, MI estimation between a discrete range data set and a continuous range data set has not received so much attention. We therefore present here a method for the estimation of MI for this case, based on the kernel density approximation. This calculation may be of interest in diverse contexts. Since MI is closely related to the Jensen Shannon divergence, the method developed here is of particular interest in the problems of sequence segmentation and set comparisons.

Keywords: entropic distance; sequence segmentation; Jensen Shannon divergence

Introduction

Mutual Information (MI) is a quantity whose theoretical base originates in Information Theory[1].

Since MI between two independent random variables (RV) is zero, a non-null value of MI between these variables gives a measure of mutual dependence. When analyzing two data sets X and Y (assumed to be the realization of two mutually dependent RVs) MI can give us a measure of the mutual dependence of these sets. Although MI may be straightforwardly calculated when the underlying probability distributions are known, this is not usually the case when only the data sets are available. Therefore, MI must be estimated from the data sets themselves. When X and Y are the discrete type, MI may be estimated by substituting the joint probability of these variables by the relative frequency of appearance of each pair (x; y) in the data sequence [2, 3]. For real value data sets (or the discrete type with a wide range) estimation of MI by frequency of appearance is not applicable. The binning method [4] in turn requires large bins or large sequences in order to produce reasonable results. Alternative proposals have been made for cases when both data sets are the continuous type [5].

Estimation of MI between a discrete RV and a continuous one has not been so extensively considered, in spite of being a problem of interest in diverse situations. For instance, we could compare the day of the week (weekday-weekend, discrete) with trac ow (continuous), quantifying this effect. In a dierent context we might wish to quantify the efect of a drug (administered or not, discrete) in medical treatment evaluation (electroencephalograms in epilepsy, continuous data). Ross[6] has proposed a scheme for estimating MI based on the nearest neighbour method [4]. Assuming a sequence of (x; y) pairs, with X being discrete and Y continuous, the nearest neighbour method requires the pairs to be ordered by the Y values. This requirement makes the proposal impractical, in sequence analysis for instance. The nearest neighbour method was also considered by Kraskov et al. [7]. In their paper they suggest two ways of evaluating MI with this method. An alternative de nition for MI is presented by Gao et al. [8], also based on the distance between the elements of the sequence. In this paper we propose a more direct method for estimating MI between a discrete and a continuous data set, based on the kernel density approximation (KDA)[4] for estimating the probability density function (PDF) of the continuous variable. For the discrete variable we make use of the usual frequency approximation [2,3]. Finally, MI is computed by the Monte Carlo integration.

As shown by Grosse et al. [2] MI can be identi ed with the Jensen Shannon Divergence (JSD), a measure of dissimilarity between two probability distributions. JSD is a non-negative functional that equals zero when the distributions being compared are the same. This property makes JSD a useful tool for sequence segmentation [2, 3]. Furthermore, in diverse contexts it is of interest to evaluate whether a given sequence matches a particular probability distribution. The most usual case is that of a normal distribution. Nevertheless, this is a more general problem. For instance, in satellite synthetic aperture radar (SAR) images the backscatter presents a multiplicative noise assumed to have an exponential distribution [9]. Also, models for cloud droplet spectra assume a Weibull distribution [10,11]. Several indirect methods have been proposed for analysis of continuous range sequences. Pereyra et al. [12] outlined a method based on wavelet transform to analyze electroencephalograms. Recently, Mateos et al. [13] have proposed a mapping from continuous value sequences into discrete state sequences previous to JSD calculation. Several other mapping methods have been proposed in the literature to associate a discrete probability distribution with a real value series.

Here, by means of the KDA we avoid resorting to any indirect method, approximating the probability densities of continuous range variables by this non parametric method. In section II we present the calculation of MI and the arrangement for sequence segmentation with JSD. In section III we test the perfomance of this method through numerical experiments. Also considered is application of the method in edge detection in a satellite synthetic aperture radar (SAR) image. In section IV we consider the results obtained.

Method

In this section we present our proposal for estimating MI between discrete and continuous RVs, based on the KDA estimator of a PDF. Let us consider a sequence of pairs (x; y) with x as a variable of discrete range and y of continuous range. To calculate MI we resort only to the sequence itself, making use of no extra information. We start from the sequence of data pairs (x; y), and assume that these data are sampled from a joint probability density (x; y), although unknown at rst. From the joint PDF the marginal probabilities

Note that if the variables X and Y are statistically independent then (x; y) = p (x) - (y), and in this case I (X; Y ) = 0. In this way a value of I (X; Y ) 6= 0 gives a measure of the mutual dependence of these variables. We may rewrite I (X; Y ) in terms of the conditional PDFs

Figure 1: Kernel Density Approximation (KDA) for the Probability Density in (10) calculated from 1000 pairs generated by the Monte Carlo method. For plot A ym = 1, while for plot B ym = 5. In both cases g = 1. Solid lines correspond to the analytic function and dashed lines to the KDA. 

i Kernel density approximation

To carry out the calculation in (4), knowledge of the conditional PDFs is necessary. As mentioned, these densities are assumed to be unknown, and have to be estimated from the data themselves. Here we make use of the KDA [4], as summarized in the following. The conditional PDFs in Eq. (3) are estimated considering separately each data set pair with a given value of x. We de ne the set

We illustrate the results obtained with the KDA by an example: let us consider the joint probability distribution u (x; y)

We sampled 1000 pairs from this distribution for two diferent values of ym, and from these pairs we made an estimation of the conditional PDFs using the KDA. In Fig. 1A and 1B we plot the probability functions in (10) and (11) for two values of ym and the corresponding approximations.

Figure 2: The segmentation problem. Consider a sequence S made up of two stationary subsequences S1 and S2, with n1 and n2 elements respectively. The problem consists in determining the value of n1; i.e., the point when the statistical properties change. 

ii Monte Carlo integration

After approximating the PDFs we have to compute the integrals in (4) to estimate MI. We recognize in these integrals the expectation value

Here the sum is again restricted to the yj values associated with a particular x value. Note that in this sum we make use of the kernel approximation of the conditional PDFs in (6). Substituting both approximations we nally get

iii Sequence segmentation

The JSD is a measure of dissimilarity between probability distributions. Originally proposed by Burbea and Rao [16] and Lin [17] as a symmetrized version of Kulback Leibler divergence [1,18], a generalized weighted JSD between two PDFs, f1; f2 is dened as

with i arbitrary weights satisfying 1 + 2 = 1. Here H is Gibbs Shannon entropy, dened for continuous range variables as

With these identications, the functionals in (15) and (4) are the same. The JSD and several generalizations have been succesfully applied to the sequence segmentation problem, the partition of a non-stationary sequence into stationary subsequences, for discrete range sequences. We propose here the extension of this method to continuous range sequences without resorting to discrete mapping, wavelet decomposition or any other indirect method of estimation of the probability distribution.

The procedure for sequence segmentation may be stated in the following way: let us consider a sequence S with n elements made of two stationary subsequences S1 and S2, with n1 and n2 values respectively (n1 + n2 = n), schematically illustrated

Figure 3: The sliding window method. A sliding window is dened for sequence segmentation. The window is divided into two subwindows of equal size. The center of the window is considered as the window position. The window is displaced along the sequence and the JSD between the subwindows is calculated. The segmentation point is identied as the window position at which JSD has its maximun value. 

Figure 4: Mutual information estimation for the joint distribution in (10). For the distribution in (10) the dots represent the average MI value for 100 data sets of 1000 (x; y) pairs each, with the bars indicating the standard deviation of each set. The black line is the analytical value of MI: A) as a function of the mean value ym in (10b)(the inset shows the distribution of MI for a particular value of ym for a dependent and an independent set), and B) changing g, the standard deviation in (10b). The inset shows the same plot but in log-log scale to highlight the MI value for independent sets. 

in Fig. 2. The aim is to determine the value of n1; i.e., the position of the last element in S1. In the algorithm proposed here we dene a sliding window of xed width over the sequence. The window is divided into two segments, each including 1 elements (see Fig. 3). We dene the window position as that of the last element in the left section of the window. This window is displaced over the sequence and the window position where JSD reaches its maximun value is taken as the segmentation point.

III Assessment results

In this section we present the results of our assessment of the proposed method by considering two applications: the detection of mutual dependence between two RV sequences and the segmentation of a sequence.

In the rst case we generate sequences of two jointly distributed variables: one of discrete range and one of continuous range, and then we compute MI between these variables. In the second case we consider sequences made of two subsequences generated from diferent distributions. We detect the segmentation point following the procedure described in the previous section. We also apply the method to detect the edges between homogeneous regions in SAR images.

i Mutual information between a discrete and a continuous variable

We computed the MI between discrete and continuous variables. We generated 100 data sets, sampling 1000 (x; y) pairs from the distribution in (10) with diferent values of ym or ug, and from the joint distribution

with diferent values of ym or a. We estimated the MI, I (X; Y ), from each set by the method described in the previous section. Given that we are sampling the data pairs from known distributions, we are also able to calculate MI from the analytical expressions. In this way we may compare the results obtained from the approximation with the corresponding analytical results.

Figure 5: Mutual information estimation for the joint distribution in (19). For the distribution in (19) the dots represent the average MI value for 100 data sets of 1000 (x; y) pairs each, with the bars indicating the standard deviation. The black line is the analytical value of MI while the dots represent the Kernel Density Approximation (KDA) values; A) as a function of mean value ym in (19b), and B) changing the width parameter a in (19b). 

In addition, we calculated the MI for samples of statistically independent variables to establish a significance value for the MI of the dependent variables. The analytical value in this case is zero, as already mentioned. The results of the calculation are shown in Fig. 4 for the distribution in (10) and in Fig. 5 for the distribution in (19), respectively. We include the average value of MI over the 100 data sets for the diferent values of the parameters, and the bars correspond to the standard deviation in each set. A small underestimation of the MI value can be seen in this last case. This may be attributed to a shortcoming of the KDA at the borders of the interval of the uniform distribution. Nevertheless, it is still possible to detect mutual dependence between the discrete and the continuous value sequences.

To consider the efect of sample size, we repeated the experiment with the distribution in (10) for different values of n, the number of data pairs in each set. We again generated 100 data sets of n data pairs each. The results are shown in Fig 6 for three sets of parameters. A slightly increasing overestimation of MI can be appreciated as n decreases. Finally, we considered an usual situation when there is only one sample of data pairs available. We sampled 1000 pairs from the distribution in (10), the distribution in (19) and from the distribution

Figure 6: Mutual information estimation for the distribution in (10). For the distribution in (10) the dots represent the average value for 100 data sets of diferent numbers of (x; y) pairs. Bars indicate the standard deviation, and dashed lines represent the analytical values of MI for the diferent sets of parameters. 

Figure 7: Segmentation point in artificial sequences. The JSD average computed for 500 sequences generated from Rayleigh distributions. Each sequence has a length of 500 elements divided into two subsequences with 250 elements each. The ratio of the mean values of the subsequences is given by rm = ml=mr = 5, where ml and mr are the mean values in the left and right subsequences, respectively. The sequences are analyzed with diferent window widths (ww). In all cases the window position (wp) of the maximum JSD average is at the segmentation point. 

Figure 8: Segmentation point in artifcial sequences. The JSD average computed for 200 sequences generated from Rayleigh distributions. Each sequence has a length of 500 elements divided into two subsequences with 250 elements each. Diferent values of the mean quotient rm = mr=ml are considered, where mr is the mean value of the right subsequence and ml the mean value of the left subsequence. In all cases a window width of 50 elements was used. Even for the lowest quotient value the window position (wp) of the maximum JSD average is coincident with the segmentation point. 

for each sample we generated 100 data sets of 1000 pairs of independent variables. The discrete values were sampled from the distribution

ii Sequence segmentation

To test the sequence segmentation method, we generated sets of 500 sequences of 500 values each, divided into two subsequences with 250 values in each one. The sequences were generated from Rayleigh distributions with a diferent mean value for each subsequence. The mean value of the rst subsequence is denoted by ml , and the mean value of the second segment by mr ; we de ne the ratio of the mean values as rm = ml=mr. Using the sliding window method, we analyzed a set with rm = 5 with several window widths. In Fig. 7 we present the average value across the 500 sequences of JSD at each window position for the difeerent widths considered. The average JSD has a maximum value at position 250, the segmentation point, even for a narrow window with 20 elements (10 elements in each subwindow), although in this case statistical uctuations are more noticeable. To test the sensitivity of the method we generated sets with rm = 1:2; 1:5; 2; 5; 10. The results of the algorithm, with a window of 50 elements, are included in Fig. 8. Even for the smallest ratio considered, the segmentation point can be detected. Finally, we present an example of application of the segmentation algorithm to detect the edge between homogeneous regions in a SAR image. In SAR images the backscatter is aected by speckle noise (a multiplicative noise). This noise in the backscatter amplitude is modelled by a Rayleigh distribution in homogeneous regions. In Fig. 9 we include a section of the SAR image of an Antarctic region, and the boundary detected between water and ice. On the right a plot of the values of the backscatter amplitude of the highlighted lines in the image and the JSD is included. There is good coincidence of the detected boundary with the contour in the image.

Table 1: Mutual information and significance value. MI of the sampled dependent sequences (see text) and the corresponding significance values computed from the independent sets. 

IV Discussion and conclusions

In this paper we have presented a method for computing Mutual Information (MI) between discrete and continuous data sets, or alternatively, the JSD between continuous range data sets. The algorithm developed gives a measure of dissimilarity without resorting to an indirect method like those proposed in [12, 13]. Neither is it necessary to have the continuous values ordered as in the nearest neighbour method [4, 6]. In fact, the calculation in (14) is based only on the registered data as they were recorded.

The measure may be applied to two similar problems. On the one hand we can quantify the mutual dependence between discrete and continuous data sets, and on the other hand we can quantify the dissimilarity between two continuous data sets, as discussed in section II. In section III we applied the method to artificially-generated pairs of variables, nding good agreement with the corresponding analytical values as shown in Figs. 4 and 5, although systematic underestimation occurs mainly when the diference is given by the width in uniform distributions (g 5-B). We attribute this discrepancy to the abrupt decay of the uniform distribution at the borders of the interval, while the KDA with a Gaussian kernel extends to infinity. The MI values in these cases of mutually dependent variables are clearly distinguishable from the MI values of independent variables. We also considered the dependence of the results of this method on the length of the sequence. IIn Fig 6 a slightly increasing overestimation of MI is seen with decreasing length. Nevertheless, there is good agreement for sequences of more than 400 pairs.

In real situations we frequently have only one sequence of (X; Y ) pairs. We have proposed a method for establishing a significance value by generating 100 sequences of independent variables with probability distributions given by the estimated marginal distribution for the discrete variable, and by a Gaussian distribution for the continuous variable with the same mean value and variance as the marginal distribution of the original sequence. We have considered sequences generated from three distributions. In all three cases MI establishes a clear diference between dependent and independent sets, as shown in Table 1.

It has been shown that the Jensen Shannon Divergence (JSD) is equivalent to MI [2]. Therefore, the calculation method developed here will also be suitable for computing JSD between two continuous range data sets, and in this format the JSD may be applied to the sequence segmentation problem as proposed in section II-iii. In this section we suggested a method based on a xed-length sliding window. We considered the segmentation of articially generated sequences in section III-ii. The JSD average at each position in the sequences exhibits a maximum at the segmentation point, as shown in Fig. 7. As we continue this work we will address the problem of comparing and analyzing electrophysiological signals. The segmentation method may also be of interest in detecting borders in images. Work along these lines will be published elsewhere.

Figure 9: Border detection in SAR images. The segmentation method was applied to detection of the border between homogeneous regions in a SAR image. The image was analyzed line by line and the segmentation point at each line detected. The segmentation points are coincident with the border. 

Acknowledgements

We wish to acknowledge partial support from SCyT - UTN through grant UTI4811 and from SeCyT - UNC through grant 30720150100199CB.

References

1 T Cover, J Thomas, Elements of Information Theory, J. Wiley, New York (2006). [ Links ]

2 I Grosse, P Bernaola-Galvan, P Carpena, R Roman-Roldan, J Oliver, H. E. Stanley, Analysis of symbolic sequences using the Jensen- Shannon divergence, Phys. Rev. E, 65, 041905 (2002). [ Links ]

3 M A Re, R K Azad, Generalization of entropy based divergence measures for symbolic sequence analysis, PLoS ONE 9, e93532 (2014). [ Links ]

4 B W Silverman, Density estimation for statistics and data analysis, Chapman and Hall, London (1986). [ Links ]

5 R Steuer, J Kurths, C O Daub, J Weise, J Selbig, The mutual information: Detecting and evaluating dependencies between variables, Bioinformatics 18, S231 (2002). [ Links ]

6 B C Ross, Mutual Information between discrete and continuous data sets, PLoS ONE 9, e87357 (2014). [ Links ]

7 A Kraskov, H Stogbauer, P Grassberger, Esti- mating mutual information, Phys. Rev. E 69, 066138 (2004). [ Links ]

8 W Gao, S Kannan, S Oh, P Viswanath Estimating mutual information for discretecontinuous mixtures, 31st Conference on neural information processing systems (NIPS), 5986 (2017). [ Links ]

9 A Moreira, P Prats-Iraola, M Younis, G Krieger, I Hajnsek, K P Papathanassiou, A tutorial on Synthetic Aperture Radar, IEEE Geosci. Remote S. Magazine 1, 6 (2013). [ Links ]

10 Y Liu, J Hallett, On size distributions of cloud droplets growing by condensation: a new conceptual model, J. Atmos. Sci. 55, 527 (1998). [ Links ]

11 Y Liu, P H Daum, J Hallett, A generalized systems theory for the e ect of varying uctuations on cloud droplet size distributions J. Atmos. Sci. 59, 2279 (2002). [ Links ]

12 M E Pereyra, P W Lamberti, O A Rosso, Wavelet Jensen-Shannon divergence as a tool for studying the dynamics of frequency band components in EEG epileptic seizures, Phys. A 379, 122 (2007). [ Links ]

13 D M Mateos, L E Riveaud, P W Lamberti, Detecting dynamical changes in time series by using Jensen Shannon divergence, Chaos 27, 083118 (2017). [ Links ]

14 S J Sheather, Density estimation Stat. Sci. 19, 588 (2004). [ Links ]

15 A Papoulis, Probability, random variables and stochastic processes, McGraw-Hill, New York (1991). [ Links ]

16 J Burbea, C R Rao, On the convexity of some divergence measures based on entropy functions, IEEE T. Inform. Theory 28, 489 (1982). [ Links ]

17 J Lin, Divergence measures based on the Shannon entropy, IEEE T. Inform. Theory 37, 145 (1991). [ Links ]

18 S Kullback, R A Leibler, On information and suciency, Ann. Math. Stat. 22, 79 (1951). [ Links ]

Received: September 23, 2020; Accepted: January 11, 2021

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License