^{1}C.D. Technologies Ltd., Israel

^{2}Department of Science, Technology and Society, Bar Ilan University, Ramat Gan, Israel

The heterogeneity of parameters is a ubiquitous biological phenomenon, with critical implications for biological systems functioning in normal and diseased states. We developed a method to estimate the level of objects set heterogeneity with reference to particular parameters and applied it to type II diabetes and heart disease, as examples of age-related systemic dysfunctions. The Friedman test was used to establish the existence of heterogeneity. The Newman-Keuls multiple comparison method was used to determine clusters. The normalized Shannon entropy was used to provide the quantitative evaluation of heterogeneity. There was obtained an estimate for the heterogeneity of the diagnostic parameters in healthy subjects, as well as in heart disease and type II diabetes patients, which was strongly related to their age. With aging, as with the diseases, the level of heterogeneity (entropy) was reduced, indicating a formal analogy between these phenomena. The similarity of the patterns in aging and disease suggested a kind of “early aging” of the diseased subjects, or alternatively a “disease-like” aging process, with reference to these particular parameters. The proposed method and its validation on the chronic age-related disease samples may support a way toward a formal mathematical relation between aging and chronic diseases and a formal definition of aging and disease, as determined by particular heterogeneity (entropy) changes.

An estimate of heterogeneity (contrariness or dissimilarity) of parameters is important for understanding any biological system, and for research of chronic age-related diseases in particular [1–6]. For example, high parameter heterogeneity may completely obscure and confound diagnostic results, while low heterogeneity may reflect a selection bias. Yet, a major difficulty in estimating heterogeneity is that clinical parameters have non-Gaussian distribution. Hence, we propose using the Friedman test to determine the existence or lack of parameters’ heterogeneity. In order to separate between clusters of objects described by specific clinical parameters, we use the Newman-Keuls method for multiple comparisons. This method has been successfully used for the analysis of biomedical and financial data [7–11]. In order to estimate the clustering, i.e. to estimate the degree of partition of the sample into dissimilar groups (the level of heterogeneity), we use the normalized Shannon entropy. This is a dimensionless measure which can permit the concurrent comparison of heterogeneity for a wide variety of different parameters, and in a variety of chronic age-related diseases. Some important applications of the normalized Shannon entropy in medical research have been made earlier [12,13]. Using a similar dimensionless value, i.e. the normalized mutual information, interesting results have been obtained in medicine [14], and in particular in oncology [15–17]. The approach proposed in [15], is presented in the monograph [18].

The methods for estimating the heterogeneity, assuming a Gaussian distribution of input parameters, are shown in [19–22]. The present article presents an algorithm for the estimation of heterogeneity, which has no restrictions on the distribution of input parameters and permits to compare the heterogeneity of various objects described by specific parameters. In the present work, this algorithm was applied to investigate the heterogeneity of functional, metabolic and biometric characteristics of healthy subjects and diabetes and heart disease patients as examples of chronic age-related diseases. Thereby an attempt was made to relate parameter heterogeneity with disease status and age. Such an estimation is needed insofar as not just particular, averaged and static parameter values, but the degree of their heterogeneity vary depending on the disease status and the aging process. The estimation of the parameter heterogeneity, essentially reflecting the degree of the system entropy and variability, may thus shed a new light on the underlying processes of aging and disease, providing a unified formal framework for their description. The application of this algorithm may consequently help a diagnosing physician to better evaluate the complexity of the system under examination and hence to better appreciate the complexity of diagnostic results and their relation to age.

The proposed method for heterogeneity assessment was validated on datasets for two age-related non-communicable diseases: diabetes and heart disease.

For diabetes, the Pima Indians Diabetes Database of the Johns Hopkins University was analyzed [23]. The representative sample used comprised 161 instances, including healthy subjects and diabetes patients. All the subjects were women at least 21 years old of Pima Indian origin, from Arizona, US. All the cases of diabetes that were included in the database were Type 2 diabetes [24]. The data set included 8 parameters: 1. Number of times pregnant; 2. Plasma glucose concentration at 2 hours in an oral glucose tolerance test; 3. Diastolic blood pressure (mm Hg); 4. Triceps skin fold thickness (mm); 5. 2-hour serum insulin (mu U/ml); 6. Body mass index (weight in kg/(height in m)^2); 7. Diabetes pedigree function; 8. Age (years). The current model included the parameters: age, plasma glucose concentration, body mass index, 2 hour serum insulin, and diastolic blood pressure.

For heart disease assessment, we used the Heart Disease Data Set of the V.A. Medical Center, Long Beach and Cleveland Clinic Foundation [25]. The representative sample used comprised 85 instances, including healthy subjects and heart disease patients, aged 34–77. The data set included the following 14 attributes: 1. Age (years); 2. Sex; 3. Chest pain type; 4. Resting blood pressure (in mm Hg on admission to the hospital); 5. Serum cholesterol in mg/dl; 6. Fasting blood sugar (discretized above and below 120 mg/dl); 7. Resting electrocardiographic results (discretized); 8. Maximum heart rate achieved; 9. Exercise induced angina; 10. ST depression induced by exercise relative to rest; 11. The slope of the peak exercise ST segment; 12. Number of major vessels colored by flouroscopy (discretized); 13. Thalium heart scan (normal, fixed defect, reversible defect); 14. The predicted attribute - diagnosis of heart disease (angiographic disease status) with value 0 for diameter narrowing < 50% and value 1 for diameter narrowing > 50%, in any major vessel. The current model included the continuous parameters: age, resting blood pressure, serum cholesterol, and maximum heart rate achieved.

Let us consider that the initial data on* n** m* ×* n* array [* a _{ij}*], where each column

*j*

*m*

*n*

*a*].

_{ij}The algorithm consists of three procedures:

1) Determining the heterogeneity using the Friedman test;

2) Separating clusters using the Newman-Keuls multiple comparison method;

3) Estimating the heterogeneity of the set using the normalized Shannon entropy.

For each row of the array [* a _{ij}*] we rank its elements and assign rank 1 to the smallest element of the row. We receive the array

*m*×

*n*of ranks [

*r*], where each row of the array contains ranks from 1 до

_{ij}*n*. We shall apply the Friedman test to the array [

*r*] [26]. If the Friedman test demonstrates the existence of the objects’ heterogeneity, we proceed to the second procedure.

_{ij}We shall further present each object* j* by a sum of ranks of its corresponding column* j* of the array [* r _{ij}*]. We compare the objects, the sums of ranks, by the Newman-Keuls method [7] and obtain the clustering of the set of objects.

Let* X*_{1},* X*_{2},...,* X _{k}* 1≤ k ≤ n be the clustering of the set of objects and |

*X*| the number of elements in the set

_{l}*X*. We shall estimate the heterogeneity of the set of objects (the heterogeneity of parameters describing these objects) using the normalized Shannon entropy [27,28]:

_{l}Properties of the normalized Shannon entropy:

1) 0≤S≤1;

2)

3)

If the Friedman test does not reject the hypothesis of the absence of row (column) effects, i.e. shows the absence of heterogeneity, then we assume S=0.

We demonstrated the applicability of the proposed method for heterogeneity assessment in two age-related diseases: diabetes and heart disease, showing a common pattern of entropy change in aging and different aging-related diseases.

First, to illustrate the proposed approach, we use partial data from the diabetes data set concerning 10 healthy individuals aged 21–25 years.Table 1 shows the initial results for particular parameter values for the 10 subjects, namely: plasma glucose concentration, 2 hour serum insulin, diastolic blood pressure, body mass index. Usually, high values of these parameters are considered as risk factors for diabetes.

Next, we perform consecutively the three procedures of the heterogeneity estimation algorithm:

* 1. We rank the rows inTable 1* and receive the array [* r _{ij}*]. The results are shown inTable 2.

The last row ofTable 2 is the sum of ranks of the corresponding columns. Thus, Subject 10 had some of the smallest parameter values (the “best” values or the least associated with diabetes risk) – the plasma glucose concentration was 71 (the smallest value of this parameter in this set, giving it rank 1), 2 hour serum insulin was 76 (rank 4 for insulin), diastolic blood pressure was 48 (rank 1) and the body mass index was 20.4 (rank 3). The overall sum of ranks for this patient was 9 (the smallest sum of ranks for this set, indicating some of the best values in terms of diabetes risk). In contrast, Subject 1 had some of the largest (“worst”) parameter values: plasma glucose concentration was 129 (rank 8 for glucose), 2 hour serum insulin was 270 (the largest value for insulin, giving it rank 10), diastolic blood pressure was 86 (rank 10), and the body mass index was 35.1 (rank 9). The total sum of ranks for this patient was 37 (the highest sum of ranks, indicating some of the worst overall risk factor values).

Let us considerTable 2 as the Friedman statistical model [26], and examine the column effect of this table.

Hypotheses:

H_{0}: There is no column effect (“null hypothesis”).

H_{1}: The null hypothesis is invalid.

* Critical range*. The sample is “large”, therefore the critical range is the upper 5%-range of * χ*^{2} criterion, then for the large groups, for the Friedman test assessment, the* χ*^{2} is used.

Let us calculate the* χ*^{2}-criterion [26]. We obtain the value* χ*^{2}= 18.34. The critical range is

2.* For multiple comparisons, we use the Newman-Keuls test* [7].

For α* _{T}* =0.05 (α

*is the probability at least once to erroneously identify differences) we obtain the critical range for the comparison interval 2 equal 3.92 and |*

_{T}*R*−

_{j}*R*

_{j}_{+1}|>3.92 where

*R*and

_{j}*R*are the elements of the column “Sum of ranks” in the

_{j+1}*j*-th and

*(j+1)*-th rows ofTable 3 respectively (in other words,

*R*and

_{j}*R*are the subjects represented by sums of ranks of the corresponding parameter values).

_{j+1}“By the multiple comparisons, we construct the clustering shown inTable 3. The obtained clustering possesses the following properties: a) for two neighboring clusters ofTable 3, the smallest element of one cluster and the greatest element of another cluster located nearby are significantly different (α* _{T}* =0.05); b) elements belonging to the same cluster do not differ from each other (α

*=0.05).”*

_{T}Thus, in the sample of 10 subjects under consideration, 3 clusters are established: 1 subject (Subject 1) is found in the first cluster (with the top sum of ranks for the risk factors), the second cluster (with the intermediate sum of ranks for the risk factors) contains 6 subjects, and the third cluster (with the bottom sum of ranks) contains 3 subjects.

3. According toTable 3, the numbers of elements in each cluster are:

Now we can calculate the normalized Shannon entropy (S) that provides the exact measure of heterogeneity for this set:

Next we consider the entire set consisting of 53 healthy women, from the diabetes dataset, aged 21–25, where each woman is represented by 4 parameters, as described in the Materials and Methods section. After performing the procedures of the algorithm, we obtain the following clustering (seeTable 4): The set, consisting of 53 elements (women) was partitioned into 14 clusters. The largest number of elements in a single cluster was 20, and in 6 clusters there was only 1 element in each cluster. The normalized Shannon entropy of the clustering equals 0.527, which demonstrates high heterogeneity.

Next we analyze the slightly older group of 22 healthy women, aged 26–29. The results are shown inTable 5. Four clusters were found, where the largest number of elements in a single cluster was 9, and 5 clusters contained 1 element each. The normalized Shannon entropy of the clustering equals 0.592, indicating high heterogeneity, about the same or just slightly higher than in the former group.

Then we consider the set consisting of 41 healthy women aged 30–39. After repeating the procedures of the algorithm, we obtain the following clustering (seeTable 6): The set, consisting of 41 elements (women) was partitioned into 11 clusters. The largest numbers of elements in a single cluster was 9, while 2 clusters had only 1 element each. The normalized Shannon entropy of the clustering equals 0.583, which demonstrates high heterogeneity and is about the same as in the former group.

Thereafter the following sets were considered: 18 healthy subjects aged 40–49, 24 diabetes patients aged 21–25, 18 patients aged 26–29, 34 patients aged 30–39, and 26 patients aged 40–49 years. For these 5 sets, the Friedman tests showed lack of heterogeneity and S=0.

We further validated the proposed method on another major age-related disease: the heart disease. We considered four groups of subjects who were characterized by the parameters commonly used to diagnose heart disease: resting blood pressure, serum cholesterol, and maximum heart rate achieved. The four groups were: 17 healthy subjects aged 34–49, 21 healthy subjects aged 50–74; 24 heart patients aged 35–49; and 23 heart patients aged 50–77. After applying the algorithm to the set of healthy subjects aged less than 50 years, we obtained the set partition into 3 clusters, containing 11, 5 and 1 elements (seeTable 7). The entropy value was 0.285, showing a considerable degree of heterogeneity in the younger healthy individuals. For the three other sets – healthy individuals over 50, and heart disease patients of all ages, both under and over 50 – the Friedman test showed no heterogeneity, and the entropy value S=0.

Heterogeneity is an intrinsic phenomenon of any biological system or organism, either unicellular or multicellular. Innumerable causes can contribute to heterogeneity, as biological systems are diverse on every level of organization and in every imaginable parameter. Thus, even within a seemingly homogenous cell population, variability can be great, deriving from different cell stages, different dynamic reactions to exposures, different genetic expressions, different activities of sub-cellular elements, etc [15,29]. Within particular organs and tissues, there are different cell compositions, as well as differences of tissue supply by nutrients in different locations (the tissue center and periphery), and great many other factors. Stemming from the heterogeneity at the lower levels of organization, the heterogeneity of the higher levels, such as individuals in a population, is also high. Thus even within an apparently genetically homogenous population, there could be found great heterogeneity with reference to any physiological and biometric parameter.

The disregard of this heterogeneity, the lack of ability to appreciate its value, can lead to simplistic, biased and misleading results in diagnosis. Hence, the ability to estimate precisely the level of heterogeneity of biological and physiological parameters is of great practical importance for the diagnostic and clinic. Yet, often the discussion of heterogeneity is purely qualitative and impressionistic, without any exact quantitative estimate. Here we provide a new convenient method to estimate precisely the value of heterogeneity, based on personal biological and physiological evaluation, applicable to any set of biological and physiological parameters, at any level of organization. The application of such a method can assist biomedical researchers and diagnosing physicians in assessing the complexity and ambiguity of the model systems they have to confront. Its application in the study of aging and aging-related diseases can be of special significance, as it could provide a formal measure precisely and rigorously relating aging and disease in terms of system heterogeneity.

Here the normalized Shannon entropy was chosen as the measure of heterogeneity. As the results for the diabetes dataset show, there were found considerable differences between diseased and healthy subjects, as well as between older and younger subjects, with references to their entropy (heterogeneity) values. Thus, for the healthy women aged 21–25, the normalized Shannon entropy of the clustering was 0.527, for those aged 26–29 it was 0.592, and for those aged 30–39 it was 0.583, which indicated high levels of heterogeneity. In contrast, for the healthy subjects of older age (aged 40–49), and for all the diseased subjects, from the young age of 21 and onward, there was no heterogeneity shown (zero normalized Shannon entropy). Thus it can be seen that both aging and the diabetic disease, in this sample, are characterized by a lower degree of heterogeneity (entropy) with reference to the measured parameters. In other words, the similarity of the patterns in aging and disease suggests a kind of “early aging” of the diseased subjects, or alternatively a “disease-like” aging process, with reference to these particular parameters. The same pattern of similarity was reconfirmed with reference to the heart disease dataset. In the heart disease dataset, only healthy subjects younger than 50 years showed a considerable degree of heterogeneity (the entropy value S=0.285). On the other hand, healthy individuals of higher age (over 50), and diseased subjects of any age, showed lack of heterogeneity (zero entropy value). Thus, also in the case of heart disease, the formal analogy between the aging and disease process was suggested. Of course, a heterogeneity-based biometric that would utilize age as a predictor of disease, and conversely utilize disease as a biomarker of aging, will yet require a thorough investigation on a much larger sample and range of parameters.

One aspect that may need to be taken into account when developing a heterogeneity-based health biometric is the phenomenon of “heterochrony”, the lack of synchronicity of biological processes. As the results show, the difference between healthy subjects in terms of heterogeneity manifested at different ages: at the age of 40 for the diabetes dataset and 50 for the heart disease dataset. This may be explained by the phenomenon of “heterochrony” or different systems aging at different rates [30,31]. Hence it may be necessary always to delimit the system and set of parameters whose heterogeneity is established. Also, the difference may have arisen due to the specificity of the populations under consideration in the two datasets. Yet, the general trend toward decrease of heterogeneity for aging and disease was shown in both cases of diabetes and heart disease, in both datasets.

The notion of entropy has been used earlier to characterize physiological states generally, and aging and disease in particular. Yet, the physiological and biological interpretations of this term ranged widely. Thus, there has been quite the intuitive “belief that age changes are characterized by increasing entropy, which results in the random loss of molecular fidelity, and accumulates to slowly overwhelm maintenance systems” [32]. Yet, it should be emphasized that for a formal description of an aging and/or diseased system, a formal and calculable entropy value should be used, referring to specific parameters and sub-systems, rather than a general and intuitive concept. There have also been more physicalist interpretations. Thus, in the entropy analysis of the EEG Power Spectrum during sleep in younger and older individuals, the sample entropy of the signal was found to be higher in Stage 2 and REM sleep of the elderly, which was associated with an “alertness promoting” increase of brain wave energy frequencies [33]. In other systems, the interpretations may differ. Thus, in the analysis of body balance, via measurements of the center of pressure, higher entropy was suggested to indicate an increase of automatic (non-volitional) control (greater in young individuals with good balance) as well as an increase in chaotic excursions (greater in older individuals with bad balance). Also in the study of body balance, the interpretations of higher entropy after training included “improved stability”, “increased complexity”, “a more self-organized system”, and more [34]. In view of this diversity, it seems impossible to speak of “changes in entropy” with aging and disease generally, but it always seems necessary to delimit the system under discussion, the formal definition of entropy that is being used and the parameters for which the entropy values are calculated.

Yet despite this variety of interpretations, the dominant (perhaps even counterintuitive) physiological interpretation of entropy change has been that aging and disease are generally associated with a decrease of the system entropy, indicating a loss of the system complexity. Thus, the loss of complexity, shown by lower entropy, was suggested for a wide variety of morbid and age-related conditions, ranging from a more regular (predictable) heart rate in heart diseases, to the periodical (predictable) Parkinsonian tremors [35–37]. The loss of complexity (lower system entropy) has been generally associated with a greater regularity and fixation of the physiological system, in other words with a loss of system variability, and hence the loss of its ability to adapt to the changing environment. This loss of the adaptive capability can be seen as a common feature of both disease and aging. Moreover, the loss of complexity, shown by the lower system entropy, has been suggested as a potentially powerful dynamic biomarker of disease and aging and as a potential metrics to test therapeutic interventions (measuring the therapy’s ability to restore the entropy levels) [35]. It is necessary to note, however, that even accepting this interpretation, the loss of system complexity may not necessarily be a detrimental phenomenon, but a protective and adaptive one, diminishing the range of the aged and diseased organism’s operation and thus preventing its “overheating” (as apparently happens in repair mechanisms of sleep which involve a high degree of synchronicity [38]).

In the present work, the found patterns of heterogeneity in the diabetes and heart disease patients generally conformed with the prevalent findings of entropy loss in aging and diseases. Within the present populations, and with reference to the present parameters, the subjects may have become “more similar” to each other (less heterogeneous) as they aged and succumbed to disease, simply due to the loss of their adaptive function – generally diminishing the range of their reactions to confine it to a narrower framework. It is possible that the heterogeneity diminished due to an execution of some genetic program of development, or due to a protective mechanism to withstand adverse environmental circumstances (“finding safety in a confined space”). Whatever the explanation, in practical terms, the levels of heterogeneity may serve as an additional tool in the epidemiological and nosological arsenal, helping to analyze the incidence of diabetes, heart disease and other age-related diseases and conditions. Following further development, the use of information-theoretical measures, such as entropy, may become a generally applicable formal quantitative tool to describe senescent and diseased states as well as restoration of physiological function

We have no conflict of interest.

**Reference**