Conceptual Overview of Biological Age Estimation

Chronological age is an imperfect measure of the aging process, which is affected by a wide range of genetic and environmental exposures. Biological age estimates may be derived using mathematical modelling with biomarkers set as predictors and chronological age as the output. The difference between biological and chronological age is denoted the “age gap” and considered a complementary indicator of aging. The utility of the “age gap” metric is assessed through examination of its associations with exposures of interest and the demonstration of additional information provided by this metric over chronological age alone. This paper reviews the key concepts of biological age estimation, the age gap metric, and approaches to assessment of model performance in this context. We further discuss specific challenges for the field, in particular the limited generalisability of effect sizes across studies owing to dependency of the age gap metric on pre-processing and model building methods. The discussion will be centred on brain age estimation, but the concepts are transferable to all biological age estimation.


Introduction
Global population aging has increased the burden from non-communicable diseases of older age. Promotion of healthy aging is a public health priority. Increased age is associated with loss of function across organ systems and increased propensity to disease [1] . However, there is variation in age-related loss of function amongst individuals of the same age. Thus, chronological age is not always a perfect measure of biological age, which is influenced by a wide array of genetic and environmental exposures throughout the lifecourse [2].
Over the last 50 years many researchers have attempted to estimate biological age as an alternate measure of the aging process. In general, the premise of these works is the development of a mathematical model for estimation of biological age using biomarkers as model predictors and chronological age as the model output. The discrepancy between the model estimated biological age and the observed chronological age is denoted as an "age gap" metric, which gives more information about the aging process over and above chronological age alone [3]. Given that aging processes may be differential across organ systems, many researchers advocate organ-specific biological age estimation.
The emergence of biomedical research databases, such as the UK Biobank [4], with availability of highly detailed phenotypic characterisation of very large samples has provided unique opportunities to develop biological age estimation models. In particular, the vast expanse of neuroimaging data has permitted development of biological brain age estimation models using imagederived phenotypes as predictor biomarkers [5]. Existing work has demonstrated the utility of brain age gap derived from these models through demonstration of associations with key environmental and genetic exposures [6]. Whilst biological age estimation is an old endeavour, it has found new applications in the field of brain age estimation [3]. There is growing interest in application of these concepts to understanding aging process of other organ systems [7].
The development and interpretation of biological age estimation models differs somewhat from that of conventional models. In this paper we review the concept of the age gap, assessment of model performance in the context of age estimation, sources of variation in biological age estimates, and limitations and future work in the field. The discussion will be centred on brain age estimation, but the concepts are broadly transferable to any biological organ age estimation.

Brain age gap
Brain age estimates provide a prediction of biological brain age based on neuroimaging phenotypes. A variety of methods can be used to estimate age as a continuous variable, including multiple linear regression [8], principal component analysis [9][10][11], the Hochschild's method [12], the Klemera and Doubal's method [13,14], support vector regression [15], Xgboost regression [16] and Bayesian ridge regression [17]. Non-regression methods may also be used for estimation of brain age, such as deep learning and decision trees. In the context of biological brain age estimation, chronological age plays the role of the dependent variable, and brain phenotypes are the independent variables. These variables are fitted in a regression model as in the following equation 1 (for a single observation).
where yi is the dependent variable, b0 is the intercept, xi is a vector (i.e., brain imaging phenotypes), b is the coefficient value and εi is the error. Once brain age ˆyi has been estimated, the brain age gap is calculated by subtracting chronological age from predicted brain age, ˆyi − yi, this is the equivalent of the residual or the error from the model (though regression residuals are usually computed instead as yi − yˆi).
The next step is a bias correction. Linear regression methods that minimise the mean squared error produce residuals that are uncorrelated with the fitted values (predicted age), but the residuals are correlated with the response variable (chronological age). Since this results in systematic over-prediction of brain age for younger subjects and under-prediction of older subjects, a bias correction is applied. While such bias corrections slightly increase the mean squared error, they remove this systematic structure from the brain age gap (see [18] for details).
The "model predicted brain age" is the conditional population mean, i.e., the mean age prediction given the covariates x. A positive brain age gap may be interpreted as a brain that is "older" and is indicative of increased risk of cognitive impairment and brain diseases. Conversely, a negative brain age gap can be considered as representing healthy or delayed brain aging [19]. The associations of brain age gap with exposures of interest may be examined to evaluate variation explained by biological brain age that is not already explained by chronological age. Thus, such association studies should include chronological age as a covariate. It is important to note, that while the bias correction method aims to ensure the age gap metric is uncorrelated with chronological age, the two metrics may be exactly uncorrelated (e.g., if analysis is performed on a held-out sample or different subset) or have non-zero partial correlation given that they will be conditional on other covariates in the model [18]. The brain age gap extracted and interpreted in this way has been used as a phenotype of interest in phenomwide association studies (PheWAS) and genome wide association studies (GWAS) to investigate associations of demographic, environmental, and genetic exposures on brain health represented by the brain age gap [20]. These studies have contributed to establishing the biological validity of brain age gap through demonstration of associations with cognitive tests, and genetic and environmental exposures. Furthermore, examination of associations with brain age gap can provide insight into novel determinants of brain aging. Figure1 illustrates the general pipeline for brain age estimation and examination of age gap associations.
Thus, biological age estimated in this way can be used to assess aging for any organ (brain, heart, liver, etc.) and provides an easily understandable output which can be compared against chronological age and used to investigate the associations of a wide range of exposures with advanced aging. However, the biological age estimates provided vary by the input data used, modelling methods, and sample size, which limits comparability across studies. Furthermore, in cases where complex modelling methods are used, it can be difficult to interpret the contribution of model input variables to the estimated biological age. Finally, it is challenging to evaluate model performance in the context of biological age estimation using conventional methods. Figure 1. General illustration of the brain age estimation pipeline and GWAS and PheWAS with the derived BAG. The path starts by using either the raw brain MRI or IDPs as model predictors (input variables). The model output is chronological age. The model is developed on a training set and tested on a held-out sample. There is then a bias correction step followed by BAG calculation. BAG is calculated as the model predicted age -chronological age. The BAG can then be used as phenotype of interest (indicator of aging) in PheWAS and GWAS to examine the association of wide range of lifestyle exposures and genetic on BAG. MRI: Magnetic resonance imaging; IDPs: image derived phenotypes; BAG: brain age gap; GWAS: Genome-wide association study; PheWAS: Phenome-wide association study.

Optimal age estimation
Assessment of model performance in the context of biological age estimation requires special considerations. Evaluation of model performance quantifies the accuracy with which the model estimates the value of interest. Mean Absolute Error (MAE) is one of the most commonly used measures of model performance. MAE represents the absolute difference between the predicted value and the observed value averaged over the n subjects in the sample [21]. Hence, MAE quantifies the overall error in the model; it can be calculated as in equation 2 where yi is the actual value, ˆyi is the predicted value and n is the number of subjects to estimate their dependent variables.
In conventional model building, we regard smaller MAE values as indicative of better model performance, in that the model predicted values are close to the observed values. In the context of biological age estimation there is, by construct, no ground truth and the model with the smallest error is not necessarily the one producing the optimal age gap estimate (ϵ) [5].
Fundamentally, it is important to note that conventional approaches to evaluation of model performance are not always useful for assessment of biological age estimation models. The purpose of biological age estimation is to construct a measure of the aging process that would provide incremental information over that captured by chronological age. Perfect age prediction would in fact produce no useful "age gap", in what is known as the "biomarker paradox" [14]. As such, the most useful assessment of model performance lies not in examination of metrics such as the MAE but in the strength and consistency of associations between the extracted age gap metric and key exposures of interest.

Variation in biological age estimates
Many studies have estimated biological brain age and performed PheWAS and GWAS analyses with brain age gap to uncover the effects of genetics variations and environmental exposures on brain aging [22,20]. In these studies, brain age gap was significantly associated with many exposures that advance brain aging such as diabetes, alcohol, and smoking in a biologically consistent direction. In addition, different sets of SNPs in genes with known functional importance in brain diseases have been found to associate with brain age gap [6,20]. The results are clear indications for validity of the method and potential of clinical translation.
Although multiple studies highlight biologically consistent associations with their brain age gap metrics, there is significant variation in the magnitude of estimates reported. For instance, the effect of diabetes on brain aging has been reported by several studies with each reporting different effect sizes. Salih, et. al [22] reported that diabetes advances brain aging by 6 months while James et al. [19] reported that diabetes advances brain age gap by 2 years. Similar differences in effect size estimates were observed for associations with smoking and alcohol consumption. Baecker et al. report increase of 3.4 years and 4.1 years in brain age gap with smoking and greater alcohol consumption, respectively [3]; while Cole et al. report smaller magnitude of associations in their analysis of brain age gap with the same exposures [19].
These differing effect sizes highlight the highly specific interpretation required for brain age gap measures produced by each analysis. There are multiple sources of variation in biological age estimation at all levels of model development and in evaluation of age gap associations. These include characteristics of the population on which the model was built, the predictor IDPs and factors related to their acquisition and post-processing, and mathematical modelling methods deployed. Furthermore, there are likely significant variations in ascertainment and definition of exposure variables which further impact the observed associations with the age gap metric. These observations highlight the importance of comprehensive and transparent reporting of all steps in development and assessment of age estimation models. Ultimately standardisation of approaches to biological age estimation will be required to permit cross comparability of relationships.

Limitations and future work
A key limitation of biological estimation methods is the lack availability of adequate datasets for model development. This is more important in cases where study of a specific cohort is required e.g., young children [23] or individuals with an uncommon illness [24] . In these settings obtaining adequate training data can be extremely challenging. Firstly, a reliably verified disease sample which adequately encompasses the full spectrum of the condition of interest is required. Second, the expected atypicality of the disease cohort means that a simple agematched comparator cohort is likely to be insufficient with need for attention to other sample characteristics. More generally, large extensively phenotyped datasets are available almost exclusively in the context of dedicated biomedical databases such as the UK Biobank [4]. Such datasets provide a valuable platform for development of biological age estimation models as well as provide opportunity to examine patterns and determinants of aging with potential for novel insights into aging pathways. Such analyses may advance our understanding of the biology of aging and highlight exposures that may be tackled at a population level to promote healthy aging. A growing body of literature addresses these questions with regards biological brain aging. More recently, similar approaches have been taken to estimation of biological age across other organs, such as the heart [7]. Further work is required to develop optimal biological age estimation models that may be utilised to better understand the aging process within and across organ systems. A general limitation of such work is that broad generalisability of these models to other settings may be limited as extensive phenotyping in research datasets is unlikely to be widely available. A question that has not been adequately addressed is the potential clinical utility of biological age estimation, which hinges on the predicated age estimates providing information that is incremental to chronological age in terms of disease discrimination and event prediction. Increased availability of large datasets with detailed clinical phenotyping and longitudinal outcome tracking will permit evaluation of these key questions.

Conclusion
Chronological age does not always fully capture the biological age of an individual. Biological age estimation is a method of deriving an alternate measure of the aging process with the aim of providing information that is complementary to chronological age. In brain age estimation neuroimaging biomarkers are used as model predictors to predict chronological age as outcome. The discrepancy between model predicted age and chronological age is calculated to derive a brain age gap. The influence of genetic and environmental exposures on brain aging may be studied through examination of their associations with brain age gap. Optimal model performance in the context of biological age estimation is best assessed through consideration of the consistency and strength of exposure associations with the derived age gap metric. There is substantial variation in brain age estimation models due to differences in samples, predictors, and modelling methods. As such, effect sizes may not be comparable across different studies. These observations underscore the importance of exact reporting of methods and phenotypes in biological age estimation work. The concepts of brain age estimation may be applied to derive biological age estimation for other organs. Further work is required to explore the applications of this method for examining mechanistic pathways of the aging process within and across organ systems. Existing work demonstrates potential for clinical and research utility of biological age estimation. Further work is needed to establish the additional value of biological age estimates over chronological age for prediction of clinical events.