Maximal oxygen uptake is the gold standard for assessing aerobic capacity in children and adolescents. Its interpretation requires robust reference models. A pediatric Z-score model has recently been developed, but external validation in independent cohorts is essential to confirm its applicability.
MethodsWe conducted a multicenter validation study including a total of 1046 healthy participants (mean age, 12.0 ± 2.6 years), composed of 2 cohorts: a prospective Spanish validation cohort (n = 175) and a previously published reference cohort (n = 871). The Z-score model was compared with historical weight- and height-based linear equations in terms of correlation, prediction error, and classification of impaired aerobic fitness. Subgroup analyses were performed by sex and body mass index.
ResultsThe Z-score model showed the strongest correlation between predicted and observed maximal oxygen uptake (R2 = 0.70), outperforming weight-based (R2 = 0.64) and height-based (R2 = 0.66) equations. It yielded smaller prediction errors among body mass index groups and identified 14% of participants as having impaired aerobic fitness, compared with 20% and 23% using weight- and height-based reference models, respectively. The Z-score model showed improved consistency and reduced bias, particularly in overweight participants.
ConclusionsThe pediatric Z-score model demonstrated superior accuracy, consistency, and classification reliability compared with linear equations in a large, independent Spanish cohort. Its adoption may enhance the interpretation of pediatric cardiopulmonary exercise testing, support early identification of reduced aerobic fitness, and guide prevention strategies and tailored interventions.
Registered on ClinicalTrials.gov (NCT04876209).
Keywords
Abbreviations
Cardiopulmonary fitness, objectively assessed by cardiopulmonary exercise testing (CPET), has emerged as a critical biomarker in the follow-up of children and adolescents with chronic diseases. Impaired aerobic capacity, primarily reflected by a reduced maximal oxygen uptake (VO2max), contributes significantly to the long-term burden of pediatric conditions and has been associated with adverse cardiovascular outcomes.1–7 In adults, a VO2max reduction exceeding 3.5mL/kg/min (eg, 1 metabolic equivalent of task) is an independent predictor of all-cause mortality.8 Among children and adolescents, physical deconditioning frequently begins early in the disease trajectory and is compounded by sedentary behavior, ultimately impairing health-related quality of life.9
Accurate interpretation of pediatric CPET requires robust reference values that account for the complex and nonlinear growth patterns of children and adolescents. The widely used VO2max reference equations proposed by Cooper et al.10,11 in 1984 were derived from a small, homogeneous cohort of 109 healthy, normal-weight American children and are based on simple linear relationships with height or weight. However, linear models fail to adequately reflect the variability introduced by the wide range of anthropometric profiles observed in contemporary pediatric populations, particularly in underweight and overweight individuals.12
Recent research has demonstrated the superiority of nonlinear, multivariable Z-score models for constructing pediatric reference norms across a range of clinical measures.13–15 In this context, we recently developed and validated pediatric VO2max Z-score reference equations in a large, multicenter cohort of 1141 healthy children and adolescents, comprising a French development cohort and independent German and US validation cohorts.16,17 This study fulfilled most of the established criteria for high-quality CPET assessment, thereby reinforcing the methodological rigor and clinical relevance of the resulting model.18 The final equations employed logarithmic transformations of VO2max, height, and body mass index (BMI), and showed improved precision and applicability across the full anthropometric spectrum, including children and adolescents with extreme body weights.12
Despite their methodological robustness, the external validity of these reference values among culturally and demographically distinct populations remains to be established. Spain offers a relevant context for such validation, given the regional variability in pediatric physical activity levels, rising prevalence of sedentary behavior, and potential environmental influences, such as climate and heat exposure.19–22 Moreover, Spanish youth demonstrate lower adherence to international physical activity guidelines, with disparities related to socioeconomic status and sex.21
The aim of the present study was to validate the new pediatric VO2max Z-score equations in a healthy Spanish population and to compare their performance with the historical linear equations.
METHODSStudy design and populationThis multicenter study included 2 pediatric CPET cohorts: a Spanish validation cohort and a previously published reference cohort.16,17 The Spanish cohort constituted an external validation dataset, collected independently from the model development process.
The Spanish cohort was derived from a prospective cross-sectional study carried out between 2019 and 2024. Participants were enrolled from 3 pediatric CPET laboratories located in geographically and culturally distinct regions of Spain, all within tertiary academic hospitals: Hospital Universitari i Politècnic La Fe (Valencia), Hospital Universitario Donostia (Donostia/San Sebastián), and Hospital Virgen del Rocío (Seville). All CPETs were performed independently by local investigators using standardized clinical protocols.
The reference group was derived from a post-hoc analysis of the previously published development cohort, which served exclusively as a reference for the VO2max Z-score model.16,17 This reference population was originally generated by pooling healthy children and adolescents enrolled in previous pediatric CPET prospective controlled studies conducted between November 2010 and March 2020 in France. The resulting VO2max Z-score model was subsequently externally validated in independent cohorts from Germany and the United States prior to the present Spanish validation.16,17
The same eligibility criteria were applied in both groups. Children and adolescents aged <18 years referred for CPET were selected and consecutively enrolled in the study after informed consent, based on the following criteria: a) normal medical check-up, including physical examination, electrocardiogram, and echocardiography; and b) performance of a complete CPET with test quality rated by a standardized score (≥ 10 points) in accordance with American Thoracic Society/American College of Chest Physicians (ATS/ACCP) criteria.18 We excluded participants with: a) any chronic illness (cardiac, neurological, respiratory, muscular, or renal condition), or ongoing medical treatment, and those requiring any further specialized medical consultation after CPET; b) acute conditions or contraindications for CPET such as fever, uncontrolled asthma, respiratory failure, acute myocarditis or pericarditis, uncontrolled arrhythmia causing symptoms or hemodynamic compromise, uncontrolled heart failure, acute pulmonary embolism or pulmonary infarction, and individuals with mental impairment precluding cooperation; c) significant exercise-induced symptoms or anomalies during CPET requiring further evaluations. Participants with functional or innocent heart murmurs and no structural cardiac abnormalities were considered eligible.
Participants who did not reach a peak heart rate (HR) ≥ 173 bpm during CPET were excluded, regardless of the other test criteria, in order to ensure maximal effort and enhance the physiological homogeneity of the reference population.17 This HR threshold corresponds to the 5th percentile for peak HR across the pediatric age range, as established in the original Z-score development study. It was therefore used as a conservative cutoff to define adequate exertion.
CPET procedures and parametersPediatric CPET procedures were standardized among all participating centers. Tests were conducted in controlled pediatric environments equipped with an emergency cart containing a defibrillator, manual ventilation equipment, and essential cardiopulmonary resuscitation drugs. Each CPET was supervised by a qualified pediatric cardiologist assisted by an experienced specialist nurse.
All 3 Spanish centers used the same harmonized pediatric equipment, and CPET devices in the reference cohort were comparable, with technical details provided in the supplementary data.
Anthropometric data (height, weight, BMI) were collected before the exercise test.23 Overweight was defined as a BMI> 85th percentile, and obesity as a BMI > 95th percentile, in accordance with North American growth charts.24 Each exercise test was preceded by resting spirometry to assess the flow volume loop, including forced expiratory volume in 1 second (FEV1, L), forced vital capacity (FVC, L), and the FEV1/FVC ratio (%). This procedure was repeated at least 3 times to ensure reproducibility and was deemed valid if variability between trials remained <5%. Reference values were calculated using the Global Lung Function Initiative reference equations.15,25
A single cycle ergometer protocol was used across sites, similar to the protocol in the reference development study.16,17 This protocol ensured a total exercise duration of 8 to 12minutes and included: a 1-minute baseline, a 2-minute warm-up at 20W, followed by a progressive workload increase of 10, 15, or 20W/min based on the participant's height, and concluding with a 5-minute recovery. The target pedaling rate was maintained between 60 and 80rpm.
Throughout the test, HR, workload (W), oxygen saturation, oxygen uptake (VO2), CO2 elimination (VCO2), minute ventilation (VE), end-tidal CO2 pressure (PETCO2), and respiratory exchange ratio (RER) (VCO2/VO2) were measured breath-by-breath and continuously monitored (averaged over 10seconds), along with a 12-lead electrocardiogram. Blood pressure (systolic and diastolic) was measured every 2minutes. Peak exercise values were defined as the mean over the final 30seconds of exercise. Resting HR was recorded after 1minute of seated rest on the ergometer before the start of the test; maximal HR was defined as the highest value achieved during the test.
CPET was considered maximal if the participant had a peak HR ≥ 173 bpm and met at least 1 of the following criteria: a) volitional exhaustion with inability to sustain 60rpm despite encouragement; b) RERmax ≥ 1.0; c) plateau in VO2 despite increasing workload.17 VO2max was defined as the plateau in oxygen uptake despite increased workload and was expressed as a Z-score,16 absolute value (in mL/min and mL/kg/min), and percent-predicted value using classic height- and weight-based linear equations.10,11 If no plateau was observed, the VO2peak value was used, as is standard practice in pediatric testing.26,27 Impaired aerobic fitness was defined according to recent practices in pediatric exercise physiology, depending on the reference model: a VO2max Z-score < -1.64 using the nonlinear model, corresponding to the 5th percentile of the reference population, which is a commonly accepted clinical threshold for abnormality4,7,16; or VO2max <80% of the predicted value using the linear equations, a cutoff widely adopted in pediatric CPET interpretation to indicate clinically relevant functional limitation.10,11,28
The ventilatory anaerobic threshold, defined as the point at which VE increases disproportionately relative to VO2, was manually determined using the equivalents method, or, if not applicable, the V-slope method.29 The VE/VCO2 slope was measured from the beginning of incremental exercise to maximal exercise (or respiratory compensation point when present).30
Statistical analysesNumeric variables are expressed as mean± standard deviation (SD) or median [interquartile range, IQR], and discrete outcomes as absolute and relative (%) frequencies. Normality and heteroskedasticity of continuous data were assessed with the Shapiro-Wilk and Levene tests, respectively. Continuous outcomes were compared using the unpaired Student t-test, Welch t-test, or Mann-Whitney U test according to data distribution. Discrete outcomes were compared with the chi-square or Fisher exact test.
The following recently published equations based on natural logarithms of VO2max, height, and BMI were used to determine VO2max Z-scores in girls and boys, respectively:16
Differences between observed and predicted VO2max were assessed using the Wilcoxon signed-rank test, and correlations were evaluated using the Spearman coefficient. Model calibration was further assessed using Lin's concordance correlation coefficient and Bland-Altman analyses, evaluating mean bias and limits of agreement between observed and predicted VO2max values for each reference model. Agreement between reference models in the classification of impaired aerobic fitness was assessed using percent agreement and Cohen's kappa, which quantifies concordance beyond chance while preserving model-specific clinically accepted thresholds.
To explore potential center-related variability, model performance metrics were also computed separately for each participating center. These analyses included correlation coefficients (R2), prediction error, and classification rates for each VO2max reference model. Center-specific results are reported descriptively.
Alpha risk was set to 5% (α = 0.05). Statistical analyses were performed with EasyMedStat (version 3.38).
RESULTSPopulation characteristicsA total of 1046 participants (mean age, 12.0 ± 2.6 years) were included in the study: 175 in the Spanish cohort and 871 in the reference cohort. Both cohorts fulfilled at least 11 out of the 14 ATS/ACCP criteria for high-quality CPET assessment.18
Anthropometric characteristics were overall comparable between cohorts (table 1), particularly regarding sex distribution, BMI, and the proportion of overweight or obese participants (BMI ≥ 85th percentile). However, Spanish participants were slightly older (12.3 ± 2.0 vs 11.9 ± 2.7 years; P=.02), taller (155.3 ± 12.2 vs 151.1 ± 15.6cm; P < .01), and slightly lighter (47.4 ± 11.6 vs 47.7 ± 21.1kg; P=.02) than their counterparts in the reference cohort.
Comparison of main anthropometric data of the Spanish and reference cohorts
| Anthropometric data | Spanish cohort | Reference cohort | p |
|---|---|---|---|
| All | 175 | 871 | |
| Male | 103 (59) | 459 (53) | .16 |
| Age, y | 12.3± 2.0 | 11.9 ± 2.6 | .02* |
| Height, cm | 155.3 ± 12.2 | 151.1 ± 15.6 | < .01* |
| Weight, kg | 47.4 ± 11.6 | 47.7 ± 21.1 | .02* |
| BMI (kg/m2) | 19.4 ± 2.9 | 20.2 ± 6.1 | .14 |
| Overweight, BMI ≥ 85th percentile | 42 (24) | 221 (25) | .77 |
| Boys | 103 | 459 | |
| Age, y | 12.3 ± 1.9 | 11.7 ± 2.6 | .01* |
| Height, cm | 156.8 ± 12.8 | 151.2 ± 16.6 | < .01* |
| Weight, kg | 48.2 ± 11.5 | 47.5 ± 22.5 | < .01* |
| BMI (kg/m2) | 19.4 ± 2.7 | 20.0 ± 6.2 | .04* |
| Overweight, BMI ≥ 85th percentile | 26 (25) | 113 (25) | .99 |
| Girls | 72 | 412 | |
| Age, y | 12.3 ± 2.1 | 12.1 ± 2.7 | .38 |
| Height, cm | 153.2 ± 11.1 | 151.0 ± 14.3 | .37 |
| Weight, kg | 46.2 ± 11.7 | 47.9 ± 19.5 | .58 |
| BMI (kg/m2) | 19.4 ± 3.2 | 20.5 ± 6.1 | .86 |
| Overweight, BMI ≥ 85th percentile | 16 (22) | 108 (26) | .57 |
BMI, body mass index.
The data are expressed as mean±standard deviation or No. (%).
Regarding CPET outcomes (table 2), absolute VO2max was similar between the Spanish and reference cohorts (1781 ± 482 vs 1793 ± 553mL/min; P=.95). However, weight-adjusted VO2max was significantly lower in the Spanish cohort (38.1 ± 7.0 vs 40.2 ± 9.4mL/kg/min; P <.01), with a reduced mean Z-score (–0.45±0.99 vs 0.02±0.99; P <.01) and a higher proportion of participants classified as having impaired aerobic capacity (Z-score < –1.64: 14% vs 5%; P <.01). Maximal HR was comparable between groups, whereas RERmax was slightly lower in the Spanish cohort (1.10 ± 0.07 vs 1.15 ± 0.10; P <.01).
Cardiopulmonary fitness in healthy Spanish children and adolescents vs the reference cohort
| CPET data | Spanish cohort | Reference cohort | p |
|---|---|---|---|
| Overall | 175 | 871 | |
| VO2max | |||
| mL/min | 1781 ± 482 | 1793 ± 553 | .95 |
| mL/kg/min | 38.1 ± 7.0 | 40.2 ± 9.4 | < .01* |
| Z-score | –0.45 ± 0.99 | 0.02 ± 0.99 | < .01* |
| Z-score < −1.64 | 24 (14) | 43 (5) | < .01* |
| HRmax, bpm | 188 ± 10.3 | 189 ± 8.2 | .28 |
| RERmax | 1.10 ± 0.07 | 1.15 ± 0.10 | < .01* |
| VAT | |||
| mL/min | 1422 ± 377 | 1253 ± 384 | < .01* |
| mL/kg/min | 30.4 ± 6.0 | 28.2 ± 7.2 | < .01* |
| Z-score | 0.47 ± 0.92 | 0.01 ± 0.99 | < .01* |
| VE/VCO2 slope | |||
| Absolute value | 28.5 ± 3.9 | 30.4 ± 4.4 | < .01* |
| Z-score | –0.44 ± 0.92 | −0.07 ± 1.01 | < .01* |
| Boys | 103 | 459 | |
| VO2max | |||
| mL/min | 1955 ± 503 | 1931 ± 612 | .34 |
| mL/kg/min | 41.0 ± 6.8 | 43.6 ± 9.3 | < .01* |
| Z-score | –0.43 ± 1.08 | 0.01 ± 1.00 | < .01* |
| Z-score < −1.64 | 16 (16) | 22 (5) | < .01* |
| HRmax, bpm | 188 ± 7 | 189 ± 8 | .30 |
| RERmax | 1.09 ± 0.06 | 1.14 ± 0.10 | < .01* |
| VAT | |||
| mL/min | 1535 ± 412 | 1349 ± 417 | < .01* |
| mL/kg/min | 32.3 ± 5.9 | 30.5 ± 7.2 | .05 |
| Z-score | 0.40 ± 0.99 | 0.01 ± 1.01 | < .01* |
| VE/VCO2 slope | |||
| Absolute value | 28.4 ± 3.7 | 30.2 ± 4.2 | < .01* |
| Z-score | –0.43 ± 0.91 | −0.09 ± 1.01 | < .01* |
| Girls | 72 | 412 | |
| VO2max | |||
| mL/min | 1533 ± 316 | 1639 ± 429 | .07 |
| mL/kg/min | 33.9 ± 5.0 | 36.4 ± 7.8 | < .01* |
| Z-score | –0.48 ± 0.85 | 0.03 ± 0.99 | < .01* |
| Z-score < −1.64 | 8 (11) | 21 (5) | .06 |
| HRmax, bpm | 189 ± 13 | 189 ± 8 | .68 |
| RERmax | 1.12 ± 0.08 | 1.17 ± 0.11 | < .01* |
| VAT | |||
| mL/min | 1276 ± 266 | 1146 ± 312 | < .01* |
| mL/kg/min | 27.9 ± 5.2 | 25.5 ± 6.3 | < .01* |
| Z-score | 0.56 ± 0.83 | 0.02 ± 0.98 | < .01* |
| VE/VCO2 slope | |||
| Absolute value | 28.8 ± 4.1 | 30.6 ± 4.5 | < .01* |
| Z-score | –0.45 ± 0.93 | −0.06 ± 1.01 | < .01* |
HRmax, maximal heart rate; bpm, beats per minute; RERmax, maximal respiratory exchange ratio; VAT, ventilatory anaerobic threshold; VE/VCO2 slope, ventilatory efficiency.
The data are expressed as mean±standard deviation or No. (%).
Ventilatory anaerobic threshold was significantly higher in the Spanish group regardless of expression (absolute value: 1422 ± 377 vs 1253 ± 384mL/min; weight-adjusted: 30.4 ± 6.0 vs 28.2 ± 7.2mL/kg/min; Z-score: 0.47 ± 0.92 vs 0.01 ± 0.99; all P <.01). The VE/VCO2 slope was significantly lower both in absolute value (28.5±3.9 vs 30.4±4.4; P <.01) and in Z-score (-0.44 ± 0.92 vs –0.07 ± 1.01; P <.01), suggesting a more efficient ventilatory response during exercise.
These patterns were consistent between the sexes. In boys, absolute VO2max was similar between cohorts, but weight-adjusted VO2max and Z-scores were lower in the Spanish group, with a higher prevalence of impaired aerobic capacity; ventilatory anaerobic threshold was higher in all expressions. In girls, absolute VO2max tended to be lower without reaching statistical significance (P=.07), whereas weight-adjusted VO2max and Z-scores were lower, and ventilatory anaerobic threshold in all expressions was significantly higher in the Spanish cohort. Differences in VE/VCO2 slope mirrored those observed in the overall sample.
Clinical characteristics and CPET-derived variables stratified by participating Spanish centers are provided in table S1.
Z-score model validation and comparison with linear equations in the Spanish cohortIn the Spanish cohort, the Z-score model showed the highest correlation between observed and predicted VO2max (R2 = 0.70), compared with the weight-based (R2 = 0.64) and height-based (R2 = 0.66) linear reference models (table 3, figure 1). Lin's concordance correlation coefficient was 0.80 [0.75-0.85], 0.73 [0.66-0.79], and 0.73 [0.66-0.79] for observed vs predicted VO2max values using the VO2max Z-score model, height-based linear equation, and weight-based linear equation, respectively. Bland-Altman analyses confirmed limited mean bias across the range of VO2max values for the Z-score model, with wider limits of agreement observed for the linear height- and weight-based equations (figure S1).
Comparisons between observed and predicted VO2max values in the Spanish cohort according to the different reference models
| Reference model | Variables | Overall | Boys | Girls |
|---|---|---|---|---|
| Z-score model | Mean VO2max Z-score | –0.45 ± 0.99 | –0.43 ± 1.08 | –0.48 ± 0.85 |
| Impaired VO2maxa | 24 (14) | 16 (15) | 8 (11) | |
| Median VO2max difference, mL/min | –83 [–252 to 71] | –83 [–270 to 120] | –84 [–228 to 6] | |
| R2 | 0.70 | 0.64 | 0.62 | |
| Weight-based linear equations | Mean VO2max % predicts | 92.1 ± 14.8 | 89.1 ± 15.7 | 96.3 ± 12.3 |
| Impaired VO2maxb | 35 (20) | 29 (28) | 6 (8) | |
| Median VO2max difference, mL/min | –110 [–344 to 28] | –139 [–486 to 3] | –57 [–183 to 99] | |
| R2 | 0.64 | 0.54 | 0.62 | |
| Height-based linear equations | Mean VO2max % predicts | 90.2 ± 14.5 | 86.5 ± 14.1 | 95.5 ± 13.5 |
| Impaired VO2maxb | 39 (22) | 30 (29) | 9 (13) | |
| Median VO2max difference, mL/min | –201 [–385 to 7] | –296 [–555 to –112] | –64 [–248 to 40] | |
| R2 | 0.66 | 0.62 | 0.51 |
VO2max, maximal oxygen uptake.
The data are expressed as mean±standard deviation, median [interquartile range] or No. (%).
Agreement between observed and predicted VO2max in 3 reference models, stratified by sex and BMI in Spanish children and adolescents. Scatter plots comparing observed and predicted VO2max values (mL/min) in the Spanish pediatric cohort by using the Z-score model, height-based linear equation, and weight-based linear equation. Each row corresponds to a reference model, and each column shows the results for the entire cohort, boys only, and girls only. Dots are color-coded by BMI category: blue for underweight, green for normal weight, and red for overweight participants. Linear regression lines (in black) are shown for the total sample. BMI, body mass index.
The linear weight-based model yielded a mean predicted VO2max of 92.1% ± 14.8% of expected, with 20% of participants classified as having impaired aerobic fitness (< 80% predicted). The height-based model predicted 90.2% ± 14.5% of expected, with 22% classified as impaired. In contrast, the Z-score model identified a lower proportion of participants with impaired aerobic fitness (14%) (table 3).
Median differences between observed and predicted VO2max were –83 [–252 to 71] mL/min with the Z-score model, –110 [–344 to 28] mL/min with the weight-based model, and –201 [–385 to 7] mL/min with the height-based model. These differences were consistently smaller with the Z-score model in both sexes.
In boys, 15% were classified as impaired using the Z-score model, compared with 28% and 29% with the linear weight- and height-based models, respectively. In girls, the proportion was 11% with the Z-score model, compared with 8% and 13% with the linear models.
These results are detailed in table 3 and illustrated in the Venn diagram (figure 2). Only partial overlap was observed between the 3 models in the identification of participants with impaired fitness. The Z-score model showed limited concordance with the linear equations, with a large proportion of children and adolescents being classified as impaired exclusively by 1 of the linear models. Despite the use of model-specific clinical thresholds, percent agreement between reference models was high (81%-89%). Cohen's kappa indicated good agreement between the VO2max Z-score and height-based models (κ = 0.67) and moderate agreement for comparisons involving the weight-based equation (κ = 0.45–0.43) (table S2). Notably, nearly all children classified as having impaired aerobic fitness by the Z-score model were also identified by the height-based equation, whereas the latter classified a larger number of additional participants as impaired.
Distribution of Spanish children and adolescents classified as having impaired aerobic fitness using 3 VO2max reference models. Venn diagram of Spanish children and adolescents classified as having impaired aerobic fitness according to 3 VO2max reference models. Overlaps indicate concordant classification. VO2max, maximal oxygen uptake.
Among the 3 participating centers, the VO2max Z-score model showed consistently good performance, with comparable R2 values and limited median prediction errors (table 1). Although some variability across centers was observed for all reference models, particularly for the linear equations, the Z-score model consistently showed superior performance.
Influence of BMI on VO2max classification and model performance in the Spanish cohortWhen stratifying participants by BMI category, the Z-score model maintained consistent classification of impaired aerobic fitness among weight groups. Among children with normal BMI (5th-84th percentile), 10% were classified as impaired by the Z-score model, compared with 11% and 21% using the linear weight- and height-based equations, respectively. In the group with BMI ≥ 85th percentile, the Z-score model identified 20% as impaired, whereas the linear equations classified 50% (weight-based) and 26% (height-based) as impaired.
The findings illustrated in figure 1 and figure 3 highlight the superior performance and robustness of the Z-score model across the full range of BMI values. In both figures, the Z-score model consistently demonstrated the smallest absolute differences between observed and predicted VO2max, along with minimal dispersion and bias. These patterns were observed among all subgroups, including boys, girls, and participants with underweight, normal weight, or overweight. In contrast, the linear weight- and height-based equations exhibited increasing prediction errors as BMI percentile increased, particularly above the 85th percentile. The linear models tended to overestimate VO2max in overweight participants, and underestimate it in underweight individuals, as reflected by the widening gap between observed and predicted values.
Performance of VO2max prediction models among BMI percentiles in Spanish children and adolescents. Scatterplots with locally estimated scatterplot smoothing curves illustrating the relationship between BMI percentile and absolute difference between predicted and observed VO2max (in mL/min) in the Spanish cohort, using 3 reference equations: the Z-score model, weight-based linear model, and height-based linear model. The top row represents data for boys, and the bottom row represents data for girls. The shaded green area indicates normal BMI range (5th-85th percentile), blue indicates underweight (< 5th percentile), and red indicates overweight or obesity (≥ 85th percentile). BMI, body mass index; VO2max, maximal oxygen uptake.
This multicenter study validates the applicability of a recently proposed Z-score model for pediatric VO2max assessment in a large, independent Spanish cohort of healthy children and adolescents. This validation was conducted in a national cohort distinct from the development population, providing an external assessment of its clinical applicability. The Z-score model demonstrated the strongest correlation between observed and predicted VO2max values (R2 = 0.70), outperforming the widely used linear height- and weight-based equations. It provided more accurate predictions among all subgroups and a more consistent classification of impaired aerobic fitness, particularly in overweight participants (figure 4).
Central illustration. Overview of the external validation of a pediatric VO2max Z-score model in a multicenter Spanish cohort of healthy children and adolescents, compared with historical linear reference equations. CPET, cardiopulmonary exercise testing; VO2max, maximal oxygen uptake; BMI, body mass index; CO2, carbon dioxide; R2, coefficient of determination.
While historic linear VO2max reference equations are widely used for the clinical interpretation of pediatric CPET, our findings confirm that these linear models introduce systematic biases: they overestimate VO2max reference values in overweight children and adolescence and underestimate them in those who are underweight, thereby increasing the risk of misclassification.12 In contrast, the Z-score model, derived from high-quality CPET data18 and integrating the nonlinear relationships of VO2max with anthropometric parameters, showed greater robustness and consistency across sex, BMI, and age groups. The preservation of these properties in an external national cohort supports the transportability and clinical relevance of the model, in line with the original development study.16 Although different clinically accepted thresholds were used across reference models, agreement analyses showed largely concordant clinical classification, particularly between the Z-score model and the height-based equation, supporting the robustness of the Z-score approach in routine pediatric CPET interpretation.
VO2max is a cornerstone measure in pediatric exercise physiology, yet its interpretation remains highly dependent on the choice of reference standards.31 Our results suggest that using a Z-score approach improves the accuracy and comparability of aerobic fitness assessments in children and adolescents. The clinical relevance is 2-fold: first, better identification of those with reduced functional capacity who may benefit from early intervention, and second, more reliable monitoring in longitudinal follow-up or post-intervention studies. The improved performance of the Z-score model in overweight individuals is particularly relevant given the rising global prevalence of pediatric overweight and obesity.32,33
Although the Spanish and reference cohorts were not statistically different in terms of sex distribution, BMI, and proportion of overweight participants, Spanish participants were slightly older, taller, and lighter. These differences likely reflect regional or secular anthropometric trends and may partly account for the modestly lower absolute VO2max values observed in the Spanish cohort, particularly among boys.23 Such trends may be influenced by subtle environmental, nutritional, or behavioral factors. Recent national surveys have raised concerns regarding Spanish children's lifestyle habits, including insufficient physical activity and high sedentary time. According to the ANIBES study, over 60% of Spanish youth do not meet WHO physical activity recommendations.19 Furthermore, the 2022 National Sports Habits Survey, conducted by the Consejo Superior de Deportes, highlighted persistent sex-based disparities and declining sports participation during adolescence. Despite these contextual and regional differences, the Z-score model retained its discriminative and classificatory performance, further supporting its generalizability beyond the original development context. An important strength of this validation lies in the geographic diversity of the Spanish cohort, encompassing centers from distinct regions with different lifestyle and environmental contexts. Across this heterogeneity, the VO2max Z-score model demonstrated stable performance, reinforcing its robustness and relevance for pediatric CPET interpretation.
Study limitationsThis study has several limitations. First, although this was a multicenter study, the validation cohort was limited to Spanish tertiary centers, and the findings may not be generalizable to children and adolescents from other ethnic or geographic backgrounds. Second, this study did not include detailed assessments of external determinants of VO2max such as habitual physical activity, pubertal stage, lean mass, or nutrition, all of which are known to influence aerobic fitness. These variables were not available in the original development cohort and were therefore not incorporated, in order to preserve methodological consistency for external validation. Third, CPET were performed using cycle ergometry; therefore, the findings cannot be extrapolated to treadmill protocols, which may yield different results in children and adolescents. Moreover, other CPET-derived indices (oxygen pulse, oxygen uptake efficiency slope, etc.) were not analyzed, as validated pediatric Z-score reference models for these parameters have already been published separately.17 Finally, longitudinal predictive value was not assessed, and further studies are needed to confirm the prognostic relevance of the model over time.
PerspectivesBeyond its diagnostic value, CPET should be increasingly recognized in pediatrics as a core component of physical fitness assessment. VO2max is a well-established prognostic marker in adults in multiple diseases. In children and adolescents, although long-term outcome data are limited, VO2max has been shown to correlate with health-related quality of life in various childhood diseases.4,7,9 This highlights the importance of conducting multicenter, prospective longitudinal studies to explore the prognostic value of VO2max in various pediatric conditions.
In this context, CPET should be part of a comprehensive physiological and behavioral profile including muscular strength testing,6 accelerometry-based physical activity monitoring,7 and patient-reported outcomes.9 This integrative approach is in line with the current emphasis on physical activity promotion, shared decision-making, and personalized care. Validated, standardized reference models, such as the Z-score model presented in this study, could support tailored interventions, including early lifestyle counseling and pediatric rehabilitation programs.28
CONCLUSIONSThis multicenter validation study confirms the applicability and performance of the recently developed VO2max Z-score model in a large, independent Spanish cohort of healthy children and adolescents. Compared with traditional linear equations, the Z-score model demonstrated superior accuracy, consistency, and classification reliability across variations in pediatric anthropometric profiles. Its robust performance, including in overweight participants, reinforces its clinical usefulness for interpreting pediatric CPET data. The adoption of standardized, validated reference models is essential for improving diagnostic accuracy, guiding clinical decisions, and supporting longitudinal and multicenter research in pediatric exercise physiology.
Beyond clinical and research applications, reliable VO2max assessment also plays a pivotal role in preventive strategies by enabling early identification of children and adolescents with low aerobic fitness, which is a modifiable risk factor for future cardiometabolic disease. Future efforts should focus on integrating VO2max assessments into comprehensive evaluations of physical fitness and on conducting outcome-based studies among diverse pediatric populations.
DATA AVAILABILITYThe data underlying this article will be shared on reasonable request to the corresponding author.
FUNDINGThis research received funding from the Spanish High Council of Sports, through its grants for science, research and technology applied to sports medicine and health-enhancing physical activity. This study belongs to the QUALIREHAB implementation research project in Europe, funded by the “Innovate to Prevent” European Joint Transnational Call (JTC-THCS 2024).
ETHICAL CONSIDERATIONSThe study was conducted in accordance with Good Clinical Practice protocols and the principles of the Declaration of Helsinki, and it was registered on ClinicalTrials.gov (NCT04876209). For the Spanish cohort, ethics approval was obtained from the Ethics Committees of Hospital Virgen del Rocío (Seville), the Health Research Institute of Hospital La Fe (Valencia), and Hospital Universitario Donostia (Donostia/San Sebastián). For the development cohort, all prior prospective studies used to generate the CPET database were approved by relevant ethics committees and registered in international clinical trial registries. Written informed consent was obtained from all participants and/or their legal guardians prior to inclusion, and consent forms were archived in accordance with institutional and ethical requirements. Sex was systematically reported and analyses were stratified by sex when appropriate, in accordance with SAGER recommendations.
STATEMENT ON THE USE OF ARTIFICIAL INTELLIGENCENo artificial intelligence was used in the preparation of this article.
AUTHORS’ CONTRIBUTIONSE. Peiró-Molina and A. Gavotto contributed equally to this work. All authors participated meaningfully in the study and approved the final version of the manuscript. E. Peiró-Molina conceived and designed the study, coordinated patient recruitment and cardiopulmonary exercise testing in Spanish centers, contributed to methodology development and data analysis, drafted the first version of the manuscript, participated in subsequent revisions, and shared supervision and project leadership. A. Gavotto contributed equally to study conception and design, participated in patient recruitment and CPET performance, contributed to methodology development, statistical analysis, and manuscript revision, was responsible for data visualization and figure preparation, participated in validation of analyses, and shared supervision and project leadership. J. Robert contributed to study design, statistical methodology, and data analysis, participated in manuscript development and revision, and was involved in validation of the analyses. M. Andrianoely contributed to data analysis, participated in manuscript revision, and provided software and technical support to ensure analytical reproducibility. E. Rezola contributed to study design, patient recruitment, CPET performance, and manuscript development, and critically reviewed the manuscript for important intellectual content. B. Manso contributed to study design, patient recruitment and testing, participated in manuscript development, critically reviewed the manuscript, and secured study funding and institutional support. F.-J. Ferrer-Sargues participated in manuscript development, data visualization conceptualization, and contributed to the critical review of the manuscript for important intellectual content. J.I. Carrasco-Moreno contributed to patient recruitment and CPET performance in Spanish centers and critically reviewed the manuscript. S. Matecki critically reviewed the manuscript for intellectual content. A. Hager critically reviewed the manuscript for intellectual content. N. Gauthier critically reviewed the manuscript for intellectual content. S.M. Yin contributed to data interpretation and critically reviewed the manuscript. S. Guillaumont contributed to study design, patient recruitment and testing, and critically reviewed the manuscript. T. Mura contributed to study design, methodology, and statistical analysis, participated in validation of analyses and results, and critically reviewed the manuscript. J.M. Blanco Borreguero contributed to patient recruitment and testing and critically reviewed the manuscript. P. Amedro contributed to study conception and design, data analysis, manuscript development and revision, project supervision and coordination, and secured study funding and institutional support, while providing senior leadership and coordination of the whole research group.
CONFLICTS OF INTERESTThe authors declare no conflict of interest.
- –
VO2max is the gold standard for assessing aerobic capacity in children and adolescents, and reduced VO2max is associated with adverse cardiovascular outcomes.
- –
Historical VO2max linear reference equations, still widely used, were derived from small US cohorts in the 1980s and may misclassify fitness in underweight or overweight children and adolescents.
- –
Recently developed VO2max Z-score models appear more accurate but have not been validated in Spanish populations.
- –
This study provides the first external validation of the pediatric VO2max Z-score model in a large Spanish cohort of healthy children and adolescents.
- –
The Z-score model showed stronger correlation, lower prediction error, and more reliable classification than linear equations among sex and BMI groups.
- –
Its robustness beyond the original cohorts suggests broad applicability and its use may improve pediatric CPET interpretation and early detection of impaired aerobic fitness.
We thank all the children and adolescents and their families who accepted to participate in the study.
Supplementary data associated with this article can be found in the online version available at https://doi.org/10.1016/j.rec.2026.02.006.


