Introduction
Primary biliary cholangitis (PBC) is a chronic cholestatic liver disease that, if inadequately treated, can progress to end-stage liver disease through progressive destruction of the intrahepatic bile ducts.1 Ursodeoxycholic acid (UDCA) is the established first-line therapy, known to improve biochemical markers, slow fibrosis progression, and extend transplant-free survival.2 However, approximately 30%–40% of patients exhibit an inadequate biochemical response to UDCA, maintaining a heightened risk of liver-related complications and mortality.3,4
For patients with an inadequate response to UDCA, second-line agents such as fibrates (e.g., fenofibrate) are recommended to enhance biochemical response.5 Several binary response criteria have been validated to identify PBC patients who remain at increased risk of death or liver transplantation (LT) despite UDCA therapy. These include the Rochester, Barcelona, Paris I, Paris II, Rotterdam, Toronto, and Ehime criteria.3,4,6–10 Although they vary in treatment duration, cutoff values, and included parameters, alkaline phosphatase (ALP) is a central component in all but one of these models. In the context of drug development, the PBC Obeticholic Acid International Study of Efficacy (POISE) criteria—loosely based on the Toronto model—have been widely adopted. These criteria require a serum ALP < 1.67 × the upper limit of normal (ULN), a reduction of at least 15% from baseline, and a total bilirubin (TBIL) level within the normal range, and have served as surrogate clinical endpoints in therapeutic trials.11,12
However, emerging evidence challenges the adequacy of static biochemical thresholds for risk stratification in PBC. Subgroup analyses have shown that higher TBIL values within the normal range (≤1.0 × ULN) are associated with a worse prognosis.13 Specifically, patients with TBIL levels between 0.6 and 1.0 × ULN had significantly lower 10-year survival rates compared with those with TBIL ≤ 0.6 × ULN. More importantly, complete normalization of ALP (≤1.0 × ULN) has emerged as an independent protective factor. Among patients with mildly increased ALP (<1.67 × ULN), those who failed to achieve normalization exhibited lower 10-year survival rate. Corpechot et al. further reinforced the prognostic value of ALP normalization, highlighting its role as a key therapeutic target.14 Among patients with ALP levels < 1.67 × ULN, achieving complete normalization (≤1 × ULN) was identified as an independent protective factor for complication-free survival. In contrast, reducing TBIL to <0.6 × ULN did not confer a similar survival benefit. Collectively, these findings highlight the strong association between ALP normalization and improved treatment outcomes in PBC and support a clear dose–response relationship between ALP levels and clinical prognosis.
Furthermore, recent therapeutic advances have reinforced the feasibility of achieving ALP normalization in PBC. Phase III clinical trials of novel agents such as elafibranor and seladelpar have demonstrated significantly higher rates of ALP normalization compared with placebo.15,16 In addition, a randomized controlled trial by Liu et al. evaluating initial combination therapy with fenofibrate and UDCA showed superior biochemical response and ALP normalization rates at 12 months compared with UDCA monotherapy.17
Building on increasing evidence on the association of ALP with improved clinical outcomes, this study aimed to (1) validate ALP normalization as a definitive therapeutic target that surpasses conventional response thresholds and (2) identify predictive risk stratification criteria for early intervention.
Methods
Study population and design
We conducted a multicenter retrospective cohort study using a three-stage progressive validation design, encompassing phases to (1) validate the prognostic value of ALP normalization, (2) identify critical intervention windows, and (3) compare the predictive performance of various response criteria. The study population was divided into an internal development cohort and an external validation cohort. The internal cohort, comprising patients with PBC treated at Xijing Hospital between October 2004 and June 2024, was used for all three stages of analysis. The external validation cohort included independent patients from three additional centers, enrolled between June 2013 and December 2024, and was used specifically to validate the results of the criteria comparison in the third phase. Detailed information on patient sources from each external center is provided in Supplementary Table 1.
The inclusion criteria for patient enrollment were: (1) a confirmed diagnosis of PBC; (2) initiation of daily UDCA therapy at a dosage of 13–15 mg/kg from the date of diagnosis; (3) a follow-up period of more than one year; and (4) absence of liver failure events during the initial 12-month treatment period. Exclusion criteria included: (1) the presence of concurrent liver diseases, such as hepatitis B or C, alcoholic liver disease, primary sclerosing cholangitis, or autoimmune hepatitis; and (2) administration of second-line therapies—including obeticholic acid or fibrate drugs—at any time during the study.
The diagnosis and treatment of PBC followed international guidelines.18 Briefly, a diagnosis of PBC was established when at least two of the following three criteria were met: (1) biochemical evidence of cholestasis indicated by increased ALP levels; (2) positivity for anti-mitochondrial antibodies; and (3) histological findings consistent with PBC on liver biopsy.
In the first phase, which aimed to validate the prognostic significance of ALP normalization, survival analysis endpoints included death, LT, or severe liver-related complications (such as esophageal variceal bleeding, ascites, hepatic encephalopathy, hepatorenal syndrome, or hepatocellular carcinoma). The assessment followed a hierarchical priority: death or LT was considered the endpoint if either occurred; otherwise, severe liver-related complications were used as the endpoint. The baseline for survival analysis was set at 12 months after initiation of UDCA monotherapy, corresponding to the timing of key biochemical assessments and serving as a critical point for evaluating long-term treatment efficacy. Patients who did not experience an endpoint event were censored at their last recorded follow-up.
In the second phase of the study, a Sankey diagram was used to visualize changes in patient risk levels over time. Risk categories were defined based on ALP levels at each time point: high risk as ALP > 1.67 × ULN, medium risk as ALP between 1.0 × ULN and 1.67 × ULN, and low risk as ALP ≤ 1.0 × ULN. These visualizations helped track patient transitions across risk strata during treatment.
The third phase of the study evaluated the predictive efficacy of established response criteria, using ALP normalization at 12 months following UDCA initiation as the primary endpoint. Predictive performance for ALP normalization at 12 months was assessed at 3 and 6 months using the following criteria: (1) the Mayo criteria, defined as ALP < 2.0 × ULN; (2) the Paris II criteria, defined as ALP < 1.5 × ULN, AST < 1.5 × ULN, and TBIL < 1 mg/dL; and (3) the Toronto criteria, defined as ALP < 1.67 × ULN.
Clinical data, including demographic characteristics, objective symptoms, and laboratory findings, were extracted from the electronic medical records of the included patients. Cirrhosis was diagnosed based on histological evidence or imaging findings obtained via ultrasound, computed tomography, or magnetic resonance imaging. Liver histology was staged as early (stage I/II) or late (stage III/IV) according to the Ludwig classification.19 Data for the training cohort were collected from Xijing Hospital, Fourth Military Medical University (Xi’an, China), while the validation cohort data were sourced from the Second Affiliated Hospital of Kunming Medical University (Kunming, China), Beijing You’an Hospital affiliated with Capital Medical University (Beijing, China), and the First Affiliated Hospital of China Medical University (Shenyang, China).
Follow-up assessments were conducted at 1, 3, 6, and 12 months after UDCA initiation, and annually thereafter. Liver-related clinical events (including death, LT, and hepatic complications) were ascertained through telephone follow-up.
The study was conducted in accordance with the principles of the Declaration of Helsinki (as revised in 2024) and was approved by the Ethics Committee of Xijing Hospital (approval number: KY20253468-1). Informed consent was obtained from all participants. This study was reported according to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for cohort studies. A completed STROBE checklist is provided as Supplementary File 1.
Statistical analysis
Statistical analyses were performed using R (version 4.5.0) and Python (version 3.13.3). Categorical variables are presented as frequencies and percentages. Continuous variables are expressed as mean ± standard deviation for normally distributed data, and as median with interquartile range for non-normally distributed data. Comparisons of continuous variables were conducted using the Student’s t-test for normally distributed data and the Mann–Whitney U test for non-normally distributed data. Differences in categorical variables were assessed using the chi-squared test or Fisher’s exact test, as appropriate.
Survival analyses were conducted using the Kaplan–Meier method, with group comparisons assessed via the log-rank test. Multivariate Cox proportional hazards models were applied to evaluate the association between covariates and the primary composite endpoint, with results reported as hazard ratios (HRs) and 95% confidence intervals (CIs). Patient risk stratification over time was visualized using a Sankey diagram, while temporal trends were assessed through segmented Poisson regression. The predictive performance of each response criterion for ALP normalization at 12 months post-UDCA initiation was evaluated by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Regarding missing data, minimal missingness among covariates in the multivariate Cox model (Supplementary Fig. 1) rendered imputation unnecessary during the development phase; affected cases were excluded from the analysis. In contrast, for the longitudinal follow-up data, missing values were addressed using Markov chain Monte Carlo (MCMC) multiple imputation. This method, which relies on observed state transition probabilities, generated five complete datasets, and final estimates were obtained by pooling the results of these analyses. The robustness of the imputation approach was evaluated through sensitivity analyses comparing imputed and non-imputed outcomes (Supplementary Tables 2–3 and Supplementary Figs. 2–3). All statistical tests were two-sided, with a significance threshold of P < 0.05.
Results
Study population characteristics
Figure 1 illustrates the study flowchart. Of the 588 patients initially screened in the internal development cohort, 375 met the eligibility criteria. The remaining 213 were excluded based on the predefined exclusion criteria. The external validation cohort included 70 eligible patients from three regional centers. The demographic and clinical characteristics of both cohorts are summarized in Table 1. The mean age was 52.63 ± 9.69 years in the internal development cohort and 48.92 ± 9.95 years in the external validation cohort (P = 0.038). In terms of gender distribution, 88.5% of patients in the development cohort and 81.4% in the validation cohort were female; this difference was not statistically significant (P = 0.147). For histological staging, 87.9% (226/257) of patients in the development cohort were classified as stage I–II, while 12.0% (31/257) were classified as stage III–IV according to the Ludwig system. In the validation cohort, 76.5% (13/17) of patients were classified as stage I–II, and 23.5% (4/17) as stage III–IV. The difference in histological staging between the development and validation cohorts was not statistically significant (P = 0.145). Additionally, the development cohort exhibited lower platelet and red blood cell counts, a lower positive rate of anti-sp100 antibodies, and reduced levels of ALT, AST, and GGT compared with the validation cohort (P = 0.002, P = 0.005, P = 0.012, P = 0.006, P = 0.038, and P = 0.010, respectively).
Table 1Baseline characteristics of the internal development and external validation cohorts
| Characteristics | Internal development cohort (N = 375) | External validation cohort (N = 70) | P-value |
|---|
| Age, years | 52.63 ± 9.69 | 48.92 ± 9.95 | 0.038 |
| Female (%) | 332 (88.5) | 57 (81.4) | 0.147 |
| RBC, 1012/L | 4.00 (3.65–4.29) | 4.28 (3.75–4.57) | 0.005 |
| HGB, g/L | 120.00 (109.50–132.00) | 123.00 (113.25–135.75) | 0.100 |
| PLT, 109/L | 141.0 (87.00–195.50) | 203.50 (138.00–246.50) | 0.002 |
| ALT, IU/L | 50.00 (29.00–85.00) | 65.90 (45.10–97.00) | 0.006 |
| AST, IU/L | 55.00 (36.00–83.00) | 67.00 (42.00–106.00) | 0.038 |
| TBIL, µmol/L | 16.85 (11.85–26.65) | 18.90 (13.70–24.40) | 0.158 |
| ALP, IU/L | 205.00 (127.00–330.25) | 250.00 (152.00–392.00) | 0.158 |
| GGT, IU/L | 209.00 (101.00–384.50) | 312.00 (159.60–548.00) | 0.010 |
| IgG, g/L | 15.50 (13.05–19.55) | 16.00 (13.03–19.91) | 0.399 |
| IgM, g/L | 3.01 (2.11–5.04) | 3.17 (2.19–5.16) | 0.537 |
| Autoantibodies (positive, %) |
| AMA | 240/282 (85.1) | 21/24 (87.5) | 0.876 |
| ANA | 270/291 (92.8) | 32/39 (82.1) | 0.053 |
| AMA-M2 | 176/288 (61.1) | 26/37 (70.3) | 0.234 |
| gp210 | 88/276 (31.9) | 10/21 (47.6) | 0.123 |
| sp100 | 35/276 (12.7) | 7/21 (33.3) | 0.012 |
| Histological stage (%)a | | | |
| Early-stage (I–II) | 226/257 (87.9) | 13/17 (76.5) | 0.145 |
| Late-stage (III–IV) | 31/257 (12.0) | 4/17 (23.5) | 0.145 |
Notably, the internal development cohort was followed for a median duration of 76.9 months. During this period, 69 serious clinical events were recorded, including 22 deaths and two liver transplants. This extensive follow-up provided robust survival data for subsequent analyses.
Prognostic significance of ALP normalization after 12 months of treatment
Kaplan–Meier analyses (Fig. 2) were conducted to assess complication-free survival rates stratified by ALP levels at one year (>1.67 × ULN, 1.0–1.67 × ULN, and ≤1.0 × ULN). Complication-free survival was inversely correlated with ALP levels, with rates of 62.8% in the >1.67 × ULN group, 79.8% in the 1.0–1.67 × ULN group, and 89.8% in the ≤1.0 × ULN group. Patients achieving ALP normalization (≤1.0 × ULN) demonstrated the lowest risk of LT, liver-related mortality, or hepatic complications. Statistically significant differences were found between the normalized group and those with ALP levels between 1.0 and 1.67 × ULN (P = 0.016), as well as between the medium-risk group and those with ALP levels exceeding 1.67 × ULN (P = 0.014).
Following the Kaplan–Meier analysis, a multivariable Cox regression was conducted focusing on patients with ALP levels below 1.67 × ULN after 12 months of treatment (Table 2). Several binary variables were examined, including ALP > 1.0 × ULN, age > 55 years, TBIL > 0.6 × ULN, and ALT > 1.0 × ULN. ALP levels between 1.0 and 1.67 × ULN emerged as a significant predictor of adverse outcomes (HR = 2.27; 95% CI: 1.21–4.26; P = 0.011), followed by age over 50 years (HR = 4.27; 95% CI: 1.66–10.96; P = 0.003). In contrast, neither ALT > 1.0 × ULN (HR = 0.45; 95% CI: 0.16–1.27; P = 0.133) nor TBIL > 0.6 × ULN (HR = 2.37; 95% CI: 0.73–7.73; P = 0.152) was significantly associated with increased risk.
Table 2Time-dependent, multivariable-adjusted Cox regression analysis of poor clinical outcomes in patients with ALP < 1.67 × ULN
| Parameter | Events/N | HR | SE | z | P > |z| | 95% CI |
|---|
| Age > 50 y | 38/190 | 4.2686 | 0.4813 | 3.015 | 0.003 | 1.66–10.96 |
| ALP > 1.0 × ULN | 24/119 | 2.2686 | 0.3210 | 2.552 | 0.011 | 1.21–4.26 |
| TBIL > 0.6 × ULN | 39/242 | 2.3703 | 0.6028 | 1.432 | 0.152 | 0.73–7.73 |
| ALT > 1.0 × ULN | 4/54 | 0.4503 | 0.5310 | −1.503 | 0.133 | 0.16–1.27 |
Analysis of the critical time window for patient risk transitions
A Sankey diagram was employed to visualize the dynamic shifts in risk levels among 375 PBC patients over a 12-month follow-up period (Fig. 3). Risk stratification was based on ALP levels, categorized as high risk (>1.67 × ULN), medium risk (1.0–1.67 × ULN), and low risk (≤1.0 × ULN). The width of the arrows corresponds to the number of patients transitioning between risk groups. Missing data were addressed using MCMC multiple imputation.
The high-to-medium risk transition rate showed a biphasic pattern over the 12-month follow-up period, with an initial rapid decline followed by a plateau (Fig. 4A). Segmented Poisson regression analysis identified a significant joinpoint at 3.73 months (P = 0.003; AIC improvement = 7.8, Supplementary Table 4). The trend was divided into two distinct periods: an initial rapid decline phase from baseline to approximately 3.7 months, followed by a plateau phase in which the transition rate remained stable (monthly percent change (MPC) = −0.09%, 95% CI: −3.52 to 0.45%, Supplementary Table 4), highlighting that the first 3 months following intervention constitute a critical window for transitioning high-risk patients.
The downward trend observed in the moderate-risk patient group followed a similar pattern, with segmented Poisson regression identifying a changepoint at approximately 5.5 months (P = 0.043, Fig. 4B). The trend evolved through two periods: a declining phase from baseline to about 5.5 months, after which the rate of decline substantially weakened, forming a plateau (MPC = −0.30%, 95% CI: −10.92 to 0.36%; Supplementary Table 5). These findings suggest that intervention strategies should be adjusted around month 6 to address the observed plateau in treatment efficacy. The conclusions based on uninterpolated data are consistent (Supplementary Tables 2–3, Supplementary Figs. 2–3), confirming the robustness and reliability of the results.
Predictive performance of biochemical criteria for ALP normalization
Based on the critical time windows identified at 3 and 6 months for changes in patient risk, we calculated sensitivity, specificity, PPV, and NPV to assess the effectiveness of the three criteria.
The calculation method described by Corpechot et al. was employed, defining biochemical response as a positive test and ALP normalization at 12 months as the event of interest.7
At month 3, both the Mayo and Toronto criteria demonstrated excellent NPVs (≥0.95) and sensitivity (0.98), indicating that patients failing these criteria were highly unlikely to achieve normalization. Notably, the Toronto criteria exhibited superior specificity compared to the Mayo criteria (44% vs. 33%), making it more advantageous for early screening to balance timely intervention with the avoidance of overtreatment. By month 6, while the Mayo and Toronto criteria maintained 100% NPV, their specificity remained limited. In contrast, the Paris II criteria achieved the highest specificity (73%), providing superior accuracy in identifying high-risk non-responders who require intensified treatment, thereby minimizing the risk of overlooking patients in need of escalation (Table 3).
Table 3Predictive performance of biochemical response criteria at months 3 and 6 for 12-month ALP normalization in the PBC internal development cohort
| Sensitivity | Specificity | PPV | NPV |
|---|
| Month 3 | | | | |
| Mayo criteria | 0.98 | 0.33 | 0.60 | 0.96 |
| Paris II criteria | 0.67 | 0.75 | 0.73 | 0.69 |
| Toronto criteria | 0.98 | 0.44 | 0.64 | 0.95 |
| Month 6 | | | | |
| Mayo criteria | 1.00 | 0.31 | 0.60 | 1.00 |
| Paris II criteria | 0.71 | 0.73 | 0.73 | 0.71 |
| Toronto criteria | 1.00 | 0.44 | 0.65 | 1.00 |
External validation results further supported the robustness of our findings: the Toronto criteria achieved an NPV of 0.94 at month 3, and the Paris II criteria demonstrated a specificity of 63% at 6 months, comparable to the 73% specificity observed in the development cohort. These results confirm the consistency of the core findings across different populations (Table 4).
Table 4Predictive performance of biochemical response criteria at months 3 and 6 for 12-month ALP normalization in the PBC external validation cohort
| Sensitivity | Specificity | PPV | NPV |
|---|
| Month 3 | | | | |
| Mayo criteria | 0.91 | 0.52 | 0.50 | 0.92 |
| Paris II criteria | 0.58 | 0.87 | 0.70 | 0.80 |
| Toronto criteria | 0.92 | 0.70 | 0.61 | 0.94 |
| Month 6 | | | | |
| Mayo criteria | 0.90 | 0.32 | 0.41 | 0.86 |
| Paris II criteria | 0.50 | 0.63 | 0.42 | 0.71 |
| Toronto criteria | 0.80 | 0.42 | 0.42 | 0.80 |
Discussion
Normalization of serum ALP levels is associated with better long-term outcomes in PBC patients treated with UDCA14 and serves as an important prognostic indicator.20 This study confirms that UDCA-treated PBC patients with ALP levels between 1.0 and 1.67 × ULN remain at risk of poor outcomes. In addition, age over 50 years was identified as an independent risk factor for poor prognosis among patients with ALP < 1.67 × ULN. Importantly, we identified two critical dynamic risk transition windows: (1) after 3 months of treatment, patients remaining at high risk (ALP > 1.67 × ULN) are less likely to transition to medium risk (ALP 1.0–1.67 × ULN); and (2) after 6 months, patients remaining at medium risk exhibit a significant decline in normalization rates. Based on these time windows, we further assessed the potential for ALP normalization in UDCA-treated PBC patients: (1) at 3 months, patients with ALP > 1.67 × ULN have a markedly reduced likelihood of ALP normalization; and (2) at 6 months, non-responders according to the Paris II criteria (ALP < 1.5 × ULN, AST < 1.5 × ULN, and TBIL < 1 mg/dL) similarly have significantly limited potential for normalization.
Data from our study cohort confirmed that patients with normalized ALP levels had better outcomes than those with ALP between 1.0 and 1.67 × ULN.13,14 Murillo-Pérez et al. initially reported that both TBIL ≤ 0.6 × ULN and ALP ≤ 1.0 × ULN were associated with improved survival in the overall PBC population, and that TBIL ≤ 0.6 × ULN conferred a protective effect even among patients with ALP < 1.67 × ULN.13 However, proteomic research by Jones et al. found that, even under the most stringent current response criteria (the Paris II criteria), a subset of patients labeled as “UDCA responders” exhibits persistently elevated disease biomarkers compared to those achieving ALP and bilirubin normalization, with no significant difference observed between patients with TBIL ≤ 0.6 × ULN and 0.6–1 × ULN when ALP levels are in the normal range.21 Our multivariate analysis revealed that, after adjusting for confounding factors such as age, ALP normalization emerged as an independent protective factor within the subgroup with ALP < 1.67 × ULN, whereas TBIL ≤ 0.6 × ULN was not a consistently stable protective indicator among responders. Therefore, complete ALP normalization should be prioritized as the primary treatment goal, with precise TBIL management as a secondary objective, especially in cases where ALP targets are unmet. This dependency likely arises from the distinct pathological roles of the two markers22–24: ALP elevation directly reflects cholestatic activity through damaged cholangiocyte membrane shedding and progressive bile duct damage. Conversely, elevated TBIL indicates impaired bilirubin absorption or conjugation in hepatocytes, or severe excretory failure, reflecting advanced hepatocellular dysfunction. Thus, ALP normalization captures early, modifiable ductal pathology, whereas TBIL reduction may signal irreversible hepatic dysfunction.
Notably, our data indicated that within the subgroup of responders with ALP < 1.67 × ULN, multivariate Cox regression analysis identified age over 50 years as an independent risk factor for poor prognosis (HR = 4.27; 95% CI: 1.66–10.96; P = 0.003), highlighting the need for increased clinical vigilance in older patients.
The duration and magnitude of biochemical abnormalities significantly impact prognosis in PBC patients.25 Therefore, patients who fail to achieve biochemical normalization should be identified and managed promptly. Our longitudinal risk dynamics analysis further identified two critical time windows: 3 months and 6 months after initial UDCA treatment. The minimal direct high-to-low risk transitions (<3%) reinforce the concept that risk reduction in PBC is typically a gradual process rather than an abrupt shift. Thus, ongoing monitoring within the first 6 months after initial UDCA treatment is highly recommended. Our data are also consistent with previous reports of an early rapid decline in biochemical markers following UDCA initiation26; however, our segmented Poisson regression provides a more granular dynamic perspective, indicating that this initial decline phase extends to approximately 3 months before reaching a plateau. Herein, we emphasize the importance of this early period for assessing drug sensitivity in baseline high-risk patients and establish a high NPV threshold (ALP ≥ 1.67 × ULN) to accurately identify those unlikely to achieve biochemical normalization. Zhang et al.27 suggested that predictive efficacy is reached at 6 months, marking the stabilization of patient indicators post-treatment. Through trend analysis, our study highlights the critical significance of this six-month time point for biochemical normalization. We also note that the Paris II criteria allow for effective screening of patients requiring intervention at this stage.
Of note, Corpechot et al. reported that the survival benefit associated with ALP normalization was restricted to patients with liver stiffness measurement > 10 kPa and age < 62 years, suggesting that this benefit may be most pronounced in those with more advanced fibrosis. In contrast, our cohort predominantly comprised patients with early-stage fibrosis (Ludwig stage I–II, 87.9%), yet ALP normalization was still independently associated with improved complication-free survival. This discrepancy may reflect differences in patient selection, disease stage distribution, or follow-up duration between the two studies. Future studies should prospectively examine whether the prognostic value of ALP normalization is modified by fibrosis severity and patient age.
In summary, this study proposes a time-window refinement model grounded in risk evolution, offering an evidence-based framework to guide individualized intensive therapy. However, several limitations remain: (a) the efficacy of this time window–guided approach in improving hard clinical endpoints requires validation in larger prospective cohorts; (b) the limited sample size of the liver pathology repository restricts deeper investigation into the mechanisms underlying the conversion from histological to biochemical response; (c) several baseline differences were observed between the internal and external cohorts, including lower platelet counts and liver enzyme levels in the development cohort, which may partly reflect differences in disease severity or sample collection protocols across centers. Additionally, the relatively small size of the external validation cohort (n = 70) may limit the generalizability of the validation findings. Nevertheless, such an approach will allow the development of a stratified intervention strategy by identifying critical time windows—the third and sixth months of treatment—that balance precision therapy with the avoidance of overtreatment.
Conclusions
This study establishes ALP normalization (≤1.0 × ULN) as a superior therapeutic target in PBC, demonstrating a 10.0 percentage-point improvement in complication-free survival (89.8% vs. 79.8% at median follow-up) compared with patients maintaining ALP levels between 1.0 and 1.67 × ULN. These findings indicate that current biochemical response criteria may inadequately stratify risk, as a substantial proportion of patients classified as “responders” remain at elevated risk for adverse outcomes.
Our longitudinal analysis identified two critical intervention windows for treatment intensification. At 3 months, patients with ALP ≥ 1.67 × ULN show markedly reduced normalization probability (NPV 95% by Toronto criteria), warranting early consideration of second-line therapies. At 6 months, failure to meet Paris II criteria reliably identifies patients unlikely to achieve normalization (specificity 73%). This dual time window–driven strategy enables timely intervention while avoiding premature treatment escalation.
The clinical implications are threefold: treatment goals should prioritize complete ALP normalization; biochemical monitoring at 3 and 6 months provides actionable decision points for therapy intensification; and older patients (>50 years) warrant enhanced surveillance despite meeting conventional response criteria. While prospective validation is needed, this time window–based framework offers a practical approach to implementing precision medicine in PBC management, potentially reducing liver-related complications and improving long-term outcomes.
Supporting information
Supplementary Table 1
Numbers of patients per centre (original cohort).
(DOCX)
Supplementary Table 2
Estimated parameters of the segmented Poisson model for high-to-medium risk transition (Pre-Imputation Data).
(DOCX)
Supplementary Table 3
Estimated parameters of the segmented Poisson model for high-to-medium risk transition (Pre-Imputation Data).
(DOCX)
Supplementary Table 4
Estimated parameters of the segmented Poisson model for high-to-medium risk transition (Post-Imputation Data).
(DOCX)
Supplementary Table 5
Estimated parameters of the segmented Poisson model for medium-to-low risk transition (Post-Imputation Data).
(DOCX)
Supplementary File 1
STROBE Statement—Checklist of items that should be included in reports of cohort studies
(DOCX)
Supplementary Figure 1
Histogram of Missing Data Rates for Covariates in Cox Proportional Hazards Regression Analysis.
(DOCX)
Supplementary Figure 2
Analysis of the dynamic migration path of risk levels within one year (Pre-Imputation Data).
The Sankey diagram depicts the transitions between risk categories over 12 months for 318 PBC patients, who underwent six follow-up assessments (baseline and at 1, 3, 6, 9, and 12 months). The width of the arrows corresponds to the number of patients, and the labels indicate absolute counts (N) and percentages (%), relative to the total population of the preceding node. The risk categories are defined as follows: high risk (orange, ALP >1.67×ULN), medium risk (pink, ALP 1.0–1.67×ULN), low risk (blue, ALP <1.0×ULN) and gray indicates patients for whom follow-up records are missing this time. Minor transitions (<5% of total flow) were aggregated as “other pathways” and omitted from the labels. Abbreviations: ALP: alkaline phosphatase; ULN: upper limit of normal.
(DOCX)
Supplementary Figure 3
Segmented Poisson regression analysis of recovery rates over 24-month follow-up (Pre-Imputation Data).
The segmented Poisson regression plots illustrate the temporal trends in transition rates over the 24-month period. Solid lines represent the fitted models; circles indicate observed recovery rates. Vertical dashed lines mark the automatically estimated joinpoints. In panel A, the transition rate from high- to medium-risk showed a decline phase from baseline to the joinpoint at 3.7 months (p = 0.003), followed by a plateau phase thereafter. In panel B, the transition rate from medium- to low-risk also displayed two distinct phases: a decline from baseline to the joinpoint at 5.8 months (P = 0.049), after which the trend stabilized. Follow-up time points (interval starting points) included baseline, 1, 3, 6, 9, 12, 15, 18, and 21 months.
(DOCX)
Declarations
Ethical statement
This study was approved by the Ethics Committee of Xijing Hospital (approval number: KY20253468-1). All procedures were conducted in accordance with the ethical standards of the Declaration of Helsinki (as revised in 2024) and its later amendments. Informed consent was obtained from all participants.
Data sharing statement
The data that support the findings of this study are not publicly available due to privacy and ethical restrictions. The datasets contain sensitive patient information that cannot be shared, even in de-identified form, according to institutional data protection policies.
Funding
This study was supported by Prevention and Control of Emerging and Major Infectious Diseases-National Science and Technology Major Project (2025ZD01906300 & 2025 ZD01906304), the National Natural Science Foundation of China (No. 82270551 to Ying Han, 82200577 to Yansheng Liu), the Innovation Capacity Support Program of Shaanxi Province (No. 2024RS-CXTD-79 to Yulong Shang, 2025ZC-KJXX-109 to Yansheng Liu), and the Key Research and Development Program of Shaanxi (2023-ZDLSF-33 to Yulong Shang).
Conflict of interest
YH has been an Editorial Board Member of Journal of Clinical and Translational Hepatology since 2013. The other authors have no conflict of interests related to this publication.
Authors’ contributions
Conceptualization (HZ, YsL, YT, NW, YmL, MEG, YS, YH), methodology (HZ, YsL, LZ, RS, XW, JD, GJ, PSCL), formal analysis (HZ, YsL), investigation (HZ, YsL, YT, NW, YmL, YlL, CH, JD, YF), writing - original draft (HZ, YsL, YT, NW, YmL), writing - review and editing (HZ, YsL, MEG, YS, YH), project administration (HZ, YsL, LZ, RS, XW, JD, GJ, PSCL), data curation (HZ, YsL, YT, NW, YmL, YlL, CH, JD, YF, LZ, RS, XW, JD, GJ, PSCL), visualization (HZ, YsL), validation (YT, NW, YmL, YlL, CH, JD, YF), supervision (MEG, YS, YH), resources (YS, YH), and funding acquisition (YS, YH). All authors approved the final manuscript.