Introduction
Primary liver cancer is the sixth most common cancer worldwide and the fourth leading cause of cancer-related death, accounting for ∼90% of primary liver cancers.1 Hepatocellular carcinoma (HCC) is the most frequent histologic type of liver cancer. The most effective first-line treatment is surgical resection for selected patients and is widely recommended by current guidelines.2,3 However, patients with surgically resected HCC are still at risk of recurrence, with an annual rate of ≥ 10% and a recurrence rate of 70–80% after 5 years.4 In addition, the reasons for postsurgical recurrence and how to prevent recurrence are unresolved. Therefore, identification of potentially curable patients at high risk for postoperative recurrence is critical to improve long-term survival after HCC resection.
HCC recurrence is the main postoperative complication, which is generally considered either early (less than 2 years) or late (more than 2 years).5 However, early recurrence occurs in 30–50% of patients and accounts for more than 70% of tumor recurrences, and is the major cause of mortality. Previous studies have shown that early recurrence of HCC is usually related to aggressive tumor pathological features, such as large tumor size, multiple tumors, poor cell differentiation, and macroscopic or microscopic vascular invasion.6 Other risk factors for HCC recurrence are cirrhosis, tumor size of > 5 cm, or portal vein invasion.7
The prognosis of HCC has traditionally been assessed by staging, such as the tumor-node-metastasis (TNM), Barcelona clinic liver cancer and Hong Kong liver cancer systems.8–11 However, staging systems are not available to patients after surgical treatment and therefore do not predict postoperative recurrence. A few models including the Singapore liver cancer recurrence score and surgery-specific cancer of the liver Italian program (SS-CLIP),12 have been developed specifically to detect tumor recurrence after surgical resection but none of them have been externally validated.13
Machine learning (ML) algorithms are techniques for data mining that use artificial intelligence to evaluate and analyze data, and can generate predictive models more efficiently and effectively than conventional methods by detecting hidden patterns within large data sets. Recent advances in ML models have helped to learn about features represented in data and to improve model performance in different HCC domains, including disease prediction, disease classification, and clinical practice.14 Various types of model architectures have been used, such as logistic regression, k-nearest neighbor (K-NN), decision trees, naïve Bayes (NB), and deep neural networks (DNN).15 Several examples of prognosis prediction methods using ML approaches based on pathological information to evaluate micro (mi)RNA expression in exosomes, circulating miRNA information, and to incorporate radiomics have been described,16–19 but which tumor markers should be included in a surveillance program remains controversial. A more precise prognostic and recurrent prediction model is urgently needed.
In this study, we enrolled pathologically confirmed HCC patients to investigate the factors that are associated with tumor recurrence and to develop a prognostic model to improve the predictive accuracy for HCC recurrence using ML. We hope the model will provide clinicians with an appropriate surveillance tool for early detection of HCC recurrence and treatment.
Methods
Patient population
Of the 312 HCC patients diagnosed between September 2016 and June 2018 at Shandong Provincial Hospital, 220 patients recruited in this retrospective study. Patients (1) with HCC diagnosed by liver biopsy, (2) without other tumors on preoperative CT evaluation and, (3) receiving initial treatment were eligible for inclusion. Patients (1) with cholangiocarcinoma, or (2) metastasis, (3) without postsurgical follow-up; (4) younger than 18 years of age, and (5) with imaging evidence of recurrence within 2 months after treatment were excluded. All patients with HCC enrolled in this study were diagnosed by pathological evaluation. The study was approved by local Hospital Ethics Committee and patient informed consent was waived when data were collected. Figure 1 is a flow chart of patient selection. Patients were divided into two study groups by HCC recurrence and followed-up until recurrence of HCC, death, study conclusion on August 31, 2019. HCC recurrence of HCC was defined by clinical, radiological, and/or pathological diagnosis.
Dataset preparation
We collected patient-related clinical, laboratory, and radiological information from medical records and at follow-up visits. (Tables 1 and 2). Thirty-seven patient characteristics were collected, including. age, etiology, treatment strategy, degree of tumor differentiation, tumor size, number of tumors, platelet count (PLT), alkaline phosphatase (ALP), total bilirubin, prothrombin time (PT), alpha-fetoprotein (AFP), aspartate aminotransferase (AST), white blood cell (WBC) count, protein induced by vitamin K absence or antagonist-II (PIVKA-II), HBsAg, and others.
Table 1Patient characteristics
Characteristics | All patients (N=220) | Patients with recurrence (N=89) | Patients without recurrence (N=131) | p-value |
---|
Age | 56.65±10.39 | 55.89±10.63 | 57.16±10.23 | 0.37 |
Sex | | | | |
Male | 192 (87.27%) | 76 (85.39%) | 116 (88.55%) | 0.49 |
Female | 28 (12.73%) | 13 (14.61%) | 15 (11.45%) | |
Follow-up time | 7.64±8.04 | 9.71±7.97 | 14.0±6.36 | < 0.001 |
Hypertension | 56 (25.45%) | 24 (26.97%) | 32 (24.42%) | 0.67 |
Diabetes | 27 (12.27%) | 12 (13.48%) | 15 (11.45%) | 0.65 |
Fatty liver | 9 (4.09%) | 2 (2.25%) | 7 (5.34%) | 0.25 |
Cirrhosis | 186 (84.55%) | 70 (78.65) | 96 (73.28%) | 0.36 |
Family history of liver cancer | 14 (6.37%) | 7 (7.86%) | 7 (5.34%) | 0.45 |
Etiology | | | | |
Alcohol | 8 (3.64%) | 3 (3.37%) | 5 (3.82%) | 0.83 |
HBV | 131 (59.55%) | 54 (60.67%) | 77 (58.78%) | |
HCV | 5 (2.28%) | 1 (1.12%) | 4 (3.05%) | |
Alcohol and HBV | 64 (29.09%) | 25 (28.09%) | 39 (29.77%) | |
Others | 12 (5.45%) | 6 (6.74%) | 6 (4.58%) | |
Treatment strategy | | | | |
Tumor resection | 131 (59.55%) | 47 (52.80%) | 84 (64.12%) | 0.09 |
Resection and TACE | 89 (40.45%) | 42 (47.20%) | 47 (35.88%) | |
Portal vein tumor thrombus | | | | |
With | 41 (18.64%) | 26 (29.21%) | 15 (11.45%) | < 0.001 |
Without | 179 (81.36%) | 63 (70.79%) | 116 (88.55%) | |
Degree of tumor differentiation | | | | |
Poorly differentiated | 39 (17.72%) | 22 (24.72%) | 17 (12.98%) | 0.08 |
Moderately differentiated | 162 (73.64%) | 60 (67.42%) | 102 (77.86%) | |
Well differentiated | 19 (8.64%) | 7 (7.86%) | 12 (9.16%) | |
Tumor size | | | | |
≤5cm | 133 (60.45%) | 42 (47.19%) | 91 (69.47%) | < 0.001 |
>5cm | 87 (39.55%) | 47 (52.81%) | 40 (30.53%) | |
Number of tumors | | | | |
Solitary | 186 (84.55%) | 73 (82.02%) | 113 (86.26%) | 0.50 |
2–3 | 34 (15.45%) | 16 (17.98%) | 18 (13.74%) | |
Table 2Patient laboratory findings
Variables | All patients (N=220) | Patients with recurrence (N=89) | Patients without recurrence (N=131) | p-value |
---|
White blood cell count, ×109/L | 5.1 (2–82) | 5.2 (2.1–82) | 5.1 (2–15) | 0.62 |
Red blood cell count, | 4.7 (1.7–5.8) | 4.7 (1.7–5.8) | 4.7 (3.1–5.6) | 0.44 |
Hemoglobin, g/L | 14 (6–84) | 15 (10–84) | 14 (6–82) | 0.44 |
Platelet count, ×109/L | 175.30±82.63 | 184.79±81.72 | 168.86±82.93 | 0.16 |
Alanine aminotransferase, U/L | 30.5 (10–581) | 37 (12–581) | 36 (10–209) | 0.64 |
Aspartate aminotransferase, U/L | 38 (9–317) | 38.0 (9–317) | 38.0 (16.00–249.00) | 0.29 |
Alkaline phosphatase, U/L | 76.5 (12–968) | 94 (23–968) | 61 (12–807) | 0.005 |
γ-glutamyl transpeptadase, U/L | 104 (14–619) | 106 (14–427) | 103 (19–619) | 0.06 |
Total bilirubin, m/L | 17 (7–74) | 16 (7–47) | 18 (7–74) | 0.06 |
Direct bilirubin, um/L | 3 (1–97) | 3 (1–97) | 3 (1–64) | 0.77 |
Indirect bilirubin, µm/L | 13 (5–61) | 13 (5–61) | 14 (5–56) | 0.06 |
ALB, g/L | 41.59±5.18 | 41.55±4.41 | 41.85±5.65 | 0.38 |
Glucose, mmol/L | 5.0 (2–14) | 5.0 (4–13) | 5.0 (2–14) | 0.41 |
Cholesterol | 4.39±1.39 | 4.60±1.37 | 4.23±1.39 | 0.27 |
Triglycerides, mmol/L | 0.88 (0.3–2.79) | 0.77 (0.3–1.8) | 0.9 (0.42–2.79) | 0.04 |
High-density lipoprotein, mmol/L | 1.21 (0.37–4.06) | 1.25 (0.4–4.06) | 1.20 (0.37–3.19) | 0.95 |
Low-density lipoprotein, mmol/L | 2.59±0.93 | 2.73±0.97 | 2.49±0.89 | 0.30 |
PT, s | 13 (10–18) | 13 (10–17) | 13 (10–18) | 0.58 |
PTA, % | 85.45±13.36 | 85.23±13.62 | 85.61±13.23 | 0.84 |
Alpha-fetoprotein, ng/mL | 27.0 (1.1–998.0) | 59 (1.5–919.0) | 15.0 (1.1–998.0) | 0.001 |
PIVKA-II, ng/mL | 604.81 (9.38–75,000) | 1,519.5 (16.00–75,000) | 355.29 (9.38–75,000) | 0.001 |
Fibrosis-4 (FIB-4) | | | | |
Low (<1.45) | 43 (19.55%) | 20 (22.47%) | 23 (17.56%) | 0.41 |
Intermediate (1.45–3.25) | 110 (50.0%) | 46 (51.69%) | 64 (48.85%) | |
High (>3.25) | 47 (21.35%) | 23 (25.84%) | 44 (33.59%) | |
HBsAg, IU/mL | 5,790.5 (0.39–8,724.0) | 5,828.0 (0.41–8,002.0) | 5,122.0 (0.39–8,724) | 0.78 |
Evaluation metrics
We used logistic regression, K-NN, decision tree, NB, and DNN models to predict the recurrence of HCC from the patient information. The training cohort included 176 of the 220 patients; the testing cohort included the remaining 44. The training set contains a learned output that the model generalizes to new data. The algorithm flow is shown in Figure 2. The performance of the prediction results was evaluated by introducing four metrics, accuracy (Acc), precision (Prc), recall rate (TPR), and standard deviation (SD).
Acc was the ratio of the number of correctly classified samples and the total number of samples:
Acc=TP+TNTP+TN+FP+FN.
In the confusion matrix of classification results TP represents the positive samples that are predicted to be positive by the model, FP represents the negative samples that are predicted to be positive by the model, FN represents the positive samples that are predicted to be negative by the model, and TN represents the negative samples that are predicted to be negative by the model. Prc is the ratio of the number of correctly classified positive instances and the number of instances classified as positive: Prc=TPTP+FP.
TPR was the proportion of the number of positive cases correctly classified to the actual number of positive cases: TPR=TPTP+FN.
The SD is the extent of dispersion of the accuracy of random tests: σ=1N∑i=1Nxi−μ,
where x1, x2, …, xn are real numbers, µ is the arithmetic mean, and σ is the SD.Statistical analysis
Continuous variables were reported as means (SD) if they were normally distributed or a medians (IQR) if they were not. Categorical variables were reported as numbers and percentages (%). We assessed differences between severe and nonsevere patients with two-sample t-tests or the Wilcoxon rank-sum test depending on parametric or nonparametric data for continuous variables and Fisher’s exact test for categorical variables. A two-sided α of less than 0.05 was considered statistically significant. The statistical analysis was performed with SPSS version 26.0 (IBM Corp., Armonk, NY, USA).
In the building of the predictive models, the Pearson correlation coefficient was used to find the independent predictors of severity of disease from 37 vectors. The predictive models were built based on five ML classification algorithms, i.e. logistic regression, K-NN, decision tree, NB and DNN model by using Python programming software version 3.6.5.
Pearson correlation coefficient and feature selection by univariate analysis were used. The Pearson coefficient between each patient characteristic and recurrence was calculated separately, and the characteristics with significant correlations were selected. The specific steps were as follows: To make the characteristics in the dataset D = {x1, x2, …, xm, y} numerically comparable, the absolute values, maxima and minima of each were mapped to [0, 1];xi←xi−minximaxxi−minxi, y←y−minymaxy−miny, D←{x1, x2, ⋯, xm, y}.
The correlation between each feature and the tag value p(xi, y)p(xi,y)=∑n=1n(xik−x¯i)(yk−y¯)∑kn(xik−x¯)2∑kn(xik−y¯i)2 where xik, yk represent the value of the k-th sample of the characteristic, and x¯i,y¯ represent the sample mean value of the two characteristics, represents the total number of characteristics in the patient data.
To calculate the eigenvectors and eigenvalues of the covariance matrix the features with large influencing factors were selected as the optimal feature subset. The final data set was constructed based on the feature subset.
The K-NN algorithm was constructed as follows:
For data set , the distance from each sample di = (xi, yi) to be classified x to all known samples, L(di, dj);
L(di,dj)=(∑l=1m|xi(l)−xj(l)|2+|yi−yj|2)1/2
was constructed.Adjacent values of each sample were sorted in descending order according to the distance.
The k-nearest neighbors of each sample are obtained by determining the K value. According to the majority voting rule of the following formula, the sample x to be classified is classified into the category with the largest number of samples:
Cx = argmax j∊l∑y=xkI(Cy = j)
Where j represents the tag values of different categories, and Y represents the k-nearest neighbors of sample x to be classified.
Results
Patient characteristics
The clinical characteristics of the patients are shown in Table 1. Most patients were men (192/220, 82.27%), and the mean age was 56.65 (SD = 10.39) years. Of the 220 HCC patients, 89 (40.5%) were recrudescent and 131(59.5%) were nonrecrudescent. The mean time from surgery to recurrence was 14 (SD = 6.36) months. Patients with recurrent HCC were more likely to have larger tumors (> 5 cm diameter, 52.81% vs. 30.53%, P < 0.001) and portal vein tumor thrombus (29.21% vs. 11.45%, P < 0.001). Some differences in the laboratory values of patients with recurrent and nonrecurrent HCC obtained on admission (Table 2) were significant.
Performance comparison
In principal component analysis, we found nine key factors affecting the recurrence of HCC, including tumor size, tumor grade differentiation, portal vein tumor thrombus, PLT, AFP, PIVKA-II, AST, WBC, and HBsAg (Fig. 3). Tumor size, tumor differentiation grade, portal vein tumor thrombus, PLT, AFP, PIVKA-II, AST, WBC, HBsAg, and recurrence results of 176 patients in the training cohort were formed into a data set. The data sets were input into different ML algorithms (i.e. logistic regression, K-NN, decision tree, naïve Bayes, and DNN) to form the ML model. Then the data of 44 patients in the testing cohort were input into the five ML models for prediction. The prediction results from different models were evaluated by comparing the model performance metrics. The accuracies of the K-NN (70.6%), NB (60.9%), decision tree (57.5%), logistic regression (67.9%), and DNN (64.9%) models is reported in Figure 4. After comparing different ML methods, we choose the K-NN model as the optimal prediction model. In terms of accuracy and precision, K-NN algorithm was superior to other algorithms. It had 70.6% Acc and 70.1% Prc. The TPR was 51.9% and the SD was 0.02.
Discussion
The ideal resection index is early solitary HCC, regardless of tumor size, and preserved liver function. Unfortunately, the rate of disease recurrence remains high, with early relapses considered to be "true relapses" and "relapses" afterward assumed to be mainly caused by de novo tumors. However, there is no reliable prediction tool for early HCC recurrence. In this study, we retrospectively evaluated 89 patients with early recurrence of HCC, which had different clinical characteristics and laboratory parameters compared with nonrecurrent patients. Using Pearson analysis, we discovered that early recurrence was mainly determined by aggressive characteristics of the primary (resected) tumor, including size, grade, differentiation, and higher serum AFP, PIVKA-II, PLT, AST, WBC, and HBsAg levels.
Currently, we can only use tumor markers such as AFP and PIVKA-II to determine HCC recurrence, because there is no useful postoperative recurrence marker. AFP is the most commonly used clinical biomarker of HCC, but its sensitivity and specificity are not ideal. AFP is a risk factor for the recurrence of HCC after radical treatment, and has been considered as a better prognostic predictor than cancer morphology alone.20,21 PIVKA-II may play a role in the progression of HCC and is associated with HCC size, microvascular invasion, metastatic dissemination, and recurrence after tumor ablation. In fact, AFP levels are high in 40–60% of HCC patients and in 10–20% of early-stage tumors. It may also be elevated in many benign tumors.22–24 Other studies have shown that the performance of PIVKA-II in HBV-related HCC varies across populations, with a sensitivity of 44–91% and specificity of 68–99% at a cutoff values between 40 and 150 mAU/mL.25 The evidence supports the need for more sensitive and specific HCC markers, no method to predict the recurrence of surgically resected HCC is currently available.
Given the validated, good discriminatory performance of AFP and PIVKA-II prediction models, we studied a novel predictor of HCC recurrence based on the AFP model and including 36 additional serological, pathological, and radiological patient features. Nine features, tumor size, tumor grade differentiation, portal vein tumor thrombus, PLT, AFP, PIVKA-II, AST, WBC, and HBsAg were found influence the recurrence of HCC. The accuracy, recall, and precision of the model were 70.6%, 51.9%, 70.1%, respectively. The inclusion of more clinical markers might further improve the diagnostic accuracy.
In recent years, ML has developed rapidly, and has contributed to outstanding achievements in disease prediction and clinical practice. ML algorithms can be used to predict the outcome of a new observation, based on a training dataset containing previous observations where the outcome is known. It can detect complex nonlinear relationships between numerous variables that are useful in predictive applications.26,27 Many research results show that prediction models based on ML significantly improve the accuracy of cancer diagnosis and prognosis prediction.28–30 In this study, after data training and performance comparisons, we found a novel, sensitive, and stable K-NN model to predict the recurrence of HCC after surgery. We believe that it can help to identify individuals who are at high risk of early recurrence after tumor resection. K-NN algorithms are very effective nonparametric models that are widely used for classification, regression, and pattern recognition. It is highly appropriate to use the K-NN method to predict HCC recurrence of HCC, especially using a large chronic liver disease, tumor characteristics, and hepatic function dataset. The K-NN model was the optimal prediction model, with 70.6% accuracy. When developing the model to predict the risk of patient recurrence, we input nine key factors, tumor size, grade, and differentiation; portal vein tumor thrombus, PLT, AFP, PIVKA-II, AST, WBC, and HBsAg in the K-NN algorithm, which then was able to automatically estimate the HCC recurrence risk of each patient.
This study has several limitations. It was limited by the small sample size and retrospective method. Some cases had incomplete documentation of laboratory testing, and most of the HCC patients included in our study had chronic hepatitis B infection. The limitations might have result in some bias in our general understanding of the disease. In addition, early and late recurrence were not distinguished in this study because of the relatively short follow-up. The two problems mentioned above can be resolved by additional study. The main limitations of ML algorithms are that they are best suited to predicting outcomes in the environment from which they are derived. Conversely, this limitation is also its strength, in that it is highly specific to the peculiarities of a particular center, enabling the best decision for each individual patient.
In conclusion, used ML to develop a K-NN model for predicting HCC recurrence that included a comprehensive evaluation of serological, pathological, and radiological features. The accuracy of this model was about 70.6%, which is much better than the models using only clinical or serological data. This K-NN model was sensitive and stable when used to predict the recurrence of HCC in patient after surgical resection.
Abbreviations
- Acc:
accuracy
- AFP:
alpha-fetoprotein
- ALP:
alkaline phosphatase
- AST:
aspartate aminotransferase
- DNN:
deep neural networks
- HCC:
hepatocellular carcinoma
- JIS:
Japan Integrated Staging
- K-NN:
k-nearest neighbor
- PIVKA-II:
protein induced by vitamin K absence or antagonist-II
- PLT:
platelet count
- Prc:
precision
- PT:
prothrombin time
- RFA:
radio-ablation therapy
- TACE:
trans arterial chemoembolization
- TNM:
tumor-node-metastasis
- TPR:
recall rate
- WBC:
white blood cell
Declarations
Data sharing statement
All data are available upon request.
Funding
National Natural Science Fund (No.81970545; 82170609), Natural Science Foundation of Shandong Province (Major Project) (No. ZR2020KH006) and Ji’nan Science and Technology Development Project (No.2020190790).
Conflict of interest
JL has been an editorial board member of Journal of Clinical and Translational Hepatology since 2021. The other authors have no conflict of interests related to this publication.
Authors’ contributions
Guarantor of article (JL), study concept and study supervision (JL), data collection and/or data interpretation (CL, HY, YF, YC), data analysis (JL), manuscript drafting (CL, HY, YF). All authors read and revised the manuscript.