Introduction
Hepatocellular carcinoma (HCC) is the most common primary liver malignancy and one of the four most common causes of cancer-related death worldwide.1,2 HCC is the fastest-growing cause of cancer-related deaths in the USA, and it is possible that HCC will become the third largest cause in 2030.3 In recent years, hepatitis B vaccine and antiviral treatment have been widely used,4,5 and the treatments of HCC are various, including surgery, ablation, transcatheter arterial chemoembolization (TACE), chemotherapy targeted immunotherapy and others.3,6 However, the symptoms of HCC are not easy to detect, and the overall prognosis is poor.7 HCC needs early detection, accurate prediction, individualized treatment and follow-up.
Artificial intelligence (AI) is a new subject that studies and develops theories, methods, technologies, and application systems for simulating, extending, and expanding human intelligence. Machine learning (ML) is the core of AI. ML can make the computer simulate or realize human learning behavior to acquire new knowledge or skills and reorganize existing knowledge to improve its own performance.8 In the past decade, ML has been gradually applied to medical research, and has made progress in many aspects.9 In particular, cancer-related research, including lung cancer, breast cancer and so on, and HCC-related research is increasing. ML studies in HCC involve not only diagnosis, treatment, prognosis, and other aspects, but also a variety of algorithm models including decision trees, support vector machines (SVMs), random forest and deep learning.10 The application of ML in HCC can reveal the relationship between AI and HCC and also be instrumental in the prevention and treatment of HCC. This review focuses on the application of ML in the aspects of diagnosis, treatment, and prognosis of HCC.
ML for the Diagnosis of HCC
The diagnosis of HCC depends on pathology, for patients with chronic hepatitis B and liver cirrhosis, radiology can also help with diagnosis.3 However, radiologic diagnosis requires typical imaging features,11 but more than 10% of tumors lack imaging hallmarks of HCC. If the imaging is not typical, a biopsy or second contrast-enhanced study should be performed.12 Biopsy is an invasive procedure with a sensitivity of about 70%, and lower for tumors with a diameter <2 cm. Sometimes it is difficult to distinguish well-differentiated HCC from dysplastic nodules. The diagnosis model of HCC can be established through ML, which can help to diagnose and treat the disease in clinic early and easily. For the diagnosis, it is convenient to obtain clinical data, like albumin, platelet (PLT), total bilirubin, alpha-fetoprotein (AFP), alkaline phosphatase (ALP), γ-glutamyl transferase (GGT), aspartate transaminase (AST), portal vein thrombosis, and others. Phan et al.13 established a convolutional neural network (CNN) model to predict the occurrence of HCC in HBV infected patients by selecting clinical data from Taiwan Health database. The AUC of the model was 0.886 and the accuracy was 0.980. Nam et al.14 constructed a deep neural network to predict the incidence rate of HCC in patients with HBV-related cirrhosis who received entecavir antiviral treatment. The c-index of the model was 0.782, which was significantly better than the traditional six scores (PAGE-B, CU-HCC, HCC-RESCUE, ADRESS-HCC, mPAGE-B, and THRI).
In addition, the accuracy of ML models varies significantly. Sato et al.15 collected relevant clinical data from patients diagnosed with HCC at the first visit and HBV infected patients who developed HCC during the follow-up period. They used logistic regression model for linear classification, SVM, gradient boosting, random forest, neural network, deep learning, and other algorithms for nonlinear classification, and established HCC diagnosis prediction model based on clinical data. Then, all the models were verified in the test set. They found that the gradient boosting model had the highest accuracy. Similarly, Angelis et al.16 used six algorithms including decision tree, random forest, SVM, k-nearest neighbor (KNN) classification, AdaBoost, and gradient boosting to make models based on the collected clinical data. They also found that the gradient boosting had the highest accuracy of 84% and a sensitivity of 92%. Kim et al.17 used gradient boosting machine (referred to as GBM), which is one of the boosting algorithms, to establish a model for the follow-up results of patients with HBV hepatitis treated with entecavir or tenofovir. The model predicted high or low risk of HCC in patients with HBV hepatitis, and it has been externally verified in Western cohorts. However, the study of Wong et al.18 on HCC prediction models of HBV and HCV patients in Hong Kong reported that among logistic regression model, ridge regression model, AdaBoost algorithm model, decision tree model, and random forest model, the accuracy of ridge regression [area under the receiver operating characteristic (AUROC) 0.844] and random forest model (AUROC: 0.837) were stable, and better than other traditional scores (CU-HCC, GAG-HCC, REACH-B, PAGE-B, and REAL-B) (Table 1).19–27
Table 1Details of traditional scores mentioned in this article
Scores | Author and year | Function | Based indicators | Research center | Results |
---|
PAGE-B | Papatheodoridis et al. 201619 | Score for prediction of the 5-year HCC risk in Caucasian CHB patients under entecavir/tenofovir | Age, sex, and platelets | Multicenter | c-index :0.82 |
CU-HCC | Wong et al. 201020 | Clinical score in predicting the risk of HCC among HBV carriers | Age, albumin, bilirubin, HBV DNA, and cirrhosis | Multicenter | Negative predictive value: 97.8% and 97.3% in the training and validation cohorts |
HCC-RESCUE | Sohn et al. 201721 | Prediction model for the development of HCC in treatment-naïve patients receiving oral antiviral treatment for CHB | Age, sex, and cirrhosis. | Multicenter | AUROCs :1 year, 3 years, and 5 years were 0.798, 0.788, and 0.817 in the testing cohort and 0.817, 0.810 and 0.809 in the validation cohort |
ADRESS-HCC | Flemming et al. 201422 | Risk prediction model to estimate the 1-year probability of HCC | Age, diabetes, race, etiology of cirrhosis, sex, and severity of liver dysfunction | Multicenter | |
mPAGE-B | Kim et al. 201823 | Modified PAGE-B scores to improve the predictive performance | Age, sex, platelet counts, and serum albumin levels | Multicenter | c-index 0.704 and 0.691 in the testing cohort and the validation cohort |
THRI | Sharma et al. 201724 | Scoring system to predict HCC risk for patients with cirrhosis | Age, sex, etiology, and platelets | Multicenter | AUROC: 0.82 and 0.72 in the testing cohort and the validation cohort |
GAG-HCC | Yuen et al. 200925 | Score to identify high-risk CHB patients for treatment and screening of HCC | Age, sex, HBV DNA levels, core promoter mutations, and cirrhosis | Multicenter | c-index: 0.77 |
REACH-B | Yang et al. 201126 | Score to estimate the risk of developing HCC at 3, 5, and 10 years in patients with chronic hepatitis B | Sex, age, serum alanine aminotransferase concentration, HBeAg status, and serum HBV DNA level | Multicenter | AUROC: 0.902, 0.783 and 0.806 |
REAL-B | Yang et al. 202027 | HCC risk score using routine clinical variables among a treated Asian cohort | Sex, age, alcohol use, diabetes, baseline cirrhosis, platelet count, and alpha-fetoprotein | Multicenter | AUROC: >0.80 |
For high-risk patients with chronic hepatitis B and cirrhosis, the diagnosis can be established by imaging. However, it is difficult to identify when the image characteristics are not typical. ML is good at processing images, so it has advantages in imaging identification. Bharti et al.28 obtained 754 regions of interest (ROI) through the echotexture and roughness of the liver surface in the ultrasound imaging, and constructed a CNN model to distinguish normal liver, chronic hepatitis, cirrhosis, and HCC. The classification accuracy of the model was 96.6%. Similarly, other studies have suggested that the model established by ultrasound imaging features has good accuracy in distinguishing benign and malignant liver nodules.29,30 Moreover, Brehar et al.31 compared HCC detection models based on ultrasound imaging. They compared the CNN model with the traditional multilayer perceptron, SVM, random forest, and AdaBoost algorithm, and found that the accuracy, sensitivity and specificity of CNN were good, and it was significantly better than the traditional ML algorithm. Recently, Jin et al.32 established a deep learning model through two-dimensional shear wave elastography and corresponding ultrasound images, which can predict the possibility of hepatitis B patients developing into HCC within 5 years. This provides an important reference for the treatment and follow-up of patients with chronic hepatitis B. HCC and intrahepatic cholangiocarcinoma (ICC) both occur in the liver, but their biological behavior, treatment methods, and prognosis are very different. The overall prognosis of ICC is poor. Most patients present with advanced tumors, and only 15% of patients with ICC underwent resection.33 Even for patients who are indicated for surgical resection, the study suggests that the probability of cure is about 10%.34 The resection mode, chemotherapy and targeted treatment of ICC are very different from those of HCC.35,36 Therefore, it has significant to be able to distinguish HCC and ICC in a noninvasive manner. Ren et al.37 established a SVM model by selecting the ROI of lesion on the ultrasound imaging to identify HCC and ICC. The results showed that the accuracy, specificity, and sensitivity of the model were above 0.800, and it had good generalization ability.
Enhanced CT is of great significance in the diagnosis of HCC. When it is controversial to discriminate the nature of nodules with CT images, a good ML model can improve the reliability of diagnosis. The CNN model established by Yasaka et al.38 effectively identified the types of liver masses through enhanced CT, and masses can be divided into five categories using this model. They are category A, classic HCCs; category B, malignant liver tumors other than classic and early HCCs; category C, indeterminate masses or mass-like lesions including early HCCs and dysplastic nodules and rare benign liver masses other than hemangiomas and cysts; category D, hemangiomas; and category E, cysts. Mokrane et al.39 extracted quantitative imaging features from CT images to establish candidate models for diagnosing uncertain liver nodules in patients with liver cirrhosis using three ML algorithms (KNN, SVM, and random forest). They selected the best model using the AUC and Youden index. The model helped to judge uncertain liver nodules in a noninvasive manner. MRI has a similar role. Hamm et al.40 established a CNN model using MRI images, and it divided liver lesions into six categories (simple cyst, cavernous hemangioma, focal nodular hyperplasia (FNH), HCC, ICC, and colorectal cancer metastasis). The discrimination result of the model was better than that of the radiologist, and the specificity and sensitivity were greater than 90%. Liu et al.41 made a SVM model to distinguish combined hepatocellular cholangiocarcinoma, ICC, and HCC using the radiological characteristics of MRI and CT. At the same time, they found that enhanced phase MRI and nonenhanced phase and portal vein phase CT were more helpful for differentiation. Because of the development of ML, it is also possible to predict the pathological grade of HCC by noninvasive evaluation by imaging. Mao et al.42 manually extracted radiomics features and synthesized features using recursive feature elimination, and then established a prediction model of HCC pathological grade with AUC of 0.8014 using the XGBoost model. Nebbia et al.43 established a ML model using multiparameter MRI images to achieve preoperative prediction of microvascular infiltration (MVI) status. The researchers also compared the effects of extracting only from the tumor region, extracting only from the peritumor edge region and combining them. The result showed that preoperative MRI is feasible to predict MVI, and multiparameter MRI sequences are complementary in recognition.
The results of pathological examination depend to a certain extent on the selection of specimens and pathologist judgment. Using ML can not only reduce the error of results but also shorten the time to diagnosis. Lin et al.44 established a CNN model using multiphoton microscopic imaging of unstained specimens to judge the degree of HCC differentiation. Chen et al.45 established a CNN model using HE stained pathological images. The accuracy of the model in distinguishing benign and malignant HCC was 96.0%, and it predicted HCC mutated genes (including CTNNB1, FMN2, TP53 and ZFX4) from the images. Kiani et al.46 made a CNN model based on hematoxylin and eosin stained specimen images that effectively helped in the pathological differentiation of HCC and ICC. In addition, on the molecular level, ML training through mutations in related genes can also assist in HCC diagnosis. The research of Zhang et al.47 established a SVM model based on 11 selected genes that distinguished HCC, adjacent noncancerous tissues, and hepatitis cirrhosis.
Chen et al.48 explored the significance of HBV reverse transcriptase (RT) gene for HCC patients. They used four ML methods to establish HBV RT sequences to predict HCC. The results show that the random forest model based on 10 combined features had the best predictive performance, and the individual HCC risk score obtained by the random forest model distinguished HCC and HBV patients. Circulating tumor gene (ctDNA) detection makes it possible to detect tumors early and noninvasively and helps to match suitable targeted drugs. The study of Tao et al.49 established a random forest model that distinguished HCC and HBV patients by somatic copy number aberrations of ctDNA through low-depth whole-genome sequencing of plasma samples from HBV-related HCC patients and cancer-free HBV patients. The diagnosis of HCC is not limited to “diagnosis.” Diagnosis, including etiology and disease severity, greatly affects the treatment plan and prognosis. Especially in clinical practice, diseases are often atypical. The application of ML of HCC has a huge impact on the diagnosis and differential diagnosis of HCC (Table 2).13–15,17,18,29–32,37–49
Table 2Details of machine learning for the diagnosis of hepatocellular carcinoma
Author and year | Data type | Sample number | Machine learning model/algorithm | Results |
---|
Phan et al. 202013 | Clinical data | N: 6,052 (training set: 70%; test set: 30%) | Convolutional neural network | AUC: 0.886 |
Nam et al. 202014 | Clinical data | Training set: 424; validation set (independent external cohort): 316 | Deep neural network | c-index: 0.782 |
Sato et al. 201915 | Clinical data | N: 1,580 (training set: 80%; development set and test set: 20%) | SVM, gradient boosting, random forest, neural network, deep learning, and other algorithms | Gradient boosting model had the highest accuracy (87.34%) AUC: 0.94 |
Kim et al. 202117 | Clinical data | Training set: 6,051; validation set (external validation cohorts): (5,817 patients from Korean centers and 1,640 from Western centers) | GBM | c-index: 0.79 |
Wong et al. 202218 | Clinical data | N: 124,006 (training set: 70%; test set: 30%) | AdaBoost, decision tree and random forest | Accuracy of random forest (AUROC: 0.837) was stable |
Schmauch et al. 201929 | Imaging | Training set: 367; validation set: 177 | Deep learning | Weighted mean ROC-AUC scores of 0.891 |
Li et al. 202130 | Imaging | N: 226 (training set: 80%; test set: 20%) | SVM | AUC: 0.86 |
Brehar et al. 202031 | Imaging | N: 268 (training set: 66%; test set: 20%; validation set: 14%) | CNN, SVM, random forest, and AdaBoost | CNN was the best (accuracy of 91% with AUC of 95%) |
Jin et al. 202132 | Imaging | Training set: 262; validation set: 86; testing set: 86 | Deep learning | AUCs: 0.981, 0.942 and 0.900 in training, validation, and testing cohorts |
Ren et al. 202137 | Imaging | Training set: 149; test set: 38; validation set: 39 | SVM | AUC: 0.936 |
Yasaka et al. 201838 | Imaging | Training set: 460; test set: 100 | CNN | AUC: 0.92 |
Mokrane et al. 202039 | Imaging | Discovery set: 142; validation set: 36 | KNN, SVM, and random forest | AUC: 0.70 and 0.66 in discovery and validation cohorts |
Hamm et al. 201940 | Imaging | Training set: 434; test set: 60 | CNN | AUC: 0.992 |
Liu et al. 202141 | Imaging | N: 86 | SVM | AUC: 0.77 |
Mao et al. 202042 | Imaging | Training set: 237; test set: 60 | XGBoost | AUC: 0.8014 |
Nebbia et al. 202043 | Imaging | N: 99 | SVM | Highest AUC: 0.8669 (multiparametric MRI combination yield) |
Lin et al. 201944 | Pathology | N: 113 | CNN | Accuracy>90% |
Chen et al. 202045 | Pathology | Training set: 261; test set: 50; internal validation set: 155; external validation set: 101 | CNN | Accuracy: 96.0% |
Kiani et al. 202046 | Pathology | Training set: 70; test set: 80; validation set: 26 | CNN | Accuracy: 0.885 |
Zhang et al. 202047 | Gene | Training set: 1,333; test set: 336 | SVM | Sensitivity: 91.93%, specificity: 100%, and AUC: 0.9597 |
Chen et al. 202148 | Genes | Training set: 361; validation set: 183 | Random forest, SVM, KNN | Best predictive performances: random forest (AUC: 0.96; accuracy, 0.90) |
Tao et al. 202049 | Genes | Training set: 209; validation sets: 76/99 | Random forest | AUC>0.800 |
ML for the treatment of HCC
The preferred treatment for HCC is surgical resection, and R0 resection should be performed in patients who can undergo surgery. TACE or radiofrequency ablation (RFA) is recommended for nonresectable HCC patients, and targeted or immunotherapy and other systemic treatment schemes can be used for patients who cannot undergo the above treatment.3 In clinical practice, doctors may encounter some patients whose treatment methods are difficult to decide. For individual patients, there is only one choice, which therefore needs to be made carefully. Properly used, ML can help patients to choose treatment methods.
Choi et al.50 established a clinical decision support system based on 20 clinical indicators selected using a random forest model. The system recommended the initial treatment plan for HCC patients and predicted the overall survival of the corresponding treatment methods. Liu et al.51 established a radiomics model using ultrasound images of HCC patients to predict the efficacy of TACE. The model AUC was 0.93. It predicted progression-free survival of patients and optimized their treatment. On the basis of predicting the first TACE treatment response of HCC patients, Dong et al.52 used six ML models and compared them to select the most appropriate model. The results showed that the random forest model performed best and accurately predicted the early response to the first TACE treatment. With the development of targeted therapy and immunotherapy, the application of ML for the treatment may tend to the selection of targeted Immunologic drugs for HCC patients. ML will provide reference for patients to select suitable targeted drugs in the future (Table 3).50–52
Table 3Details of machine learning for the treatment of hepatocellular carcinoma
Author and year | Data type | Sample number | Machine learning model/algorithm | Results |
---|
Choi et al. 202050 | Clinical data | Training set: 813; validation set: 208 | Random forest | c-index: 0.725 (RFA/PEIT), 0.695 (resection), 0.803 (TACE), 0.676 (TACE + EBRT), 0.684 (sorafenib), 0.710 (supportive care), 0.959 (transplantation), 0.850 (other therapies) |
Liu et al. 202051 | Imaging | N: 419 (training and validation cohorts by a ratio of 2:1) | CNN | AUC: 0.93 |
Dong et al. 202152 | Clinical data& Imaging | N: 110 (training set: 80%; validation set: 20%) | XGBoost, decision tree, SVM, random forest, KNN, fully convolutional networks | Best performance: random forest (AUC: 0.802 accuracy: 0.784, sensitivity: 0.904, and specificity: 0.480) |
ML for the prognosis of HCC
Since the 21st century, HCC has been the fastest-growing cause of cancer-related death in the USA, and it is expected that HCC will become the third largest cause by 2030.53 The long-term prognosis of liver transplantation is better than that of hepatectomy, with a recurrence rate of 70% and a 10-year survival rate of 7–15%.54 Liver transplantation is an ideal surgical method for HCC patients, but it is still limited by a small number of donors and high medical costs. How to choose these two treatment methods for people with appropriate indications? Schoenberg et al.55 established a random forest model based on clinical data. The predictive value of the model for early disease-free survival was 0.788. The model divides the patients into high-risk and low-risk groups. The low-risk patients are suitable for liver resection, and the high-risk patients are considered suitable for liver transplantation, so as to guide the selection of treatment.
Ji et al.56 established a prediction model for the prognosis of patients with tumors ≤ 5 cm and no evidence of extrahepatic disease or large vessel invasion after resection. The model determined a critical value using eight clinical characteristics including age, race, AFP, tumor size, tumor number, vascular invasion, histological grade and fibrosis score, and divided the prognosis into low risk, medium risk, and high risk. The results showed that there was no significant difference in the prognosis of low-risk patients undergoing tumor resection or liver transplantation. The model provided a reference for patients as to whether they should undergo neoadjuvant therapy. Huang et al.57 also used the clinical data of patients after hepatectomy to establish a model, but their study compared multiple models (DeepSurv, XGBoost and Random Survival Forest) and found that XGBoost was the best one. They used a heat map to individualize the recurrence risk. The study also divided the prognostic variables of patients in more detail, according to time. Within 1 year after surgery, the importance of cancer thrombus was the highest. At 1 to 2 years after surgery, the number of tumors was the most important variable related to the prognosis of patients, followed by the type of resection, tumor thrombus, and tumor diameter. In the two periods of 2 to 3 years and 3 to 5 years, in addition to the number of tumors, HBV infection was a relatively important variable. Smoking was also associated with late recurrence. A model established by Jiang et al.58 using CT radiomics features not only predicted the MVI status of patients before surgery, but also judged the difference in recurrence-free survival of patients by grouping. Regarding RFA, an SVM model established by Liang et al.59 can effectively identify HCC patients with relatively high recurrence risk after ablation therapy, which is helpful for postoperative follow-up and management of patients.
There are also many studies that used clinical data, pathological information, radiomics characteristics and other data to establish ML models.60–65 They effectively predicted the prognosis of patients and provided great help in the selection of treatment methods, the requirements of postoperative review, and the avoidance of high-risk factors. For patients with Barcelona Clinic Liver Cancer stage B, the international guidelines recommend TACE. However, there is great heterogeneity in patients at that stage, and the efficacy of TACE is different. Lin et al.66 selected the clinical data of patients with BCLC stage B, and extracted five indicators including tumor size, tumor number, BCLC-B substage, AFP, and ALB to establish a random forest model. The model can predict the prognosis of patients after TACE treatment, and distinguish the middle-term HCC patients who are suitable for TACE. The CNN model established by Peng et al.67 also effectively predicted the efficacy of TACE. A model established by Jin et al.68 by extracting the features of enhanced CT effectively predicted the possibility of extrahepatic diffusion or vascular invasion of the patients after the initial TACE treatment (EVIT).
In terms of genes, many studies have explored the model of predicting the prognosis of HCC patients. Chaudhary et al.69 used deep learning for the first time to explore the difference in survival time of HCC patients. They established a model using RNA sequencing (RNA Seq), microRNA sequencing (miRNA Seq), and methylation data that reliably predicted the survival times of six different cohorts. Liu et al.70 selected immune genes with differences between normal and HCC. The model established with those genes predicted the 5-year survival HCC patients. Bedon et al.71 classified HCC patients with progression-free survival with methylation maps, and constructed a model. High-risk and low-risk patients with early cancer progression were classified. Prognosis is a common concern of patients and doctors. The extensive application of ML makes the prognosis more specific, and provides great help for follow-up guidance of patients (Table 4).55–59,66,67,69–71
Table 4Details of machine learning for the prognosis of hepatocellular carcinoma
Author and year | Data type | Sample number | Machine learning model/algorithm | Results |
---|
Schoenberg et al. 202055 | Clinical data | Training set: 127; test set: 53 | Random forest | AUC: 0.788 |
Ji et al. 202156 | Clinical data | Training/validation set: 1,899; test set: 879 | GBM | c-index: >0.72 |
Huang et al. 202157 | Clinical data | Training set: 5,928; internal validation set: 1,483; external validation set: 508 | DeepSurv, XGBoost, random survival forest | Best performance: XGBoost (c-index: 0.713) |
Jiang et al. 202158 | Clinical data & imaging | Training set: 324; validation set: 81 | XGBoost, 3D-CNN | AUROCs: training set 0.952 and 0.980; validation: 0.887 and 0.906 |
Liang et al. 201459 | Clinical data | N: 83 | SVM | AUC: 0.69 |
Lin et al. 202166 | Clinical data | Training set: 602; internal validation set: 301; external validation set: 343 | Random forest | c-index: 0.69, AUROC>0.71 |
Peng et al. 202067 | Imaging | Training set: 562; validation sets: 89/138 | CNN | AUC:>0.95 |
Chaudhary et al. 201869 | Gene | Training set: 360; validation sets (5 external datasets): 230/221/166/40/27 | Deep learning | c-index: 0.68 |
Liu et al. 202170 | Gene | N (3 databases): TCGA 365; ICGC 232; GSE14520 209 | Random forest | AUC:>0.7 |
Bedon et al. 202171 | Gene | Training set: 300; test set: 74 | Random forest | Accuracy: 0.80 |
Conclusion
The study of ML in HCC involves a variety of data such as patient clinical information, imaging information, pathological information, and gene loci. ML can provide guidance and help in the diagnosis of HCC, the selection of patient treatment methods, and prognosis prediction. Especially for noninvasive diagnosis. Its advantages include accuracy in processing images. ML can avoid the contraindications and complications of biopsy, as well as the possibility of tumor rupture and disseminated metastasis. Because of the different treatment methods of HCC and ICC, preoperative differentiation of HCC and ICC by ML can help preoperative assessment of whether surgery can be performed as well as the surgical procedure.
ML has brought great guiding significance to the diagnosis and treatment of HCC from many aspects. In particular, it is not based on subjective assessment and experience to determine the diagnosis and treatment method, but is based on actual data and accuracy to provide evidence. Currently, as described above, there are many types of data available for the application of ML of HCC, including basic clinical information (sex, hepatitis history, blood biochemical examination, and others), imaging data including ultrasound, CT, and MRI, pathology data, and gene data. Moreover, there are many models and algorithms that can be used in the application of ML for HCC. For example, random forest, SVM, deep learning, and so on. It is uncertain which model is suitable for the research problem, but ML models can be selected according to the type of research data. SVM, random forest, artificial neural network, boosting, and bagging algorithms are common models in ML, which are more suitable for the traditional “learning mode,” and so are more suitable for processing numerical data. While the essence of deep learning, including CNN, and others, is complex, along with the complexity of learning and training models, the algorithm are closer to human brain models. Deep learning may be more suitable for processing complex data types. However, there are shortcomings of ML. The learning process is still a black box. It is hard to understand its essence, which may have potential harm. The interpretability of AI is still a problem that needs to be solved. In addition, models are always based on a part of the population. Then the extensive application of the models is facing a huge test and needs to be constantly improved.
There are many types of ML algorithms, different data types, and research methods. However, researchers have been exploring suitable algorithms and models, and have achieved much. It is believed that with the continuous development of AI and ML, HCC-related research models of ML will also be improved and bring good news to HCC patients.
Abbreviations
- AFP:
alpha-fetoprotein
- AI:
artificial intelligence
- AUROC:
area under the receiver operating characteristic
- CNN:
convolutional neural network
- GBM:
gradient boosting machine
- HCC:
hepatocellular carcinoma
- ICC:
intrahepatic cholangiocarcinoma
- KNN:
k-nearest neighbor
- ML:
machine learning
- MVI:
microvascular infiltration
- RFA:
radiofrequency ablation
- ROI:
regions of interest
- SVM:
support vector machine
- TACE:
transcatheter arterial chemoembolization
Declarations
Funding
This work was supported by the Natural Science Foundation of Hunan Province (2022JJ30939) and The Science and Technology Innovation Leading Project for High-tech Industry of Hunan Province (2020SK2009).
Conflict of interest
The authors have no conflict of interests related to this publication.
Authors’ contributions
Study conception and design (SF, XY, CL), acquisition of data (SF, JW, LW, QQ, DC, HS, XL), analysis and interpretation of data (SF, JW, LW, QQ, DC, HS), drafting of the manuscript (SF), critical revision of the manuscript for important intellectual content (SF, JW, LW, QQ, DC, HS, XL, XY, CL), project administration (XY, CL), and study supervision (XL, XY, CL). All authors have made significant contributions to this study and have approved the final manuscript.