Introduction
Chronic hepatitis B (CHB)-related liver fibrosis has been recognized as a hallmark of progression from mild hepatitis to decompensation manifestations.1 Inaccurate fibrosis staging may delay antiviral therapy and increase the healthcare burden.2 It is well known that biopsy is treated as the reference standard for fibrosis staging, but it suffers from problems such as high medical cost, invasiveness, sampling errors, and is often accompanied by complications including infection and hemorrhage.3–7 Biomarkers such as aspartate transaminase-to-platelet ratio index (APRI) and fibrosis index based on four factors (FIB-4) remain controversial for the diagnosis of liver fibrosis. Shear wave elastography (SWE) is emerging as a noninvasive tool for fibrosis detection using the biomechanical properties of tissue by applying an external mechanical wave to generate deformation of the liver and then capturing the velocity of shear wave propagation, which is directly associated to stiffness.
Several guidelines have indicated that liver stiffness (LS) assessed by SWE can serve as an alternative to liver biopsy in patients with chronic viral hepatitis.8–10 Most of the clinical practice guidelines recommend liver LS measurement for noninvasive fibrosis staging.11 However, there are various potential confounding factors (e.g. liver inflammation), that may influence the LS measurements and lead to false positive LS values.12–19 Nakano et al. found that the severity of liver fibrosis measured by SWE was influenced by hepatic necroinflammation in chronic hepatitis patients but not in cirrhotic patients.13 Ren et al. showed that LS on SWE and transient elastography significantly correlated with hepatic inflammation grade on HBV patients who had no or mild (F1) fibrosis.19 The confounding factors hindered elastography from becoming an ideal alternative to liver biopsy for fibrosis assessment. On the other hand, accurate assessment of liver inflammation in patients who do not have prominent liver fibrosis is important for guiding therapy and improving the prognosis of those patients. Therefore, a model to simultaneously predict the stages of liver fibrosis and inflammation is of clinical significance.
Recently, several studies that applied machine learning algorithms to two-dimensional (2D)-SWE images for chronic liver disease diagnosis reported satisfactory performance.20–23 Wang et al. described a deep learning radiomics model for assessing liver fibrosis in CHB in a multicenter study that significantly improved diagnostic performance.20 Chen et al. employed four existing classification methods (support vector machine, naïve Bayes, random forest, and k-nearest neighbors) to build a decision-support system to improve the diagnostic performance for fibrosis staging.21 In this study, we developed a dual-task convolutional neural network (DtCNN) for assessment of fibrosis staging and inflammation activity based on 2D-SWE images.
Methods
Patients
The proposed DtCNN model was developed and validated in a retrospective cohort including a total of 966 consecutive CHB patients who underwent liver biopsy and 2D-SWE between March 2015 and November 2018. The study was in conducted following the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the institutional review board of Ruijin Hospital. The requirement for informed consent was waived for this retrospective study. An additional 180 consecutive patients with known chronic liver disease or who were suspected of having chronic liver disease between December 2019 and April 2021 were prospectively included in this study. All patients in the prospective study were asked to provide informed consent. The inclusion criteria were (1) HBsAg seropositivity for more than 6 months; (2) liver fibrosis and inflammation stages indicating liver biopsy assessment; (3) antiviral treatment-naïve patients. The exclusion criteria were (1) the existence of other hepatitis virus coinfection or autoimmune liver disease, (2) the presence of severe extrahepatic diseases, or (3) pregnancy. A total of 532 CHB patients in the retrospective cohort and 180 in the prospective cohort were included in the analysis. The demographic and clinical data of the patients are shown in Table 1.
Table 1Baseline characteristics of the overall study cohort
| Retrospective cohort
| Prospective cohort
|
---|
All patients (n=532) | Training cohort (n=372) | Testing cohort (n=160) | p-value | All patients (n=180) |
---|
Demographics | | | | | |
Age, years | 59.3±11.6 | 59.3±11.4 | 60.3±11.8 | 0.45 | 58.6±11.4 |
Male | 290 (54.5) | 206 (55.4) | 84 (52.5) | 0.15 | 153(85.0) |
Anthropometry | | | | | |
Body mass index (kg/m2) | 23.9±3.2 | 23.6±3.1 | 24±3.3 | 0.41 | 23.8±3.1 |
Liver biochemistry | | | | | |
ALT (U/L) | 40.4±55.4 | 41.5±62.4 | 43.2±64.8 | 0.23 | 38.5±48.7 |
AST (U/L) | 43.6±59.1 | 45.6±67.1 | 47.6±69.1 | 0.37 | 42.2±53.8 |
Albumin (mg/L) | 39.8±6 | 39.8±5.7 | 39.6±6.5 | 0.53 | 39.8±5.6 |
Platelets (×109/L) | 143.6±62.6 | 142.7±64 | 139.8±64.6 | 0.19 | 145.4±60.8 |
Tbil (µmolL) | 19.5±14.1 | 18.4±7.8 | 19.3±15.7 | 0.31 | 19.6±13.0 |
Dbil (µmol/L) | 3.9±3.1 | 3.9±2.6 | 3.7±2.9 | 0.22 | 4.1±3.2 |
Cr (µmol/L) | 76.1±17.4 | 76.5±18.2 | 76.9±15.7 | 0.44 | 76.0±18.0 |
PT (s) | 12.3±1.3 | 12.1±1.4 | 12.5±1.3 | 0.11 | 12.2±1.3 |
INR | 1.0±0.1 | 1.0±0.1 | 1.1±0.1 | 0.25 | 1.0±0.1 |
Scores | | | | | |
APRI | 1.2±2.4 | 1.3±2.7 | 1.4±2.8 | 0.24 | 1.1±2.1 |
FIB-4 | 3.7±3.8 | 3.9±4.2 | 4.1±4.2 | 0.33 | 3.5±3.6 |
Fibrosis stage | | | | | |
F0-1 | 186 (35.0) | 130 (35.0) | 56 (35) | | 22 (12.2) |
F2 | 146 (27.4) | 100 (26.9) | 46 (28.8) | | 36 (20) |
F3 | 118 (22.2) | 82 (22.0) | 36 (22.5) | | 27 (15.0) |
F4 | 82 (15.4) | 60 (16.1) | 22 (13.7) | | 95 (52.8) |
Inflammation | | | | | |
A0 | 165 (31.0) | 115 (30.9) | 50 (31.3) | | 53 (29.4) |
A1 | 173 (32.5) | 121 (32.5) | 52 (32.5) | | 88 (48.9) |
A2 | 132 (24.8) | 92 (24.7) | 40 (25.0) | | 23 (12.8) |
A3 | 62 (11.7) | 44 (11.8) | 18 (11.2) | | 16 (8.9) |
Serum samples were collected after the patients had fasted overnight (8 h) for measurement of alanine aminotransferase (ALT), aspartate transaminase (AST), platelet, total cholesterol, triglycerides, alkaline phosphatase, serum glucose, insulin levels, and serum uric acid. The two biomarker models24 were:
APRI=AST/AST upper limit of normalplatelet count (109/L)×100
and FIB-4=age (years)×AST (U/L)platelet count (109/L)×ALT (U/L)1/2.
SWE examination
SWE was performed with an Re7 ultrasound system (Mindray Medical International Co., Ltd, China, SC5-1 probe) by an ultrasonographer with over 15 years of experience in abdominal ultrasonography. The patients fasted for about 4 hours prior to the scan and lay in the supine position with their right arm maximally raised and abducted during scans. The SWE region of interest was acquired at a location deeper than 2 cm from the hepatic capsule to avoid reverberation artifacts and was kept away from large vessels. The patients were asked to hold their breath for approximately 7 seconds after quiet breathing. A rectangular electronic region of interest (ROI) was shown on the best static SWE image, in which a circular Q-Box ROI with a diameter of about 2 cm and free of large vessels was set for analysis. Five consecutive 2D-SWE images were obtained for each patient, and the median value was reported as the LS. Special attention was paid to avoid focal lesions, vessels, biliary tracts, or artifacts from nearby lung gas or cardiac movement. Measurement failure was defined as Inability to obtain any color-coded elasticity images after five trials was considered measurement failure.
Histopathological analysis
Liver biopsy was performed within 1 week of the 2D-SWE examination. The biopsies were performed in the right liver lobe with percutaneous ultrasonographic guidance and processed via formalin fixation, paraffin embedding, hematoxylin-eosin staining, and Masson staining. A pathologist with 12 years of experience in hepatobiliary pathology who are blinded to the laboratory results and LS measurements performed the histopathologic analysis of all liver specimens. Fibrosis staging and inflammation activity were assessed by the Scheuer scoring system.25 The stage of fibrosis was assessed as F0, no fibrosis; F1, periportal fibrosis; F2, few fibrotic septa, F3, numerous septa; and F4, cirrhosis. Activity was graded as A0, no activity; A1, mild activity; A2, moderate activity; and A3, severe activity. Staging of ≥F2, ≥F3, and F4 indicated significant fibrosis, advanced fibrosis and cirrhosis, respectively.
DtCNN model
As shown in Figure 1, the DtCNN model contained two pathways to stage fibrosis (upper) and inflammation staging (below). Each pathway included five convolutional and pooling layers and each layer contained a 3 × 3 convolution kernel and a rectified linear unit (ReLu) activation function. A 3 × 3 convolution layer with a stride of 2 was connected immediately after the first convolution layer. A batch normalization layer was used after each convolution for faster convergence. Concatenation connections between the two pathways were applied to provide joint features to the two tasks and to improve the prediction performances. Three fully-connected layers were connected after the convolution layers for binary classification. Dual-task learning was implemented by introducing several cross-feature units between the two pathways. An optimal linear combination of hidden feature layers was learned by the cross-feature units. Given two activation maps xAij, xBij from the first layer for both tasks, in which A corresponds to the task of fibrosis staging and B corresponds to the task of inflammation staging, a linear combination matrix was learned to connect the first layer and the second layer (x˜Aij, x˜Bij). Specifically, at the location (i, j) in the feature map, the linear combination can be parameterized using α:
[x˜Aijx˜Bij]=[αAAαBAαABαBB][xAijxBij].
The cross-feature unit was inserted into the two pathways after each convolution. That helped to regularize both tasks by enforcing the shared representations by combining both feature maps. By using this network architecture, the DtCNN model can learn an optimal combination of shared and task-specific representations in a supervised way.
The patients were randomly divided into a training cohort and a testing cohort. 2D-SWE images in the training cohort were augmented through flipping, mirroring and rotating to reduce the overfitting of the proposed DtCNN model. All five images acquired from each patient were employed in this model. As the training was performed on a per-image basis, the class with the highest average probability was chosen to be the most likely score from multiple images. Their corresponding histological results (F scores and G scores) were used as the training labels. Three DtCNN models were trained and tested for binary classification of fibrosis stages in different subgroups, i.e. ≥F2 (significant fibrosis), ≥F3 (advanced fibrosis), and F4 (cirrhosis). Similarly, the inflammatory activity was classified as A0, ≥A1, ≥A2 and A3. For comparison, two independent CNN models were tested: the fibrosis model and inflammation model. The diagnostic performances of using LS measurements for binary classifications were also compared.
The DtCNN model was performed on a computer equipped with an Intel® Xeon® Processor E5-2640, 16 GB of memory, and 4 NVIDIA V100 graphic processing units with 16 gigabytes of memory each.
Statistical analysis
Variables were reported as means±SD. Mann-Whitney U-tests and χ2-tests were performed for the comparative analysis. Receiver operating characteristic (ROC) curves were plotted to evaluate the discriminating fibrosis stages as ≥F2 (significant fibrosis), ≥F3 (advanced fibrosis), and F4 (cirrhosis). The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value and negative predictive value were calculated to evaluate the performance of the DtCNN model. Comparisons between the AUCs were performed with the DeLong test.26 All statistical analyses were performed using SPSS v.20.0 (IBM Corp., Armonk, NY, USA). P value <0.05 was considered statistically significant.
Results
A typical case of liver fibrosis and inflammation is shown in Figure 2. The LS values increased significantly with the progression of liver fibrosis and inflammation (overall p < 0.001). The patients with inflammation activities >2 had higher 2D-SWE LS measurement than those < 2. It can be observed that both fibrosis and inflammation influenced the LS measurement, especially for patients with higher fibrosis stages or severe inflammation activities. The diagnostic performance of the DtCNN in classifying liver fibrosis and inflammation using 2D-SWE images in the retrospective cohort is shown in Figure 3 and Table 2. For classification of significant fibrosis (≥F2), the AUC of the liver stiffness measurement (LSM) was 0.78 [95% confidence interval (CI): 0.75–0.82), sensitivity was 0.68, and specificity was 0.81. For classification of advanced fibrosis (≥F3), LSM had an AUC of 0.77 (95% CI: 0.72–0.81), sensitivity of 0.60, and specificity of 0.85. For classification of liver cirrhosis (F4), LSM had an AUC of 0.76 (95% CI: 0.72–0.80), sensitivity of 0.66, and specificity of 0.82. The fibrosis model outperformed the AUCs for different Scheuer fibrosis stages, for significant fibrosis the AUC=0.82 (95% CI: 0.80–0.85), advanced fibrosis AUC=0.84 (95% CI: 0.81–0.87), liver cirrhosis AUC=0.82 (95% CI: 0.80–0.84)). The DtCNN clearly outperformed the other classification models of Scheuer fibrosis stage [(significant fibrosis AUC=0.89 (95% CI: 0.87–0.92), advanced fibrosis AUC=0.87 (95% CI: 0.84–0.90), liver cirrhosis AUC=0.85 (95% CI: 0.81–0.89)]. DtCNN-based inflammation staging had AUCs of 0.82 (95% CI: 0.78–0.86) for ≥A1, 0.88 (95% CI: 0.85–0.90) for ≥A2, and 0.78 (95% CI: 0.75–0.81) for ≥A3, which were significantly higher than the single-task groups (Table 3).
Table 2Comparison of diagnostic performance of the models in assessment of significant fibrosis, advanced fibrosis, and cirrhosis using retrospective data
Classification | Model | AUC | Accuracy | Sensitivity | Specificity |
---|
Significant fibrosis (≥F2) | DtCNN model | 0.89 | 0.81 | 0.82 | 0.80 |
| Fibrosis model | 0.82 | 0.78 | 0.59 | 0.90 |
| LSM | 0.78 | 0.76 | 0.68 | 0.81 |
| APRI | 0.72 | 0.73 | 0.62 | 0.80 |
| FIB-4 | 0.66 | 0.66 | 0.56 | 0.72 |
Advanced fibrosis (≥F3) | DtCNN model | 0.87 | 0.80 | 0.81 | 0.79 |
| Fibrosis model | 0.84 | 0.78 | 0.59 | 0.90 |
| LSM | 0.77 | 0.75 | 0.60 | 0.85 |
| APRI | 0.72 | 0.71 | 0.60 | 0.78 |
| FIB-4 | 0.65 | 0.66 | 0.49 | 0.76 |
Cirrhosis (F4) | DtCNN Model | 0.85 | 0.78 | 0.62 | 0.89 |
| Fibrosis Model | 0.82 | 0.77 | 0.79 | 0.76 |
| LSM | 0.76 | 0.76 | 0.66 | 0.82 |
| APRI | 0.75 | 0.75 | 0.61 | 0.84 |
| FIB-4 | 0.66 | 0.66 | 0.43 | 0.80 |
Table 3Comparison of diagnostic performance of the models in assessment of liver inflammation on retrospective data
Classification | Model | AUC | Accuracy | Sensitivity | Specificity |
---|
≥A1 | DtCNN model | 0.82 | 0.75 | 0.69 | 0.80 |
| Inflammation Model | 0.72 | 0.70 | 0.54 | 0.80 |
| LSM | 0.66 | 0.67 | 0.53 | 0.75 |
| APRI | 0.63 | 0.65 | 0.46 | 0.76 |
| FIB-4 | 0.59 | 0.63 | 0.26 | 0.85 |
≥A2 | DtCNN model | 0.88 | 0.80 | 0.82 | 0.79 |
| Inflammation model | 0.82 | 0.77 | 0.59 | 0.89 |
| LSM | 0.71 | 0.72 | 0.55 | 0.82 |
| APRI | 0.76 | 0.74 | 0.57 | 0.85 |
| FIB-4 | 0.67 | 0.67 | 0.53 | 0.76 |
A3 | DtCNN model | 0.78 | 0.74 | 0.67 | 0.78 |
| Inflammation model | 0.70 | 0.70 | 0.59 | 0.76 |
| LSM | 0.63 | 0.65 | 0.48 | 0.76 |
| APRI | 0.64 | 0.67 | 0.48 | 0.78 |
| FIB-4 | 0.62 | 0.63 | 0.29 | 0.85 |
For the prospective dataset, the DtCNN had the best diagnostic performance compared with the other classification methods. The AUCs were 0.88 (95% CI: 0.83–0.91) for significant fibrosis, 0.83 (95% CI: 0.80–0.85) for advanced fibrosis, and 0.88 (95% CI: 0.82–0.93) for liver cirrhosis (Fig. 4 and Supplementary Table 1). DtCNN-based inflammation staging had AUCs of 0.83 (95% CI: 0.79–0.86) for ≥A1, 0.88 (95% CI: 0.86–0.91) for ≥A2, and 0.77 (95% CI: 0.75–0.79) for ≥A3, which were significantly higher than those for the single-task groups (Fig. 4 and Supplementary Table 2). The main findings in the prospective validation dataset were consistent with those obtained with the retrospective data. The dual-task DtCNN model increased the diagnostic performance for both fibrosis and inflammation staging.
The 2D-SWE results for Scheuer fibrosis and inflammation stages are shown in Figure 5. With the DtCNN model, the diagnostic concordance rates of fibrosis staging were 77.3% for F1, 51.3% for F2, 51.4% for F3, and 55.7% for F4; and the rates for inflammation staging were 58.9% for A0, 60.3% for A1, 48.9% for A2, and 33.0% for A3. When both Scheuer fibrosis and inflammation stages were taken into consideration in the proposed DtCNN model, the discordance was significantly less than that observed with the single-task models.
Discussion
This study explored the influence of liver inflammation on fibrosis staging using 2D-SWE. The dual-task predictive model significantly improved staging accuracy. The DtCNN model achieved better AUCs than those of the single-task (i.e. fibrosis or inflammation) models. Although the diagnostic performance of SWE has been well validated for liver fibrosis staging, the effect of hepatic inflammation on tissue stiffness has not been evaluated with the existence of liver fibrosis. To the best of our knowledge, this is the first deep learning model to assess liver fibrosis and inflammation simultaneously.
The LS measurement-based fibrosis staging outperformed APRI and FIB-4 for all groups; and with the introduction of the deep learning model, the diagnostic performance further improved in all study groups. Several studies have reported that the diagnostic performance of 2D-SWE was better than that of transient elastography and point SWE in assessing liver fibrosis.27–29 Although the measurement of stiffness and thresholding is the most widely used method of fibrosis staging, deep learning has advantages because it can extract texture and pattern.12–14 Previous studies have reported that the hepatic surface nodularity,30 coarseness of the hepatic parenchyma,31 and caudate lobe hypertrophy32 were significant predictors of liver cirrhosis. Deep learning can capture the heterogeneity of intensity and texture from the images.
The assessment of significant and advanced fibrosis was significantly improved by the DtCNN compared with single-task models, which means the bias in 2D-SWE measurement caused by liver inflammation was significant. The findings are consistent with previous studies16–18 that reported a significant association between the extent of inflammation and 2D-SWE-based stiffness and that both measurements should be evaluated and considered in the predictive model. It is known that the velocity of shear waves can be affected by inflammation because tissues are composed of viscoelastic materials.33 However, the extent of inflammation increases the viscosity of liver tissue.34 Our DtCNN model performed two tasks simultaneously and shared the features of the layers for joint learning. It thus overcame the interference of liver inflammation on fibrosis staging in CHB patients, which is essential for elastography diagnosis. In addition to liver inflammation, many other confounding factors for fibrosis staging exist (e.g. steatosis, siderosis, obesity, cholestasis, and ascites). It has been reported that in patients with moderate to severe steatosis, the thicker abdominal walls attenuated ultrasound reflection and affected the LS measurements.35 Fatty liver was associated with a significant decrease in the AUC of 2D-SWE.35 Our DtCNN model can easily be extended to a multitask model that includes more confounding factors in the prediction, thus further improving the diagnostic performance in real scenarios. In addition, CNN models have been reported to performance well in the assessment of liver fibrosis using computed tomography (CT)36–38 and magnetic resonance imaging (MRI).39,40 With mild modification, the DtCNN model can also be used with those imaging modalities.
Several limitations exist in this study. First, the study was validated only in a single center. It is necessary to test the performance of the DtCNN using US scanners and systems from different manufacturers. Standard algorithms are also necessary to reduce the variability of US images from different centers. Unlike Wang et al.,12 we developed an automatic tool to select the circular ROI inside the Q-Box as the input of the network, as the quality of those areas were well controlled by the system. This technique can be easily applied to existing US scanners without specific operations for the selection of the scanning planes. Second, the distribution of fibrosis and inflammation stages was uneven, which may bias the training procedure. Resampling algorithms should be introduced into the data augmentation step to improve performance.
Conclusions
The proposed DtCNN improved the diagnostic performance of existing fibrosis staging models by introducing liver inflammation into the model. DtCNN provided more accurate assessments of liver fibrosis and inflammation stages than serum biomarkers in patients with CHB, which supports its potential for clinical application.
Supporting information
Supplementary Table 1
Comparison of diagnostic performance of the models in assessment of significant fibrosis, advanced fibrosis and cirrhosis using prospective data.
(DOCX)
Supplementary Table 2
Comparison of diagnostic performance of the models in the assessment of liver inflammation using prospective data.
(DOCX)
Abbreviations
- ALT:
alanine aminotransferase
- APRI:
aspartate transaminase-to-platelet ratio index
- AST:
aspartate transaminase
- AUC:
area under the curve
- CHB:
chronic hepatitis B
- DtCNN:
dual-task convolutional neural network
- FIB-4:
fibrosis index based on four factors
- LS:
liver stiffness
- LSM:
liver stiffness measurement
- ROC:
receiver operating characteristic
- ROI:
region of interest
- SWE:
shear wave elastography
Declarations
Ethical statement
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Shanghai Jiao Tong University School of Medicine.
Data sharing statement
The data supporting the findings of this study can be requested from the corresponding author.
Funding
This study was funded by the National Natural Science Foundation of China (No. 62001120) and the Shanghai Sailing Program (No. 20YF1402400).
Conflict of interest
The authors have no conflict of interests related to this publication.
Authors’ contributions
Conceptualization (CW, RL, XR), methodology (CW, LZ, YL), software (JL), validation (CW, SX), formal analysis (CW, XH), investigation (YL), resources (FY, XR), writing-original draft preparation (CW), writing-review and editing (XR, RL), visualization (CW), supervision (RL, XR), project administration (CW, LZ), and funding acquisition (XR). All authors have read and agreed to the published version of the manuscript.