Introduction
With the growing epidemic of obesity and type 2 diabetes mellitus, nonalcoholic fatty liver disease (NAFLD) has become one of the most common chronic liver diseases worldwide.1,2 It is reported that the global prevalence of NAFLD is approximately 25%, and the prevalence in the USA has risen from 20.0% to 31.9% in the past 3 decades.3,4 There is a similar estimated prevalence of 29.62% in Asia.5 NAFLD may progress through various fibrosis stages and has the potential to develop into cirrhosis and hepatocellular carcinoma. Liver fibrosis is closely related to a poor prognosis and is considered a strong prognostic predictor for NAFLD.6–8 Therefore, identifying patients with advanced fibrosis for stratification and early intervention is critical for individualized management of NAFLD.
Liver stiffness measurement (LSM) and controlled attenuation parameter (CAP) using transient elastography are regarded as reliable methods for the diagnosis of liver fibrosis and steatosis in NAFLD.9,10 Liver biopsy, the “gold standard” for diagnosing liver fibrosis, is impractical for wide usage in NAFLD due to its invasiveness, sampling variability, poor acceptability, and the high prevalence of NAFLD.11,12 These limitations highlight the need for reliable noninvasive fibrosis scores. Currently, commonly used noninvasive fibrosis models include the aspartate aminotransferase (AST) to platelet ratio index (APRI),13 body mass index (BMI)-AST/alanine aminotransferase (ALT) ratio and diabetes score (BARD),14 fibrosis 4 index (FIB-4),15 and NAFLD fibrosis score (NFS).16 The formulas for calculating these non-invasive scoring systems are shown in Supplementary Table 1. These models have been tested and perform well in predicting fibrosis in NAFLD.17–19
Metabolic-associated fatty liver disease (MAFLD) is a new concept, proposed in 2020 to revise the term NAFLD.20 Unlike NAFLD, MAFLD does not need to exclude alcohol intake or any other liver diseases. MAFLD will be diagnosed if the patient has hepatic steatosis and any of the following three conditions: overweight/obesity, type 2 diabetes mellitus, or at least two metabolic abnormalities in nonobese individuals.21 Considering the significant difference between MAFLD and NAFLD, the applicability of traditional noninvasive fibrosis scores requires re-evaluation. This study aimed to verify the performance of different noninvasive scores in predicting advanced fibrosis in MAFLD.
Methods
Study population
The study data were obtained from the latest National Health and Nutrition Examination Surveys (NHANES) 2017-2018, which is an unbiased survey dataset collected by the National Center for Health Statistics of the Centers for Disease Control and Prevention of the USA. The NHANES database has been frequently used for the study of fatty liver disease.22–24 Currently, NHANES 2017-2018 is the only public database with FibroScan® liver fibrosis assessment, laboratory, and examination data. All NHANES datasets are anonymous and free to access online (https://www.cdc.gov/nchs/nhanes/index.htm ).
Additionally, patients with biopsy-proven MAFLD were enrolled from the First Affiliated Hospital of Fujian Medical University in China and Singapore General Hospital in Singapore as an Asian validation cohort. As the hepatitis B virus infection rate is high in Asia, especially among Asian patients who undergo liver biopsy, MAFLD patients combined with hepatitis B were excluded in the Asian cohort. The study protocol was approved by the Ethics Committee of The First Affiliated Hospital of Fujian Medical University and Singapore General Hospital, conforming to the ethical guidelines of the Declaration of Helsinki. All patients provided written informed consent for the use of their data in research studies, such as this one.
Definition of MAFLD and fibrosis
MAFLD was diagnosed based on the updated international expert consensus statement on MAFLD from 2020.21 In the NHANES cohort, hepatic steatosis was measured by FibroScan®, with a criterion of CAP ≥248 dB/m.25 Advanced fibrosis was defined as fibrosis grade ≥F3 (LSM ≥8.2 kPa).26 Participants with a fasting time <3 h, <10 complete LSMs, or LSM interquartile range/median LSM ≥30% were considered as unsuccessful measurements and excluded.
All patients in the Asian cohort underwent percutaneous liver biopsy under ultrasonic guidance. When more than 5% of hepatocytes presented steatosis, fatty liver was diagnosed. Advanced fibrosis was defined as stage 3 or 4, according to the Metavir fibrosis stage.27
Statistical analysis
The quantitative variables were expressed as mean±standard deviation or median (interquartile range) and compared by Student’s t-test or Mann-Whitney U-test. The qualitative variables were expressed as counts (percentages) and compared using the χ2 test. The receiver operating characteristic (ROC) curve was used to evaluate the performances of noninvasive models. The optimal cutoffs were chosen based on Youden’s index. Statistical analyses were conducted using the SPSS software version 22.0 (IBM Corp., Armonk, NY, USA) and MedCalc software version 20.0 (MedCalc Software Ltd, Ostend, Belgium). A p-value <0.05 was considered statistically significant.
Results
Baseline characteristics of participants
The NHANES 2017-2018 dataset contained 9,254 participants. After excluding 3,776 cases with missing data and 405 cases with ineligible FibroScan® data, a total of 5,073 participants were eligible for final analysis (Fig. 1). Among them, a total of 2,622 (51.69%) participants met the criteria for MAFLD. Furthermore, a total of 293 patients with MAFLD were enrolled from The First Affiliated Hospital of Fujian Medical University in China and Singapore General Hospital in Singapore between 2005 to 2021 as an Asian cohort. A total of 356 (13.58%) participants of the NHANES cohort and 86 (29.35%) patients of the Asian cohort had advanced fibrosis (Fig. 1). Patients in the Asian cohort had a lower level of BMI, a higher prevalence of diabetes mellitus and high liver enzymes (all with a p-value <0.05; Table 1). Baseline characteristics of patients from China and Singapore in the Asian cohort are shown in Supplementary Table 2.
Table 1Baseline characteristics of the patients with MAFLD
| NHANES cohort (n=2,622) | Asian cohort (n=293) | P-value |
---|
Age (years) | 50.70±18.36 | 49.47±13.49 | 0.264 |
Male, n (%) | 1,388 (52.94) | 157 (53.58) | 0.833 |
BMI (kg/m2) | 32.47±6.83 | 29.64±6.89 | <0.001 |
Diabetes mellitus, n (%) | 706 (26.93) | 161 (54.95) | <0.001 |
Hypertension, n (%) | 1,304 (49.73) | 132 (45.05) | 0.190 |
Platelet (×109/L) | 248.62±65.91 | 245.33±83.78 | 0.433 |
Albumin (g/dL) | 4.10 (3.80, 4.30) | 4.16 (3.80, 4.40) | 0.001 |
ALT (U/L) | 20.0 (15.0, 30.0) | 74.0 (40.0, 111.0) | <0.001 |
AST (U/L) | 20.0 (16.0, 25.0) | 52.0 (33.5, 75.5) | <0.001 |
TBIL (µmol/L) | 6.8 (5.1, 8.6) | 13.6 (10.0, 19.0) | <0.001 |
GGT (U/L) | 24.0 (17.0, 37.0) | 82.0 (43.5, 137.5) | <0.001 |
Triglyceride (mmol/L) | 1.45 (1.01, 2.12) | 1.67 (1.23, 2.42) | 0.247 |
HDL-C (mmol/L) | 1.22 (1.03, 1.42) | 1.51 (1.12, 2.00) | 0.025 |
Glycohemoglobin (%) | 6.03±1.21 | 7.68±1.65 | <0.001 |
hs-CRP (mg/L) | 2.52 (1.20, 5.28) | 2.36 (0.82, 6.13) | 0.913 |
HOMA-IR | 3.79 (2.43, 6.38) | 4.54 (2.78, 6.20) | 0.825 |
Performances of APRI, BARD, FIB-4, and NFS in predicting advanced fibrosis in the NHANES cohort
The ROC curves were used to evaluate the performances of traditional noninvasive fibrosis scoring systems for predicting advanced fibrosis in the NHANES cohort (Fig. 2A). NFS had the largest AUROC (0.679; 95% CI: 0.648–0.709), followed by APRI (0.616; 95% CI: 0.583–0.650), FIB-4 (0.601; 95% CI: 0.569–0.63371), and BARD (0.589; 95% CI: 0.556–0.621). The optimal cutoff values of the four noninvasive models for predicting advanced fibrosis and the verification of previously reported cutoffs are shown in Table 2. The results showed the best cutoffs of NFS, APRI, FIB-4, and BARD for diagnosing advanced fibrosis in the NHANES cohort were 0.159, 0.3, 1.02, and 3, respectively. The thresholds for all models, except BARD, were lower than previously reported values.
Table 2Comparison of the performance among NFS, APRI, FIB-4, and BARD in the NHANES cohort
| Cutoffs | AUROC | Accuracy (%) | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | PLR | NLR | DOR | Youden’s index |
---|
NFS | −1.455 | | 42.5 | 84.8 | 35.9 | 17.2 | 93.8 | 1.32 | 0.42 | 3.14 | 0.207 |
| 0.159 | 0.679 | 72.4 | 51.7 | 75.7 | 25.1 | 90.9 | 2.13 | 0.64 | 3.33 | 0.274 |
| 0.676 | | 79.1 | 37.6 | 85.6 | 29.1 | 89.7 | 2.62 | 0.73 | 3.59 | 0.233 |
APRI | 0.3 | 0.616 | 77.6 | 36.5 | 84.0 | 26.4 | 89.4 | 2.29 | 0.76 | 3.01 | 0.205 |
| 0.5 | | 85.2 | 14.3 | 96.3 | 37.8 | 87.7 | 3.86 | 0.89 | 4.34 | 0.106 |
| 1.5 | | 86.6 | 2.3 | 99.9 | 80.0 | 86.7 | 25.46 | 0.98 | 25.98 | 0.022 |
FIB-4 | 1.02 | 0.601 | 58.0 | 58.4 | 57.9 | 17.9 | 89.9 | 1.39 | 0.72 | 1.93 | 0.163 |
| 1.30 | | 68.5 | 37.6 | 73.4 | 18.2 | 88.2 | 1.41 | 0.85 | 1.66 | 0.110 |
| 1.45 | | 73.0 | 32.9 | 79.3 | 20.0 | 88.3 | 1.59 | 0.85 | 1.87 | 0.122 |
| 2.67 | | 86.1 | 9.3 | 98.2 | 44.6 | 87.3 | 5.12 | 0.92 | 5.57 | 0.075 |
| 3.25 | | 86.9 | 6.7 | 99.5 | 68.6 | 87.2 | 13.89 | 0.94 | 14.78 | 0.063 |
BARD | 2 | | 48.0 | 63.8 | 45.5 | 15.5 | 88.9 | 1.17 | 0.80 | 1.46 | 0.093 |
| 3 | 0.589 | 79.8 | 29.2 | 87.7 | 27.2 | 88.7 | 2.37 | 0.81 | 2.93 | 0.169 |
With the newly established cutoffs, the accuracy of the four models ranged from 58.0% to 79.8% (Table 2). The positive likelihood ratio (PLR) and negative likelihood ratio (NLR) of the four models with the new thresholds ranged from 1.39–2.37 and 0.64–0.81, and diagnostic odds ratios did not exceed 3.5 (Table 2). These scoring systems all had high negative predictive values (NPVs) (>88%), but the positive predictive values (PPVs) were far from ideal (17.9–27.2%). By applying the previously reported cutoff value of NFS for predicting advanced fibrosis (0.676), the sensitivity, specificity, PLR, and NLR were 37.6%, 85.6%, 2.62, and 0.73, respectively. The performances of the other three scoring systems were also not sufficiently satisfactory (Table 2).
The pairwise comparison of the four noninvasive scores in the NHANES cohort is shown in Supplementary Table 3. The results suggested NFS had the best predictive performance and was statistically significantly better when compared to the other three (NFS vs. APRI, p=0.001; NFS vs. BARD, p<0.001; NFS vs. FIB-4, p<0.001).
Performances of APRI, BARD, FIB-4, and NFS in predicting advanced fibrosis in the Asian cohort
Figure 2B shows the ROC curves of the four noninvasive fibrosis scores when applied to the Asian cohort. The AUROC of NFS was still the largest (0.699; 95% CI: 0.639–0.747), followed by FIB-4, APRI, and BARD (0.683, 0.625, and 0.615, respectively; Table 3). The optimal cutoffs of APRI and FIB-4 in the Asian cohort were the same or very close to that in the NHANES cohort (0.3 vs. 0.3 and 1.02 vs. 1.21, respectively). However, the best cutoffs of NFS and BARD were lower than those in the NHANES cohort (−0.372 vs. 0.159 and 2 vs. 3, respectively). The accuracy of the four models ranged from 49.2.0% to 72.0%, which was not sufficiently good.
Table 3Comparison of the performance among NFS, APRI, FIB-4, and BARD in the Asian cohort
| Cutoffs | AUROC | Accuracy (%) | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | PLR | NLR | DOR | Youden’s index |
---|
NFS | −1.455 | | 61.7 | 67.4 | 59.4 | 40.8 | 81.5 | 1.66 | 0.55 | 3.02 | 0.269 |
| −0.372 | 0.699 | 72.0 | 53.5 | 79.7 | 52.3 | 80.5 | 2.64 | 0.58 | 4.55 | 0.332 |
| 0.676 | | 71.2 | 14.7 | 94.7 | 57.7 | 73.4 | 3.28 | 0.87 | 3.77 | 0.121 |
FIB-4 | 1.21 | 0.683 | 62.1 | 75.6 | 56.5 | 41.9 | 84.8 | 1.74 | 0.43 | 4.05 | 0.321 |
| 1.30 | | 61.4 | 67.4 | 58.9 | 40.6 | 81.3 | 1.64 | 0.55 | 2.98 | 0.264 |
| 1.45 | | 65.9 | 60.5 | 68.1 | 44.1 | 80.6 | 1.90 | 0.58 | 3.28 | 0.297 |
| 2.67 | | 70.6 | 26.7 | 88.9 | 50.0 | 74.5 | 2.41 | 0.82 | 2.94 | 0.151 |
| 3.25 | | 70.3 | 18.6 | 91.8 | 48.5 | 73.1 | 2.27 | 0.89 | 2.55 | 0.116 |
APRI | 0.3 | 0.625 | 49.2 | 90.7 | 31.9 | 35.6 | 89.2 | 1.33 | 0.29 | 4.59 | 0.226 |
| 0.5 | | 52.9 | 65.1 | 47.8 | 34.1 | 76.7 | 1.25 | 0.73 | 1.71 | 0.130 |
| 1.5 | | 69.6 | 17.4 | 91.3 | 45.5 | 72.7 | 2.01 | 0.90 | 2.23 | 0.097 |
BARD | 2 | 0.615 | 55.2 | 67.4 | 50.2 | 36.0 | 78.8 | 1.36 | 0.65 | 2.09 | 0.177 |
In the Asian cohort, NFS also had the largest AUROC, which was better than APRI and BARD with a statistically significant difference (NFS vs. APRI, p=0.046; NFS vs. BARD, p=0.021; Supplementary Table 4). The AUROC of FIB-4 was better in the Asian cohort than that in the NHANES cohort (0.683 vs. 0.601, p=0.030; Supplementary Table 5). The predictive capabilities of NFS and FIB-4 were not significantly different in the Asian cohort which had high liver enzymes (0.699 vs. 0.683, p=0.519).
Discussion
The main finding of this study was that NFS is more reliable for predicting advanced fibrosis in patients with MAFLD. Overall, the performances of the four noninvasive scoring systems in MAFLD are not as good as previously reported for NAFLD.
Conventional noninvasive scoring systems calculated from readily available clinical and laboratory parameters are widely used for the assessment of advanced fibrosis in chronic liver disease.28–30 The results of this study suggested that the NFS performed better than the other three non-invasive models in assessing advanced fibrosis for patients with MAFLD. This is probably because NFS includes many metabolism-related parameters, such as BMI, impaired fasting glucose, and diabetes. However, it is also very complex and inconvenient in clinical practice. A novel, simpler, and more accurate noninvasive fibrosis scoring system is urgently required.
FIB-4 was initially exploited to assess fibrosis in patients with human immunodeficiency virus/hepatitis C virus.15 Although FIB-4 did not perform well in the NHANES cohort, its performance was better in the Asian cohort. This may be a result of the increased liver enzymes and lower BMI among patients in the Asian cohort, because ALT and AST are crucial components of the calculation of FIB-4. Additionally, the generally lower BMI of Asians may not highlight the accuracy of NFS so well, as compared to the NHANES cohort. Whereas FIB-4 may be more accurate as BMI is not included. FIB-4 is easier to calculate than NFS because it includes only four clinical indicators. Therefore, FIB-4 can be an alternative choice for MAFLD with high liver enzymes when NFS is unavailable.
The APRI score only includes the two parameters of AST and platelet count, and the BARD has no more than four variables. The APRI and BARD scores are simple to calculate and easy to acquire in clinical practice. APRI and BARD were originally developed to identify fibrosis in patients with hepatitis C and nondiabetic NAFLD.13,14 However, their performance in predicting advanced fibrosis in patients with MAFLD is not satisfactory. The poor performance of BARD might be caused by the partial duplication of the BARD scoring variables and the MAFLD diagnostic variables.
It is worth mentioning that there are some differences between patients in the Asian cohort and patients in the NHANES cohort, like a higher prevalence of diabetes, a lower BMI, and high liver enzymes. The Asian cohort is composed of populations from China and Singapore but the NHANES cohort is mainly composed of Caucasians from the USA. Moreover, different from the population-based survey of the NHANES cohort, the increased liver enzyme was the main reason precipitating consultation in the biopsy-proven Asian cohort. These differences may explain why the cutoffs of NFS and BARD in the Asian cohort were lower than those in the NHANES cohort. This result also suggested that different races and regions may require different thresholds to distinguish advanced fibrosis in MAFLD.
This study is the first large-sample study using FibroScan® and liver biopsy to evaluate the utility of conventional noninvasive fibrosis scoring systems in MAFLD. However, it is necessary to acknowledge the limitations of this study. First, the diagnoses of hepatic steatosis and fibrosis in the NHANES cohort were based on FibroScan® rather than the “gold-standard”, liver biopsy. This is because the study data were derived from the latest NHANES, which was a population-based survey and liver biopsy was not possible to be performed in the health examination cohort. Therefore, we validated the results in a biopsy-proven MAFLD population, which supported the findings based on the NHANES cohort. Second, the dataset used in this study is mainly composed of Caucasians in the USA and a small part of Asians, and it is unclear whether the results apply to other cohorts. The findings require further verification in more regions and races.
In conclusion, NFS is better for predicting advanced fibrosis in MAFLD. FIB-4 can be an alternative choice for MAFLD with high liver enzymes when NFS is unavailable. Novel efficient non-invasive fibrosis scoring systems are highly required for patients with MAFLD.
Supporting information
Supplementary Table 1
The four noninvasive scoring systems for detecting fibrosis.
(DOCX)
Supplementary Table 2
Baseline characteristics of patients from China and Singapore in the Asian cohort.
(DOCX)
Supplementary Table 3
Pairwise comparison of ROC curves of different non-invasive scoring systems in the NHANES cohort.
(DOCX)
Supplementary Table 4
Pairwise comparison of ROC curves of different non-invasive scoring systems in the Asian cohort.
(DOCX)
Supplementary Table 5
Comparison of ROC curves for different non-invasive scoring systems between the NHANES and Asian cohort.
(DOCX)
Abbreviations
- ALT:
alanine aminotransferase
- APRI:
AST to platelet ratio index
- AST:
aspartate aminotransferase
- AUROC:
area under the receiver operating characteristic curve
- BARD:
BMI-AST/ALT ratio and diabetes score
- BMI:
body mass index
- CAP:
controlled attenuation parameter
- FIB-4:
fibrosis-4 index
- LSM:
liver stiffness measurement
- MAFLD:
metabolic-associated fatty liver disease
- NAFLD:
nonalcoholic fatty liver disease
- NFS:
NAFLD fibrosis score
- NHANES:
National Health and Nutrition Examination Survey
- NLR:
negative likelihood ratio
- NPV:
negative predictive value
- PLR:
positive likelihood ratio
- PPV:
positive predictive value
- ROC:
receiver operating characteristic
Declarations
Data sharing statement
All data are available within the submitted article and its supplementary materials.
Funding
This work was supported by the Fujian Province Health Education Joint Project (No. 2019-WJ-16) and the Fujian Province Health Technology Project (No. 2020CXA040).
Conflict of interest
Goh GB and SL have been editorial board members of Journal of Clinical and Translational Hepatology since 2018 and 2021 respectively. The other authors have no conflict of interests related to this publication.
Authors’ contributions
Study concept and design (YZ, SL), acquisition of data (XC, GBG, MW), analysis and interpretation of data (XC, JH, YW), drafting of the manuscript (XC), critical revision of the manuscript (YZ, SL, GBG, RK), and study supervision (YZ, SL). All authors read and approved the final version of the manuscript.