Original Article
Open Access
Chen-Xia Lu, Chuan-Xi Tian, Yi-Bo Jiao, Hui Zhu, Hai-Yan Yu, Zi-Xin Shu, Ling-Han Zhang, Jia Zhang, Lan Wang, Qi Hao, Wen-Bin Zou, Ming-Zhong Xiao, Cheng-Hai Liu, Qiu-Yang He, Bee Luan Khoo, Xiao-Dong Li
Published online April 8, 2026
[
Html ]
[
PDF ]
[
Google Scholar ]
[ Cite ]
Views: 1043
Journal of Clinical and Translational Hepatology.
doi:10.14218/JCTH.2025.00631
Abstract
Metabolic dysfunction-associated fatty liver disease (MAFLD) represents a predominant cause of chronic liver disease, underscoring the demand for accessible, non-invasive diagnostic
[...] Read more.
Metabolic dysfunction-associated fatty liver disease (MAFLD) represents a predominant cause of chronic liver disease, underscoring the demand for accessible, non-invasive diagnostic tools. Tongue diagnosis in Traditional Chinese Medicine provides a distinctive perspective on systemic health, though it remains largely subjective. This study aimed to develop an interpretable multimodal deep learning model for MAFLD screening by integrating quantitative tongue image features with routine clinical data.
From 904 screened candidates, 477 subjects (157 healthy, 320 MAFLD) were included and randomly allocated to training, validation, and test sets in an 8:1:1 ratio. All participants underwent standardized tongue imaging (International Commission on Illumination L*a*b color features) and comprehensive clinical evaluation. We constructed a dual-stream deep learning model, combining a ConvNeXt-Tiny network for tongue images and a multilayer perceptron for clinical variables. Feature fusion was achieved via a Dynamic Affine Feature Transformation module, and the model was trained using weighted cross-entropy loss.
MAFLD patients showed significant metabolic abnormalities compared to healthy controls. A progressive decrease in tongue yellowness (b* value) was observed with advancing fibrosis. On an independent test set (n = 48), the multimodal model achieved 97.92% accuracy, Quadratic Weighted Kappa of 0.9538, and 96.88% sensitivity, and 100% specificity, outperforming single-modality and serological models. Interpretability analyses confirmed the model’s focus on clinically relevant tongue regions and key metabolic drivers.
We developed an accurate and interpretable multimodal model that synergizes tongue image features with metabolic indicators for MAFLD screening. This approach presents a promising, low-cost tool potentially well-suited for resource-limited settings.
Full article