Introduction
According to 2020 global cancer statistics Hepatocellular carcinoma (HCC) is one of the deadliest cancers, causing more than 830,000 deaths worldwide each year.1 The occurrence of liver cancer is related to risk factors like hepatitis virus infection, excessive alcohol consumption, aflatoxin exposure, and genetic mutations.2–5 HCC is currently treated with surgery, chemotherapy, radiotherapy, targeted therapy, immunotherapy, and liver transplantation.6,7 Liver resection and transplantation are the primary treatment methods for patients with early-stage HCC.8,9 However, surgery carries a high risk of recurrence, and matching suitable organ donors is difficult.10 Therefore, exploring new treatment methods and screening for more effective prognostic markers are necessary and urgent to improve the prognosis of HCC patients.
Serine and glycine are essential amino acids in the human body that not only contribute to the formation of nucleic acids, lipids, amino acids, and coenzymes but also mediate numerous biological synthesis and signal transduction pathways.11–13 Previous studies have shown that the synthesis and metabolism of serine and glycine are pivotal in tumor onset and development.14,15 For example, they play a role in cancer cell nucleotide synthesis,16 lipid metabolism,17 and carbon source supplementing for cancer cell one-carbon metabolism.12 Understanding the metabolic properties of serine and glycine may help to comprehend tumor progression. Recent studies on cancer metabolomics have found that increasing the synthesis of serine and glycine stimulates the synthesis of macromolecules in cancer cells, neutralizes high levels of oxidative stress, modulates methylation and tRNA acetylation, and promotes the occurrence and development of tumors.16,18–21 Downregulation of the synthesis and metabolism of serine and glycine can effectively inhibit the proliferation of cancer cells.22,23 In summary, the synthesis and metabolism of serine and glycine are closely linked to the occurrence and development of tumors. However, the correlation between serine and glycine metabolism-related genes (SGMGs) and the prognosis of HCC, as well as their correlation with immune cells and the tumor immune microenvironment (TIME), is unclear and requires elucidation.
We developed and validated a novel prediction model based on SGMGs using bioinformatics. Additionally, we investigated the relationship between this model and tumor immune cell infiltration to effectively predict the overall survival (OS) rate of HCC patients and generate more accurate risk stratification management. Moreover, we used the TIDE score to analyze the immunotherapy effects of varying risk groups based on the risk model. Our findings can aid in the development of personalized immunotherapy for HCC patients.
Methods
Acquisition of SGMGs in HCC
We searched the LIHC dataset from The Cancer Genome Atlas (TCGA) database as the training cohort, which included clinical data and gene expression profiles from 424 (50 normal and 374 cancer) samples. In addition, we obtained the GSE76427 and GSE14520 datasets from the GEO database as external validation cohorts, which included mRNA expression profiles from 114 HCC samples. The clinical characteristics of patients in the data sets are detailed in Supplementary Table 1. The relevant literature yielded 24 SGMGs (Supplementary Table 2).14 Using the R package edgeR,24 we conducted differential analysis on cancer (n=375) and normal (n=32) samples in the training cohort (|logFC|>1.0, FDR<0.05) to identify differentially expressed genes (DEGs). Subsequently, we took the intersection of these DEGs and SGMGs to identify SGMGs that were differentially expressed in HCC.
Identification of different HCC subtypes based on SGMGs
Based on expression differences of SGMGs in HCC, we performed unsupervised clustering of the training cohort data using the ConsensusClusterPlus package, with 1000 repetitions. We conducted survival analysis on the clustered samples using the R package survival to compare the survival differences. Then, we performed differential analysis (|logFC|>1.0, FDR<0.05) on samples of different subtypes using the R package edgeR. Subsequently, we conducted KEGG and GO enrichment analyses on DEGs using the clusterProfiler R package and visualized the results with the enrichplot R package. Based on these DEGs, we constructed a PPI network in STRING, using interactions with a confidence score greater than 0.9 as the basis for constructing the PPI network.
Establishment of a prognostic model according to DEGs between different subtypes
Based on the DEGs in the PPI network, we established a prognostic model in the TCGA training cohort. Firstly, we identified genes that were significantly linked to survival through univariate Cox regression analysis. Then, we further narrowed the range of genes using the LASSO penalty method. During this process, we adjusted the optimal parameter λ through 10-fold cross-validation and calculated the optimal cutoff value using the “survminer” package. Subsequently, we performed multivariate Cox regression analysis on LASSO-selected genes to determine the final candidate genes and their risk coefficients. Finally, a risk score prognostic model was constructed.
We calculated the risk score value using the formula: Model=∑Coefficient(gene)*Expressionvalue(gene). Based on the median value, we assigned the training cohort into high- and low-risk groups. We then compared the survival rate differences using Kaplan-Meier survival curves, and assessed model sensitivity and specificity by ROC curves. These analyses were performed using the timeROC25 and the survival packages. Additionally, we calculated and presented risk score values and corresponding survival outcomes for each patient in the training and validation cohorts. We plotted expression differences of feature genes between risk groups.
Independent testing of prognostic model for patients
Univariate and multivariate Cox regression analyses were applied to assess independent prognostic value. Based on the prognostic model and clinicopathological features of HCC, a nomogram was generated using the survival R package, regplot R package, and rms R package. To dissect the deviation between the nomogram and the ideal model, calibration curves for 1, 3, and 5-year survival rates were generated using the rms package. ROC curves and decision curve analysis (DCA) curves were plotted to assess the predictive ability of the model.
Analysis of TIME differences
CIBERSORT was used to calculate immune infiltration levels (IILs) of immune cells in high- and low-risk groups, and the Wilcox test was used to evaluate differences in TIME between different risk groups. TIDE is a computational method that predicts the immunotherapy response by simulating the main mechanisms of tumor immune escape: high cytotoxic T lymphocyte (CTL) levels induce T cell dysfunction, while low CTL levels inhibit T cell infiltration.26 In this study, we predicted the response of tumor patients to immunotherapy by calculating the TIDE score. The Imvigor210 dataset included expression data, complete survival data, follow-up information, and immunotherapy response information for 348 cases of urothelial carcinoma downloaded from the Imvigor210CoreBiologies R package. GSE78220 was a melanoma dataset in which patients received anti-PD-1 checkpoint inhibition therapy. Patients in the Imvigor210 and GSE78220 cohorts were assigned into 2 groups according to their response to immunotherapy: response and non-response.
Tumor mutation analysis and heterogeneity analysis
The “maftools” package was used to assay and plot the mutation status of high- and low-risk groups for HCC SNV mutation data. The similarities and differences in mutation types, SNV class, and mutation rates between high- and low-risk groups were assessed, and the top ten genes with the highest mutation rates were selected. Additionally, waterfall plots were drawn to show the mutation status of feature genes.
Prediction of drugs related to feature genes in the model
To find new potential targets and more effective therapeutic drugs, we first downloaded drug/compound-related gene sets (http://dsigdb.tanlab.org/Downloads/DsigDB_All_detailed.txt ) from the DsigDB database (http://tanlab.ucdenver.edu/DsigDB ). Then, the clusterProfiler package and BiocFileCache package were used to perform drug prediction enrichment analysis on the feature genes in the model, and a network diagram and corresponding heat map of drugs and their enriched genes were plotted. Finally, potential small molecule compounds that were significantly associated with both sensitivity and prognostic genes were screened.
Results
Identification of differential SGMGs and different HCC subtypes
In the TCGA training cohort, we identified 4932 DEGs (3892 upregulated; 1040 downregulated) through differential analysis (Supplementary Table 3). Combining the results of literature retrieval, we identified 12 SGMGs that were differentially expressed in HCC by Venn analysis (Fig. 1A). Subsequently, we sorted the TCGA training cohort into 2 stable subtypes based on SGMG levels, with cluster1 containing 255 patients and cluster2 containing 87 patients (Fig. 1B). Survival analysis showed that cluster2 had a worse OS than cluster1 (Fig. 1C). After the patients in the GSE14520 validation cohort were divided into two clusters, the OS of cluster2 was also worse than that of cluster1 (Fig. 1D). In addition, compared to cluster2, we identified 1038 DEGs in cluster1. Enrichment analysis unraveled that DEGs were mainly enriched in KEGG pathways such as hormone metabolism, chemical carcinogenesis-DNA adducts by passive transmembrane transporters, and the cAMP signaling pathway (Fig. 1E). Besides, these DEGs were enriched in GO functions such as regulation of hormone levels, response to exogenous stimulus, passive transmembrane transporter activity, channel activity, and receptor ligand activity (Fig. 1F–G). According to the PPI network, these DEGs may have complex interactions (Fig. 1H). In conclusion, preliminary analysis revealed the role of SGMGs in HCC classification and the influence of SGMGs on HCC prognosis.
Prognostic model construction and validation
To further investigate the role of SGMGs in HCC prognosis, we developed an effective and clinically applicable robust risk feature model. Firstly, we used univariate Cox regression analysis in the TCGA training cohort to screen 109 survival-associated genes (threshold p<0.05). Then, we further screened out superior prognostic features from the 109 genes through LASSO Cox regression analysis (Fig. 2A–B). Finally, we used multivariate regression analysis to identify 12 crucial non-zero coefficient genes to construct a relevant risk score model. Among them, MMP1, S100A9, NPHS2, CDX2, SFRP2, TKTL1, and CHGA were identified as prognostic risk factors, while NCAN, ACE2, SLC5A7, TNNT3, and MYH3 were identified as prognostic protective factors (Fig. 2C). Notably, CHGA, NCAN, TKTL1, S100A9, and SLC5A7 seemed to be statistically insignificant (p>0.05), and we speculated that they may be associated with other markers and outcomes.
Regarding survival analysis, we computed risk scores using a formula. The cases in the TCGA training cohort as well as external validation cohorts GSE76427 and GSE14520 were assigned into high- and low-risk groups according to the median risk score. The Sankey diagram summarized the relationship between clustering and risk grouping (Fig. 3A). In different datasets, Kaplan-Meier survival curves demonstrated differences in survival rates between different risk groups. It could be observed that the low-risk group exhibited better OS in the training and validation cohorts (Fig. 3B–D). The average AUC values for 1, 3, and 5-year prognosis prediction in the TCGA cohort were 0.81, 0.75, and 0.81, respectively (Fig. 3E). The average AUC values for 1, 3, and 5-year survival rates were 0.88, 0.72, and 0.70 in GSE76427, and 0.58, 0.65, 0.67 in GSE14520 (Fig. 3F–G). Additionally, scatter plots showed the distribution of risk scores and survival status in both datasets (Fig. 3H–J). The expression of feature genes in the prognostic model was presented in the form of a heatmap. Among them, ACE2, MYH3, NCAN, TNNT3, and SLC5A7 were highly expressed in the low-risk group, while MMP1, S100A9, NPHS2, CDX2, SFRP2, TKTL1, and CHGA were highly expressed in the high-risk group (Fig. 3K–L). These analyses demonstrated that the risk score was capable of reliably differentiating tumor prognosis differences.
Independent evaluation of prognostic model
To investigate the independence of the prognostic model in predicting patient survival, we performed univariate and multivariate Cox regression analyses based on the clinical factors and risk scores of each sample in the TCGA dataset (Fig. 4A–B). We found that the risk score could serve as an independent prognostic factor in predicting the prognosis of HCC patients. We combined the risk score with prognostic clinical features and constructed a nomogram to predict 1, 3, and 5-year OS (Fig. 4C). The calibration chart indicated that our constructed nomogram performed well compared to the ideal model (Fig. 4D). DCA revealed that the nomogram had favorable predictive performance (Fig. 4E). ROC curves presented that the joint model had the highest AUC value (Fig. 4F), demonstrating that the joint model had the best prognostic predictive efficiency. Taken together, these results indicated that we had successfully constructed a reliable prognostic model with high predictive value.
Evaluation of TIME and immune cell IILs in different risk groups
Considering the relationship between SGMGs and immunity, we applied CIBERSORT to assess the IILs of each HCC patient. The results showed that B cell naïve, B cell memory, Monocytes, Dendritic cells activated, Mast cells resting, and Neutrophils all had higher IILs in the low-risk group, whereas T cell regulatory, Macrophages M0, and Dendritic cells resting had higher IILs in the high-risk group (Fig. 5A). Furthermore, to predict the likelihood of HCC patients responding to immunotherapy, we found that the low-risk group had more immune-responsive patients and a higher Response Percentage (Fig. 5B). In addition, given that patients in varying groups showed varying IILs, we speculated that there may be some differences in the efficacy of ICI treatment between the two groups. Since transcriptome data for HCC patients receiving ICIs were unavailable, data from other malignancies were utilized to support this theory. In order to confirm the link between risk score and immunological response, we examined the IMvigor210 dataset. We discovered that the Response Percentage was greater in the low-risk group (Fig. 5C). At the same time, OS was noticeably better in the low-risk group (Fig. 5D). Additionally, we confirmed the association between risk score and immune response using the GSE78220 dataset, and discovered that Response Percentage was greater in the low-risk group (Fig. 5E). At the same time, OS was notably better in the low-risk group (Fig. 5F). These results suggested that immunotherapy was more effective in low-risk patients. This could explain why HCC patients with low risk had better survival than those with high risk.
Comparing mutation status between high-risk and low-risk groups
Cancer is thought to occur most frequently as a result of gene alterations. To dissect the potential mechanism of using risk score features to evaluate patient prognosis, we studied the gene mutations of patients in different risk groups. Missense mutations had the highest proportion in both groups, followed by frameshift mutations. SNP mutation was the most prevalent type, with C>T and C>A accounting for 50.4% and 46.2% of SNP mutations in high- and low-risk groups, respectively (Fig. 6A–B). The stacked bar chart showed the top 10 mutated genes, with TP53, TTN, MUC16, CTNNB1, RYR2, ALB, PCLO, ARID1A, APOB, and OBSCN in the high-risk group, and TTN, CTNNB1, TP53, MUC16, ALB, PCLO, ABCA13, XIRP2, AXIN1, and APOB in the low-risk group (Fig. 6A–B). For model genes, CHGA, NCAN, and MYH3 were the top 3 mutated genes in the high-risk group, while SLC5A7, ACE2, and CDX2 were the top 3 mutated genes in the low-risk group (Fig. 6C–D).
Drug sensitivity prediction
Furthermore, using the DSigDB database, we selected some potential drugs that were significantly related to both sensitivity and prognosis genes. Streptozocin, norfloxacin, and hydrocotarnine were the top three significantly enriched small molecule compounds (Fig. 7A). In addition, enrichment analysis of target genes showed that streptozocin was enriched in three feature genes (ACE2, NPHS2, and MMP1), while norfloxacin and hydrocotarnine were enriched in two feature genes (Fig. 7B–C). MMP1 was the feature gene that was commonly enriched by all three small molecule compounds, indicating that MMP1 may be a key gene for the action of these potential small molecule compounds.
Discussion
Serine and glycine metabolism is a non-glutamine amino acid metabolism component that is important in energy metabolism. This metabolic pathway has been shown to be critical for cancer cell survival by participating in energy production, biosynthesis, and signal transduction during tumor occurrence and development.13,15,27 As HCC is a molecularly heterogeneous malignancy, its molecular characteristics are linked to tumor biology behavior.28 Therefore, identifying molecular biomarkers linked to serine and glycine metabolism in HCC is of great significance.
Currently, although many researchers have established numerous prognostic models for HCC,29–31 there is no prognostic model related to SGMGs to systematically predict and evaluate the overall prognosis of HCC patients. Therefore, to reveal the potential function and unique prognostic value of SGMGs in HCC, we examined public datasets through bioinformatics analysis to provide new insights for future research.
We successfully built a 12-gene risk score model, in which ACE2, MYH3, NCAN, TNNT3, and SLC5A7 were upregulated in the low-risk group, while MMP1, S100A9, NPHS2, CDX2, SFRP2, TKTL1, and CHGA were upregulated in the high-risk group of HCC. MMP1 is a downstream member of TCONS_00012883 and has been shown to mediate the progression of colorectal cancer with DDX3, YY1, and TCONS_00012883.32 In addition, MMP1 promotes metastasis of ovarian cancer and esophageal squamous cell carcinoma.33,34 S100A9 is a widely studied biomarker in recent years and a pivotal mediator of cancer progression.35–37 It is worth noting that clinical studies have found that downregulation of S100A9 expression in HCC can significantly inhibit cancer cell growth.38 Therefore, targeting S100A9 is a possible strategy for HCC patients. CDX2 is a critical prognostic biomarker for patients with stage I and II colon cancer.39 CDX2 is expressed positively in HCC,40–42 congruous with the findings of this study. Overexpression of SFRP2, TKTL1, and CHGA is implicated in a dismal prognosis in various cancers. For example, Lai et al.’s study43 found that increased SFRP2 was substantially linked to invasive features of urothelial carcinoma and could lead to a poor prognosis. TKTL1 has been shown to be overexpressed in malignant conjunctival tumors and is associated with clinical outcomes (e.g., tumor relapse rate).44 In addition, Allison B Weisbrod et al. ’s study45 found that changes in CHGA expression are related to the phenotype of invasive pancreatic neuroendocrine tumors and can be a useful prognostic marker. Combining the findings of this study, the risk score composed of these genes can be an independent prognosticator for predicting the prognosis of HCC patients.
Furthermore, this study found marked differences in the immune landscape of patients in different risk groups. This may suggest that different expression levels of SGMGs could impact the TIME of HCC patients. He et al.46 found that enhancing extracellular levels of serine significantly inhibited macrophage and neutrophil function. Su et al.47 found that serine and glycine synthesis switch macrophage phenotypes to express immune checkpoint molecule PD-L1 by inducing IL-1β production. Similarly, in this study, we found that macrophages M0 had higher IILs in the high-risk group, while neutrophils had higher IILs in the low-risk group. This finding is not entirely consistent with previous studies. Combining the other results of this study, we found that in addition to macrophages and neutrophils, regulatory T cells, resting dendritic cells, naive B cells, memory B cells, monocytes, and resting mast cells also had different IILs between the two groups. Therefore, we speculated that differences in TIME between varying risk groups may be the result of the combined action of multiple immune cells. However, the molecular mechanisms underlying this require further investigation. Thus, serine and glycine metabolism not only participate in tumorigenesis but also have a connection to TIME.
We screened for potential drugs significantly associated with sensitivity and prognostic genes based on the DSigDB database. Although in vitro and in vivo experiments were not performed to validate results, enrichment analysis revealed that streptozocin was significantly enriched with three feature genes (ACE2, NPHS2, and MMP1) in the model, while norfloxacin and hydrocotarnine were significantly enriched with two feature genes each. Therefore, we speculated that these small molecule drugs may have possible therapeutic effects on HCC. Streptozocin can effectively inhibit tumor cell proliferation.48 Current studies have shown that streptozocin can be widely used in the clinical treatment of pancreatic cancer,49,50 but its role in HCC is still unknown.
We comprehensively assessed RNA sequencing data from HCC patient cohorts from TCGA and GEO, established a risk prognostic model for HCC based on SGMGs, investigated the correlation between SGMGs and the immune landscape of HCC, and predicted the possibility of immunotherapy for patients in different risk groups. Univariate and multivariate Cox regression analyses reported that the risk score model was an independent prognostic factor for HCC. In conclusion, this model had favorable predictive ability for the prognosis of HCC patients, providing the possibility for further improving personalized treatment for HCC patients. However, there are still limitations. Firstly, clinical data collected by bioinformatics analysis are limited and retrospective, and the specificity expression and function of SGMGs in HCC should be validated in prospective designs. Furthermore, the relationship between prognostic model-related genes and HCC is predicted in the bioinformatics analysis stage, which requires further molecular biology experiments for validation. Further research is needed to confirm the efficacy of the potential drugs screened in this study.
Supporting information
Supplementary Table 1
The clinicopathological characteristics of patients in TCGA and GEO.
(XLSX)
Supplementary Table 2
Details of 24 SGMGs.
(XLSX)
Supplementary Table 3
DEGs in TCGA training set
(XLSX)
Abbreviations
- CTL:
cytotoxic T lymphocyte
- DCA:
decision curve analysis
- DEGs:
differentially expressed genes
- GEO:
gene expression omnibus
- HCC:
hepatocellular carcinoma
- IILs:
immune infiltration levels
- OS:
overall survival
- TCGA:
The Cancer Genome Atlas
- TIME:
tumor immune microenvironment
- SGMGs:
serine and glycine metabolism-related genes
Declarations
Ethical statement
This study was carried out in accordance with the Declaration of Helsinki. The individual consent for this retrospective analysis was waived.
Data sharing statement
The data and materials in the current study are available from the corresponding author on reasonable request.
Funding
This work was supported by Postgraduate Science Foundation Project of Zhejiang Chinese Medical University (No. Y202248712).
Conflict of interest
The authors have no conflict of interests related to this publication.
Authors’ contributions
Conception and design: XC. Administrative support: SL. Collection and assembly of data: FX, ZW. Data analysis and interpretation: HC, SL. Manuscript writing: XC, FX. Final approval of manuscript: All authors