Introduction
Globally, hepatocellular carcinoma (HCC) is one of the leading causes of cancer-related mortality. According to the World Health Organization’s estimation, about one million patients will die from HCC in 2030.1,2 Sorafenib and lenvatinib are the standard of therapy for patients with advanced stage HCC, however, the median overall survival is only 13 months.3,4 Although new treatment algorithms and drugs have been constantly developed and tested during the past decades, the 5-year overall survival (OS) rate of HCC patients is as low as 18%.2 Thus, it is urgently required to explore novel targets for HCC treatment.
The immune system can play vital roles in tumor initiation and progression, including for HCC.5 Various studies have demonstrated that aberrantly expressed immune-related genes (IRGs) are related to a high risk of HCC development and a poor clinical outcome.6–8 Abnormal expression of the immune gene programmed death-ligand 1 (PD-L1) was found to be associated with vascular formation in HCC patients.7 An IRG expression pattern in cirrhosis patients has been reported to be correlated with the risk of HCC occurrence. Mice with chronic liver inflammation administrated with nintedanib or aspirin and clopidogrel lost IRG expression pattern and exhibited an attenuation in tumor size.6 Hence, systematic analysis and characterization of the IRGs is of great importance.
Immunotherapy has shed new light on HCC treatment. In a phase 2 trial, nivolumab (PD-1 inhibitor) achieved a median survival of 15.6 months in HCC patients who had been treated with sorafenib.9 Pembrolizumab, another PD-1 inhibitor, also showed similar results in a phase 2 trial.10 However, phrase 3 trials of pembrolizumab as second-line treatment did not demonstrate a longer OS or progression-free survival compared to placebo, indicating the urgent need for accurate biomarkers to predicate the treatment response and prognosis of HCC.1
In the present study, based on the IRGs obtained from the Immunology Database and Analysis Portal database (ImmPort; https://www.immport.org/shared/genelists ),11 we aimed to construct an immune-related prognosis signature (IRPS) with data retrieved from The Cancer Genome Atlas (TCGA) dataset (https://portal.gdc.cancer.gov/ ). The relationship of the prognosis signature with clinicopathological characteristics and immune cell infiltration were then explored. Further, to assess its predictive accuracy and effectiveness, the prognosis signature was validated with the data acquired from the International Cancer Genome Consortium (ICGC) database (https://dcc.icgc.org/releases/current/Projects ).
Methods
Data preprocessing
The messenger RNA (mRNA) expression profiles and clinical data of HCC patients were retrieved from TCGA. The processed RNA-Sep FPKM data of a total of 374 HCC tissues and 50 normal samples were obtained. Another RNA-seq dataset, with 240 HCC tissues and 202 normal samples, was retrieved from ICGC and used as validation cohort for the prognosis signature. Only patients with complete survival information and OS time more than 30 days were selected for further analysis.
In total 2,498 IRGs, containing 17 immune categories according to different molecular functions, were derived from the ImmPort database, while 318 transcription factors (TFs) were obtained from the Cistrome Cancer database (http://cistrome.org/CistromeCancer/CancerTarget/ ).12
All of our data were directly acquired from open-access public databases, and this study was carried out strictly in accordance with the publishing guidelines provided by TCGA and ICGC. Thus, ethical approval was not necessary.
Construction of a TF-immunogene regulatory network
To detect differentially expressed IRGs (DE-IRGs) and TFs in normal and HCC samples, Wilcoxon test method was used in R software (version 3.6.1, https://www.r-project.org/ ). The significant cut-off values were set as log2-foldchange>1 and false discovery rate (FDR)<0.05. Heatmaps and volcano plots were generated with pheatmap package in R. To identify the prognostic value of DE-IRGs, univariate Cox analysis was performed with survival package in R. Only those with p value <0.01 were considered as prognostic IRGs. To assess the regulation of TFs on IRGs, the prognosis-associated TFs (p-value <0.01) were screened out using univariate Cox analysis. The correlation analysis between prognosis-related TFs and prognostic IRGs were performed using Pearson’s test, and the calculations are done using cor.test in R. The significant cut-off values were set as p-value <0.001 and correlation coefficient >0.6. Cytoscape software was employed to build and visualize the TF-IRG regulatory network.13
Establishment of an IRPS for HCC
To establish a prognostic signature, the relationship between prognostic IRG expression and OS was evaluated by Least Absolute Shrinkage and Selection Operator (LASSO) and multivariate Cox regression analyses using the survival and glmnet packages in R. The prognostic signature was exhibited as risk score = (coefficientmRNA1×expression of mRNA1)+ (coefficientmRNA2×expression of mRNA2)+(coefficientmRNA3×expression of mRNA3)+…+ (coefficientmRNAn×expression of mRNAn). The eight-gene based risk scores for HCC patients were calculated via using the survminer R package to find median risk-score as cut-off value, and then the patients were allocated to high- and low-risk groups. The Kaplan-Meier (K-M) survival analysis was implemented to compare the differences in OS rates between the high- and low-risk groups by using the survival package in R. A scatter plot and interactive distribution-based scatter plot for evaluating the performance of the signature were generated to illustrate the gene expression heatmap and K-M survival curves. The receiver operating characteristic (ROC) curve was generated from the survival ROC package in R.
Independence evaluation of the IRPS
To evaluate the independent predictive power of the IRPS based on risk scores, HCC patients with complete demographic and clinical information, such as age, gender, tumor stage, and tumor grade, were included in subsequent analyses. Both univariate and multivariate Cox regression analyses were carried out by a forward stepwise method. A p-value <0.05 was regarded as statistically significant. The results are presented as forest_plots, which were generated via the survival package in R.
Correlation analysis between the IRPS and clinicopathological characteristics
To assess the correlation of prognostic signature and clinicopathological characteristics such as age at diagnosis, TNM stage, and tumor grade, the Chi-squared test was conducted via the beeswarm package in R.
Association analysis between the IRPS and immune cell infiltration
The Tumor Immune Estimation Resource (TIMER; http://timer.cistrome.org/ ), an important database for the systemic analysis of tumor-infiltrating immune cells (e.g., B cells, neutrophils, CD4+ T cells, CD8+ T cells, macrophages, and dendritic cells), was employed to evaluate the association of prognostic signature with immune cells infiltration. The immune cell infiltration data of HCC patients were obtained from TIMER database, and the association between the IRPS and immune cell infiltration was determined in R.
Validation of the IRPS
The validation cohort was retrieved from the ICGC database to validate the predictive values of the IRPS. The same formula was employed to determine the risk scores, and patients were assigned to high- and low-risk groups according to the same cut-off point of the TCGA cohort. Both K-M and ROC curves were generated by the methods described above. Independence evaluation and correlation analysis between IRPS and clinicopathological characteristics were also performed as described above.
External validation of the expression of the IRGs involved in IRPS
The mRNA expression data of the IRGs involved in IRPS were retrieved from both TCGA and ICGC datasets, and then validated using the Oncomine database (https://www.oncomine.org/resource/main.html ). The protein expression levels of the IRPS were also confirmed through the Human Protein Atlas (HPA) database (http://www.proteinatlas.org ).
Gene set enrichment analyses (GSEA)
To disclose the biological information of the IRPS, GSEA were conducted to analyze the enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways based on the C2 in TCGA cohort.14 A p-value <0.05 and FDR <0.05 were deemed as statistically significant.
Establishment of a predictive nomogram
A predictive nomogram was established by including all independent prognostic factors, in order to visualize the personal risk score and survival rate. The graph was generated with the rms package in R according to the multivariate Cox regression data of each independent assessment.
Statistical analysis
All statistical analyses were carried out with R. The relationship between the IRPS and OS were evaluated via log-rank test with survival package in R. The survival curves were generated via the survminer package, while the ROC curves were constructed with the survialROC package. The association analysis between the IRPS and clinicopathological characteristics of HCC was conducted by the beeswarm package in R.
Results
Identification of IRGs with prognostic value in HCC
In total, 329 IRGs (including 62 downregulated and 267 upregulated genes) were identified as differentially expressed genes in HCC samples compared with normal samples. Both heatmap and volcano plots showeds the distribution of DE-IRGs between HCC and normal tissues (Supplementary Fig. 1A, C). Then, the 329 IRGs were evaluated with univariate Cox regression analysis to determine prognostic characteristics. Finally, 62 genes were identified as candidate prognostic IRGs, which are shown in Figure 1.
Construction of TF-immunogene regulatory network
To discover underlying molecular mechanisms of these candidate prognostic IRGs, we explored the interaction between each gene. A total of 118 TFs were differentially expressed between HCC and normal tissues (Supplementary Fig. 1B, D). A regulatory network was built upon these 118 TFs and 62 prognostic IRGs. A correlation score of >0.6 was set as the cut-off value. The constructed TF-immunogene regulatory network revealed the interaction among these IRGs (Fig. 2).
Establishment of the IRPS for HCC prognosis
According to the univariate Cox regression results of the TCGA cohort, LASSO-penalized Cox analysis identified eight genes to establish the IRPS, including retinol binding protein 2 (RBP2), microtubule-associated protein tau (MAPT), baculoviral IAP repeat-containing 5 (BIRC5), roundabout guidance receptor 1 (ROBO1), fidgetin like 2 (FIGNL2), interleukin 17D (IL17D), secreted phosphoprotein 1 (SPP1), and stanniocalcin 2 (STC2). The risk scores were calculated as follows: 0.0132×expression of RBP2+0.1397×expression of MAPT+0.0202×expression of BIRC5+0.0512×expression of ROBO1+0.2357×expression of FIGNL2+0.0560×expression of IL17D+0.0001×expression of SPP1+0.2095×expression of STC2. Based on the median cut-off risk score (0.387082), 343 HCC samples were assigned to the high- (n=171) and low-risk (n=172) groups. K-M survival analysis demonstrated that HCC patients with high-risk scores exhibited remarkably worse OS compared to the low-risk group (p-value=7.731e−08; Fig. 3A). The area under the curve (AUC) for OS was determined to be 0.825 (Fig. 3C), suggesting that this prognostic signature exhibits outstanding sensitivity and specificity. The gene expression heatmap and survival overview are illustrated in Figure 3E.
The prognostic signature was validated in the ICGC cohort. A total of 232 HCC patients were assigned to the high- (n=197) and low-risk groups (n=35) according to the same cut-off value of 0.387082 in the TCGA cohort. Notably, the OS of HCC patients was longer in the low-risk group than in the high-risk group (p-value=3.3632e−02; Fig. 3B). The AUC for OS was determined to be 0.668 (Fig. 3D), indicating moderate sensitivity and specificity of this prognostic signature. Moreover, the gene expression heatmap and survival overview are illustrated in Figure 3F.
Independence evaluation of the IRPS
For the TCGA cohort, univariate analysis demonstrated that tumor stage, metastasis stage, and risk score were independent prognostic factors for OS (Fig. 4). Furthermore, multivariate Cox regression analysis showed that only risk score remained an independent prognostic factor of the survival outcome of HCC patients after adjustment by age, gender, tumor grade and TNM stage (Fig. 4B). For the ICGC cohort, both univariate and multivariate Cox regression analyses revealed that tumor stage and IRPS were the independent prognostic factors for OS (Fig. 4C, D), which were supported by the findings of the TCGA cohort. Taken together, these results confirmed that IRPS could serve as a biomarker for predicting survival outcome of HCC patients.
Relationships between the IRPS and clinical-related factors
To further determine the predictive values of the IRPS, we analyzed the correlation between the IRPS and clinicopathological characteristics. In the TCGA cohort, this prognostic signature was significantly correlated with pathologically poor differentiation (p-value=0.003; Fig. 5A), more advanced tumor stage (p-value=0.004; Fig. 5B), and advanced T stage (Fig. 5C). Similarly, higher risk scores were associated with more advanced tumor stage in the ICGC cohort (Fig. 5D). The clinical significance of each identified IRG is shown in Supplementary Tables 1 and 2.
To analyze whether the IRPS is related to tumor-infiltrating immune cells, the association between the signature and immune cell infiltration were investigated in the TCGA cohort. The results indicated that CD4+ T cells and CD8+ T cells were negatively correlated with the signature, and no correlations were observed between the signature and B cells, dendritic cells, macrophage cells and neutrophil cells. These results are shown in Figure 6.
External validation of the expression of the IRGs involved in IRPS
Compared to the adjacent non-tumor liver tissues, the expression of the eight IRGs involved in IRPS were significantly increased in HCC in TCGA cohort. These results were validated in the ICGC cohort (Supplementary Table 3). Moreover, we confirmed the expression patterns of these eight genes in HCC samples through the Oncomine database. Except for FIGNL2, the mRNA expression levels of RBP2, MAPT, BIRC5, ROBO1, IL17D, SPP1 and STC2 were also increased in HCC tissues by using the Wurmbach liver cohort15 (Fig. 7A). The protein expression of these eight genes was confirmed in the HPA database. Compared to their expression in normal liver tissues, MAPT and ROBO1 were strongly expressed, BIRC5 and SPP1were moderately expressed, and STC2 were weakly expressed in HCC tissues (Fig. 7B, C). However, the expression of RBP2 was not detected in both HCC and normal liver tissues (Fig. 7B, C), the expression of FIGNL2 and IL17D were not found on the website.
GSEA
GSEA were preformed and identified 65 significantly enriched KEGG pathways in the TCGA cohort. The top five KEGG pathways in the high-risk group were: “KEGG_PYRIMIDINE_METABOLISM”, “KEGG_PURINE_METABOLISM”, “KEGG_RNA_DEGRADATION”, “KEGG_BASE_EXCISION_REPAIR”, and “KEGG_CELL_CYCLE”. The top five KEGG pathways in the low-risk group were: “KEGG_COMPLEMENT_AND_COAGULATION_CASCADES”, “KEGG_FATTY _ACID_METABOLISM”, “KEGG_DRUG_METABOLISM_CYTOCHROME_ P450”, “KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS”, and “KEGG_RETINOL _METABOLISM” (Fig. 8A).
Establishment of a predictive nomogram
A predictive nomogram was established to systematically validate the predictive value of the prognostic signature in HCC patients (Fig. 8B). All independent clinical risk factors, including age, gender, grade, stage, and tumor, node, and metastasis stages combined with the prognostic signature, were involved. Each factor was assigned a score point according to its risk for survival. In the nomogram plot, the influence of different factors on survival were represented by the length of the line. The prognostic signature had the longest line, indicating that it had the greatest influence on the predication of survival probability. Compared to other clinical characteristics, our prognostic signature contributed the highest number of risk points (from 0 to 100) in the nomogram, which was consistent with the results of independence evaluation analyses.
Discussion
HCC is one of the most prevalent and life-threatening cancers around the world, with rapid progression and difficulty in treatment.16 Although radiotherapy, molecular targeted therapy, and other treatment regimens can modestly prolong the survival time of HCC patients, the clinical outcomes of these patients are still unsatisfactory.1,17,18 It has been reported that the immune system is involved in tumor development and progression. However, immune PD-1/PD-L1 checkpoint therapy, such as nivolumab and pembrolizumab, remains to be further investigated due to the unsatisfactory OS.5,18
Thus far, there is no established personal treatment strategy for HCC patients. Individual risk assessment might be an essential step for the successful implementation of personal treatment strategies.19–21 Construction of a gene signature has been proposed to estimate the treatment outcomes of patients with HCC.22–26 Thus, the prognostic signatures based on IRGs might serve as effective targets for HCC treatment.27,28
In the present study, we screened out 62 candidate prognostic IRGs based on the transcriptional data of the TCGA cohort. A TF regulatory network was established to elucidate the interactions among these candidate prognostic IRGs. According to the results of LASSO-penalized Cox analysis, a novel IRPS was constructed and validated with the ICGC cohort. This prognostic signature could efficiently classify the OS of HCC patients into two different cohorts. The predictive performance of the IRPS was excellent in the TCGA and ICGC cohorts, with ROC=0.825 and 0.668, respectively. Furthermore, this IRPS was an independent prognostic factor of HCC and positively correlated with clinicopathological characteristics in the two cohorts, indicating a high effectiveness of the IRPS. The mRNA and protein expression of these genes involved in IRPS were confirmed through the Oncomine and HPA databases, respectively. More importantly, the cut-off value of our signature in both the TCGA and ICGC cohorts are the same, indicating that our prognostic signature is of high clinical practicability. In addition, a nomogram based on our prognostic signature combined with other clinical information was established to predicate the survival probability of patients with HCC, which could provide a convenient and efficient means for clinical use.
Besides, our IRG-based prognostic signature demonstrated a negative correlation with CD4+ T cells and CD8+ T cells. These results implied that HCC patients in the high-risk group might exhibit decreased infiltration levels of CD4+ T and CD8+ T cells. Tumor-infiltrating T cells have been shown to control the progression of HCC, thus contributing to better prognosis.29 Ma and co-workers30 founded that CD4+ T cells loss might be the mechanism underlying hepatocarcinogenesis induced by non-alcoholic fatty liver disease. Fu and co-workers31 reported that the abnormal levels CD4+ cytotoxic T cells was associated with high recurrence and poor survival rates in HCC patients. In addition, CD8+ T cells were linked to prolonged survival in patients with HCC.7,32 Nonetheless, the relationship between IRGs and immune cells in HCC is still indistinct. Our results demonstrated that IRPS could accurately reflect the status of HCC tumor immune microenvironments, which provides new insights into the molecular basis of HCC. However, more research is needed in the future.
To disclose the underlying biological mechanisms of the IRPS, GSEA were performed on the high- and low-risk group. In the high-risk group, the top five enriched KEGG pathways were primarily focusing on metabolic processes and cell proliferation, indicating that these IRGs can play an essential role in HCC prognosis.
Most of the eight IRGs have been identified as candidate biomarkers in various types of cancers. RBP2 is highly expressed in HCC tissues compared to normal tissues, and has been found to be correlated with alpha-fetoprotein, tumor differentiation and TNM stage.33,34 Knocking down of RBP2 significantly suppressed HCC proliferation, and vice versa, indicating that RBP2 may be involved in the pathogenesis of HCC.33 MAPT has been reported to be associated with poor prognosis and decreased sensitivity to taxane-based therapies in several tumor types.35–37 Both BIRC5 and SPP1 have been found to promote HCC cell proliferation.38,39 ROBO1 is overexpressed in HCC samples.40 Overexpression of ROBO1 could promote HCC cell growth and metastasis both in vitro and in vivo.41 STC2 is upregulated in HCC, thereby facilitating HCC proliferation and survival as well as mediating chemotherapeutic resistance.42–44 The potential molecular mechanisms of genes involved in IRPS to regulate HCC progression is illustrated in Supplementary Figure 2. However, further studies are needed to elucidate the regulatory roles of candidate IRGs in HCC.
Nevertheless, some limitations to this study should be highlighted. First, this signature was constructed based on retrospective data, and thus a prospective study is required to validate our findings. Second, the IRGs in this signature do not represent the same biological processes. Further experiments should be carried out to reveal the exact mechanisms both in vitro and in vivo. Finally, the relationship between the IRPS and immunotherapy response has not been studied due to lack of patients treated with immunotherapy.
Conclusions
In summary, we constructed a prognostic signature based on IRGs in HCC patients. This signature may provide novel prognostic and therapeutic biomarkers for HCC.
Supporting information
Supplementary Fig. 1
Differentially expressed IRGs in HCC.
(A) Heatmap of the DE-IRGs in HCC. Heatmap (B) and volcano plot (D) of differentially expressed TFs in HCC. (C) Volcano plot of DE-IRGs in HCC.
(TIF)
Supplementary Fig. 2
Potential mechanisms of IRPS to regulate HCC progression.
Upregulation of the Slit2-Robo1 signaling pathway lead to E-cadherin degradation which reduces the loss of cell-cell contact. OCT4 enhances the expression of BIRC5 via cyclin D1, which promotes the proliferation of HCC cells. IL-17-producing cells are enriched in HCC tissue and their levels correlate with disease progression and poor survival. RBP2 protein may stimulate HIF-2a via activation of the PI3K/Akt signalling pathway and then stimulate VEGF expression, which is involved in HCC angiogenesis. STC2 could regulate the expression of cyclin D1 and activate ERK1/2, so that the expression of STC2 promotes HCC cell proliferation. SPP1 interacts with miR-181c, which indicates a post-transcription regulation in HCC and may function as an enhancer of HCC growth. PRKCA, MARK1, and ABL2 may act on MAPT protein. The report for FIGNL2 is very rare.
JCTH-D-21-00017-s002.tif
(TIF)
Supplementary Table 1
Association of clinicopathologic characteristics with the IRGs and prognostic signature in TCGA cohort.
(DOCX)
Supplementary Table 2
Associations of clinicopathologic characteristics with the IRGs and prognostic signature in ICGC cohort.
(DOCX)
Supplementary Table 3
Differential expression of the IRGs involved in IRPS in HCC.
(DOCX)
Abbreviations
- AUC:
area under the curve
- BIRC5:
baculoviral IAP repeat containing 5
- DE-IRGs:
differentially expressed immune-related genes
- FIGNL2:
fidgetin like 2
- GSEA:
gene set enrichment analyses
- HCC:
hepatocellular carcinoma
- HPA:
Human Protein Altas
- ICGC:
International Cancer Genome Consortium
- IL17D:
interleukin 17D
- ImmPort:
Immunology Database and Analysis Portal database
- IRGs:
immune-related genes
- IRPS:
immune-related prognosis signature
- KEGG:
Kyoto Encyclopedia of Genes and Genomes
- K-M:
Kaplan-Meier
- LASSO:
Least Absolute Shrinkage and Selection Operator
- MAPT:
microtubule associated protein tau
- mRNA:
messenger RNA
- OS:
overall survival
- PD-L1:
immune gene programmed death-ligand 1
- RBP2:
retinol binding protein 2
- ROBO1:
roundabout guidance receptor 1
- ROC:
receiver operating characteristic
- SPP1:
secreted phosphoprotein 1
- STC2:
stanniocalcin 2
- TCGA:
The Cancer Genome Atlas
- TFs:
transcription factors
- TIMER:
Tumor Immune Estimation Resource
Declarations
Acknowledgement
The authors would like to express their gratitude to EditSprings (https://www.editsprings.com/) for the expert linguistic services provided.
Data sharing statement
All data are available upon request.
Funding
This study was supported by the National Natural Science Foundation of China (No.81703916 to YL), and the Natural Science Foundation of Hunan Province (No.2018JJ6042, to YL).
Conflict of interest
The authors have no conflict of interests related to this publication.
Authors’ contributions
Conception and design (HL, YL), administrative support (YL), provision of study materials or patients (XX, HL, PT), Collection and assembly of data (XX, PT), and data analysis and interpretation, manuscript writing, final approval of the manuscript (HL, YL, XX, PT).