Introduction
Hepatocellular carcinoma (HCC) accounts for 90% of liver cancers and is the fourth leading cause of cancer-related mortality worldwide.1,2 Liver resection, liver transplant, ablation, transarterial chemoembolization and systemic drugs have dramatically extended patient life expectancy, but the 5-year relative survival rate of liver cancer still hover at around 20%.1,3 Improvement in progression in prognosis is needed. Immune therapy has challenged traditional HCC therapy. Immune checkpoint inhibitors (ICIs) are the backbone of current immune therapy. The combination of the PD-L1 inhibitor atezolizumab and the anti-vascular endothelial growth factor (VEGF) agent bevacizumab is accepted as a preferred treatment for advanced HCC.4,5 Nevertheless, the overall response rate of immune therapy is less than a third, which limits the survival efficacy.6 More specific prognostic models and subclassifications of patients who might benefit from a particular therapy are needed. With the development in multi-omics profiling, various bioinformatic methods are used for HCC-patient subclassification and prognostic assessment. However, many studies enrolled only parameters from the whole transcriptome profile and overlooked their biological processes. Consequently, the results had an inherent bias and did not reflect the intrinsic feature of HCC.
Aminoacyl-tRNA synthetases (ARSs) are a family of enzymes in the cytoplasm and mitochondria, which catalyze the ligation of amino acids to their corresponding tRNAs in protein synthesis. Some combine with ARS-interacting multifunctional proteins to form multi-synthetase complexes.7 Recent studies have revealed that ARSs are associated with cancer occurrence and progression by their structural metamorphosis and additional motifs and domains including N-terminal helix appendix, leucine zipper, glutathione S-transferase, endothelial monocyte activating polypeptide II (EMAPII) and WHEP domains.8,9 Signaling pathways mediated by ARSs also participate in tumorigenesis of multiple organs.10–12 Several abnormally expressed ARSs have diagnostic and prognostic value in cancer.13–17 Considering their conventional catalytic activity and noncanonical function in tumorigenesis, ARSs may actively participate in initiation and progression of HCC which needs further exploration, and the clinical significance of ARSs in HCC is also unclear.
Herein, we carried out a comprehensive bioinformatic analysis of ARSs in patients from The Cancer Genome Atlas-liver hepatocellular carcinoma (TCGA-LIHC) cohort, constructed a novel prognosis ARS model by Cox regression and least absolute shrinkage and selection operator (Lasso) regression and used it to assess the survival rate of HCC patients. The prognostic value and protein expression level of each enrolled parameter were evaluated in the International Cancer Genome Consortium (ICGC), Gene Expression Omnibus (GEO) and Human Protein Atlas (HPA) databases. To understand the underlying mechanism, genetic mutation profiles were analyzed, and enrichment analyses were performed. Results showed differences in immune-related pathways and led us to conduct single sample gene set enrichment analysis (ssGSEA), comparison of the expression of immunosuppressive molecules, and calculation of the tumor mutation burden (TMB). The study design is shown in Figure 1. The study objective was to explore the prognostic value of ARSs and their function in HCC. We established a novel ARS-related prognosis model and preliminarily explored the underlying mechanisms, which will facilitate the design of therapy that improves patient prognosis.
Methods
Public data acquisition and processing
RNA-seq expression data of TCGA-LIHC cohort and corresponding clinical information were obtained from the TCGA website (https://portal.gdc.cancer.gov/ ). Patients whose survival time was missing or who survived for less than 30 days were excluded, and 343 patients were enrolled in this study. The etiologies of the patients included hepatitis B, hepatitis C, nonalcoholic fatty liver disease, and inherited metabolic diseases such as hemochromatosis and alpha-1 antitrypsin deficiency. We allocated patients randomly to training and test cohorts. Baseline characteristics of patients are shown in Supplementary Table 1. Expression and clinical data of the Liver Cancer-RIKEN Japan (LIRI-JP) cohort and GSE76427 dataset were downloaded from ICGC (https://dcc.icgc.org/ ) and the GEO database (https://www.ncbi.nlm.nih.gov/geo/ ) respectively.
Model construction and validation
Univariate Cox regression analysis of 38 ARS genes was conducted in the training cohort (Supplementary Table 2) and 17 prognostic genes were selected. Next, Lasso ten-fold cross-validation was performed by glmnet and survival R packages, and six genes were screened out. Multivariate Cox regression analysis was then used to construct a prognosis model that was confirmed in the test cohort. We used receiver operating characteristic (ROC) curves to evaluate the model performance, which were drawn by the survivalROC R package.
HPA database analysis
DARS2, YARS1 and CARS2 immunohistochemistry profiles were obtained from the HPA (https://www.proteinatlas.org/ ), an open-access database of protein expression of human normal and tumor tissues.
Clinical characteristics analysis
Patients in the TCGA-LIHC cohort were grouped into different subsets by their clinical characteristics including age, sex, alcohol consumption history, hepatitis B history, hepatitis C history, Child-Pugh score, alpha-fetoprotein (AFP) value, tumor grade, tumor stage, residual tumor, and vascular invasion. Differences between the risk scores of the clinical-feature subgroups were compared with Wilcoxon tests. Results were visualized by the ggplot2 package and a nomogram was constructed with the rms package in R.
Acquisition and analysis of mutation profiles
Mutation profiles (VarScan2 format) of TCGA-LIHC cohort were downloaded from the University of Southern California Xena database (https://xena.ucsc.edu/ ) and analyzed with the R maftools package. Waterfall figures were produced with GenVisR.
Screening and enrichment analysis of differentially expressed genes (DEGs)
DEGs between low-risk and high-risk group were found by the R packages DESeq2, edgeR, and limma. Intersection of three DEG sets was obtained and visualized with the nVennR and VennDiagram packages. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of common DEGs was carried out with the clusterProfiler package, which was also used for gene set enrichment analysis (GSEA).
Immune-cell infiltration analysis and TMB calculation
The GSVA package was used to estimate the extent of immune-cell infiltration in different groups,18 and the strength and direction of the association between risk scores and immune-cell infiltration were evaluated by Spearman’s correlation. Differences in the expression of immunosuppressive molecules between the low-risk group and high-risk group were tested by the Wilcoxon test. The TMB of each sample was calculated with the TCGA mutations package and differences of TMB between the low-risk and high-risk groups were compared with the Wilcoxon test.
Statistical analysis
Statistical analysis was performed with R software (version 4.1.0). We used log-rank test for Kaplan-Meier survival analysis and calculate hazard ratios and 95% confidence intervals. Two-tailed p-values <0.05 were considered statistically significant.
Results
Construction of a novel prognostic risk score model based on the ARS
Univariate Cox analysis and Lasso regression identified potential independent prognostic ARSs in the training cohort (Fig. 2A–C). A novel prognostic risk score model containing DARS2, YARS1, and CARS2 was constructed by multivariate Cox analysis. The risk score formula was: Risk Score = 0.089 × DARS2 + 0.126 × YARS1 + 0.102 × CARS2.
DARS2, YARS1, and CARS2 predict prognosis and are highly expressed in HCC
To evaluate the prognostic value of DARS2, YARS1, and CARS2, we performed Kaplan-Meier survival analysis in the ICGC LIRI-JP cohort and GEO GSE76427 dataset. Highly expressed DARS2, YARS1, and CARS2 were linked to shorter survival time in the LIRI-JP cohort (Fig. 3A–C). The survival analysis results in GSE76427 were also statistically significant (Supplementary Fig. 1). We also saw increased levels of DARS2, YARS1, and CARS2 protein expression in the HPA database (Fig. 3D).
Assessment of the ARS prognostic risk score model
The risk score of each patient in the TCGA-LIHC cohort was calculated using the derived model. Using the median score of training cohort, the patients in the TCGA-LIHC cohort were divided into high-risk and low-risk groups. The distribution of risk scores, survival time, and gene expression in the training and test cohorts are shown in Figure 4A and B. The prognostic value of model was estimated by survival analysis of the training and test cohorts. A Kaplan-Meier survival curve of the training cohort (Fig. 4C) showed that HCC patients with high-risk scores had a worse prognosis than those with low-risk scores, which was confirmed in the test cohort (Fig. 4D). Performance of the risk score system in the training and test cohorts was evaluated with ROC curves. The areas under the ROC curves of the training and test cohorts were 0.775 and 0.797, respectively (Fig. 4E, F). High-risk scores were also linked to worse recurrence-free survival (Fig. 4G).
ARS risk score is correlated with clinical features and predicts the prognosis of clinical-feature subgroups
The prognostic significance of the model was evaluated by survival analysis within age, sex, tumor grade, stage, and vascular invasion subsets in the TCGA-LIHC cohort. The results (Fig. 5A–E) showed a high-risk score was significantly associated with worse prognosis in patients who were >65-years of age (p<0.001) and ≤65 years of age (p=0.003), female (p=0.032), male (p<0.001), grades 1–2 (p<0.001), stages I–II (p<0.001), stages III–IV (p=0.033), and without vascular invasion (p=0.003). We also found that patients had higher risk score in grade 3–4, stage III–IV, and vascular invasion subgroup than patients in grade 1–2, stage I–II, and no vascular invasion subgroup respectively. There was no evidence that the risk score was correlated with age, sex, tumor size, AFP, Child-Pugh score, residual tumor, or etiology such as viral hepatitis (Supplementary Fig. 2). A prognostic nomogram (Fig. 5F) was constructed using the risk score and clinical features, which provides an easy-to-use tool to predict the 1-, 2-, and 3-year overall survival of HCC patients.
Analysis of somatic mutations based on risk score model
Because of the importance of gene mutations in carcinogenesis and tumor development, we compared genetic mutational profiles of low-risk and high-risk groups to investigate the underlying mechanism. Information related to gene mutations in the TCGA-LIHC cohort is summarized in Supplementary Figure 3. The frequency of TTN mutation was higher in the low-risk than in the high-risk group. TP53 mutations were more frequent in the high-risk group (Fig. 6A, B).
Functional annotation of DEGs in different risk groups
Differential gene expression analysis was conducted to identify DEGs, and 1590 DEGs were selected for later enrichment analysis (Fig. 7A). GO enrichment results showed that DEGs were enriched in pathways such as organelle fission, nuclear division, mitotic nuclear division, nuclear chromosome and diterpenoid metabolic process (Fig. 7B, Supplementary Fig. 4). KEGG enrichment results indicated that top-ranked enrichment pathways were retinol metabolism, metabolism of xenobiotics by cytochrome P450, cell cycle, drug metabolism-cytochrome P450, chemical carcinogenesis-DNA adducts and neuroactive ligand-receptor interaction (Fig. 7C). GSEA was also applied, and we found genes were enriched in pathways related to activation of immune response, cell cycle and activation of ATR signaling in response to replication stress (Fig. 7D-F, Supplementary Fig. 4).
Correlation between risk score and immune-cell infiltration
Driven by the importance of immune cells in hepatic microenvironment and the result showed in Figure 7D that activation of immune response pathway was enriched in GSEA, we performed ssGSEA to analyze immune-cell infiltration levels in each tumor sample. Compared low-risk group with high-risk group, we discovered that infiltration levels of activated B cell, activated CD8+ T cell, effector memory CD8+ T cell, eosinophil, gamma delta T cell, mast cell, memory B cell, and type 1 T helper cell were higher in low-risk group, while infiltration levels of activated CD4+ T cell, central memory CD4+ T cell and type 2 T helper cell were higher in high-risk group (Fig. 8A). The results of Spearman’s correlation analysis showed that risk score was positively correlated with activated CD4+ T cell, central memory CD4+ T cell but inversely correlated with eosinophil, mast cell and memory B cell (Fig. 8B–F). Higher expression of immunosuppressive molecules such as PD-1, TIM3, CTLA-4 and TIGIT was also discovered in high-risk group (Fig. 8G–J). However, differences in PD-L1, PD-L2, and LAG3 expression in the low- and high-risk groups were not significant. (Fig. 8K, Supplementary Fig. 5). We also calculated TMB of each sample in TCGA-LIHC cohort and found a higher TMB in high-risk group (Fig. 8L).
Discussion
Although various technologies and drugs applied in HCC treatment bring an increase in patient life expectancy, HCC is still a big threat for human health and finding the best treatment remains a conundrum. Therefore, a comprehensive understanding of mechanisms in tumor initiation, progression, recurrence, metastasis, and resistance is needed. It has been proved that ARSs are involved in tumor progression through their noncanonical functions but their roles in HCC are unspecific.8 Herein, we conducted in-silico analyses to explore the role ARSs play and their prognostic value in HCC and presented the first ARS family-based prognostic model.
In this study, univariate Cox regression analysis was conducted in training cohort to find ARSs associated with prognosis and Lasso was carried out considering that multivariate collinearity might exist as ARS genes had similar functions. Multivariate Cox regression was used to select independence prognostic ARS genes and DARS2, YARS1, and CARS2 were screened out, which were then enrolled in the prognostic model. We evaluated the prognostic value of three genes in ICGC and GEO database respectively and tested survival prediction capability of the model in test cohort and subgroups with different clinical features. Although survival time of high-risk patient was shorter in grade 3–4 and vascular invasion subsets, results did not meet the predefined threshold of statistical significance, which might be limited by sample size. In addition, results showed higher risk score was correlated with vascular invasion, higher tumor grade, and stage but irrelevant to hepatitis B or hepatitis C history, which are major causes for HCC.19 Therefore, the elevation of risk score might not be etiology driven but a universal reaction during hepatocarcinogenesis.
In this model, DARS2, deficiency of which causes leukoencephalopathy, is involved in mitochondrial unfolded protein response.20,21 Liu et al.,22 have shown that DARS2 is a downstream of NFAT5 and promotes hepatocarcinogenesis through regulating cell cycle and attenuating cell apoptosis, indicating its potential value as a prognosis biomarker. YARS1 was recently potentiated to regulate intracellular signaling and associated with chemotherapeutic response.23,24 It has been reported that upregulated YARS1 aggravates HCC by mediating PI3K/AKT signaling pathway, which supports our result that YARS1 is related to poor prognosis.25,26 Additionally, YARS1 has a C-terminal domain and can be split into two distinct cytokine mimics, one of which is EMAPII.27 EMAPII is a pro-inflammatory cytokine and affects fibronectin disruption and VEGF signaling.28 These results imply that YARS1 could influence prognosis via immune-related mechanism and partially explain why patients with vascular invasion had higher risk score. Although mutation of CARS2 leads to a severe epileptic encephalopathy and complex movement disorder, its role in tumor is still unspecified and further investigations are needed.29 Overall, the three-gene signature successfully predicted the survival of HCC, which drove us to explore the potential underlying mechanisms.
As abnormal and uncontrolled cellular growth in cancer was primarily caused by genetic mutations, we evaluated mutational genes in two distinct groups first and the result showed a higher TP53 mutation frequency in high-risk group.30 Regulating multiple biologic processes, TP53 mutational status affect the immune responses and its functional deletion can lead to centrosome amplification and chromosomal instability.31,32 It also has been reported that TP53 mutation frequencies increase significantly with tumor progression and associate with prognosis of HCC, which can partially account for shorter survival time in high-risk group.33 Next, we conducted the enrichment analysis to figure out other possible underlying mechanisms and found an enrichment in activation of immune response, showing that immune heterogeneity between low- and high-risk groups is a possible reason for the difference in overall survival.
Immunosuppressive microenvironment is crucial for tumor cells to evade immune surveillance.34 Therefore, we performed ssGSEA to evaluate immune-cell infiltration in tumor microenvironment, which presented a result that infiltration levels of antitumor response related cells, including activated B cell, CD8+ T cell, gamma delta T cell, and type 1 T helper cell, were higher in low-risk group, while infiltration levels of cells related to immunosuppression, such as type 2 T helper cell, were higher in high-risk group. We speculated that high-risk score reflected immunosuppression of tumor microenvironment in HCC, which led us to compare expression of immunosuppressive molecules in two distinct groups. Previous studies have shown that PD-1, PD-L1, CTLA-4, TIGIT and TIM-3 are closely related to immunosuppression.35,36 Targeting at these immunosuppressive molecules, ICIs herald a new era of systemic treatment. PD-1 inhibitor pembrolizumab and nivolumab have been accepted as a second-line treatment for patient after sorafenib failure.37,38 Combination of ICIs and other agent such as anti-VEGF agent, tyrosine kinase inhibitors will extend patient survival compared with single ICI agent.39 Our results showed that PD-1, CTLA-4, TIGIT, and TIM-3 were highly expressed in the high-risk group, which supported our speculation. Despite the p-value not reaching significance, PD-L1 expression was also higher in high-risk group, and further validation is needed. Increasing expression of immunosuppressive molecules in high-risk group hinted that ICIs could be used in high-risk patient treatment and ARS-based prognostic model may act as a predictive biomarker for immunotherapy response, which is also supported by the result that TMB was elevated in high-risk group.
Although the ARS-based model constructed in this study successfully predicted prognoses of HCC patients and focused on the underlying TP53 mutation and immune-related mechanism, some limitations remain. First, TCGA cohort did not include all etiological factors for HCC. Although hepatitis B and hepatitis C are not related to risk score, other etiological factors such as nonalcoholic fatty liver disease are not estimated, so whether this model can apply for all etiologies of HCC is remaining for further evaluation. Meanwhile, this model was constructed based on TCGA-LIHC cohort, external validation in prospective cohort and further experimental verification are required.
This study was the first comprehensive analysis about prognostic value of ARS family members in HCC patients. A novel ARS-related prognosis model was established and the possible mechanisms behind the model were preliminarily explored. Our findings provided a novel ARS-based prognostic model which might reflect the immune-suppressive state and contribute to developing precision medicine for HCC.