Introduction
Hepatocellular carcinoma (HCC), which represents about 85–90% of cases of primary liver cancer, is the fourth most common malignancy and the third leading cause of tumor-related deaths in China.1 Worldwide, HCC is the second and sixth leading cause of cancer-related deaths in males and females, respectively.2 Risk factors, including viral infection, alcoholic cirrhosis and fatty liver, along with chronic inflammatory disorders of the biliary tract, genetic diseases and carcinogens, contribute to the risk of HCC. These are responsible for abnormal gene expression, resulting in increased cancer cell proliferation and escape from immune surveillance, which can speed up the process of development into HCC.3 Patients with HCC are usually hospitalized at an advanced stage and are prone to recurrence and metastasis. Therefore, it is of primary concern to explore the potential mechanisms of HCC and discover sensitive biomarkers to screen out high-risk patients.
With the advent of molecular biology, bioinformatics and sequencing technology have become increasingly used in research. The Cancer Genome Atlas (TCGA) database has accumulated abundant genomic and gene expression profiles for different diseases in the past decade. Through analysis of these data, various key genes and signaling pathways related to the disease may be identified, which may aid in better understanding of the disease mechanism and provide a theoretical basis for various clinical applications, including prognostic markers and potential therapeutic targets. Recent studies have shown that non-coding RNA (ncRNA) characterized by microRNA (miR/miRNA) and long non-coding RNA (lncRNA) are important regulatory molecules involved in various physiological and pathological cellular processes.4 Small nucleolar RNAs (snoRNA) are a class of nuclear-enriched intron-derived lncRNAs, between 60–300 nucleotides in length, predominantly found in the nucleolus and playing an important role in the process of chemical RNA modification, pre-RNA processing and alternative splicing control. Recently, snoRNAs and their host genes (small nucleolar RNA host genes, SNHGs) have been reported in a wide spectrum of cancers.5–7SNHG4, which is localized at the 5q31.2 region and has five exons, can promote tumor growth in patients with osteosarcoma8 and has been associated with a shorter overall survival (OS) in patients with HCC.9 miRNAs are small endogenous RNAs involved in regulating gene-expression post-transcriptionally. SNHG4 was found to promote tumor growth by sponging miR-224-3p in osteosarcoma and miRNA-204-5p in gastric cancer.8,10
Understanding this novel RNA crosstalk can lead to significant insight into gene regulatory networks and have implications in disease pathogenesis.11 However, the functional network and relationship between SNHG4 and HCC has not yet been elucidated. Therefore, the aim of the present study was to investigate the relationship between the expression of SNHG4 and HCC using online databases and to predict SNHG4-related pathways as well as related binding proteins and miRNAs (Fig. 1).
Methods
Expression of SNHG4 in HCC
The TCGA portal is the largest and most commonly used public resource providing somatic mutation, gene expression, gene methylation and copy number variation (CNV) datasets for several thousands of tumor samples.12 Various computational tools have been developed to aid researchers in performing specific TCGA data analyses. UALCAN (http://ualcan.path.uab.edu ) is an interactive website used to perform in-depth analyses of TCGA gene expression data and analyze relative expression of a query gene(s) across tumor and normal samples, as well as in various tumor sub-groups, based on individual cancer stages, tumor grade or other clinicopathological features.13,14 Among 377 samples of liver hepatocellular carcinoma (LIHC) present in the TCGA database, SNHG4 expression was found in only 371 samples. The lncLocator 2.0 is a cell-line-specific predictor, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences. lncLocator 2.0 was used to identify the distribution of SNHG4 in the HepG2 cell line.15 Experimental validation was then conducted (see Supplementary Materials).
Survival analysis of SNHG4 in HCC
The Kaplan-Meier Plotter (www.kmplot.com ) is an online database which can assess the effects of 54,675 genes on survival using 10,461 cancer samples.16 In this study, it was used to analyze OS and the recurrence-free survival (RFS) of HCC patients; patient samples were split into two groups, according to whether the expression value was above or below the median. The Mantel-Cox test, also known as the log-rank test, was used to determine significance of difference between survival curves. The number-at-risk is indicated below each panel, while the hazard ratio (HR) with 95% confidence interval (CI) and log-rank p-value are shown in each plot.
Functional and signaling pathway analysis
The functional annotation of SNHG4-related genes was performed using Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. GO enrichment analysis is commonly used to annotate the degree of gene function terms in differentially expressed genes (DEGs), which include molecular function (MF), biological process (BP), and cellular component (CC). KEGG enrichment analysis was used to demonstrate enriched signaling pathways in DEGs. Both GO and KEGG pathway analyses were performed using the R package “clusterProfiler”. Enrichment maps which allow visualization of pathway enrichment results were drawn using R software (http:///www.r-project.org/ ). Gene set enrichment analysis (GSEA; http://www.broad.mit.edu/gsea ) was conducted to perform analysis of SNHG4-related canonical pathways in HCC.
Analysis of genes, miRNA and transcription factor targets related to SNHG4 in HCC
Analysis of genes which interact with SNHG4, miRNA and transcription factor targets of SNHG4 in HCC was carried out using LinkedOmics. The LinkedOmics database (http://www.linkedomics.org/login.php ) harbors multiomics data and clinical data for 32 cancer types and a total of 11,158 patients from TCGA.17 LinkedOmics consists of three analysis modules: LinkFinder, LinkCompare, and LinkInterpreter. The LinkFinder module in LinkedOmics was used to study genes differentially expressed in correlation with SNHG4. Results were graphically presented as volcano plots, heat maps or scatter plots, and were statistically analyzed using Pearson’s correlation coefficient. A p-value of <0.05 was considered significant. Subsequently, GSEA was performed using the LinkInterpreter module in LinkedOmics. The results were further analyzed by KEGG and Wikipathway, conducted in WebGestalt for analysis of genes interacting with SNHG4, miRNA targets and transcription factor-target enrichment. The rank criterion was a false discovery rate (FDR) <0.05, and 500 simulations were performed. In order to validate the transcription factors obtained by LinkedOmics, we used Transcription factor Affinity Prediction (TRAP) Web Tools (http://trap.molgen.mpg.de/cgi-bin/home.cgi ). TRAP Web Tools was developed at the Max Planck Institute for Molecular Genetics to predict transcription factor (TF) binding affinities to DNA.18
Oncomine is an online platform that provides data from cancer microarray datasets, which can also be used for data mining in multiple cancers, including HCC.19 Wurmbach Liver dataset was selected from the Oncomine to evaluate the expression characteristics of RNF44 and HEATR1 in HCC, compared to that in normal liver tissues. A p-value <0.05 was considered as statistically significant.
Prediction and analysis of RNA-binding proteins (RBPs) related to SNHG4
CirclncRNAnet (http://120.126.1.61/circlnc/index.php ) is an integrated web-based resource for mapping of functional networks of ncRNAs.20 The Encyclopedia of RNA Interactomes (ENCORI) is an open-source web tool used for studying the ncRNA interactions of CLIP-seq, degradome-seq, and RNA-RNA interactome and RBPs data.21 The RBPs relevant to SNHG4 were downloaded from the CirclncRNAnet and ENCORI. Subsequently, RBPs coexistent in both databases and which were expressed in HepG2 cells (Human hepatoma cell lines), were screened.
The Human Protein Atlas (HPA) (https://www.proteinatlas.org/about ) is a Swedish-based program initiated in 2003, aiming to map all the human proteins in cells, tissues and organs, by integrating various omics technologies. The HPA consists of six separate parts: the Tissue Atlas, the Single Cell Type Atlas, the Pathology Atlas, the Brain Atlas, the Blood Atlas and the Cell Atlas, each focusing on a particular aspect of the genome-wide analysis of the human proteins. The Cell Type Atlas shows single cell RNA sequencing (scRNAseq) data from 13 different human tissues, together with immunohistochemically-stained tissue sections, allowing for visualization of the corresponding spatial protein expression patterns.
Results
SNHG4 expression in HCC
RNA sequencing (RNA-seq) datasets were obtained from TCGA. SNHG4 was found to be upregulated in patients with HCC compared to healthy subjects (Fig. 2A). The HCC group was further stratified based on sex (Fig. 2B), age (Fig. 2C), individual cancer stages (Fig. 2D), tumor grade (Fig. 2E) and promoter methylation levels (Fig. 2F) by UALCAN. Moreover, the OS and RFS of patients with HCC expressing high SNHG4 levels and low SNHG4 levels were significantly different (Fig. 3A and B). Based on results of lncLocator 2.0, SNHG4 was found to be mainly distributed in the nucleus of HCC cells. We then performed in vitro analysis of lncRNA SNHG4 expression in cytoplasm and nucleus. RNA was extracted from the cytoplasm and nucleus of two different kinds of hepatoma cell lines (HepG2 and HuH-7) using the PARIS™ Kit, and subjected to reverse transcription for analysis of lncRNA SNHG4 expression using real-time fluorescent quantitative PCR. These results showed that lncRNA SNHG4 was mainly located in the nucleus (For details, see Supplementary Fig. 1).
Functional and signaling pathway analysis
GO annotation showed that SNHG4-related biological processes included RNA splicing, RNA processing and ribonucleoprotein complex biogenesis; cellular component included the spliceosomal complex and small subunit processome, while molecular functions involved ATPase, helicase and catalytic activity (Fig. 4A, B and C). In addition, KEGG pathway annotation revealed that RNA transport, ribosome biogenesis, cell cycle and RNA surveillance pathway were significantly enriched pathways (Fig. 4D). To identify the potential function of SNHG4, GSEA was conducted to search for biological processes enriched in samples with high levels of SNHG4. A total of 96 gene sets were found to be upregulated. Three gene sets, namely mitogen-activated protein kinase (MAPK)/ERK, hepatocyte growth factor (HGF)/MET and mTOR, were enriched in samples with high expression levels of SNHG4 (Fig. 5A, B and C).
Analysis of genes, miRNA and transcription factor targets related to SNHG4 in HCC
LinkedOmics was used to analyze RNA sequencing data of 371 HCC patients from TCGA LIHC. A total of 8,744 genes showed significant correlation with SNHG4 (Fig. 6A), with a false discovery rate of <0.01 and p-value < 0.05. We found 49 positive and 50 negative gene correlations with SNHG4, as shown in the heatmap in Figure 6B and C, respectively. Among these positively correlated genes, RING finger protein 44 (RNF44; Pearson’s correlation coefficient: 0.66, p=3.76×10−47) and HEAT repeat-containing protein 1 (HEATR1; Pearson’s correlation coefficient: 0.64, p=2.3×10−44) significantly correlated with SNHG4. KEGG and Wikipathway cancer enrichment analyses of genes interacting with SNHG4, which had been identified with LinkedOmics LinkFinder, were conducted in WebGestalt. Ribosome biogenesis in eukaryotes and the RNA processing and surveillance pathway were among the top terms (Fig. 6D and E). Oncomine analysis of the data from the Wurmbach Liver dataset validated the significant difference in expression of RNF44 and HEATR1 in HCC (Fig. 6F and G), and normal healthy liver tissue was significantly different; the levels of expression were based on the gene rank percentile, fold change and p-value. The target transcription factors and miRNAs are shown in Figure 7A and B, respectively. SNHG4-related miRNAs were (ATAACCT) mir-154 [enrichment score (ES): 0.684; normalized enrichment score (NES): 1.834], (ACATTCC) mir-1, mir-206 (ES: 0.595; NES: 1.832) and (ATAGGAA) mir-202 (ES: 0.659; NES: 1.831), while SNHG4-related transcription factors included the E2F transcription factor (E2F) family, consisting of E2F1DP1_01 (ES: 0.622; NES: 1.896), E2F1DP2_01 (ES: 0.622; NES: 1.896) and E2F4DP2_01 (ES: 0.622; NES: 1.896). Moreover, by using TRAP web tools to validate the above findings, we obtained transcription factors related to SNHG4, which were found to be related to the E2F family. (See Supplementary Table 1; in the supplementary table, we ranked them from highest to lowest, according to corrected p-value.)
RBPs related to SNHG4
A total of 65 SNHG4 binding proteins were downloaded from CirclncRNAnet and 111 SNHG4 binding proteins were downloaded from ENCORI. U2AF2 and BUD13 were coexistent in both databases and found to be expressed in HepG2 cells. Moreover, both U2AF2 and BUD13 had prognostic significance as shown in Kaplan–Meier plotter (Fig. 8A and C) and the HPA (Fig. 8B and D). Low expression of U2AF2 and BUD13 was found to be related to a better prognosis, while U2AF2 showed better correlation with SNHG4 compared to BUD13 (Fig. 8E and F).
Discussion
HCC is considered to be one of the main causes of cancer mortality worldwide with high morbidity and mortality rates. HCC accounts for about 80–90% of liver cancer cases and globally results in an annual death toll of 600,000.22 Therefore, identifying novel prognostic biomarkers is essential to improving current HCC treatment. LncRNAs, which have recently attracted much attention, are dysregulated in a wide spectrum of cancers and have potential roles in tumorigenesis and tumor progression. Increasing evidence has demonstrated that snoRNA, a type of lncRNA, plays an important role in tumorigenesis.23 Moreover, SNHG4 which is an important type of snoRNA, has been reported to promote tumor growth and is a poor prognostic factor in patients with osteosarcoma.8
The significance of the role of SNHGs has been documented in the development of various cancers. For instance, SNHG6 has been reported to promote both cell proliferation in gastric cancer and tumor progression in HCC.24 Additionally, knockdown of SNHG1 inhibited cell growth and metastasis of osteosarcoma in vitro and in vivo,6 while several genes including SNHG1, SNHG3, SNHG4, SNHG5, SNHG6, SNHG7, SNHG10, SNHG11 and SNHG12, were found to be upregulated in HCC. Increased SNHG4 expression was also found to be associated with shorter OS (HR: 1.319, 95% CI: 1.131–1.537, p<0.001).9
The significant correlation between SNHG4 expression in liver cancer and histological type, histologic grade, stage, T classification and survival status, was reported by Jiao et al.;25 however, only GSEA analysis was performed during that study. The present study performed a comprehensive bioinformatics analysis of SNHG4 using TCGA datasets and found that SNHG4 was significantly upregulated in patients with HCC and showed poor prognosis in terms of OS and RFS. Based on functional and pathway enrichment analyses, SNHG4 was identified to be mainly involved in cancer-related signaling pathways, potentially facilitating tumorigenesis and progression in HCC. KEGG pathway analysis results showed SNHG4 may be involved in RNA transport, ribosome biogenesis in cell cycle, mRNA surveillance pathway and aminoacyl-transfer RNA biosynthesis. Among the genes interacting with SNHG4, RNF44 and HEATR1 showed the most significant correlation. Interestingly, both have been previously reported to be involved in DNA repair, regulation of rRNA synthesis, RNA stability, and initiation of translation.26,27 Using WebGestalt, KEGG and Wikipathway cancer enrichment analyses of SNHG4-related genes were performed; significantly enriched processes included ribosome biogenesis in eukaryotes and the RNA processing and surveillance pathway, results similar to the KEGG pathway analysis of SNHG4 with R software.
MiRNAs are short ncRNA sequences containing ∼22 nucleotides that act as important regulators of gene expression by specifically binding and cleaving mRNAs or inhibiting their translation.28SNHG4 was reported to promote tumor growth in patients with osteosarcoma, where miRNA-154 is downregulated in liver tissue and can inhibit tumorigenesis and the G1/S transition in cancer cells when overexpressed.29 miRNA-206, which is downregulated in human HCC tissues, and abnormally increased miR-206 expression (in HCC cell lines) have been reported to attenuate cell viability, migration, invasion and increased apoptosis.30 E2F1 plays a crucial role in the control of cell cycle and action. E2F1 acts as a transcription activator for E2F target genes, and can regulate transcriptional activities by binding to the promoters of a target gene via their response elements during the cell cycle progression of the late G1/S phase.31 Studies have shown that SNHG3 could be transcriptionally activated by E2F1. Choi et al.32 reported that E2F1 acts as an inhibitor on the hepatitis B virus life cycle, which mediated HCC by overcoming the control of virus-encoded HBx function on the host p53 promoter, while E2F1 activated the p53 promoter via the E2F1 binding site. Moreover, Palaiologou et al.33 reported E2F1 overexpression and its pro-apoptotic nature in human HCC.
It has long been established that alternative splicing contributes to tumorigenesis by producing splice isoforms which can stimulate cell proliferation and cell migration or induce resistance to apoptosis and anticancer agents.34 Among RBPs interacting with SNHG4, low expression of BUD13 and U2AF2 showed good prognosis in HCC patients, while U2AF2 showed good correlation with SNHG4 (r=0.6745, p<0.001). U2AF, comprised of a large and a small subunit, is a non small nuclear ribonucleoprotein particles (snRNP) protein required for the binding of U2 snRNP to the pre-mRNA branch site. Pre-mRNA splicing is a crucial step in eukaryotic gene expression. The recognition of the splice sites is initiated by U2AF2, which binds to the poly-pyrimidine tract (referred to also as Py-tract) RNA upstream of exons to assemble the spliceosome. Li et al. found that U2AF2 was related to activation of the AKT/mTOR pathway in non-small cell lung cancer.35 Similarly, from our GSEA analysis results, we can infer that the SNHG4 related-biological pathways were MAPK/ERK and mTOR, the former presenting with a significantly higher number of gene hits before the peak (see Fig. 5A) and hence contributing more to the enrichment score of the gene set. This suggests that SNHG4 might promote tumor progression by regulating the MAPK/ERK pathway. Both MAPK/ERK and mTOR have been reported as majorly important cellular signaling pathways during the genesis and development of HCC.36 Taking into account that numerous studies have found that SNHG4 plays a crucial role in carcinogenesis,8,9 it can be inferred that SNHG4 contributes to the progression of HCC and influences prognosis by regulating the MAPK/ERK or mTOR pathway.
SNHG4 has the potential to be the basis of future drug treatment or gene therapy in HCC. The present findings suggest that SNHG4 may play a critical role in hepatocarcinogenesis by influencing different steps of gene expression, from RNA splicing and ribosome biogenesis to the mRNA processing and surveillance pathway. A limitation of our study is the lack of validation at experimental level, which we shall perform in the future. Secondly, HCC has many different causes, which may lead to different pathogeneses. We did not explore the role of SNHG4 in HCC of different etiology. However, based on patient medical history and risk factors of HCC, patients in our study could not be simply subclassified into distinct groups, such as virus-induced HCC and alcohol-related HCC, since for most patients HCC was found to be of mixed etiology, as is often the case at clinical level. To our knowledge, this is a limitation of any study on HCC involving data from the TCGA database. In addition, the findings of the present study are preliminary and purely descriptive.