Introduction
Lung carcinoma remains one of the most prevalent and is estimated to be a primary cause of cancer mortality globally.1,2 Lung adenocarcinoma constitutes 80–85% of lung cancer incidences, also termed non-small cell lung cancer (NSCLC).3 Radiotherapy is the most important nonsurgical strategy for NSCLC patients and is required for approximately 70% of NSCLC patients during treatment.4 Radiation therapy is effective when surgery is not an alternative in the early stages of NSCLC. However, it often recurs after radiation therapy, and the relapsed NSCLC is commonly re-irradiated in this condition, with poor outcomes; as a result, individuals may acquire radioresistance.5 It develops in certain patients due to tumour heterogeneity concerning cell origin, aetiology, and molecular mechanisms, which is the most critical question to address since it results in radiotherapy failure.6 Furthermore, the critical components involved in radioresistance are still unclear.
Long non-coding RNA (lncRNA) is a non-coding RNA with more than 200 nucleotides involved in the progression of diseases and contributes to various cellular and biological processes.7,8 The upregulation of the lncRNAs actively participates in cancer progression by inducing activation of oncogenes and silencing tumour suppressors.9,10 Recently, several functional lncRNAs have been considered critical factors in the induction of radioresistance responses by modulating the transcriptional or post-transcriptional levels of their target gene expression.11 lncRNAs offer tremendous potential for developing novel cancer prognoses and therapeutic strategies. The co-expression networks showed important roles in cell proliferation in cancer and regulating related genes in cancer.12
The rapid progression in personalized and targeted therapies is paving the way for developing next-generation therapeutics. However, only 15% of the proteins are disease-related and druggable.13 The small compounds targeting the proteins demand a deep binding pocket on their spatial structure. The small molecules are also deemed a potential lead for targeting lncRNA, eventually inhibiting cancer and metastasis.14 Moreover, these small molecules offer advantages in enhanced pharmacokinetic attributes, cost, and patient acceptability. Therefore, the authors focused on three selected plant-based volatile ligands recognized by the Joint FAO/WHO Expert Committee on Food Additives as flavouring agents. The reported bioactivities and predicted pharmacokinetic properties of the ligands piqued our interest in exploring their potential in modulating the dysregulated biological processes driving radioresistance in NSCLC pathogenesis. The bioactivities of the selected ligands and their corresponding Chemical Identification numbers (CID) were noted. Syringaldehyde (CID:8655) and chalcone demonstrated anti-breast cancer activity by inhibiting cell proliferation and metastasis.15 Anti-angiogenic and anti-inflammatory properties and engagement in suppressing reactive oxygen species were reported with vanillyl alcohol (CID:62348).16,17 Currently, antimicrobial drugs are being repurposed for anticancer therapy, such as Ethyl 2-benzylacetoacetate (CID:246929) that is associated with novel antimicrobial and anticancer drug synthesis and related activity.18
Advancements in the high throughput screening of lncRNAs are being explored against cancer, and small molecules targeting these oncogenic lncRNAs are expanding rapidly. Therefore, the study focused on identifying and targeting the lncRNA regulating the clinically significant biomarkers in NSCLC pathogenesis. The study demonstrated the binding potential of the volatile ligands to the lncRNA and the ability to exert enhanced pharmacokinetic effects to inhibit the dysregulated cellular and biological processes driving radioresistance in NSCLC cells.
Methodology
The research focuses on discovering a lncRNA that regulates clinically significant genes linked to radioresistance in NSCLC and exploring the possible interaction between small volatile inhibitors and the identified lncRNA. Figure 1 illustrates the methods employed in this study, while Supplementary Table 1 lists the bioinformatic tools used. The study does not involve any human or animal subjects.
Microarray data analysis
The bioinformatics web services based on dry lab approaches are crucial for processing and interpreting microarray data aiding in identifying novel targets for drug development and personalized medicine.19 The Gene Expression Omnibus (GEO) database was used to retrieve a non-coding RNA profiling microarray dataset (GSE197236) of radioresistant A549 cells, which included mRNAs and lncRNAs. Three radioresistant and three sensitive A549 cells were yielded from the dataset. The microarray used a GPL26963: Agilent-085982 Arraystar human lncRNA V5 microarray platform. The differentially expressed (DE) mRNAs and lncRNAs between the Radioresistant and sensitive A549 cells were determined using the GEO2R tool. The log2FC is a measure used to compare the expression levels of a gene in two different conditions, such as a test and a control group. Differentially expressed genes (DEGs) were achieved by using the Benjamini and Hochberg false discovery rate (FDR) < 0.05 to control false positive rates, with the criteria of a log2FC (log2Fold Change) > 1.20
Identification of hub genes
The DEGs were leveraged to the Search Tool for the Retrieval of Interacting Genes (STRING, https://string-db.org/ ) to assess their interactions with medium confidence > 0.4 and FDR stringency at 5% significance. The constructed network was visualised using Cytoscape (V 3.9.1) to determine key nodes in the network by employing Molecular Complex Detection (MCODE) clustering algorithm (V2.0). The cytoHubba plugin (V 0.1.) was used to identify the top 5 hub genes based on Maximum Clique Centrality (MCC) topological algorithm.21,22
Functional enrichment analysis of hub genes
A gene ontology (GO) functional enrichment analysis for the hub genes was performed using ShinyGO V0.76.1 (http://bioinformatics.sdstate.edu/go/ ) to uncover dysregulated biological processes in radioresistant NSCLC cells. The GO biological processes were evaluated at 5% FDR stringency and restricted to Homo sapiens.23
Clinical significance
The KM plotter database (https://kmplot.com/analysis/ ) assessed the association between the hub gene expression cohorts and the overall survival of NSCLC data (n=719). The yielded KM survival plots to assess the number of samples at risk for various time points of 200 months in subgroups with the expression cohorts of individual hub genes with a Hazard Ratio (HR) to compare the risk of an NSCLC progression between two expression cohorts. An HR > 1 indicates that one group has a higher risk of the event than the others.24
mRNA-lncRNA coexpression analysis
A co-expression network of DE lncRNA-mRNA transcripts was constructed to investigate the underlying functions of the DE mRNAs and the interactions between lncRNAs. Pearson correlation is commonly used to assess the strength and direction of the linear relationship between the expression levels of two set genes. The DE mRNA-lncRNA pairs that co-express each other based on average expression and have a Pearson correlation coefficient > 0.99, representing positive correlation, were determined using the “diffcoexp” package of R (V4.1.2).25 Finally, the lncRNAs having a high correlation to the hub genes were chosen to construct a co-expression network and visualised using Cytoscape.26,27
lncRNA structure prediction and molecular docking
The lncRNA’s secondary structure and folding affinity were assessed using the RNAfold web server (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi ). Further, the three-dimensional structure of the lncRNA was predicted using the RNAComposer (https://rnacomposer.cs.put.poznan.pl/ ).28 The selected volatile ligands are orally bioavailable and have low molecular weight and drug-likeness with Hydrogen bond acceptors > 2, demonstrating the potential of forming hydrogen bonds with the target was determined using SwissADME (http://www.swissadme.ch/ ). They were subjected to knowledge-based molecular docking with the lncRNA using HADDOCK 2.4 (https://wenmr.science.uu.nl/haddock2.4/ ) by automatically specifying active and passive residues. The minimum relative solvent accessibility was set as the default of 15%. Unlike ab initio docking procedures, which solely take the coordinates of the structures into account, it uses experimental data to guide the molecular docking process. The HADDOCK’s docking technique consists of the following steps: randomising orientations preceded by energy minimisation to avert steric clashes, torsion angle dynamics exploiting torsion angles as degrees of freedom, and tweaking the system using an explicit solvent in Cartesian space.29
Molecular dynamics simulation
The molecular dynamics (MD) simulations provide critical information about lncRNA-ligand complexes’ flexibility and stability. It depicts the interaction between the lncRNA and the ligand, including binding sites and conformational changes during the bound and unbound states.30 The MD simulation steps of the lncRNA-ligand complexes were partially adapted from Junaid et al.31 The MD simulation was carried out using GROMACS (2022.1), in contrast to the ligand-free lncRNA, which served as the control. The AMBER force field generated lncRNA and lncRNA-ligand complex topologies. The complexes were solvated using the TIP3P water model and were neutralised by incorporating Na+ and Cl− ions. Additionally, the complexes were relaxed using the steepest descent energy minimisation technique. The minimised complexes were subjected to the equilibration step, maintaining the constant volume and pressure using the Nose-Hoover algorithm. The complexes were then subjected to the production step at 300 K and 1 atm pressure for 20 nanoseconds (ns) to assess the system’s stability, and the resultant trajectory data were used to demonstrate the Root Mean Square Deviation (RMSD) of the complexes.
Results
The study identified overexpression of a lncRNA that regulated clinically significant genes associated with radioresistance in NSCLC. The study’s findings are explained in the following section and illustrated in Supplementary Figure 1 and Supplementary Table 2.
Differential expression of the genes
The RNA microarray dataset contained six samples with sensitive and Radioresistant A549 cells. 1085 DEGs, of which 589 and 496 DE mRNAs and lncRNAs were identified according to the criteria of log2FC > 1 as upregulated and log2FC < −1 as downregulated, represented with red and blue dots, respectively, with adjusted P value < 0.05. The volcano plots (Fig. 2a–c) represent the differential gene expression of control vs test, DE mRNAs and lncRNAs.
PPI network and topological analysis
The protein-coding genes were leveraged to the STRING database for elucidating their physical and functional relationship, restricted to “Homo sapiens”. For further visualisation and analysis, the network was exported to Cytoscape. The network with significant DEGs consists of 589 nodes (DE mRNAs) and 1036 edges (Supplementary Fig. 2) with a clustering coefficient of 0.23, determined through the Analyzer tool of Cytoscape. MCODE determined 19 clusters of densely interconnected nodes from the network. A cluster with 20 nodes and 61 edges and a score of 6.421, cluster 1 was considered significant compared to the other clusters’ scores, possessing 12 downregulated and 8 upregulated nodes (Supplementary Fig. 3a). The cluster uncovered the top 5 hub genes (CKS1B, CDC25C, IRF1, IFNB1, CDT1) using the MCC algorithm (Supplementary Fig. 3b), which offers better accuracy in assessing the topological relevance of nodes when compared to other centrality algorithms.
Biological processes and functional enrichment
The GO biological processes were determined for the hub genes at a 5% FDR cut-off, and the percentage of the hub genes enriched GO was based on total fold enrichment. Notably, the 20 GO terms uncovered the dysregulation of cancer immune response, cell cycle, and post-regulation of the cell cycle. The count of genes implicated with the determined biological process driving radioresistance is depicted in a chart (Fig. 3).
Survival risk of the hub genes in NSCLC
The KM plotter was employed to determine the overall survival of the hub genes. The survival median and log-rank P-value of the KM plot assessed the overall survival between the high and low expression cohorts between the individual hub genes and the overall NSCLC subgroups at 95% confidence. The KM plots explained that the high expression cohorts drive NSCLC pathogenesis with a Hazard ratio >1 (Supplementary Fig. 4, Supplementary Table 3).
Hub gene-lncRNA co-expression network
The co-expression network construction was employed to investigate the functions of lncRNAs. The Pearson correlation coefficient, a co-expression metric, determined the correlation between the hub genes and lncRNAs based on average expression (Fig. 4). Our investigation revealed that ENST00000605056 lncRNA was upregulated and exhibited the maximum interactions with three highly ranked hub genes (CKS1B, IRF1, and IFNB1), rendering it the most active lncRNA. However, few other lncRNAs (ENST00000608624, ENST00000538804, ENST00000435024, and ENST00000584810) possessed interaction with two hub genes (CDC25C and CDT1). Moreover, 7 lncRNAs (ENST00000504474, ENST00000412492, ENST00000431800, ENST00000448804, ENST00000604818, ENST00000523267, and ENST00000621891) were correlated with IRF1 and IFNB1.
Structure prediction of the lncRNA
The obtained dot-bracket annotations from RNAfold were used to predict the three-dimensional structure of the ENST00000605056 on the RNA COMPOSER webserver (Supplementary Fig. 5). The secondary structure, folding energy, dot-bracket notations and chromosome coordinates were noted (Table 1).
Table 1The predicted secondary structure of the identified lncRNA with dot-bracket notations and its chromosome number and coordinates
Identified lncRNA | 2° Structure of lncRNA and MFE | Dot bracket | Chromosome number and coordinates |
---|
ENST00000605056/AC104695.2 | −5.40 Kcal/mol | | Chr2:28425945-28426719[+] |
lncRNA-ligand docking
The potential binding mode of interaction between lncRNA and the volatile ligands was studied employing knowledge-based RNA-Ligand docking using the HADDOCK webserver. The docking findings resulted in the generation of 5 clusters, each including a wide range of structures that depicted 95.5% of water-refined models. A weighted average of the van der Waals intermolecular energy, electrostatic intermolecular energy, desolvation energy, restraints violation energy, and buried surface area is used to calculate the HADDOCK score. The lower the HADDOCK scores, the stronger the complex’s binding affinity. The number of standard deviations from the mean at which a specific cluster is positioned was represented by a Z-score. Negative Z-scores are thought to illustrate a strong HADDOCK cluster better. Based on the Z score and HADDOCK score, the best cluster for each complex was considered, along with other intermolecular energies and restraints (Supplementary Table 4). The best cluster possessed the top 4 poses for each of the complexes of our study and was visualised with PyMOL2 (V2.5.2), and non-covalent interactions at (<3.5 A°) were considered. The best pose of the 8655-lncRNA complex possessed three hydrogen bonds between the nucleotides (A33, G31, and A30) and the ligand, possessing a HADDOCK score of −21.1 ± 1.7. Similarly, the 62348-lncRNA complex was involved in three hydrogen bonds at G32, A5, and A6 with a score of −17.0 ± 1.7. However, the 246929-lncRNA complex possessed two H bonds at G32 with the highest score of −22.6 ± 2.4. The H bond distance between the ligands and the nucleotides was acceptable. Moreover, our docking analysis uncovered the Van der Waals interactions between the ligands and the lncRNA (Fig. 5, Table 2).
Table 2The HADDOCK score and non-covalent bond interactions (<3.5 A°) of the best-docked complexes of the lncRNA and the volatile ligands
Complexes | HADDOCK Score | H-bonds and distance (A°) between lncRNA-ligand complex | Nucleotides participating in Van der Waals interactions |
---|
lncRNA-8655 | −21.1 ± 1.7 | A33 (H62)-O4 (1.6); G31 (H1)-O3 (2.3); A30 (H62)-O3 (2.7) | A30 |
lncRNA-62348 | −17.0 ± 1.7 | G32(H1)-O3 (1.8); G32(H22)-O3 (2.2); A5(O3)-O2HAA (2.2); A6(O1)-O2HAA (2.4) | A34 and A33 |
lncRNA-246929 | −22.6 ± 2.4 | G32(H22)-02 (1.7); G32(H21)-03 (2.1) | A33 |
Binding stability of the complexes
The MD simulation assessed the binding stability of the best-docked lncRNA-ligand complexes. The ligand-free lncRNA served as the control for the simulation. The RMSD calculations were used to calculate the complex’s deviation during the 20 ns trajectory period with a snapshot of the atomic coordinates every 1 picosecond (ps). In contrast to the control, the average RMSD value of 1.63 nm, 1.7 nm for 8655-lncRNA, 1.86 nm for 62348-lncRNA, and 2.08 nm for 246929-lncRNA was determined, respectively (Fig. 6).
Discussion
The study involved identifying and targeting a lncRNA, dysregulating the multiple biological processes driving the radioresistance in NSCLC cells. Since nucleic acid-based techniques often entail big, generally highly charged molecules and pose delivery challenges, small-molecule therapeutics targeting RNA are frequently favoured.32 The authors investigated the RNA microarray dataset and estimated gene expression based on log2FC among radioresistant and sensitive A549 cells. The experimental investigators of the dataset claim 529 DEGs, enriched to impaired immune response and TNF signalling. However, our statistical analysis revealed the presence of 589 DE mRNAs and 496 DE lncRNAs. The DE mRNAs were further leveraged for the downstream analysis, involving the physical and functional relationship through a static network-based approach.33 A noteworthy cluster was identified based on densely interconnected regions within the network by important genes, their relationships, and their significance in regulating biological processes. The log2FC-based retrieved hub genes explain the dysregulation of the cell cycle and evasion of cancer immune responses.
Radiation resistance is associated with immune escape, involving various established mechanisms for evasion of cancer immune responses, such as cytokine release, silencing T cell activation, and other alterations.6 In our study, IFNB1, which involved autophagy and immunomodulatory function, was downregulated.34 The inherent radiosensitivity of cell subpopulations prevalent in low- and high-radiosensitive subsets has been altered. This distinction relies on hypoxia levels, DNA repair ability, the number of dividing and apoptotic cells, and cell cycle stages. Cell cycle regulation plays a significant part in this process.35 CKS1B, and CDC25C are the highly ranked hub genes identified in our study that are crucial in impaired cell cycle regulation and drug resistance, reflecting our results on their expression levels in radioresistance.36,37 Also, overexpression of CDT1 triggers aberrant cell replication activates DNA damage checkpoints and promotes numerous malignancies.38 Furthermore, IRF1, a tumour suppressor gene, was downregulated, indicating apoptosis evasion and possibly negatively regulated by the Wnt/β-catenin signalling pathway.39
It is critical to select biomarkers that may predict therapeutic response and clinical outcomes using Kaplan-Meier plots.40 The survival risk of expression cohorts in the NSCLC patients was explained for the hub genes for 200 months. The lower median of high expression cohorts of the hub genes (CKS1B, CDC25C, and CDT1) revealed their role in impaired cell cycle regulation, increasing the survival risk in NSCLC patients with a hazard ratio greater than 1, which indicates greater risk.41 The tumour suppressor IRF1 possessed a lower median for the high-expression cohort. However, the study by Xu et al.42 revealed that overexpression of IRF1 hindered the development and enhanced radiosensitivity in cancer cells, probably by modulating interferon-induced proteins, which is parallel to the finding.
This study uncovered a co-expression network to acquire a deeper insight into the possible relationships and related processes of the co-expressed lncRNAs and hub genes. The Pearson correlation coefficient was used to determine the co-expression between the hub genes and DE lncRNAs. The Pearson correlation yields an absolute number between 0 and 1. The correlation’s absolute value, closer to 1, was claimed to be significantly co-expressed by the pair of genes (hub gene-lncRNA).43 The co-expression network uncovered a novel lncRNA (ENST00000605056) regulating three highly ranked hub genes, and the authors claim it to be active and involved in driving radioresistance.12
The lncRNA structure was predicted using the most frequently employed method of thermodynamic models. RNAfold leveraged the well-known Zuker algorithm to efficiently predict an optimal secondary structure with the minimum free energy.44 The dot-bracket notations encoded the topology of the lncRNA secondary structure and aided in exploiting its tertiary structure.28 The predicted three-dimensional structure of the lncRNA opened an avenue to study its molecular interaction with the selected volatile ligands.
The plant volatiles possess low molecular weight and high vapour pressure45 and diverse scaffolds and are deemed potential candidates for lead development.46,47 However, poor pharmacokinetics is frequently a stumbling block during drug development.48 Therefore, the pharmacokinetic properties of the ligands were determined (Supplementary Fig. 6, Supplementary Table 5). The selected ligands satisfied the Rule of Five, indicating favourable drug-like properties and are bioavailable.49 The molecular interaction of the ligands with the lncRNA revealed the preponderance of H bonds, Van der Waals, and pi-stacking, promoting high binding affinity and stability.31 Moreover, the HADDOCK cluster revealed an active site at 5–40 nucleotides, possessing the best poses of the complex (ligand-lncRNA) with high binding affinity.14
The molecular interaction analysis of the complexes was further extended to validate the stability of the complexes and facilitate enhanced pharmacokinetic effects to inhibit the dysregulated lncRNA. The MD simulation assessed the stability of the complexes to understand the structural and conformational changes. The conformations and stability were assessed by comparing the distance between the ligand-bound nucleotide atoms and the lncRNA’s unbound states. Therefore, we employed RMSD calculations to determine the distance between nucleotide atoms. An MD simulation study of the miRNA-ligand complex by Junaid et al.31 revealed the average RMSD > 2 for a trajectory period of 10 ns. However, our findings indicated that the complexes, in contrast to the control, acquired conformational stability in the 20 ns trajectory period. The RMSD of the ligand 8655 and 62346 was <2. The ligand 8655 possessed a similar trajectory of structural deviation compared to the control, demonstrating it to be the potential lead for targeting the dysregulated lncRNA, inducing radioresistance in NSCLC patients.
Conclusion
The study aimed to identify and target the lncRNA in inducing radiation resistance in NSCLC cells using selected small volatile ligands with favourable pharmacokinetic attributes. The co-expression network revealed the ENST00000605056 lncRNA to regulate clinically significant genes CKS1B, IRF1 and IFNB1 involved in NSCLC pathogenesis. The molecular interaction study demonstrated a strong binding association of the selected volatile ligands with the lncRNA with a preponderance of non-covalent bonds, promoting binding stability. Moreover, the MD simulation determined the dynamic behaviour and stability of the complexes, and the lead 8655 possessed lower RMSD than the other selected ligands, indicating it to be the potential lead in targeting dysregulated lncRNA. This study filled the gap in understanding the drugging of a small molecule with lncRNAs triggering radioresistance with an impaired cell cycle regulation, immune response and apoptosis evasion using an in-silico approach. Furthermore, upon experimental confirmation, this strategy would assist in developing small molecules to target specific lncRNAs in the NSCLC aetiology with improved pharmacokinetic attributes.
Supporting information
Supplementary material for this article is available at https://doi.org/10.14218/GE.2023.00005S .
Supplementary Table 1
The bioinformatics tools employed in this study and their principles and applications are listed.
(DOCX)
Supplementary Table 2
The significant results obtained in the study are summarized.
(DOCX)
Supplementary Table 3
The median of the expression cohorts of the hub genes determines the overall survival risk for NSCLC patients.
(DOCX)
Supplementary Table 4
The best clusters obtained from the complexes obtained from the HADDOCK with associated intermolecular interaction energies.
(DOCX)
Supplementary Table 5
The drug-likeness of the selected volatile ligands based on the rule of five assessed using SwissADME.
(DOCX)
Supplementary Fig. 1
The outcome of this study is presented in a flowchart.
(TIF)
Supplementary Fig. 2
The interaction of the DE mRNAs dataset is represented using a PPI network.
(TIF)
Supplementary Fig. 3
(a) The top-scored cluster (11.273) was identified from the PPI network using the MCODE plugin of Cytoscape. (b) The top 5 ranked hub genes were determined using the maximum clique centrality (MCC) technique.
(TIF)
Supplementary Fig. 4
The Kaplan Meier plots of the hub genes represent their clinical significance based on the median survival of NSCLC patients (n = 719) for 200 months.
(TIF)
Supplementary Fig. 5
The tertiary structure of the ENST00000605056 lncRNA was predicted using RNA COMPOSER.
(TIF)
Supplementary Fig. 6
The radar plot represents the predicted pharmacokinetic properties of the selected volatile ligands.
(TIF)
Abbreviations
- CID:
identification numbers
- DE:
differentially expressed
- DEGs:
differentially expressed genes
- FDR:
false discovery rate
- GO:
gene ontology
- HR:
hazard ratio
- lncRNA:
long non-coding RNA
- MD:
molecular dynamics
- NSCLC:
non-small cell lung cancer
- RMSD:
Root Mean Square Deviation
Declarations
Acknowledgement
The authors thank the organizers and the management of CompBio22 for providing the Jetstream2 HPC facility to carry out this study.
Data sharing
The dataset is publicly available at https://www.ncbi.nlm.nih.gov/geo/ (GSE197236).
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or non-profit sectors.
Conflict of interest
The authors have no conflict of interests to declare.
Authors’ contributions
AM: Conceptualization; Methodology; Software; Data curation; Formal analysis; Investigation; Validation; Visualization; Writing-original draft; Writing-review and editing. MKS: Conceptualization; Methodology; Supervision; Writing-review and editing.