Introduction
Gastric cancer (GC) is the sixth most common cancer in the world, with more than 70% of cases occurring in the developing world. GC is the third leading cause of cancer deaths worldwide (source: WHO, 2018). More than 50% of cases occur in Eastern Asia. In Asia, GC is the third most common cancer after breast and lung and is the second most common cause of cancer death after lung cancer.1
The seroprevalence of Helicobacter pylori is closely related to the incidence of GC.2–4 In recent years, other bacteria have been proposed as risk factors for GC, including Propionibacterium acnes and Prevotella copri,5Fusobacterium nucleatum6,7 and Leptotrichia wadei.8Prevotella melaninogenica, Streptococcus anginosus and P. acnes have been reported as increased in the tumoral microhabitat.9 The centrality of Peptostreptococcus stomatis, S. anginosus, Parvimonas micra, Slackia exigua and Dialister pneumosintes in GC tissue has also been reported.10 Furthermore, P. acnes has also been associated with lymphocytic gastritis.11 The association between periodontal pathogens and GC has been questioned, and answered so far negatively regarding the gastric microbiome12,13 but positively regarding the oral microbiome.14
The availability of a number of these studies in the form of raw microbiome sequence reads offers the possibility to revisit the GC microbiome using a uniform bioinformatics approach, to obtain a consensus of additional species possibly involved in GC and address therapeutic options beyond H. pylori eradication therapy.
Materials and methods
We identified a total of 12 eligible datasets from the literature and the NCBI BioProject repository. Dataset SRP080738 was excluded due to mismatch of paired-end sequences as submitted. Dataset SRP224905 was excluded because the variable regions sequenced were not documented. Dataset SRP109017 was excluded because of the extreme amount of non-specific human DNA amplification. Most eligible datasets are from China (Table 1). Scientific publication has been issued for the following projects: PRJEB21497,15 PRJEB21104,16 PRJEB22107,17 PRJNA428883,9 and PRJNA495436.18 For the purpose of comparison, we also included all five colorectal cancer (CRC) mucosa biopsy datasets we had previously analyzed (Supplementary Material, Table S1).
Table 1Gastric mucosa samples used in this study.
BioProject | SRA | n | 16S | Study metadata | Region |
---|
PRJEB21104 | ERP023334 | 93 | V1-V2 | disease progress | UK |
PRJEB21497 | ERP023753 | 34 | V4 | disease progress | Malaysia |
PRJEB22107 | ERP024440 | 30 | V1-V2 | Hp+/−, CagA+/− | Austria |
PRJNA313391 | SRP070925 | 119 | V3-V4 | disease progress | China, Qingdao |
PRJNA428883 | SRP128749 | 669 | V3-V4 | disease location | China, Zhejiang |
PRJNA481413 | SRP154244 | 301 | V4 | anatomic location | China, Nanchang |
PRJNA495436 | SRP165213 | 32 | V3-V4 | pre/post-Hp eradication | China, Nanchang |
PRJNA508819 | SRP172818 | 173 | V3-V4 | disease location | China, Zhejiang |
PRJNA545207 | SRP200169 | 63 | V3-V4 | healthy only | China, Nanchang |
Total | | 1,514 | | | |
Data analysis
Amplicon sequence variants (ASVs) were generated with the R Bioconductor package dada2,19 version 1.12.1, with recommended parameters, involving quality trimming, discarding of sequences with N’s, assembly of forward and reverse sequences and chimera removal, as described previously.20 ASVs per dataset were subject to further analysis, involving multiple alignment with mafft, version 6.603b21 and approximately-maximum-likelihood phylogenetic tree generation with FastTreeMP, version 2.1.11,22 both used with default settings.
Taxonomic classification of ASVs were performed by an in-house Python and R program using random forest-based supervised learning on RDP release 11.5. The classifier assigns a species or higher level taxonomic identity to each ASV. Resulting classifications are available from the github repository https://github.com/GeneCreek/GC-manuscript in the form of R data objects.
UniFrac distances were computed using the R Bioconductor package phyloseq, version 1.28.023 on raw ASVs. Further analysis used counts and relative abundances summarized at the species level, using the provided taxonomic classifications.
Dirichlet multinomial mixtures were computed with the R bioconductor package DirichletMultinomial, version 1.26.0,24 using default parameters. The required processing steps are provided on https://github.com/GeneCreek/GC-manuscript/blob/master/scripts/dmm_community_types.Rmd .
Classification prediction was performed using the R caret package, version 6.0.84, provided random forest model. Variable (taxa) importance was estimated using the mean decrease in node impurity. Multiclass area-under-the-curve (AUC)25 was computed by the R package pROC, version 1.15.3.
Ecological networks were computed using inverse covariance with SPIEC-EASI26 as incorporated in the R Bioconductor package SpiecEasi, version 1.0.7, using default parameters.
For the nitrosating status of species, we required that at least one non-redundant genome for the species carries a UniProt annotated nitrate reductase alpha unit gene (narG).27
Prevalence difference analysis across disease progress, disease state and H. pylori eradication state was computed using Pearson’s χ2 testing as implemented by the R stats package provided chisq.test, with Monte Carlo simulation-based computation of p-values.28
Co-exclusion and co-occurrence between species for probiotics composition were computed using χ2 testing on detectable presence of species in samples (n = 17,844) from a set of 30 clinical- and crowd-sourced 16S studies, all performed on the Illumina platform (Table 1 and Supplementary Material, Table S1).
A full-stack analysis script for dataset SRP128749 is provided on https://github.com/GeneCreek/GC-manuscript as a detailed processing example.
Results
Pathogens in gastric mucosa
Among the species with highest prevalence in gastric mucosa of healthy individuals (n = 85), we found a substantial number of opportunistic pathogens, with the majority being known as periodontal pathogens. Figure 1 depicts the distribution of prevalence and relative abundances of the top 20 periodontal and other pathogens. Whereas the position of H. pylori is obviously not a surprise, the 60% prevalence of the skin pathogen P. acnes (recently renamed to Cutibacterium acnes) was unexpected. The position of F. nucleatum, a known CRC-associated pathogen, among the top four pathogens is also remarkable. We found 17 distinct ASVs assigned to P. acnes and 53 distinct ASVs assigned to F. nucleatum in this dataset.
Gastric mucosa community analysis
We applied unsupervised clustering to investigate microbial gastric mucosa community structure, irrespective of sample disease status. In brief, using Dirichlet multinomial mixtures, we obtained an optimal goodness of fit at k = 5 communities according to the Laplace and Akaike information criterion evaluations (Supplementary Material, Fig. S1). Assigning per sample community types accordingly, we then retrieved the top 100 most important species. We assigned species to community types by maximum contribution. Putative interactions between these species were retrieved from the SPIEC-EASI ecological network constructor, which operated independently from the community structure on all 1,544 samples. Figure S2 in the Supplementary Material depicts the correspondence between species community types and the correlation network.
For community types one and two, the dominating species was H. pylori, with levels exceeding 50% (Supplementary Material, Fig. S3). Community type two had the lowest phylogenetic diversity of all community types (Supplementary Material, Fig S4). Community type four received the majority of periodontal pathogens, whereas community types three and four harbored the most abundant nitrosating species (Table 2).
Table 2Distribution of periodontal and other pathogens and nitrosating bacteria over community types
Community type | Periodontal | Other | Nitrosating |
---|
dmm 1 | 3 | | 2 |
dmm 2 | | 1 | |
dmm 3 | | 3 | 9 |
dmm 4 | 20 | 5 | 8 |
dmm 5 | | 2 | 1 |
Anatomical locations
Dataset SRP154244 presents samples from different anatomical gastric locations in patients with gastritis, intestinal metaplasia, and GC. We investigated if microbial signatures cluster by gastric location using random forest models and ecological networks (Supplementary Material, Table S5 and Fig. S5). Although we observed segregation between interacting antral curvature species on the one hand and corpus/antrum species on the other hand, it does not seem we can explain the distribution of datasets over the community types by difference in anatomical location alone.
Disease progress
Dataset SRP070925 contains gastric mucosa samples (n = 119) from patients with gastritis, intestinal metaplasia, early GC and advanced GC. We combined this dataset with dataset SRP200169, containing gastric mucosa samples (n = 63) from healthy subjects. Both are from Chinese cohorts and have been analyzed using the 16S variable regions V3-V4 combined on the Illumina MiSeq. Performing multi-dimensional scaling on unweighted UniFrac distances, we found the disease stages are well separated (Supplementary Material, Fig. S6).
We performed supervised learning of disease progress status with random forests on two-thirds of the combined dataset, with evaluation on the remaining third. Relative abundances summarized at the species level were used as the analysis substrate. Table S6, Supplementary Material provides the classification results. Metaplasia samples were confounded with gastritis and early cancer, whereas advanced cancer samples were in part classified as early cancer. Healthy, gastritis and early cancer samples were well classified, resulting in an overall multi-class AUC of 0.936.
Sample disease location
Dataset SRP128749 contains gastric mucosa samples (n = 669) from GC patients and comprises triplet tumor, peripherical and normal samples. We added biopsies from healthy subjects to this cohort, again using dataset SRP200169, to challenge the idea that GC normal reflects entirely healthy tissue. Performing multi-dimensional scaling on unweighted UniFrac distances, we found the disease locations show interesting separation (Supplementary Material, Fig. S9). We performed two supervised learning experiments on the combined dataset, one with a two-thirds training, one-third evaluation setup and a second using one additional dataset SRP172818 (n = 173) also containing triplets as the cross-validation set. All three datasets are from Chinese cohorts and have been analyzed using the 16S variable regions V3-V4 combined on the Illumina MiSeq.
Table S7, Supplementary Material provides the classification results on the combined SRP128749 and SRP200169 dataset. The model performs with a multi-class AUC of 0.842. Just one normal sample is confounded with healthy samples. The model performance increased to an AUC of 0.906 when trained on the whole combined dataset and cross-validated on the SRP172818 dataset (Supplementary Material, Table S8). None of the GC normal samples were confounded with samples from healthy donors.
Species relevant in GC
We disposed of four datasets having the metadata required for the association of species with tumor status, whether from a disease progress or disease location standpoint. In brief, we processed datasets individually and retrieved the top 50 differentiating species from the random forest models, trained on the dataset as a whole. We generated ecological networks using these top species, retaining only connected nodes for display.
Figure 2 provides the putative interaction network of the disease location datasets SRP172818 and SRP128749, showing reproducible tumor association of, and possible interaction between, the oral species F. nucleatum, P. micra, P. stomatis and Catonella morbi. Correlation indicates the interaction would be cooperative. Figures S10 and S11, Supplementary Material provide the same analysis for the disease progress datasets SRP070925 and ERP023334, respectively; in the first of which, we found P. melaninogenica associated with advanced cancer status and in the second F. nucleatum with cancer status.
Prevalence differences
An alternative take on the species differentiating between disease states, using χ2 testing of difference in prevalence, is presented in Tables S9–S13, Supplementary Material. P. acnes was reproducibly found at over 61% in GC tumor samples, whereas P. stomatis was found at over 54%, P. micra over 37% and F. nucleatum over 35% in GC tumor samples. The presence of all four roughly doubled over their baseline prevalence in normal samples (Supplementary Material, Tables S9 and S10).
Comparison with CRC
We tested five previously analyzed CRC datasets for presence and interactions of F. nucleatum, P. micra and P. stomatis. All five datasets SRP117763 (n = 34, tumor-only),29 SRP137015 (n = 211, tumor/peripherical/normal),30,31 SRP076561 (n = 50, tumor/normal),32 ERP005534 (n = 96, tumor/normal)33 and SRP064975 (n = 98, tumor/peripherical/normal)34 have been published. We found F. nucleatum in interaction with P. stomatis in SRP137015 and P. micra in interaction with P. stomatis in datasets SRP117763 and SRP076561 (Supplementary Material, Fig. S12). Prevalence of F. nucleatum was found at 70% or more in tumor samples in SRP117763 (Supplementary Material, Table S14), at 48% in tumor samples in SRP137015 (Supplementary Material, Table S15), and at 73% in tumor samples in SRP076561 (Supplementary Material, Table S16). Listing the most abundant cancer-associated species in GC and CRC, the intersection between the two cancer types was formed by F. nucleatum, P. micra and P. stomatis (Table 3).
Table 3Comparison of GC- and CRC tumor associated species
Species | GC | CRC |
---|
Bacteroides fragilis | | 2 |
Bacteroides ovatus | | 3 |
Brevundimonas vesicularis | 2 | |
Escherichia coli | | 2 |
Fusobacterium nucleatum | 3 | 3 |
Gemella morbillorum | | 3 |
Parvimonas micra | 2 | 3 |
Peptostreptococcus stomatis | 2 | 2 |
Prevotella intermedia | | 2 |
Propionibacterium acnes | 2 | |
Eradication therapy
Dataset SRP165213 provides mucosa samples, pre- and post-bismuth quadruple H. pylori eradication therapy. Using χ2 testing of difference in prevalence, we found several bacteria, including the expected H. pylori, exhibited an important drop in prevalence (Table 4). P. stomatis, P. micra and F. nucleatum, on the other hand, showed a moderately significant prevalence increase.
Table 4Pre- and post-eradication therapy prevalence differences, dataset SRP165213
Species | Association | p value | Pre | Post | Count |
---|
Helicobacter pylori | pre | 1.0e-03*** | 17/17 (100.0%) | 2/15 (13.3%) | 19 |
Brevundimonas diminuta | pre | 1.0e-03*** | 17/17 (100.0%) | 3/15 (20.0%) | 20 |
Sphingobium yanoikuyae | pre | 1.0e-03** | 13/17 (76.5%) | 2/15 (13.3%) | 15 |
Sphingomonas yabuuchiae | pre | 2.0e-03** | 13/17 (76.5%) | 3/15 (20.0%) | 16 |
Sphingobium xenophagum | pre | 3.0e-03** | 11/17 (64.7%) | 2/15 (13.3%) | 13 |
Propionibacterium acnes | pre | 1.0e+00 | 14/17 (82.4%) | 12/15 (80.0%) | 26 |
Bifidobacterium adolescentis | post | 1.0e-03*** | 2/17 (11.8%) | 13/15 (86.7%) | 15 |
Ruminococcus bromii | post | 1.0e-03*** | 4/17 (23.5%) | 14/15 (93.3%) | 18 |
Dorea longicatena | post | 1.0e-03*** | 1/17 (5.9%) | 11/15 (73.3%) | 12 |
Leptotrichia wadei | post | 2.0e-03** | 0/17 (0.0%) | 7/15 (46.7%) | 7 |
Parvimonas micra | post | 2.8e-02* | 0/17 (0.0%) | 4/15 (26.7%) | 4 |
Peptostreptococcus stomatis | post | 3.0e-02* | 5/17 (29.4%) | 11/15 (73.3%) | 16 |
Fusobacterium nucleatum | post | 4.6e-01 | 5/17 (29.4%) | 7/15 (46.7%) | 12 |
Modulation of the gastric mucosa microbiome
Using prevalence data from 17,844 samples, including the samples used in this study, we probed for qualified presumption of safety (referred to here as QPS) species found in co-exclusion with the species of interest panel identified above (Fig. 3). Bifidobacterium longum appears as the most promising QPS species, followed by Streptococcus salivarius; both of these are being used in probiotic products and are actually detectable in gastric mucosa samples (see Fig. 2b for B. longum). In the healthy dataset SRP200169, we found 27 ASVs for B. longum but none for S. salivarius, indicating that the latter is possibly not commensal in the stomach in healthy individuals.
Discussion
In this study, we revisited public gastric mucosa and CRC datasets, taking into account recent advances in processing of amplicon metagenomic sequences,35 establishing species level taxonomic classification.
Limitations
Use of a healthy cohort analyzed as a separate batch and from a different regional cohort does not allow for control of batch or regional effects in supervised learning. Regional clustering of GC microbiota has been reported previously.36 So, our hypothesis that samples from healthy donors are distinct from GC normal samples in GC patients is delicate. For confirmation of this hypothesis, healthy donors need to be recruited from the same population as the GC patients.
Four subspecies are known for F. nucleatum. Our taxonomic classifier does not resolve down to the level of subspecies, so all counts and relative abundances for F. nucleatum may conceal different subspecies, moreover so since in CRC, multiple subspecies have been isolated from biopsies37 and since we detected several tens of distinct ASVs associated with F. nucleatum.
Low biomass and contamination
P. acnes has been proposed as a possible contaminant of many experiments.38 This is particularly relevant for gastric samples which are of low biomass as compared to biopsies from the lower gastrointestinal tract. That does not mean we need to discard the bacterium altogether, notably not if it shows significant increase in tumor sample locations as in datasets SRP172818 and SRP128749, but it could mean its baseline presence is overestimated and hence its status as a gastric mucosa commensal.39 Its position as a prevalent but low abundant species in healthy subjects gives credit to the contamination thesis. However, the number of ASVs associated with P. acnes suggests that if there is contamination, it originates from multiple individuals. The fact that the bacterium never reached high abundance in the experiments means that it did not contaminate low biomass samples in particular.
H. pylori
In all datasets, we found gastric mucosa samples completely exempt of H. pylori, including in normal and peripherical samples, which opens the possibility that other pathogens play a role in GC. We did not find H. pylori in significant interaction, which is unexpected and discrepant to findings reported from the same dataset SRP128749.9 We attribute this discrepancy to the use of a more stringent ecological network inference.26 On the other hand, report has been made that H. pylori presence did not affect microbial community composition.40 So, it seems that although H. pylori may create oncogenic conditions through host interaction, there does not seem to be a direct benefit or detriment of such conditions for other bacteria.
Cohort-specific species
Our results show species found in gastric mucosa have a strong cohort-specific distribution of species. Within cohort prediction of sample disease status or location status based on the microbiome composition is performing well (with AUCs > 0.8); so, despite its diversity, there is a clear sample status signature in the microbiome composition.
Nitrosating species
Nitrosating bacteria convert nitrogen compounds in gastric fluid to potentially carcinogenic N-nitroso compounds, which are believed to contribute to GC.41–45 We found nitrosating bacteria were not uniformly distributed over gastric mucosa community types. Community type four combines nitrosating species with periodontal pathogens and can be considered as the highest GC risk community type.
Periodontal and CRC pathogens
It has been reported that among patients with periodontal disease, high levels of colonization of periodontal pathogens are associated with an increased risk of gastric precancerous lesions.13 We found the periodontal pathogens F. nucleatum, P. micra and P. stomatis to be commensal but also associated with tumor status and in direct interaction in several datasets. These three species were also found in association with tumor status in CRC datasets revisited and correspond with a CRC subtype with strong immune signature.29 Revisiting the CRC datasets, we found in part the same interactions as in GC. Two recent meta-analysis of CRC case-control studies placed F. nucleatum, P. micra and P. stomatis among the top five carcinoma-enriched species.32,46F. nucleatum and P. stomatis have also been proposed among a panel of species for early detection of CRC.33
Virulence
The Gram-negative bacterium F. nucleatum promotes tumor development by inducing inflammation and host immune response in the CRC microenvironment. Its adhesion to the intestinal epithelium can cause the host to produce inflammatory factors and recruit inflammatory cells, creating an environment which favors tumor growth. Treatment of mice bearing a colon cancer xenograft with the antibiotic metronidazole reduced Fusobacterium load, cancer cell proliferation, and overall tumor growth.47F. nucleatum can induce immune suppression in gut mucosa, contributing to the progression of CRC.48 In CRC, F. nucleatum is predicted to produce hydrogen sulfide,30 which is a metabolite with a dual role, both carcinogenic and anti-inflammatory. Epithelial cells react to F. nucleatum by activation of multiple cell signaling pathways that lead to production of collagenase-3, increased cell migration, formation of lysosome-related structures, and cell survival.49
Furthermore, it is predicted that F. nucleatum infection regulates multiple signaling cascades, which could lead to up-regulation of proinflammatory responses, oncogenes, modulation of host immune defense mechanism, and suppression of the DNA repair system.50 There does not seem to be a reason why F. nucleatum would not be pathogenic in gastric tissue whereas it is in periodontal, respiratory tract, tonsils, appendix, colonic and other tissues.51
The Gram-positive anaerobe P. stomatis has been isolated from a variety of periodontal and endodontic infections, as well as infections in other bodyparts.52 The species has been found associated with oral squamous cell carcinoma.53 At present, little is known about the specifics of its pathogenicity. The type strain (DSM 17678) genome harbors a gene (mprF, phosphatidylglycerol lysyltransferase) producing lysylphosphatidylglycerol (termed LPG), a major component of the bacterial membrane with a positive net charge. LPG synthesis contributes to bacterial virulence, as it is involved in the resistance mechanism against cationic antimicrobial peptides produced by the host’s immune system and by competing microorganisms. Contrary to other Peptostreptococci, P. stomatis does not produce intestinal barrier enforcing indole-3-propionic acid or indoleacrylic acid.54
P. micra, previously known as (Pepto)streptococcus micros, is a Gram-positive anaerobe known to be involved in periodontal infections. It has also been isolated from oral squamous cell carcinoma.55 It is a producer of collagenase and exhibits limited elastolytic and hemolytic activity.56 In a mouse CRC model, P. micra elicited increased Th2 and Th17 cells, decreased Th1 cells and increased inflammation.57
The oral cavity as a reservoir
It has been shown that in a number of cases (6/14, 43%) identical F. nucleatum strains could be recovered from CRC and saliva of the same patients.58 Furthermore, the oral microbiome composition is to a certain extent predictive for CRC disease progress status.59 It is tempting to speculate that a similar relationship could be explored for disease progress in GC.
Biofilm formation
F. nucleatum is regarded as a central organism for dental biofilm maturation, due to its wide ability to aggregate with other microorganisms, such as Porphyromonas gingivalis.60 It is considered as a bridge bacterium between early and late colonizers in dental plaque.61 The eventuality of H. pylori- and non-H. pylori biofilm formation in the gastric environment has been raised.62 Our ecological interaction networks suggests F. nucleatum and other bacteria but not H. pylori could indeed engage in gastric mucosa biofilms and more particularly in GC biofilms.
Antibiotherapy
H. pylori eradication therapy has been shown to have a prophylactic effect against GC.63 The first-line therapy consists of a proton pump inhibitor or ranitidine bismuth citrate, with any two antibiotics among amoxicillin, clarithromycin and metronidazole. In vitro testing has shown P. stomatis is sensitive to amoxicillin and metronidazole.64F. nucleatum is sensitive to amoxicillin or amoxicillin/clavulanate combination therapy65 and to metronidazole.47,66P. micra is sensitive to amoxicillin/clavulanate and metronidazole.67In vivo sensitivity of the species may differ and in addition, with the oral cavity as a reservoir, periodontal pathogens could recolonize the gastric environment and take advantage of the space cleared by H. pylori, which is what our data suggests.
Probiotics use
We predicted in silico that several QPS species could be effective against the spectrum of H. pylori and the periodontal pathogens discussed above. Our findings are coherent with the report that probiotics including B. longum, Lactobacillus acidophilus, and Enterococcus faecalis significantly reduced the abundance of F. nucleatum in CRC surgery patients by nearly 5-fold, whilst normalizing dysbiosis.68In vitro adhesion inhibition of Gram-negative species by B. longum has been reported.69 Other than adhesion inhibitors, Bifidobacteria produce acetate and lactate as well as vitamins, antioxidants, polyphenols, and conjugated linoleic acids which have been proposed to act as chemical barrier against pathogen proliferation.70S. salivarius not only inhibits adhesion of pathogens to epithelial cells but also produces bacteriocins.71