Introduction
Numerous single-nucleotide polymorphisms (SNPs) have been discovered and deposited in public databases (e.g. the National Center for Biotechnology Information [http://www.ncbi.nlm.nih.gov ], Ensembl [http://asia.ensembl.org/index.html ], and the MEXT Integrated Database Project [http://dbcls.rois.ac.jp ]) through international SNP discovery projects such as the Human Genome Project,1 the International HapMap project (http://hapmap.ncbi.nlm.nih.gov/index.html.en ), and the 1000 Genomes project (www.1000genomes.org ). Together with the development of technologies for large-scale SNP genotyping, genome-wide association studies (GWASs) using hundreds of thousands of SNPs allow the identification of candidate genetic loci for multifactorial diseases. Disease-associated SNPs have also been deposited in public databases, such as the database of Genotypes and Phenotypes (www.ncbi.nlm.nih.gov/gap ). Moreover, a number of SNPs have been reported to be associated with complex genetic traits, such as body mass index,2 height,3 and hair thickness.4 In the National Human Genome Research Institute (NHGRI) GWAS catalog (www.genome.gov ), more than 8,800 trait- or disease-associated SNPs with genome-wide significance (p<5×10−8) have been archived from a total of 1,551 published GWAS (through March, 2013).5
Here, we describe a GWAS strategy to identify disease-associated SNPs, including SNP genotyping technologies for both the GWAS stage and the following replication stage. Based on this GWAS strategy, we have identified associations of genetic variations with diseases related to hepatitis B and C viruses (HBV and HCV), including drug response in patients with chronic HCV infection,6 susceptibility to primary biliary cirrhosis (PBC),7 and HBV-related hepatocellular carcinoma (HCC).8
Technologies for GWAS and replication analysis
A number of SNP typing methods have been used to analyze a single SNP, or SNPs at multiple sites of a template or templates simultaneously. Most of the methods employ single or multiple site-specific amplifications and a genotyping step based on various types of chemical reactions, including Sanger direct sequencing, 5′ exonuclease fluorescence-based assay (TaqMan),9 pyrosequencing,10 DigiTag2 assay,11 single-base extension,12 and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF).13
Together with technology developments in large-scale SNP genotyping, the most recent versions of commercially available genotyping platforms allow the simultaneous analysis of more than one million SNPs across the whole genome in a single experiment. Two platforms are commercially available and widely used for genome-wide SNP typing: Affymetrix SNP GeneChip arrays14 and Illumina BeadArray genotyping technology.15 The number of SNPs embedded in both platforms has been gradually increasing since 2003, when the first commercial genome-wide SNP genotyping platform was released by Affymetrix.16 The first platform of the Affymetrix GeneChip Mapping 10K Array included 14,548 SNPs, which enabled the performance of whole-genome linkage analyses and was indeed used to identify a disease-associated missense mutation in the HOXD10 gene with Charcot–Marie-Tooth disease through a family-based linkage study.17 The current versions of the commercial platforms from Affymetrix and Illumina include more than 900,000 SNPs (Genome-Wide Human SNP Array 6.0) and 4.3 million SNPs (HumanOmni5-Quad BeadChip), respectively. A newly released genome-wide SNP typing platform, named the Affymetrix Axiom Genome-Wide ASI 1 Array, has a probe set for SNPs (including rare and common variants) that are optimized for Asian populations. These platforms open a new approach for researchers to perform GWASs with hundreds of thousands of SNPs, allowing the identification of candidate genetic loci for multifactorial diseases.
In 2002, the first GWAS using 92,788 gene-based SNPs was reported by a Japanese group (RIKEN), which identified the lymphotoxin-α gene as being associated with susceptibility to myocardial infarction.18 The RIKEN group developed its own platform to perform a GWAS based on the Invader assay19 with multiplex polymerase chain reaction (PCR).20 Since 2002, the number of published genome-wide associations with genome-wide significance (p<5×10−8) has increased annually, reaching 1,551 publications in the NHGRI GWA catalog (through March, 2013).5
For a replication study following a GWAS stage, several candidate genetic regions that have been detected in the initial GWAS need to be analyzed. Suitable platforms for replication analyses have the ability to perform multiplex detections in a single reaction, such as the mini-sequencing (SNaPshot) technique,21 chip-based genotyping by mass spectrometry (Sequenom),22 and the DigiTag2 assay.11 The DigiTag2 assay is our own technology for multiplex SNP typing, and represents a simple and cost-effective approach by combining multiplex PCR to enrich genetic regions including the target SNPs with an oligonucleotide ligation assay to determine the genotype of the target locus. For a single locus analysis the TaqMan assay would be more commonly used to determine the genotype of the target locus, as opposed to conventional Sanger sequencing, which is more commonly used when a large number of samples need to be analyzed.
Hepatitis research based on GWAS
In a GWAS, two groups of participants are compared to detect the “association(s)” of certain variants with a particular trait by examining differences in allele and/or genotype frequency of all SNPs, which exist across the entire genome. GWAS enables the effective detection of associated variations in strong linkage disequilibrium with the causal variants and genes, and the following replication analysis and high-density mapping identify the causal variants and genes using an independent set of participants with a larger number of samples. However, the association of SNPs with low minor allele frequency (below 1–5%; known as rare variants) would be difficult to detect in a SNP-based GWAS because of insufficient statistical power due to the limitation of sample number.23Fig. 1 outlines the GWAS strategy from whole-genome SNP typing to replication analysis.
The emerging strategy of GWAS has revealed disease-causing alleles, or variants that lead to susceptibility to complex polygenic diseases with small additive or multiplicative effects on the disease phenotype. For example, a recent GWAS and subsequent meta-analyses in populations of European descent identified human leukocyte antigen (HLA) and 21 non-HLA susceptibility loci, most of which are involved in interleukin (IL)-12/IL-12 receptor (IL-12R) signaling, tumor necrosis factor (TNF)/toll-like receptor (TLR)–nuclear factor (NF)-κB signaling, and B-cell differentiation in the development of PBC.24–27 PBC is a chronic cholestatic liver disease characterized by chronic non-suppurative destructive cholangitis of the intrahepatic small bile ducts. A high concordance rate in monozygotic twins and familial clustering of patients with PBC indicates the involvement of strong genetic factors in the development of PBC.28 To identify susceptibility loci for PBC in the Japanese population, we conducted a GWAS and subsequent replication study using a total of 1,327 PBC patients and 1,120 healthy controls.7 In addition to the most significant susceptibility region at HLA, two significant susceptibility loci (TNFSF15 and POU2AF1) with p-values <5×10−8 were identified (Table 1). Moreover, of the 21 non-HLA susceptibility loci that were identified in populations of European descent, three loci (IKZF3, CD80, and IL7R) showed significant associations and two loci (NFKB1 and STAT4) showed suggestive associations with PBC in the Japanese population. Five other loci (CXCR5, TNFAIP2, MAP3K7IP1, rs6974491, and DENND1B) also showed marginal associations (p<0.05) with PBC in the Japanese population (Table 1). These results indicate that additional important disease pathways (via TNFSF15 and POU2AF1) – differentiation to T-helper 1 (Th1) cells (via TNFSF15, CD80, IL12, IL12R, and STAT4; Fig. 2), B-cell differentiation (via POU2AF1, CXCR5, SPIB, IL7R, and IKZF3), and NF-κB signaling – in addition to the previously reported disease pathways have a role in the development of PBC in Japanese populations.
Table 1Replication analysis of Japanese samples for SNPs associated with PBC in previous studies, and two newly identified loci (TNFSF15 and POU2AF1)
Gene name | SNP | OR | 95% CI | P-value |
Significant associations with PBC |
TNFSF15 | rs4979462 | 1.57 | 1.76–1.40 | 1.85×10−14 |
POU2AF1 | rs4938534 | 1.38 | 1.55–1.23 | 3.27×10−8 |
IKZF3 | rs9303277 | 1.44 | 1.63–1.28 | 3.66×10−9 |
CD80 | rs2293370 | 1.48 | 1.68–1.29 | 3.04×10−9 |
IL7R | rs6890853 | 1.47 | 1.69–1.28 | 3.66×10−8 |
Suggestive associations with PBC |
NFKB1 | rs7665090 | 1.35 | 1.52–1.21 | 1.42×10−7 |
STAT4 | rs7574865 | 1.35 | 1.52–1.19 | 1.11×10−6 |
Marginal associations with PBC |
CXCR5 | rs6421571 | 1.42 | 1.75–1.16 | 0.0004 |
TNFAIP2 | rs8017161 | 1.22 | 1.38–1.08 | 0.0006 |
MAP3K7IP1(TAB1) | rs968451 | 1.29 | 1.52–1.10 | 0.0009 |
rs6974491 | rs2717948 | 1.33 | 1.66–1.07 | 0.005 |
DENND1B | rs12134279 | 1.14 | 1.33–0.98 | 0.0405 |
No apparent associations with PBC |
rs11117432 | rs8062669 | 1.21 | 1.52–0.96 | 0.0521 |
IL12RB2/SCHIP1 | rs3790567 | 1.12 | 1.28–0.98 | 0.0540 |
RPS6KA4 | rs538147 | 1.12 | 1.28–0.98 | 0.0554 |
TNFRSF1A | rs1800693 | 1.12 | 1.30–0.97 | 0.0607 |
CLEC16A | rs12924729 | 1.10 | 1.28–0.94 | 0.1197 |
MMEL1 | rs3748816 | 1.07 | 1.20–0.95 | 0.1256 |
PLCL2 | rs1372072 | 1.07 | 1.20–0.95 | 0.1396 |
SPIB | rs3745516 | 1.08 | 1.27–0.92 | 0.1803 |
IRF5/TNPO3 | rs4728142 | 1.08 | 1.30–0.90 | 0.2027 |
RAD51L1 | rs911263 | 1.07 | 1.30–0.89 | 0.2353 |
IL12A | rs6441286 | 1.02 | 1.15–0.91 | 0.3422 |
In another study that aimed to identify host genetic factors related to drug response to pegylated interferon-α plus ribavirin treatment for HCV infected patients, comparatively small number of samples were analyzed in a GWAS, including samples from 154 Japanese HCV patients undergoing pegylated interferon-α/ribavirin treatment, 78 null virologic responders, and 64 virologic responders.6 Despite the small number of samples in the GWAS in comparison with other studies in European descendants (European American,29 Australian,30 and Swiss31), the same disease-causing locus of IL28B was identified with the strongest association in the Japanese population. In general, the number of samples affects the statistical power of detection in a GWAS. Moreover, false-positive associations can increase when low-quality genotype data are incorporated in the analysis, presumably caused by accidental errors in genotyping steps or low-quality genomic DNA. The Japanese GWAS was able to successfully identify the risk factors in a small number of samples because: (1) IL28B is a strong host risk factor for drug response in Asian and white populations; and (2) quality controls were used in sample collection in terms of clinical characteristics, and the genotype data were checked for quality.14
As for HBV-related HCC, a GWAS using chronic HBV carriers with and without HCC in five independent Chinese samples found that one SNP (rs17401966) in KIF1B was associated with susceptibility to HBV-related HCC.32 Moreover, in the most recent report on this topic, genetic variants in the STAT4 and HLA-DQ genes were identified as genetic susceptibility loci for HBV-related HCC in the Chinese population.33 We performed SNP genotyping of rs17401966 on the KIF1B gene in Japanese, Korean, and Hong Kong populations for the purpose of replication analysis of a previous GWAS report.8 We first examined two independent Japanese HBV-related HCC populations and healthy controls, including 179 patients and 769 controls from Biobank Japan, and142 patients and 251 controls from various hospitals. We did not detect any associations between rs17401966 and HCC in the Japanese population. We also detected no association of the SNP with HBV-related HCC in Korean and Hong Kong populations using 164 patients and 144 controls, and 94 patients and 187 controls, respectively. In a recent report from another group, no significant association of the KIF1B gene was observed in HBV-related HCC patients of Saudi Arabian ethnicity.34 These results may be explained by genetic diversity among the Chinese, Japanese, Korean, Hong Kong, and Saudi Arabian populations. The complexity of multivariate interactions in the pathogenesis of HCC may lead to difficulties in identifying the gene(s) associated with HBV-related HCC.
In a previous report that studied 179 Japanese individuals with chronic HBV infection (CHB) and 934 control participants, a GWAS identified significant associations of CHB with a region including HLA-DPA1 and HLA-DPB1.35 The same group was also conducted a second GWAS with a total of 2,667 Japanese patients with persistent HBV and 6,496 controls, which confirmed significant associations between the HLA-DP locus and CHB, in addition to associations with another two SNPs located in a genetic region including the HLA-DQ gene.36 We performed a GWAS using samples from Japanese HBV carriers, healthy controls, and individuals who spontaneously resolved HBV infections (hepatitis B surface antigen [HBsAg] negative and hepatitis B core antibody [anti-HBc] positive), in order to confirm or identify the host genetic factors related to CHB and viral clearance.37 In the subsequent replication analysis, we validated the associated SNPs in the GWAS using two independent sets of Japanese and Korean individuals. In our study, healthy controls with no clinical evidence of HBV exposure were randomly selected; therefore, HBV-resolved individuals were prepared to clearly identify the host genetic factors related to CHB or HBV clearance. An association analysis conducted with the Japanese and Korean data identified the HLA-DPA1 and HLA-DPB1 genes with Pmeta=1.89×10−12 for rs3077 and Pmeta=9.69×10−10 for rs9277542. We also found that the HLA-DPA1 and HLA-DPB1 genes were significantly associated with protective effects against CHB in Asian populations including Japanese, Korean, Chinese, and Thai individuals (Pmeta=1.26×10−42 for rs3077 and Pmeta=1.10×10−14 for rs9277535) (Fig. 3).35–41 The SNP rs9277535 was located about 4 kb upstream from rs9277542 and showed strong linkage disequilibrium of r2 0.955 in the HapMap JPT (Japanese in Tokyo, Japan) population. These results suggest that the associations between the HLA-DP locus and the protective effects against persistent HBV infection and with clearance of HBV are widely replicated in East Asian populations; however, there are few reports of GWASs in Caucasian or African populations. Further studies are necessary to clarify the pathogenesis of CHB and the mechanisms of HBV clearance, including functional analyses of the HLA-DP molecule.
The GWASs described above have successfully identified disease-associated genes or SNPs using different types of genome-wide SNP tying platforms. The embedded SNPs are varied among platforms by selecting the tagging SNPs and the suitable SNPs for their own genotyping strategy; however, the genome coverage among platforms revealed no differences over 60% between the HapMap CEU samples and the HapMap JPT+CHB samples.42 Moreover, the genome coverage of the current version of the Affymetrix Genome-Wide Human SNP Array 6.0 platform has been estimated to reach 75% in the Japanese population.14
Conclusions
Together with technology developments, GWASs are a promising strategy with which to identify host genetic factors for multactorial diseases, including common liver diseases, and various host genetic traits. The GWAS strategy may allow researchers to identify unexpected significant associations. Recently, new strategies and emerging technologies for massive parallel sequencing (also termed next-generation sequencing) have allowed whole-genome analysis to identify single-nucleotide variations and structural variations (including insertion, deletion, duplication, translocation, and transposition events). The costs of using these emerging technologies are currently high; therefore, common SNP-based GWASs using the genome-wide SNP analysis technologies introduced in this paper still have an important potential role in the fields of clinical and basic research.
Abbreviations
- GWAS:
genome-wide association study
- SNP:
single nucleotide polymorphism
- HBV:
hepatitis B virus
- HCV:
hepatitis C virus
- PBC:
primary biliary cirrhosis
- HCC:
hepatocellular carcinoma
- PCR:
polymerase chain reaction
- HLA:
human leukocyte antigen
- CHB:
chronic hepatitis B
Declarations
Conflict of interest
None
Authors’ contributions
Genotyping and statistical analyses for hepatitis studies (NN), acquisition of genotyping data on hepatitis researches (KT), manuscript writing (NN, KT), critical review (MM).