Introduction
Human papillomavirus (HPV) is a double-stranded circular DNA virus with a genome of approximately 7–8 kb.1,2 To date, researchers have identified over 200 genotypes, among which HPV16 is one of the major high-risk subtypes and is closely associated with the occurrence of various malignant tumors, such as cervical and anal cancer.3,4 Currently, the commonly used HPV detection methods in clinical practice mainly target partial viral fragments for typing, such as the Cobas 4800 detection system.5,6 Although these methods enable rapid and accurate identification of the types of HPV, their limitation lies in the inability to provide complete full-genome information. Therefore, current research still relies on other molecular techniques to achieve full-genome amplification and sequencing of HPV16.
However, current methods for amplifying the full HPV16 genome, such as long-fragment polymerase chain reaction (PCR) and rolling circle amplification (RCA), still have several limitations. Long-fragment PCR shows low amplification efficiency when handling low-copy-number templates and often produces non-specific amplification products. In addition, long-fragment PCR requires high-quality template DNA. Degraded or damaged DNA templates can further reduce amplification efficiency or even cause complete amplification failure.7 RCA can achieve efficient DNA amplification; however, its primer binding sites are relatively limited, and it lacks a sequence-specific selection mechanism, which often leads to non-specific products.8 Moreover, RCA requires long reaction times and is costly, which limits its use in high-throughput clinical applications.9
To address these challenges, we established an overlapping extension PCR method for efficient and specific amplification of the full HPV16 genome. The genome was divided into two large fragments (3.9 kb and 5.3 kb) and amplified using a nested PCR strategy. We systematically evaluated the performance of two high-fidelity DNA polymerases under optimized conditions. The high sensitivity of this method enables its application in samples with low viral loads, providing a reliable approach for full-genome sequencing, variant analysis, molecular evolution studies, and clinical diagnostics of HPV16.
Materials and methods
Nucleic acids extraction and plasmid standard curve equation
HPV16 DNA extraction: All HPV16-positive nucleic acid samples were extracted from cervical exfoliated cells of patients at Peking University First Hospital. The samples were confirmed to be HPV16-positive by clinical HPV genotyping detection. For sample selection, the inclusion criteria were: 1) samples with complete clinical detection records; 2) nucleic acid purity meeting experimental requirements (A260/280 ratio between 1.7 and 1.9, determined by ultraviolet spectrophotometry); and 3) no obvious DNA degradation (verified by agarose gel electrophoresis). The exclusion criteria were: 1) samples with ambiguous HPV genotyping results; 2) nucleic acid samples contaminated during extraction; and 3) samples with insufficient volume for repeated experiments. Plasmid construction: The HPV16 E6 standard fragment was synthesized and ligated into the plasmid vector (pUC). The recombinant plasmid was then transformed into Escherichia coli DH5α and cultured. Positive clones were selected and confirmed by PCR and sequencing. Plasmid DNA was extracted from the confirmed clones, and its concentration was measured using a nucleic acid analyzer. The DNA copy number was determined to serve as the standard stock solution. Serial dilutions were performed, and real-time quantitative PCR was used to amplify the samples. Data were collected and used to generate a standard curve equation.
Primer design
Based on the HPV16 whole-genome reference sequence (K02718.1) from the NCBI Papillomavirus Episteme database, nested PCR primers (Table 1) were designed using Primer 5 software. Fout/Rout served as the outer primers for nested PCR, while Fin/Rin were the inner primers. The overlapping regions of the amplified products were designed to be greater than 500 bp to significantly enhance the efficiency of fragment splicing by providing sufficient homologous sequences for overlapping extension PCR. This design also ensured adequate overlapping read lengths during next-generation sequencing (NGS), thereby improving the accuracy of sequence assembly and the reliability of variant detection. All primers were synthesized by TianyiHuiyuan Biotech.
Table 1HPV16 whole-genome sequence amplification primers
| Primer | Sequence (5′ to 3′) | Location | Length | Tm value (°C) | GC content (%) |
|---|
| 16.1Fout | ATCATCAAGAACACGTAGAGAAACCC | 526–551 | | 59 | 42 |
| 16.1Rout | CGACCCTGTTCCAATTCCTAACCC | 4,403–4,426 | 3,902 | 61 | 54 |
| 16.1Fin | TGTGACTCTACGCTTCGGTTG | 742–762 | | 60 | 52 |
| 16.1Rin | GGTAGCCGATGCACGTT | 4,266–4,282 | 3,542 | 57 | 59 |
| 16.2Fout | CAACATTACTGGCGTGCTTT | 3,874–3,893 | | 56 | 45 |
| 16.2Rout | CCATACCCGCTGTCTTCG | 1,250–1,267 | 5,299 | 57 | 61 |
| 16.2 Fin | CCTATTAATACGTCCGCTGCT | 3,925–3,945 | | 57 | 48 |
| 16.2 Rin | CAGTAAACAACGCATGTGCT | 1,062–1,081 | 5,060 | 56 | 45 |
Two-segment primer set validation
To verify the feasibility, amplifications were performed using two high-fidelity DNA polymerases: 2× Platinum™ SuperFi™ II PCR Master Mix (Thermo Fisher) and 2× Phanta® Max Master Mix (Vazyme). The HPV16 full-length genome was amplified using a two-step nested PCR method. The primer positions are shown in Figure 1, and the primer information is detailed in Table 1.
For the first-round PCR using the Thermo Fisher DNA polymerase, the total reaction volume was 20 µL, which included 10 µL of 2× Platinum™ SuperFi™ II PCR Master Mix, 1 µL of forward primer (10 µmol), 1 µL of reverse primer (10 µmol), 5 µL of template DNA, and 3 µL of ddH2O. The reaction conditions were as follows: initial denaturation at 98°C for 30 s; followed by 30 cycles of denaturation at 98°C for 10 s, annealing at 58°C for 10 s, and extension at 72°C for 3 m; and a final extension at 72°C for 5 m, followed by holding at 4°C. For the second-round PCR, the total reaction volume was 50 µL, which included 25 µL of 2× Platinum™ SuperFi™ II PCR Master Mix, 2.5 µL of forward primer (10 µmol), 2.5 µL of reverse primer (10 µmol), 5 µL of template DNA, and 15 µL of ddH2O. The reaction conditions were the same as those for the first round.
For the first-round PCR using the Vazyme DNA polymerase, the total reaction volume was 25 µL, which included 12.5 µL of 2× Phanta® Max Master Mix, 1 µL of forward primer (10 µmol), 1 µL of reverse primer (10 µmol), 5 µL of template DNA, and 5.5 µL of ddH2O. The reaction conditions were as follows: initial denaturation at 95°C for 3 m; followed by 30 cycles of denaturation at 95°C for 15 s, annealing at 58°C for 15 s, and extension at 72°C for 3 m; and a final extension at 72°C for 5 m, followed by holding at 4°C. For the second-round PCR, the total reaction volume was 50 µL, which included 25 µL of 2× Phanta® Max Master Mix, 2 µL of forward primer (10 µmol), 2 µL of reverse primer (10 µmol), 5 µL of template DNA, and 16 µL of ddH2O. The reaction conditions were the same as those for the first round.
In the first-round amplification, two HPV16-positive nucleic acid samples were used as templates. The A260/280 ratio of the samples was measured by ultraviolet spectrophotometry and was found to be between 1.7 and 1.9. The copy numbers of the samples were determined by real-time quantitative PCR to be 2,000 copies/µL and 200 copies/µL, respectively. Nested PCR amplification was performed using two-segment primers. The second-round amplification used the products from the first round as templates. After amplification, 5 µL of the amplified product was subjected to 1% agarose gel electrophoresis. The electrophoresis conditions were 120 V for 25 m. After electrophoresis, the results were observed using a gel electrophoresis imaging system.
Annealing temperature optimization for PCR amplification
After validating the nested primer feasibility, HPV16 full-genome nested PCR amplification was performed using Thermo Fisher and Vazyme DNA polymerases at four annealing temperatures (57.2°C, 58.6°C, 59.5°C, and 60°C). The first-round PCR products served as templates for the second-round amplification with inner primers, with a no-template control (ddH2O replacing template DNA). The amplified products were analyzed by 1% agarose gel electrophoresis.
For quantitative analysis, ImageJ software was used to quantify and grade band intensity to screen for the optimal annealing temperature, and the effect of different temperatures on non-specific amplification was also evaluated (see Section 2.6 for detailed methods).
Sensitivity assessment for PCR amplification
The pUC-HPV16 E6 standard curve equation (Y = −3.232X + 35.648, where Y represents the Ct value and X represents the sample concentration in logarithmic form) was used to calculate the concentration of the sample template. An HPV16-positive nucleic acid sample with a Ct value of 25 was selected, and its initial copy number was calculated to be 2,000 copies/µL based on the standard curve equation.
The sample was then serially diluted using sterile distilled water (ddH2O) as the diluent at dilution factors of 100, 10−1, 10−2, and 10−3, yielding concentrations of 2,000, 200, 20, and 2 copies/µL, respectively. Quadruplicate amplification tests were conducted using both Thermo Fisher and Vazyme DNA polymerases on the serially diluted samples. The optimal annealing temperatures for Fragment 1 and Fragment 2 obtained from the methods in Section 2.4 were applied for amplification. Using “simultaneous successful amplification of two target fragments” as the criterion for effective amplification, the amplified products were detected by 1% agarose gel electrophoresis. The effective amplification success rate and its 95% confidence interval were calculated, and differences in amplification between the two enzymes in high- and low-concentration groups were compared (see Section 2.6 for detailed statistical methods).
Statistical analysis
For annealing temperature optimization, the ImageJ software was used for quantitative analysis of agarose gel electrophoresis bands. First, a fixed-area rectangular frame was set to select target bands, and the integrated density of the bands was measured after background subtraction. Then, band intensities were classified into three levels (“+”, low; “++”, medium; “+++”, high) based on integrated density values. The temperature corresponding to the “+++” intensity was selected as the optimal annealing temperature to ensure the highest amplification efficiency.
For sensitivity assessment, using “simultaneous successful amplification of two HPV16 target fragments (3.9 kb and 5.3 kb)” as the criterion for effective amplification, the success rate of quadruplicate experiments for each sample concentration (2, 20, 200, and 2,000 copies/µL) was calculated (number of effective amplifications/total number of replicates × 100%). The Wilson score method was used to estimate the 95% confidence interval of the success rate. Samples were divided into a high-concentration group (≥200 copies/µL) and a low-concentration group (<200 copies/µL). Fisher’s exact test was used to compare differences in amplification success rates between the two enzymes, with a P-value < 0.05 indicating a statistically significant difference.
Results
Validation of amplification primers
Amplifications were performed using Thermo Fisher DNA polymerase and Vazyme DNA polymerase, with two different copy-number positive HPV16 nucleic acid samples as templates. The results are shown in Figure 2. When the template copy numbers were 2,000 copies/µL and 200 copies/µL, both DNA polymerases successfully amplified the two fragments, with the product band sizes consistent with the theoretical values. Moreover, PCR products were validated by Illumina sequencing, achieving a Q30 ≥ 96% and >98% identity with the HPV16 reference sequence (K02718.1), confirming accurate and complete amplification (Fig. 3).
Annealing temperature optimization
Amplifications were performed using Thermo Fisher DNA polymerase and Vazyme DNA polymerase at different annealing temperatures, and the results are shown in Figure 4. No-template control showed no specific bands, confirming no contamination or non-specific amplification. The amplification results were quantitatively analyzed by ImageJ (Table 2). Based on the summarized data, the annealing temperature that yielded the highest product intensity was selected for amplification. Specifically, for Thermo Fisher DNA polymerase, the annealing temperature for both Fragment 1 and Fragment 2 was determined to be 60°C. For Vazyme DNA polymerase, the annealing temperature for both Fragment 1 and Fragment 2 was determined to be 58.6°C.
Table 2Effects of Thermo Fisher and Vazyme DNA polymerases at different temperatures
| Annealing temperature (°C) | 57.2°C | 58.6°C | 59.5°C | 60°C |
|---|
| Thermo Fisher | | | | |
| Fragment one | ++ | + | ++ | +++ |
| Fragment two | + | +++ | +++ | +++ |
| Vazyme | | | | |
| Fragment one | + | +++ | ++ | ++ |
| Fragment two | ++ | +++ | + | ++ |
Sensitivity assessment
The results showed that Thermo Fisher DNA polymerase could clearly amplify specific bands when the template concentration was as low as 2 copies/µL. Vazyme DNA polymerase could clearly amplify specific bands when the template concentration was as low as 200 copies/µL, while the sample at 20 copies/µL failed to produce a detectable band (Fig. 5).
The statistical analysis of four replicate experiments (Table 3) showed that both enzymes achieved success rates over 75% in samples with concentrations ≥ 200 copies/µL, and their 95% confidence intervals overlapped. Fisher’s exact test revealed no significant difference between the two enzymes in this concentration range (P = 1.000). In contrast, at concentrations < 200 copies/µL, Thermo Fisher enzyme had a success rate of 62.5% (5/8), while Vazyme enzyme had a success rate of 0% (0/8), indicating a statistically significant difference (P = 0.037). Two types of failure were observed in replicate experiments: (1) No bands were amplified (the majority of cases); (2) Non-specific bands (non-target fragments) appeared without the expected target bands. These results demonstrated that the method was effective across a concentration range of 2–2,000 copies/µL. While high-concentration samples could be reliably amplified by both enzymes, Thermo Fisher enzyme was particularly effective for low-concentration samples.
Table 3The effective amplification success rates and 95% confidence intervals of two DNA polymerases in HPV16 samples of different concentrations
| Sample concentration | Thermo Fisher DNA polymerase | Vazyme DNA polymerase |
|---|
| Effective amplification success rate (95% confidence interval) |
| 2,000 copies/µL | 100% (47.8–100%) | 100% (47.8–100%) |
| 200 copies/µL | 100% (47.8–100%) | 75% (24.9–96.8%) |
| 20 copies/µL | 75% (24.9–96.8%) | 0% (0–39.3%) |
| 2 copies/µL | 50% (11.6–88.4%) | 0% (0–39.3%) |
Discussion
HPV is a highly host-specific virus that primarily infects human skin and mucosal epithelial cells. Due to its life cycle’s dependence on the specific microenvironment of host cells, HPV is difficult to effectively replicate in vitro,10,11 which has limited research into its molecular biology. Given the critical role of HPV16 in the development of various malignancies, establishing efficient and reliable whole-genome amplification methods is essential. This will advance our understanding of its oncogenic mechanisms, guide vaccine development, and support precise treatment strategies.
In this study, we established an overlap extension PCR method to amplify the HPV16 whole genome, which is particularly suitable for low viral load samples. This approach effectively avoids the nonspecific amplification and primer competition issues commonly associated with RCA,12 thereby significantly improving amplification specificity and stability. The full-length HPV16 genome amplified using this method—unlike Cobas 4800 and similar commercial kits—can be used for NGS to comprehensively analyze genomic sequence variations, including single-nucleotide variants, insertions/deletions, and gene recombination. These analyses offer a more comprehensive technical basis for in-depth investigation into the evolution, oncogenic mechanisms, and drug resistance development of HPV16.
Notably, certain regions of the HPV16 genome (e.g., the E4 gene region) contain high guanine–cytosine (GC) content and complex secondary structures, posing additional challenges for whole-genome amplification.13 To ensure amplification efficiency, we specifically addressed these technical difficulties during method optimization. First, we selected DNA polymerases with high elongation capacity and thermostability. Second, primer design avoided regions with extremely high GC content and strong secondary structure formation to ensure primer binding efficiency and amplification specificity. Furthermore, systematic optimization of key parameters (e.g., annealing temperature and extension time) significantly improved amplification efficiency for these complex genomic regions. These optimizations not only resolved technical challenges but also established a foundation for subsequent polymerase performance comparisons.
The sensitivity assessment revealed that both DNA polymerases achieved amplification success rates exceeding 75% in samples with ≥200 copies/µL, with no statistical difference detected (P = 1.000). This underscores the method’s stability for high-viral-load samples. In contrast, for samples with <200 copies/µL, Thermo Fisher DNA polymerase demonstrated a significantly higher success rate (62.5% vs 0%, P = 0.037), highlighting its superiority in low-viral-load scenarios. This advantage likely stems from its stronger exonuclease proofreading activity, which maintains stable amplification even with low template quantities. Additionally, its excellent extension efficiency allows for longer DNA chain synthesis per template binding, reducing polymerase dissociation frequency and thereby enhancing amplification efficiency for low-copy templates.14 In contrast, Vazyme DNA polymerase had lower sensitivity (20 copies/µL) but performed well in specificity and stability for routine clinical sample detection. These findings provide important guidance for polymerase selection in different applications: Thermo Fisher polymerase is preferable for processing low-copy-number samples (e.g., whole-genome sequencing preprocessing), while Vazyme DNA polymerase offers a cost-effective alternative for routine clinical testing.
In terms of clinical applications, our method shows significant technical advantages. First, it requires only conventional PCR equipment and completes HPV16 whole-genome amplification within 4 h while maintaining high compatibility with mainstream NGS platforms like Illumina. In terms of cost-effectiveness, the per-sample testing cost is lower than commercial kits, providing an economical solution for large-scale clinical screening.15–17 More importantly, the complete genome data obtained through this method can be directly applied to multiple fields, including clinical genotyping, oncogenic mutation analysis, and vaccine escape monitoring, offering crucial evidence for personalized diagnosis and treatment. Establishing standardized, automated high-throughput detection protocols in the future will further enhance the clinical value of this method.
Limitations
While the HPV16 whole-genome amplification method developed in this study shows several advantages, some limitations require improvement. First, validation was performed only using cervical exfoliated cell samples, without including samples from other anatomical sites (e.g., oropharynx, anus), which may affect its broad applicability. Additionally, the samples did not systematically cover the full disease spectrum from HPV infection to cancer development, limiting the evaluation of its performance across different disease stages. Future studies should validate the method using more diverse sample types (e.g., oral/anal swabs) and clinical cohorts (e.g., patients with cervical lesions at different stages) to assess its potential for diagnostic and precision medicine applications. Second, although amplification conditions for GC-rich regions were optimized, efficiency for strains with extreme GC content still needs improvement. Developing novel thermostable DNA polymerases or optimizing buffer systems may help address this limitation in future work.
Conclusions
The overlapping extension PCR method established in this study demonstrates high sensitivity and specificity for amplifying HPV16 in cervical exfoliated cells. This method provides a robust technical foundation for genomic research, with potential applications in whole-genome sequencing and variation analysis. Additionally, this study provides insights for the optimization of nested PCR amplification systems and serves as a reference for other studies involving the amplification of long DNA fragments.
Declarations
Acknowledgement
We thank all the participants from the Obstetrics and Gynecology Department of Peking University First Hospital for providing us with the specimens.
Ethical statement
This study was conducted in accordance with the ethical standards of the Declaration of Helsinki (as revised in 2024) and was approved by the Ethics Committee of North China University of Science and Technology (Approval Number: 20250616). Informed consent was obtained from all patients, and the confidentiality of patient information was ensured through data anonymization.
Data sharing statement
After publication, all major data sheets will be available upon request.
Funding
This study was supported by the Natural Science Foundation of Hebei Province (No. H2022209048) and the National Key Research and Development Program (No. 121000004008239033–2021YFC2301103/05).
Conflict of interest
The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Informed consent was obtained for this publication.
Authors’ contributions
Study conception and design (ZC, XZ, SL), main data analysis, and manuscript draft (ZC, HL, SZ, HY, XZ, SL). All authors contributed to the article and approved the final version.