Introduction
Human papillomavirus (HPV) infection is the primary cause of cervical, anogenital, and oropharyngeal cancers in the United States, which account for over half a million new cases globally each year.1 These cancers can be prevented through HPV vaccination in adolescents. Vaccination is recommended before the age of 12; however, uptake of the HPV vaccine remains well below the national Healthy People 2030 target, which aims for 80% of adolescents to be up to date with HPV vaccination.2,3 In Oregon, only 55% of adolescents completed the HPV vaccine series in 2020.4 The recommended age for HPV vaccination has recently dropped to age 9, which allows providers to offer the vaccine alongside routine vaccinations. To further increase HPV vaccination rates among high-need groups, research is critically needed to identify new interventions that can promote vaccination.
Multifactorial barriers hinder vaccination completion.5 Some adolescents have low awareness of the benefits of the vaccine, as well as misconceptions about its potential harms.6 Some parents express concerns that the HPV vaccine might promote sexual promiscuity in their daughters.7 Few providers use evidence-based messaging strategies for HPV vaccination, such as employing strong presumptive language, bundling the HPV vaccine with other routine vaccinations, and emphasizing cancer prevention.8,9 Studies show that the effectiveness of messaging can be enhanced by multilevel vaccination interventions that include provider and support staff education, parent/caregiver and patient education materials, and vaccination reminders.10,11 These interventions have been shown to reduce missed opportunities for vaccination.11 However, such interventions can be costly, and many patients will successfully vaccinate without them. Targeting interventions to those most likely to benefit can improve HPV vaccination rates at a lower cost to the healthcare system.
A recent project at Kaiser Permanente Northwest (KPNW) found a strong concordance between HPV and COVID-19 vaccination completion. Adolescents who had initiated the HPV vaccine were nearly five times more likely to have received the COVID-19 vaccine than those who had not (odds ratio = 4.87, 95% confidence interval (CI) = 4.52, 5.03).12 Focused efforts to increase vaccination rates could boost both HPV and COVID-19 vaccination uptake.
Few prior interventions to improve HPV vaccination rates have tailored the intensity of their programs to the adolescent or young adults’ current vaccination status, HPV status (due for the initial vaccine or due for series completion), and acceptance of vaccination. However, several studies have evaluated factors that predict an individual’s HPV vaccination status.13 These studies have generally focused on specific patient populations or used different theoretical models as the basis of their frameworks. Prior studies have found that factors such as age, sex, and rural/urban residency are associated with both HPV and COVID-19 vaccination.12
Predictive analytics enables health systems to maximize the utility of healthcare resources through precision delivery of care in our resource-constrained healthcare environment.14 Understanding which patients will benefit most from outreach can help healthcare systems prioritize resources and interventions for those most likely to benefit. Messaging and interventions could be tailored and delivered at multiple levels, including by providers, to parents, or directly to adolescents.
No prior studies have developed a risk model to predict vaccination likelihood. Our risk prediction model uses a multi-dimensional approach to comprehensively predict vaccination uptake. To the best of our knowledge, no previous studies have used predictive modeling techniques, where multiple electronic health record factors are considered simultaneously to predict HPV vaccination for clinical utility.
We assess vaccination status, along with patient, provider, and clinic characteristics that predict vaccination completion. We then develop a predictive model to estimate the likelihood of HPV vaccination completion in individual patients. This innovative model can be used to guide the intensity of interventions based on the likelihood of vaccination.
Materials and methods
We assessed the landscape of vaccination status and patient, provider, and clinic characteristics that predict vaccination completion. We then developed a risk prediction model at KPNW to identify the risk stratification of HPV vaccination completion, with the goal of optimizing the management of outreach to KPNW patients.
This retrospective data only study was conducted at KPNW, an integrated health system in Oregon and southwest Washington that serves over 600,000 members. The study met the Kaiser Permanente Northwest guidelines for the protection of human subjects concerning safety and privacy (KPNW IRB 2019240-1). The primary outcome of interest was HPV vaccination. All study procedures were reviewed and approved by the KPNW Institutional Review Board; informed consent was waived as this was a data-only study.
We first aimed to identify vaccination status in patients who had completed vaccination on time and to identify patients who were due, overdue, or had not yet completed the HPV vaccination series. Patients were eligible if they were members of KPNW, aged 11–17 years from January 2015 to January 2022 (note: this risk prediction model was designed prior to the updated recommendation for vaccination to start at age 9; the assessed prior vaccination practices adhered to the previously recommended age). Patients also had at least one year of follow-up. Patients were excluded if they were on the “do not contact list,” had a history of adverse events to vaccinations, or were pregnant. Data were retrieved from KPNW data sources, including the Virtual Data Warehouse, which includes provider, clinic, and patient characteristics, as well as community data (census).
The outcome of interest was a single dose of HPV vaccination, although series completion and age at vaccination initiation and completion were also assessed. Outcomes were assessed over one to seven years, depending on the patient’s age at index and length of membership. Patients were censored at disenrollment or death.
The risk prediction model was developed to identify patients’ likelihood of completing HPV vaccination. Model components were identified based on the literature and our prior research. The model included predictors that were easily accessible through the KPNW databases, including the Virtual Data Warehouse and Census data. The use of common data sources allows for the transferability of results, or recalibration, in other KP populations. Predictors were drawn from clinical encounters closest to the index date. The initial list of predictors included patient characteristics, such as demographics (rurality, sex, gender, language, insurance status, body mass index), utilization (healthcare visits, membership length, and prior vaccination status, including hepatitis B, diphtheria, tetanus and pertussis, measles, mumps and rubella, inactivated poliovirus, Meningococcal, COVID, and Flu vaccines). Predictors also included provider characteristics (having a primary care provider, provider demographics, time in service, provider specialty [Pediatrics vs. Family Practice]). Finally, predictors included clinic characteristics (clinic assignment, clinic size, location [South, Metro, North]), and community-level data (linked to the patient, e.g., travel time to clinic,15 rurality, population density, median household income).
Data used in the model were assessed for the risk of bias and applicability to the research question. To reduce the risk of bias, the Prediction Risk of Bias Assessment Tool (PROBAST) was used, assessing the 20 PROBAST principles.16 All participants who met minimal membership requirements were included. Observed vaccination rates for different groups identified in PROBAST were evaluated for differences (in % vaccinated) from the overall eligible population. The sources of data were assessed for risk of bias and relevance to the research question. The final predictors were evaluated for missingness across all eligible patients in the model, and vaccination completion was determined prior to analysis. For the clinic and clinician clusters, all data were included; if the PCP or clinic data were missing, those patients were retained in the analysis. Imputation was unnecessary as no variables were excluded due to missingness.
The risk model was developed using Cox regression to identify the likelihood of initiating the HPV vaccination series. Model performance was assessed using model statistics; explained variation was measured with an R2 statistic, and calibration was assessed using the Integrated Calibration Index (ICI), which assesses the difference between the model’s calibration and perfect calibration.17 First, a full model of patients with complete data was fit. Then, Harrell’s methods guided a step-down approach to manually remove the least predictive characteristics, one covariate at a time, ensuring that the final model retained at least 95% of the variation explained by the full model.18,19
All analysis was conducted using SAS Analytics Software Version 9.4 (bootstrapping, C-statistic and R2) and R (ICI) by the analyst. There was no subgroup or sensitivity analysis. The continuous predictors in the risk prediction model were modeled as linear, and no interaction effects were considered. The encounter count variable was capped at 3, and the count of other vaccinations was capped at 5. For variable selection for the reduced model, Harrell’s step-down method was used to retain predictors explaining 95% of the variation. This variable selection was repeated in each bootstrap. Region of care was included in the model to account for heterogeneity across clusters of predictors.
Results
Members of KPNW were identified from 2015–2022 as being aged 11–17 years old (n = 119,494, Fig. 1). Members were included in the model if they had at least one year of follow-up, had no history of pregnancy or adverse vaccination events, and were not on the “do not contact” list. The final cohort included 61,788 patients, 65.7% of whom had received at least one dose of the HPV vaccine.
Patients were tracked from the time of their membership. Younger patients were more likely to complete the vaccination series (Table S1) compared to those whose index age was older (54.9% for 11–13-year-olds, compared to 12.4% for 14–15-year-olds, and 2.1% for 16–17-year-olds). Overall, 44.3% of the patients had completed the vaccination series.
Patient characteristics by vaccination status are shown in Table 1. Patients who had received the HPV vaccination were more likely to be younger, Hispanic, and female. White, Asian, and Hawaiian/Pacific Islander patients had higher vaccination rates. Only 44.3% of the patients in the model completed the vaccination series, while 21.3% had received at least one vaccination but had not completed the series (Table S1). Those who completed the vaccination were more likely to be younger at their index date (54.9% of 11–13-year-olds). Patients in the older age groups at index were more likely to have not received any vaccinations (47.8%, 53.7%, and 29.4% for 16–17-, 14–15-, and 11–13-year-olds, respectively).
Table 1Patient characteristics
Individual characteristic | Without human papillomavirus vaccination
| With human papillomavirus vaccination
|
---|
N = 21,218
| N = 40,570
|
---|
n | (%) | n | (%) |
---|
Gender | | | | |
Female | 9,831 | (46.3) | 19,054 | (47.0) |
Male | 11,264 | (53.1) | 21,022 | (51.8) |
Age at index in years (mean (standard error)) |
11–12 | 12,067 | (56.9) | 35,340 | (87.1) |
13–15 | 6,427 | (30.3) | 4,345 | (10.7) |
16–17 | 2,724 | (12.8) | 885 | (2.2) |
Ethnicity (Hispanic) | 1,722 | (8.1) | 6,182 | (15.2) |
Insurance |
Medicaid | 3,574 | (16.8) | 6,334 | (15.6) |
Race |
White | 13,045 | (61.5) | 25,528 | (62.9) |
Asian | 856 | (4.0) | 3,770 | (9.3) |
Black | 767 | (6.6) | 2,152 | (5.3) |
Hawaiian/Pacific Islander | 207 | (1.0) | 611 | (1.5) |
American Indian | 165 | (1.8) | 404 | (1.0) |
Other | 210 | (1.0) | 432 | (1.1) |
Unknown | 5,968 | (28.1) | 7,673 | (18.9) |
The variables in the full risk prediction model included 17 individual characteristics (Table S2). Multilevel predictors included demographics (age, language, race, ethnicity, insurance, sex, gender, membership length), clinical characteristics (other vaccinations, number of visits), provider characteristics (provider sex, provider classification), and community characteristics (region, rurality, time to reach provider, community income rank, population density).
The full model included 98.91% of the population (n = 61,115), as 673 patients had missing address data (population density, median household income, travel time to provider) and were removed from the analysis (Table S3). If patients were missing Rural-Urban Commuting Area codes, they were classified as urban (3.64%), and 5.99% of patients with missing language data were classified as “unknown”. The vaccination outcome was completed for 65.7% of the population (n = 40,570, Fig. 1). The model was reduced using a step-down process to retain only the variables that retained predictive value. The least contributory variables were removed until the model R2 dropped no lower than 0.95 (96%). The five retained characteristics include Hispanic ethnicity, race, language, age at index, and prior vaccinations (Table 2.). The final model indicates that patients who were Hispanic, younger in age, Asian, had non-missing language data, and had prior vaccinations were most likely to obtain an HPV vaccination. The performance of the full model was adequate, with a naive C-statistic of 0.667 and an R2 of 0.208 (CI 0.202, 0.214) (Table 3).
Table 2Final risk prediction model: Results of the reduced multivariate Cox regression analysis of predictors of human papillomavirus vaccination
Characteristic | Reduced model
|
---|
Standard EHR data
|
---|
HR | (95% CI) |
---|
Age | | | |
11–12 | ref |
13–15 | 0.44 | (0.421, | 0.449) |
16–17 | 0.35 | (0.325, | 0.372) |
Language | | | |
English | ref |
Non-English | 1.06 | (1.019, | 1.111) |
Unknown | 0.58 | (0.537, | 0.621) |
Race | | | |
White | 0.70 | (0.672, | 0.72) |
Asian | ref |
Black | 0.84 | (0.799, | 0.889) |
Hawaiian/Pacific Islander | 0.86 | (0.79, | 0.938) |
American Indian/Alaska native | 0.75 | (0.672, | 0.826) |
Other | 0.62 | (0.557, | 0.683) |
Unknown | 0.61 | (0.587, | 0.643) |
Ethnicity (non-Hispanic) | | | |
Hispanic | ref |
Non-Hispanic | 0.69 | (0.663, | 0.714) |
Other vaccinations (continuous) | | | |
0–5 | 1.61 | (1.585, | 1.642) |
Table 3Risk prediction model characteristics
Statistic | Full model | Reduced model |
---|
Number of observations | 61,115 | 61,115 |
C-statistic | 0.667 | 0.653 |
Bootstrap-corrected C-statistic | 0.666 | 0.653 |
R2 (95% CI) | 0.208 (0.202, 0.214) | 0.194 (0.189, 0.200) |
Integrated calibration index (ICI) | 0.53 | 0.525 |
The performance measures used for evaluation were bootstrapped C-statistic and R2, ICI, and calibration. The model was validated internally using a bootstrapping approach (500 bootstraps). Bootstrapping is an appropriate strategy to determine concordance and predict the fit of a model to a series of hypothetical datasets when other validation techniques are not available.20 The model showed adequate performance with a bootstrap-corrected C-statistic of 0.653 (Table 3). Calibration was also determined by visually plotting the observed and predicted risks of the reduced model by quintiles of predicted risk (Fig. 2). Calibration was further assessed by calculating the ICI, which showed inadequate calibration (0.53). However, the calibration plot showed adequate calibration for the top deciles. If the observed and predicted values agreed perfectly, the ICI would be 0.0. The visual calibration of the observed and predicted risk is sufficient, with close alignment between observed and predicted risk at all levels.
Discussion
Predictive analytics can be used to identify patients within a health system who may benefit from interventions of varying intensity based on predicted risk. Predictive modeling can identify patients with differing likelihoods of vaccinating on their own. It has been successfully used at KPNW to identify individuals who overuse emergency room services or may benefit from early therapeutic interventions.21 Such models guide the precise delivery of services, improving patient care while reducing the burden on the health system. Further, projects that work directly with clinicians and the health system allow for an assessment of the net benefit of using a risk prediction model to identify patients who may not be harmed by less intensive surveillance.22
This model could be used to target an HPV vaccination intervention based on predicted risk: more intensive interventions could be provided to patients in the lowest two quintiles (35.7% and 62.3% likelihood of vaccinating, respectively), who have the lowest likelihood of completing vaccination on their own. Patients in the top three quintiles, who show a greater than 80% likelihood of vaccination, could be removed from intervention lists entirely, freeing up valuable resources for patients less likely to complete vaccinations. These targeted outreach efforts can benefit both the patient and the health system.
Existing approaches to HPV vaccination outreach vary across health systems. Recommendations to increase HPV vaccinations include provider and team conversations with caregivers or patients, bundling vaccinations, developing registries, and tracking vaccination rates.23 One approach includes simply identifying patients who have not yet completed the vaccination series. According to KPNW pediatricians, outreach efforts do not include patient stratification by calculated risk or consideration of sociodemographic factors. This project developed a tool to conduct patient stratification for targeted outreach. This is the first model to identify a patient’s likelihood of completing the HPV vaccination.
Stratified medicine can be used to tailor outreach to patient needs and eliminate unnecessary contacts for patients who are likely to vaccinate on their own. Developing personalized care increases patient satisfaction and improves important patient outcomes, such as vaccination completion. The role of predictive or prognostic modeling could inform tailored outreach efforts. Tailored outreach techniques could also be applied to other recommended vaccines, such as COVID-19 and flu vaccinations, or even well-child visits.
The final model includes ethnicity, race, language, age, and prior vaccination history, while community and provider characteristics were not retained. These characteristics collectively identify a patient’s likelihood of vaccination and could also be used to target interventions aimed at closing vaccination gaps.
Patient likelihood of vaccination is higher at younger ages, with patients being more likely to be vaccinated earlier. If vaccination is not completed at earlier visits, parents and patients are more likely to skip the HPV vaccination or not complete the vaccination series.24 Combining intervention techniques by targeting interventions as early as age 9 for those at risk of not vaccinating will benefit patients and increase vaccination completion.
There are strengths and limitations to this study. First, while this model was created in a large integrated delivery system with highly curated data and access to patient-level and census data, generalizability may be limited in other settings. While data that are available in other regions were intentionally used, there may be unknown applicability issues. The KPNW population is predominantly White, limiting the generalizability to more diverse populations.
Conclusions
HPV vaccination will reduce the cancer burden, but only if it is administered to adolescents on schedule. Risk prediction models can be used to identify the likelihood of vaccination, guiding the implementation of interventions and determining intervention intensity. To improve HPV vaccination rates, risk prediction models could be used to identify patients who should receive evidence-based interventions, such as provider conversations, education, reminders, and scheduling, to increase vaccination.
Supporting information
Supplementary material for this article is available at https://doi.org/10.14218/CSP.2024.00026 .
Table S1
Human papillomavirus vaccination completion status.
(DOCX)
Table S2
Results of the full model multivariate Cox regression analysis of predictors of human papillomavirus vaccination.
(DOCX)
Table S3
All variables used in the full model.
(DOCX)
Declarations
Ethical statement
The study met the Kaiser Permanente Northwest guidelines for the protection of human subjects concerning safety and privacy (KPNW IRB 2019240-1). All study procedures were reviewed and approved by the Kaiser Permanente Northwest Institutional Review Board; informed consent was waived as this was a data-only study.
Data sharing statement
The deidentified data used in support of the findings of this study are available from the corresponding author at [email protected] upon request.
Funding
This project was fully supported by Kaiser Permanente’s Garfield Memorial Fund.
Conflict of interest
From June 2022 to May 2024, Dr. Coronado served as the Principal Investigator on an industry-funded study, funded through a contract with the Kaiser Permanente Center for Health Research, to evaluate patient adherence to a commercially available blood test for colorectal cancer. From 2021 to 2022, Dr. Coronado served as a Scientific Advisor for Exact Sciences. All other authors declare no conflicts of interest.
Authors’ contributions
Conception, project design, table design, analysis plan, lead of the writing for the manuscript (AP), expertise in risk prediction, design guidance (EJ), health system perspectives, design guidance (RM), analysis conduction (MS), writing assistance of the background and discussion (MN), guidance of the conception and design (GC). All authors contributed to the writing of the manuscript.