Study design and data source
A retrospective open cohort study of pregnant women identified from primary care records [Clinical Practice Research Datalink (CPRD) GOLD Pregnancy Register], with their delivery recorded in secondary care [linked Hospital Episode Statistics (HES)] between 1997 and 2020, was performed to determine the incidence of adverse obstetric outcomes among women with PCOS in comparison to women without PCOS.
CPRD GOLD contains representative data from 7% of the general practices across the UK, covering 20 million patients from 973 practices. It contains pseudo-anonymized patient-level data on demographics, symptoms, diagnoses, drug prescriptions, physical measurements, and laboratory test results. Furthermore, patient-level data can be linked to other data sources such as HES data and deprivation data, via a trusted third party [23]. The linkage of databases aided capture of information on exposure (PCOS) from primary care, the obstetric outcomes from HES maternity tail and important potential confounders from both primary and secondary care. Symptoms and diagnoses are recorded within CPRD GOLD using Read codes, a hierarchical clinical coding system. Using maternity, antenatal and delivery health records within CPRD GOLD, pregnancy episodes and their outcomes are identified through a validated algorithm [24], which formulated the CPRD GOLD Pregnancy Register and formed the source cohort for our study.
Study population
Pregnant women were included from the CPRD GOLD Pregnancy Register if they were registered at a general practice in England and had a record of delivery from linked HES data (containing information on admissions to National Health Service (NHS) hospitals in England).
Deliveries formed the unit of analysis in our study and an index date was assigned to each eligible delivery record. Women with implausible data linkage (where a patient record in HES is linked to more than 20 patient records across 20 different primary care practices) were excluded. Furthermore, delivery records were excluded if they were (1) duplicates or (2) misclassified miscarriage, postnatal or antenatal record. Delivery records were considered misclassified miscarriages if the reported gestational age was less than 23 weeks. If two deliveries were recorded within 180 days of each other for the same patient, one of the delivery records was considered as a misclassified antenatal or postnatal record. Finally, delivery records were excluded if women were ineligible or were lost to follow-up within primary care at the delivery. Patients were considered ineligible within primary care if they (1) did not have an acceptable patient flag within CPRD GOLD (indicating sufficient data quality), (2) did not have a minimum registration period of 1 year with an eligible general practice on delivery date (practices were considered eligible one year after the “up-to-standard” date, a flag for sufficient practice data quality) and (3) were aged < 15 or > 49 years on delivery date.
Once linked, the mother’s PCOS exposure status for each delivery record was ascertained from primary care prior to the index date (date of delivery). PCOS was defined as a Read code record of PCOS. Due to underdiagnosis of PCOS within primary care, we also considered records of polycystic ovaries (PCOs) [20, 25], or a combination of symptom codes indicating a missed PCOS diagnoses based on Rotterdam criteria [(1) anovulation and (2) biochemical or symptomatic presentation of hyperandrogenism; a Read code record of hair loss or hirsutism and a recorded measure of serum testosterone level ≥ 2.0 nmol/L were considered as symptomatic and biochemical presentation of hyperandrogenism, respectively].
For each delivery record of women with PCOS (in a random order), we randomly selected four control delivery records of women without PCOS from a pool of age-matched (± 1 year) pregnant women without replacement. Cohort selection for this study is described in Fig. 1.
Outcomes
We considered four primary outcomes identified from HES data: (1) preterm birth, (2) mode of delivery, (3) high or low birthweight and (4) stillbirth.
Gestational age recorded within the HES maternity tail at the time of delivery and relevant ICD-10 codes were used to identify the outcome preterm birth (gestational age at birth < 37 weeks). Based on Operating Procedure Codes Supplement (OPCS) codes and ICD-10 codes, we classified mode of delivery into one of the following four categories as a categorical outcome variable: (1) emergency caesarean section, (2) elective or other unspecified caesarean section, (3) instrumental vaginal delivery and (4) spontaneous or other unspecified vaginal delivery (reference category). Based on birthweight(s) recorded in the maternity tail, we classified the delivery as high or low birthweight delivery if at least one of the babies born in that delivery was above 4000 g or below 2500 g, respectively. In addition, a record of the relevant ICD-10 code was used to identify a high birthweight baby. Stillbirth outcomes were identified using relevant ICD-10 codes and from maternity tail records.
As secondary outcomes, we further classified gestational age to identify very preterm (< 32 weeks) and extremely preterm (< 28 weeks) delivery. Small and large for gestational age babies (birthweight < 10th and > 90th centile, respectively) were identified using the INTERGROWTH 21st project [26], and their software tools, by comparing the birthweight and gestational age recorded in HES data to the international anthropometric standards.
Explanatory variables
We considered risk factors or features of PCOS that are also obstetric risk factors as possible explanatory variables and adjusted for them in our analysis in a step-by-step manner. This included age, ethnicity, deprivation, impaired glucose regulation based on a diagnosis of type 2 diabetes or prediabetes, diagnosis of hypertension, thyroid disorders, number of babies born within the delivery, and pre-gravid body mass index (BMI). For the outcomes low and high birthweight and mode of delivery, we further considered gestational age as an explanatory variable.
Ethnicity was identified using relevant Read codes from primary care records and was categorized as (1) white Caucasian, (2) South Asian, (3) black Afro-Caribbean and (4) mixed or multiple ethnic group or (5) other ethnic minority groups. Primary care linked English index of multiple deprivation (IMD) data provided a relative measure of deprivation based on seven different domains [27]. Type 2 diabetes was identified from primary care through relevant Read Codes, record of HbA1c ≥ 48 mmol/L (≥ 6.5%) or fasting blood glucose > 7 mmol/L. Impaired glucose regulation was identified through relevant Read codes, HbA1c ≥ 42 mmol/L (≥ 6.0%) or fasting blood glucose ≥ 5.5 mmol/L. Diagnoses of hypertension and thyroid disorders were identified from primary care through Read code records. The number of babies born during that delivery was derived from linked HES maternity tail records. Pre-gravid BMI was identified as the latest BMI measured in primary care at least a year before index date and was categorized according to WHO standards as under/normal weight (< 25 kg/m2), overweight (25–30 kg/m2) and obese (≥ 30 kg/m2). A separate missing category was created for those with missing data on ethnicity, deprivation, number of babes born within the delivery and pre-gravid BMI.
Statistical analysis
Deliveries were the unit of our analysis. Baseline explanatory variables were described using appropriate summary statistics stratified by exposure to maternal PCOS. Mean with standard deviation (SD) and median with interquartile range (IQR) were provided for continuous variables as appropriate. Frequency and percentage were provided for categorical variables.
Multiple imputation using chained equation was performed to impute missing delivery related data that were essential to compute outcome variables [28,29,30]. Missing values were imputed 31 times (since gestational age was missing among 31% of the women in the study) using linear (for gestational age and birthweight outcomes), logistic (for stillbirth outcome and sex of the baby) and multinomial logistic (for delivery method categorical outcome) regression as appropriate using the variables age, BMI, impaired glucose regulation, deprivation and the number of babies delivered. Conditional logistic or multinomial logistic regression models were used to provide unadjusted and adjusted odds ratios (ORs) for the binary and nominal categorical outcome variables (mode of delivery), respectively, among women with PCOS compared to women without PCOS. We estimated robust confidence intervals after accounting for the intragroup correlation of multiple deliveries of a woman throughout her reproductive age. We included the explanatory variables in a step-by-step manner in the regression model, resulting in a fully adjusted model.
A sensitivity analysis was performed restricting to women with a coded diagnosis of PCOS only and their corresponding matched controls. All analyses were performed in Stata IC version 15. Two-sided P values were obtained for all tests, and a P value < 0.05 was considered as statistically significant. Selection of Read, ICD-10 and OPCS code lists was performed using an inhouse developed software platform called Code Builder, with systematic searching of existing code lists, and through clinical knowledge and discussion methods used in our previous publications [31], and the list of codes used for exposure and outcome ascertainment are provided in Additional files 1 and 2. The study results are reported as per the RECORD (REporting of studies Conducted using Observational Routinely-collected health Data) statement.