The target population consisted of all females between 40 and 80 years of age who were alive and registered in the GPRD on June 1, 2002. The percentage with symptoms and time between first relevant symptom and diagnosis code were assessed according to these categories. The code lists used in this paper, including a list used to identify patients who had previously been diagnosed with, or treated for another type of, cancer are provided in the additional material (additional file The number of cases diagnosed with ovarian cancer in each year in the study was calculated by dividing the number of first-time definite diagnoses codes for ovarian cancer by the corresponding person years for that year for the total study population.
From this population, all women with an incident diagnosis of ovarian cancer recorded during June 1, 2002 – May 31, 2007 were identified. Codes for relevant investigations and referrals for ovarian cancer were categorised into 5 groups: 1. The rates were stratified by 5-year age bands and were compared with the "Registrations of cancer diagnosed in 2004, England" and "Registrations of cancer diagnosed in 2005, England" as reported by the Office of National Statistics (ONS) .
In recognition of this fact, the main focus of the work reported in this paper is to explore alternative diagnostic dating methods for ovarian cancer using a number of working definitions to develop a generalisable strategy for analysis. The code list for the four categories listed in the section on sensitivity analysis below, and the category for 'cancer from other sites' was created by merging the clinical and referral records for the cases in the defined time periods with a comprehensive list of all cancer codes.
The GPRD dataset was provided under the MRC licence scheme and access to the dataset was approved by the Independent Scientific Advisory Committee (Protocol 07_069). The descriptions for the merged events were then inspected by the authors and assigned to the appropriate category.
Read codes: B440.00 (Malignant neoplasm of ovary) B440.11 (Cancer of ovary) or B44..00 (Malignant neoplasm of ovary and other uterine adnexa) or OXMIS codes: 1830A (Malignant neoplasm ovary), 1830AD (adenocarcinoma ovary), 1830C (Carcinoma ovary), 1830MC (mucinous cystadenocarcinoma ovary). Cancer treatment or referral code All codes indicating a prior cancer diagnosis e.g "cancer care review", "chemotherapy", referral to oncologist. Investigation or referral code for suspected ovarian cancer This category included codes for a relevant investigation (e.g ultrasound scan, CA125 test), diagnostic procedure (e.g. This definition was included in order to identify when the GP was first recorded as taking action to investigate the ovarian cancer.
Read codes (which have superseded the OXMIS codes) were specifically developed for use in UK primary care by Dr James Read during the 1980s are used to record all medical events in clinical practice. Four index dates based on these categories were constructed for each case, in order of increasing inclusivity, beyond the first Read code indicating a definite diagnosis of ovarian cancer.
Effects of possible inaccuracies in dating of diagnosis on the frequencies and timing of the most commonly reported symptoms were investigated using four increasingly inclusive definitions of first diagnosis/suspicion: 1. The most commonly coded symptoms before a definite diagnosis of ovarian cancer, were abdominal pain (41%), urogenital problems(25%), abdominal distension (24%), constipation/change in bowel habits (23%) with 70% of cases reporting at least one of these.
The median time between first reporting each of these symptoms and diagnosis was 13, 21, 9.5 and 8.5 weeks respectively.
The General Practice Research Database was used to investigate the time between first report of symptom and diagnosis of 344 women diagnosed with ovarian cancer between 01/06/2002 and 31/05/2008. "First treatment or complication suggesting pre-existing diagnosis", 4 "First relevant test or referral".
Mapping out routes from first symptom to diagnosis is currently the focus of much effort and is one of the main remits of a National Audit, within the National Awareness and Early Diagnosis initiative (NAEDI)) , many studies are based on small numbers and rely on patient interviews or surveys which may be subject to recall and non-response bias.
In this study we investigate the potential and pitfalls of using records from a large UK primary care database, the General Practice Research Database (GPRD), for investigating such delays using ovarian cancer as the exemplar.
Women with a previous definite or closely related diagnosis of ovarian cancer (Table ) were excluded from the cohort. The incidence of major categories of commonly reported symptoms was estimated for each time period by dividing the number of patients reporting each symptom at least once in the given time period by the number of patients.
A medical diagnosis of ovarian cancer was defined by a Read or OXMIS code for this condition recorded in the patient's clinical or referral record i.e. Software: Data management was undertaken using My Sql In order to determine the possible effects of inaccurate dating on the estimates of percentage of symptoms and delays, a sensitivity analysis was carried out using 4 alternative categories of Read codes indicating a diagnosis of, or investigation for ovarian cancer. Definite diagnostic code only Read codes for a case of ovarian cancer or malignant primary ovarian neoplasm as defined above ("definite diagnosis" in Table ). More general "ambiguous" code which could indicate diagnosis of ovarian cancer This category included ambiguous but very closely related Read code indicating possible ovarian cancer ("very closely related diagnosis" in Table ) together with at least one more general codes such as "Cancer", "Secondary neoplasm of other specified sites" and "Carcinomatosis" Category 3.
It contains anonymised longitudinal data on a representative sample of about 6% of the UK population – 3 million currently registered patients and over 8 million historic patients.