Cohort profile: The Singapore Breast Cancer Cohort (SGBCC), a multi-center breast cancer cohort for evaluation of phenotypic risk factors and genetic markers

This article aims to provide a detailed description of the Singapore Breast Cancer Cohort (SGBCC), an ongoing multi-ethnic cohort established with the overarching goal to identify genetic markers for breast cancer risk, prognosis and treatment response, as well as to understand the ethnic differences in disease risk and outcome in an Asian setting. The cohort comprises of breast cancer patients aged 21 years and above from six public hospitals which diagnose and treat nearly 76% breast cancer cases in Singapore. Self-reported data on sociodemographic and lifestyle, reproductive risk factors, medical history and family history of breast or ovarian cancer is collected using a structured questionnaire. Clinical data on tumour characteristics, and treatment modalities are obtained through medical record. Bio-specimens (blood or saliva) is collected at recruitment. Follow-up on survival information is done through routine linkage with the Registry of Births and Deaths. As of 31 December 2016, 7,768 subjects have been recruited to the study with 76% subjects contributed bio-specimens. The SGBCC provides a valuable platform which offers a unique, large and rich resource for new research ideas on breast cancer related phenotypic risk factors and genetic markers.


Introduction
Global incidence rates of breast cancer are on the rise and more than two million women are diagnosed with the disease every year [1]. The increase in incidence can be largely attributable to a surge in breast cancer rates in Asia, possibly due to changes in lifestyle and reproductive profiles [2]. Recent studies have found that breast cancer rates in current Asian generations are surpassing even the historically high rates in the United States [3], highlighting an urgent need for efficient prevention and treatment strategies among Asian populations.
Aetiology, diagnosis, treatment and survivorship of breast cancer have been well-studied in western populations. Guidelines on breast cancer detection and treatment in many Asian countries are largely based on evidence from western studies. However, Asian women appear to be substantially different from European women in terms of lifestyles, reproductive profile, genetic susceptibility to breast cancer, cultural and religious beliefs related to health, socioeconomic status, and drug metabolism and response [4][5][6][7][8].
Singapore is a multi-ethnic country with three main ethnic groups, Chinese (75%), Malays (14%), and Indians (9%) [9]. It has attained one of the highest standards of living in Asia and established one of the most efficient healthcare systems in the world. In 2002, a population-based mammographic breast screening program-Breast Screen Singapore was established [10]. The participation rate under Breast Screen Singapore ranged from 9.9% to 13.7% for the reported periods 2002 to 2009 [10]. Age-standardized five-year relative survival rate of breast cancer has then improved from 50.2% in 1973-77 to 79.5% in 2008-2012, attributable to early detection and advances in cancer treatment in recent decades [11]. However, breast cancer is still the most common cancer and leading cause of cancer death among women in Singapore [12]. The breast cancer incidence rate in Singapore tripled from 23.8 per 100,000 in 1975-79 to 64.7 per 100,000 in 2010-2014, and is now amongst the highest in Asia [13]. A strong birth cohort effect was also observed, implying that gradual change towards a more westernized lifestyle has contributed to the increasing incidence rate, especially for the more recent birth cohorts [14]. The age-standardised incidence of breast cancer in Singapore is higher in Chinese women (66.0 per 100,000) compared to Malays (60.4 per 100,000) and Indians (58.8 per 100,000) [13]. Malay women have the worst five-year overall survival rate (58.5%) among the three ethnic groups as they are more likely to present at a younger age, with more advanced stages and more aggressive tumour biology [15]. The reasons for such ethnic differences remain unclear but it is possible that genetic, socioeconomic or cultural difference could play a role.
As such, the Singapore Breast Cancer Cohort Project (SGBCC) was established in 2010 to evaluate plausible genetic (germline) and non-genetic risk factors (e.g. lifestyle, demographic, reproductive, family history, etc.) pertaining to breast cancer. We aim to identify new biomarkers for prognosis and response to treatment, and understand the differences in survival among Asian ethnic populations through follow-up of the patients in the healthcare system or national registries. Finally, in collaboration with population-based women recruited from the Multi-Ethnic Cohort (MEC) [16], we can further identify new biomarkers for disease risk and diagnosis.

Study design and study population
SGBCC was first established as a cohort with both retrospective and prospective components. Recruitment started at National University Hospital (NUH, a tertiary academic hospital) in April 2010. Subsequently, recruitment sites extended to five other tertiary hospitals, namely KK Women's and Children's Hospital (KKH, in 2011), Tan  At each participating site, eligible breast cancer patients are invited to participate during outpatient visits at breast surgeons' or oncologists' clinics. Eligibility is assessed with the following criteria: (1) a diagnosis of breast carcinoma in situ or invasive breast cancer; (2) citizens or permanent residents of Singapore; and (3) aged 21 years and above. Informed consent is sought by trained research coordinators in the patient's language of choice (English, Chinese or Malay). In addition, information from medical records are stored at their respective hospitals, and are requested upon approval from institutional review board on project specific topics.
Consent for passive follow-up of participants is obtained. Disease outcomes such as recurrence, disease progression, and occurrence of other primary cancers are obtained from hospital medical records. Vital status and cause of death are obtained via linkage with the Registry of Births and Deaths in accordance to the audit cycle at each hospital [17]. In Singapore, nearly all deaths of citizens and permanent residents are certified. The certificate of cause of death is issued by doctors or authorized medical practitioners.

Data collection
Structured questionnaire. The structured questionnaire was adapted from the KARolinska MAmmography Project for Risk Prediction of Breast Cancer (KARMA) study's questionnaire and translated from the English version to Mandarin and Malay [18]. The questionnaire was self-administered in paper format, facilitated by a research coordinator. If the patient is illiterate, the research coordinator will read the questions to the participants in English, Mandarin or Malay.
Baseline information on sociodemographic factors are obtained at the time of recruitment. The variables include ethnicity, place of birth, marital status, employment status, type of housing, highest educational qualification attained, history of previous or existing illnesses such as diabetes, hypertension, and renal impairment, family history of breast cancer, family history of ovarian cancer, menstrual (age of menarche, age of menopause) and reproductive risk factors (parity status, age at first childbirth, breastfeeding) for breast cancer, use of oral contraceptives, use of hormonal replacement therapy, tobacco smoking, alcohol consumption, participation and attitudes towards mammographic screening program, and self-reported weight and height at time of recruitment. All variables and corresponding questions are available on https://blog. nus.edu.sg/sgbcc/for-researchers/.
Breast cancer registry and/or medical records. Hospitals have differing schedules in updating their in-house breast cancer registry, with collection of variables starting at different years. Where participants or variables are not found in the breast cancer registry, medical records are accessed from the electronic medical record system of the individual restructured hospitals in SGBCC. The electronic medical record system is widely adopted in government restructured hospitals in Singapore, with four of our sites (SGH, NCCS, KKH and CGH) sharing the same system. NUH and TTSH have independent electronic medical record systems. Variables are extracted two years after patients' entry to the study to allow the maturation of treatment modalities and allow for quality checks by hospital staff. Extracted demographic variables include participant's date of birth and date of diagnosis, tumour characteristics including tumour stage, tumour size (millimetre), tumour grade and histological type, estrogen receptor status, progesterone receptor status, human epidermal growth factor receptor 2 status, treatment-related variables including surgery, radiotherapy, adjuvant chemotherapy, neo-adjuvant chemotherapy, endocrine therapy, and targeted therapy.
Biological specimens. Blood specimens of 20ml are obtained by the trained nurses or phlebotomists in the clinics after the interview or during subsequent clinical visits. Two types of blood tubes (BD Vacutainer 1 Ethylenediaminetetraacetic acid (K2 EDTA) and BD Vacutainer 1 Serum-separating tube (SST)) were used in all hospitals except for TTSH where two tubes of K2 EDTA were collected. Blood specimens are processed at a central biobank on a weekly basis. Upon receipt of the biological specimen, whole blood sample is separated into multiple aliquots of plasma, buffy coat, red blood cell, serum and blood clot after centrifugation. All aliquots are stored at temperature of -80˚C and an inventory tracking system is maintained to link with the biological specimen at all times. If the patient refuses donation of blood, a saliva specimen is collected by spitting into an Oragene 1 DNA OG-500 self-collection kit manufactured by DNA Genotek 1 [19,20]. Saliva specimen in the original collection tube is stored at -80˚C before DNA extraction. DNA is extracted from buffy coat and saliva using Qiagen 1 Flexigene DNA Kit (for buffy coat) and Oragene 1 prepIT•L2P reagent (for saliva) according to the manufacturer's protocol. Quantification of DNA is done using Trinean 1 or NanoDrop 1 platforms [21,22]. DNA samples are stored in -20˚C for long-term storage.
Germline genetic information. In collaboration with the Breast Cancer Association Consortium (BCAC), 4,464 participants were genotyped using a custom single nucleotide polymorphism (SNP) genotyping array (illumina OncoArray-500K BeadChip) (Fig 1) [23]. The OncoArray contains~500,000 SNPs with a genome-wide backbone of~275,000 tag SNPs and additional content comprising variants associated with five common cancers (breast, colorectal, lung, ovarian, and prostate), ancestry, quantitative traits, and pharmacogenetics [24,25]. As part of Breast Cancer Risk after Diagnostic Gene Sequencing (BRIDGES) initiative [26], next-generation targeted sequencing of 34 genes known or are suspected to be associated with breast cancer was performed for 4,464 patients (Fig 1) [27]. Of the 4,464 patients, 385 have at least one protein truncating variants in any of the 34 genes studied (S2 Fig). In addition, whole-exome sequencing (Roche SeqCap EZ Human Exome v3.0) and array-based profiling of DNA methylation (Illumina Infinium MethylationEPIC BeadChip) was also performed for a subset of 1,153 and 1,408 breast cancer patients, respectively.

Findings to date
The recruitment for SGBCC is still ongoing. As of 31 December 2016, 7,768 breast cancer patients have been enrolled and full clinical data is available. The overall participation rate was 86% and 5,931 (76%) subjects contributed bio-specimens (Fig 2). The cohort grew quickly in the first two years of recruitment at each participating site, mainly driven by prevalent cases (recruitment was more than one year post diagnosis of breast cancer) of participants on routine surveillance, and then slowed down to a steady rate of 200 newly diagnosed patients per year thereafter. Table 1 shows the comparisons between participant characteristics and national-level statistics reported in the Singapore cancer registry report on female breast cancer in 2010-2014 [13], Singapore census in 2010 [9], and National Health Survey in 2010 [28] ( Table 1). The associations between selected clinical characteristics and case status (incident or prevalent breast cancer), using the Chi-square test are summarized in Table 2. A total of 626 deaths, 458 due to breast cancer, was observed. Difference (log rank p-value<0.0001) in overall survival was observed between the stages at diagnosis and between the three major ethnicities (Fig 3). The observed survival difference by ethnicity is in agreement with the existing literature reporting that certain ethnic groups, such as Malays, are independently associated with worse survival [15]. Further studies are needed to clarify the underlying reasons. S1 Table lists the protein truncating variants (PTVs) carriership for breast cancer patients from SGBCC. We used the Singapore MEC as controls to compare the frequency of PTVs in the three major ethnic groups (Chinese, Malay and Indian) [16]. The MEC enrolled 34,870 males and females from the general population between 2013 and 2016 and aims to monitor risk factors on development of common health conditions (http://blog.nus.edu.sg/sphs/ multiethnic-cohort/) [16]. We matched 4,124 controls from MEC to 4,457 breast cancer patients by ethnicity and age (enrolment age of +/-5 years from the age at diagnosis of patients). S2 Table presents the frequency of rare protein truncating variants (PTVs) carriership for breast cancer patients from SGBCC and controls from MEC by ethnicity. To date, the most robust and reliable breast cancer risk predictor comprised of common genetic variants identified by GWAS is the 313-SNP breast cancer polygenic risk score developed based on women of European ancestry [29]. Genotyping was done for controls from MEC with matched cases from SGBCC. The distribution of PRS differed for cases and controls for the different ethnic groups (Kruskal-Wallis test p-value, P Chinese = 1.79E-49, P Malay = 7.70E-12, and P Indian = 4.44E-7) (Fig 4).

Strengths and limitations
The SGBCC is one of the largest breast cancer cohort studies in Asia and has several unique strengths. Annually, approximately 1,800 women in Singapore are diagnosed with breast cancer and reported to Singapore Cancer Registry. SGBCC recruits breast cancer patients from six of the ten restructured hospitals in Singapore; Three of the restructured hospitals not part of SGBCC were established in 2010 (Khoo Teck Puat Hospital), 2015 (Ng Teng Fong General Hospital) and 2018 (Sengkang General Hospital). Our cohort participants were of similar distribution of age and ethnicity to breast cancer patients in Singapore as identified by the Singapore Cancer Registry, which strengthened the generalisability of the study. The participation rate of the current study was high (86%) across all study sites, of which 76% of our cohort participants have donated saliva or blood sample. With high-level adoption of electronic medical record among healthcare institutions in Singapore, additional direct patient contact is not required for follow-up. Ensuring a higher participation rate, higher accuracy of clinical information and reducing the potential of loss to follow-up. We acknowledge that there are some limitations to our cohort. As with all prevalent case cohort studies, survivorship bias is observed. Only patients that were alive at the time of enrolment were included and patients diagnosed in years prior to the start of recruitment would tend to be of better prognosis. Patients with short survival time and low compliance to posttreatment surveillance and follow-up clinical care were more likely to be missed. Thus, the cohort on prevalent cases is biased towards more favourable survival outcome. However, the proportion of incidence cases at each site increases to approximately 80% after a period of five years. In addition, a study has demonstrated that the inclusion of prevalent cases in a population-based epidemiological cohort of breast cancer patients does not bias the hazard ratio estimation for three prognostic factors-clinical stage, grade and estrogen receptor status in a left truncation Cox survival analysis when the proportional hazards assumption holds [32]. Information on exposure variables prior to the occurrence of breast cancer like menstrual and reproductive risk factors may be subjected to recall bias. Socially desirability response bias to questions on tobacco smoking, alcohol consumption, physical activities, and participation and attitudes towards mammographic screening program may occur. Breast cancer patients are approached by trained research coordinators during their outpatient visit at SGBCC hospital sites. Informed consent is sought in the patient's language of choice (English, Chinese or Malay). Over an in-person interview with a research coordinator, participants answer a comprehensive questionnaire for assessing known breast cancer risk factors and attitude towards mammography screening. A blood or saliva sample was taken. Information on tumor characteristics, treatment, recurrence, survival and other adverse outcomes are retrieved from medical records. Date and cause of death are updated via record linkage to a national registry. (PDF)