Deep Learning Image Analysis of Benign Breast Disease to Identify Subsequent Risk of Breast Cancer


Vellal AD, SIRINUKUNWATTANA K, Kensler KH, Baker GM, Stancu AL, Pyle ME, Collins LC, Schnitt SJ, Connolly JL, Veta M, et al. Deep Learning Image Analysis of Benign Breast Disease to Identify Subsequent Risk of Breast Cancer. JNCI Cancer Spectr. 2021;5 (1) :pkaa119.

Date Published:

3 Sept, 2020


Background New biomarkers of risk may improve breast cancer risk prediction. We developed a computational pathology method to segment benign breast disease (BBD) whole slide images (WSIs) into epithelium, fibrous stroma, and fat. We applied our method to the BBD breast cancer nested case-control study within the Nurses’ Health Studies to assess whether computer-derived tissue composition or a morphometric signature was associated with subsequent risk of breast cancer. Methods Tissue segmentation and nuclei detection deep-learning networks were established and applied to 3795 WSIs from 293 cases who developed breast cancer and 1132 controls who did not. Percentages of each tissue region were calculated and 615 morphometric features were extracted. Elastic net regression was used to create a breast cancer morphometric signature. Associations between breast cancer risk factors and age-adjusted tissue composition among controls were assessed using analysis of covariance. Unconditional logistic regression, adjusting for the matching factors, BBD histological subtypes, parity, menopausal status, and BMI evaluated the relationship between tissue composition and breast cancer risk. Results Among controls, BBD subtypes, parity, and number of births were differentially associated with all three tissue regions (p<0.05); select regions were associated with childhood body size, BMI, age of menarche, and menopausal status (p<0.05). Higher proportion of epithelial tissue was associated with increased breast cancer risk (OR=1.39, 95% CI 0.91-2.14 comparing highest and lowest quartiles; p-trend<0.05). No morphometric signature was associated with breast cancer. Conclusion The amount of epithelial tissue may be incorporated into risk assessment models to improve breast cancer risk prediction.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Institute of Health/National Cancer Institute R21CA187642 (RMT), R01CA175080 (RMT), R01CA240341 (RMT, YJH), UM1CA186107 (AHE), and U01 CA176726 (AHE), Susan G. Komen for the Cure IIR13264020 (RMT), Breast Cancer Research Foundation 17-174, the Klarman Family Foundation (YJH), BIDMC High School Summer Research Program (ADV).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data that support the findings of this study are available from the Nurses’ Health Studies, however they are not publicly available. Investigators interested in using the data can request access, and feasibility will be discussed at an investigators meeting. Limits are not placed on scientific questions or methods, and there is no requirement for co-authorship. Additional data sharing information and policy details can be accessed at The source code is available on GitHub

Publisher's Version

Last updated on 02/25/2021