Importance: A molecular diagnostic method that incorporates information about the transcriptional status of all genes across multiple tissue types can strengthen confidence in cancer diagnosis.
Objective: To determine the practical use of a whole transcriptome-based pan-cancer method in diagnosing primary and metastatic cancers and resolving complex diagnoses.
Design, setting, and participants: This cross-sectional diagnostic study assessed Supervised Cancer Origin Prediction Using Expression (SCOPE), a machine learning method using whole-transcriptome RNA sequencing data. Training was performed on publicly available primary cancer data sets, including The Cancer Genome Atlas. Testing was performed retrospectively on untreated primary cancers and treated metastases from volunteer adult patients at BC Cancer in Vancouver, British Columbia, from January 1, 2013, to March 31, 2016, and testing spanned 10 822 samples and 66 output classes representing untreated primary cancers (n = 40) and adjacent normal tissues (n = 26). SCOPE's performance was demonstrated on 211 untreated primary mesothelioma cancers and 201 treatment-resistant metastatic cancers. Finally, SCOPE was used to identify the putative site of origin in 15 cases with initial presentation as cancers with unknown primary of origin.
Results: A total of 10 688 adult patient samples representing 40 untreated primary tumor types and 26 adjacent-normal tissues were used for training. Demographic data were not available for all data sets. Among the training data set, 5157 of 10 244 (50.3%) were male and the mean (SD) age was 58.9 (14.5) years. Testing was performed on 211 patients with untreated primary mesothelioma (173 [82.0%] male; mean [SD] age, 64.5 [11.3] years); 201 patients with treatment-resistant cancers (141 [70.1%] female; mean [SD] age, 55.6 [12.9] years); and 15 patients with cancers of unknown primary of origin; among the treatment-resistant cancers, 168 were metastatic, and 33 were the primary presentation. An accuracy rate of 99% was obtained for primary epithelioid mesotheliomas tested (125 of 126). The remaining 85 mesotheliomas had a mixed etiology (sarcomatoid mesotheliomas) and were correctly identified as a mixture of their primary components, with potential implications in resolving subtypes and incidences of mixed histology. SCOPE achieved an overall mean (SD) accuracy rate of 86% (11%) and F1 score of 0.79 (0.12) on the 201 treatment-resistant cancers and matched 12 of 15 of the putative diagnoses for cancers with indeterminate diagnosis from conventional pathology.
Conclusions and relevance: These results suggest that machine learning approaches incorporating multiple tumor profiles can more accurately identify the cancerous state and discriminate it from normal cells. SCOPE uses the whole transcriptomes from normal and tumor tissues, and results of this study suggest that it performs well for rare cancer types, primary cancers, treatment-resistant metastatic cancers, and cancers of unknown primary of origin. Genes most relevant in SCOPE's decision making were examined, and several are known biological markers of respective cancers. SCOPE may be applied as an orthogonal diagnostic method in cases where the site of origin of a cancer is unknown, or when standard pathology assessment is inconclusive.