Introduction: Increasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare genetic variants. Sparseness leads to maximum likelihood estimators (MLEs) of log-OR parameters that are biased away from their null value of zero and tests with inflated type 1 errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. For a given m, log-F-penalized logistic regression may be easily implemented using data augmentation and standard software.
Method: We propose a two-step approach to the analysis of a genetic association study: first, a set of variants that show evidence of association with the trait is used to estimate m; and second, the estimated m is used for log-F-penalized logistic regression analyses of all variants using data augmentation with standard software. Our estimate of m is the maximizer of a marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm (MCEM) and (ii) a Laplace approximation (LA) to each integral, followed by derivative-free optimization of the approximation.
Results: We evaluate the statistical properties of our proposed two-step method and compared its performance to other shrinkage methods by a simulation study. Our simulation studies suggest that the proposed log-F-penalized approach has lower bias and mean squared error than other methods considered. We also illustrate the approach on data from a study of genetic associations with "super senior" cases and middle aged controls.
Discussion/conclusion: We have proposed a method for single rare variant analysis with binary phenotypes by logistic regression penalized by log-F priors. Our method has the advantage of being easily extended to correct for confounding due to population structure and genetic relatedness through a data augmentation approach.
Use of menopausal hormone therapy (MHT) is associated with increased risk for breast cancer. However, the relevant mechanisms and its interaction with genetic variants are not fully understood. We conducted a genome-wide interaction analysis between MHT use and genetic variants for breast cancer risk in 27,585 cases and 34,785 controls from 26 observational studies. All women were post-menopausal and of European ancestry. Multivariable logistic regression models were used to test for multiplicative interactions between genetic variants and current MHT use. We considered interaction p-values < 5 × 10-8 as genome-wide significant, and p-values < 1 × 10-5 as suggestive. Linkage disequilibrium (LD)-based clumping was performed to identify independent candidate variants. None of the 9.7 million genetic variants tested for interactions with MHT use reached genome-wide significance. Only 213 variants, representing 18 independent loci, had p-values < 1 × 105. The strongest evidence was found for rs4674019 (p-value = 2.27 × 10-7), which showed genome-wide significant interaction (p-value = 3.8 × 10-8) with current MHT use when analysis was restricted to population-based studies only. Limiting the analyses to combined estrogen-progesterone MHT use only or to estrogen receptor (ER) positive cases did not identify any genome-wide significant evidence of interactions. In this large genome-wide SNP-MHT interaction study of breast cancer, we found no strong support for common genetic variants modifying the effect of MHT on breast cancer risk. These results suggest that common genetic variation has limited impact on the observed MHT-breast cancer risk association.
Brain and Behavior
Introduction: Severe internal carotid stenosis, if left untreated, can pose serious risks for ischemic stroke and cognitive impairments. The effects of revascularization on any aspects of cognition, however, are not well understood, as conflicting results are reported, which have mainly been centered on paper-based cognitive analyses. Here, we summarized and evaluated the publications to date of functional MRI (fMRI) studies that examined the mechanisms of functional brain activation and connectivity as a way to reflect cognitive effects of revascularization on patients with carotid stenosis.
Methods: A PubMed and Google Scholar (covering the relevant literature until November 1, 2021) search yielded eight original studies of the research line, including seven resting-state and one task-based fMRI reports.
Results: Findings demonstrated treatment-related alterations in fMRI signal intensity and symmetry level, regional fMRI activation pattern, and functional brain network connectivity. The functional brain changes were associated largely with improvement in cognitive function assessed using standard cognitive test scores.
Conclusions: These findings support the contribution of fMRI to the understanding of brain functional activation and connectivity changes revealing cognitive effects of revascularization in the management of severe carotid stenosis. The review also highlighted the importance of reproducibility through enhancing experimental designs and cognitive task applications with future research for potential clinical translation.
Cancer Epidemiology, Biomarkers & Prevention
Background: A previous International Lymphoma Epidemiology (InterLymph) Consortium evaluation of joint associations between five immune gene variants and autoimmune conditions reported interactions between B-cell response-mediated autoimmune conditions and the rs1800629 genotype on risk of B-cell NHL subtypes. Here, we extend that evaluation using NHL subtype-specific polygenic risk scores (PRS) constructed from loci identified in genome-wide association studies of three common B-cell NHL subtypes.
Methods: In a pooled analysis of NHL cases and controls of Caucasian descent from 14 participating InterLymph studies, we evaluated joint associations between B-cell mediated autoimmune conditions and tertile (T) of PRS for risk of diffuse large B-cell lymphoma (DLBCL, n=1914), follicular lymphoma (FL, n=1733) and marginal zone lymphoma (MZL, n=407), using unconditional logistic regression.
Results: We demonstrated a positive association of DLBCL PRS with DLBCL risk (T2 vs T1: odds ratio, OR=1.24, 95% confidence interval, CI=1.08-1.43; T3 vs T1: OR=1.81, 95% CI=1.59-2.07; P-trend<0.0001). DLBCL risk also increased with increasing PRS tertile among those with an autoimmune condition, being highest for those with a B-cell mediated autoimmune condition and a T3 PRS (OR=6.46 vs no autoimmune condition and a T1 PRS, P-trend<0.0001, p-interaction=0.49). FL and MZL risk demonstrated no evidence of joint associations or significant p-interaction.
Conclusions: Our results suggest that PRS constructed from currently known subtype-specific loci may not necessarily capture biological pathways shared with autoimmune conditions.
Impact: Targeted genetic (PRS) screening among population subsets with autoimmune conditions may offer opportunities for identifying those at highest risk for (and early detection from) DLBCL.
Breast Cancer Research
Background: Genome-wide association studies (GWAS) have identified multiple common breast cancer susceptibility variants. Many of these variants have differential associations by estrogen receptor (ER) status, but how these variants relate with other tumor features and intrinsic molecular subtypes is unclear.
Methods: Among 106,571 invasive breast cancer cases and 95,762 controls of European ancestry with data on 173 breast cancer variants identified in previous GWAS, we used novel two-stage polytomous logistic regression models to evaluate variants in relation to multiple tumor features (ER, progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) and grade) adjusting for each other, and to intrinsic-like subtypes.
Results: Eighty-five of 173 variants were associated with at least one tumor feature (false discovery rate < 5%), most commonly ER and grade, followed by PR and HER2. Models for intrinsic-like subtypes found nearly all of these variants (83 of 85) associated at p < 0.05 with risk for at least one luminal-like subtype, and approximately half (41 of 85) of the variants were associated with risk of at least one non-luminal subtype, including 32 variants associated with triple-negative (TN) disease. Ten variants were associated with risk of all subtypes in different magnitude. Five variants were associated with risk of luminal A-like and TN subtypes in opposite directions.
Conclusion: This report demonstrates a high level of complexity in the etiology heterogeneity of breast cancer susceptibility variants and can inform investigations of subtype-specific risk prediction.
Leukemia & Lymphoma
Purpose: Polycyclic aromatic hydrocarbons (PAHs) are a group of environmental pollutants associated with multiple cancers, including female breast cancer. Several xenobiotic metabolism genes (XMGs), including the CYP450 family, play an important role in activating and detoxifying PAHs, and variations in the activity of the enzymes they encode can impact this process. This study aims to examine the association between XMGs and breast cancer, and to assess whether these variants modify the effects of PAH exposure on breast cancer risk.
Methods: In a case-control study in Vancouver, British Columbia, and Kingston, Ontario, 1037 breast cancer cases and 1046 controls had DNA extracted from blood or saliva and genotyped for 138 single nucleotide polymorphisms (SNPs) and tagSNPs in 27 candidate XMGs. Occupational PAH exposure was assessed using a measurement-based job-exposure matrix.
Results: An association between genetic variants and breast cancer was observed among six XMGs, including increased risk among the minor allele carriers of AKR1C3 variant rs12387 (OR 2.71, 95% CI 1.42-5.19) and AKR1C4 variant rs381267 (OR 2.50, 95% CI 1.23-5.07). Heterogeneous effects of occupational PAH exposure were observed among carriers of AKR1C3/4 variants, as well as the PTGS2 variant rs5275.
Conclusion: Our findings support an association between SNPs of XMGs and female breast cancer, including novel genetic variants that modify the toxicity of PAH exposure. These results highlight the interplay between genetic and environmental factors, which can be helpful in understanding the modifiable risks of breast cancer and its complex etiology.
Journal of Translational Genetics and Genomics
Aim: Recessive genetic variation is thought to play a role in non-Hodgkin lymphoma (NHL) etiology. Runs of homozygosity (ROH), defined based on long, continuous segments of homozygous SNPs, can be used to estimate both measured and unmeasured recessive genetic variation. We sought to examine genome-wide homozygosity and NHL risk.
Methods: We used data from eight genome-wide association studies of four common NHL subtypes: 3061 chronic lymphocytic leukemia (CLL), 3814 diffuse large B-cell lymphoma (DLBCL), 2784 follicular lymphoma (FL), and 808 marginal zone lymphoma (MZL) cases, as well as 9374 controls. We examined the effect of homozygous variation on risk by: (1) estimating the fraction of the autosome containing runs of homozygosity (FROH); (2) calculating an inbreeding coefficient derived from the correlation among uniting gametes (F3); and (3) examining specific autosomal regions containing ROH. For each, we calculated beta coefficients and standard errors using logistic regression and combined estimates across studies using random-effects meta-analysis.
Results: We discovered positive associations between FROH and CLL (β = 21.1, SE = 4.41, P = 1.6 × 10-6) and FL (β = 11.4, SE = 5.82, P = 0.02) but not DLBCL (P = 1.0) or MZL (P = 0.91). For F3, we observed an association with CLL (β = 27.5, SE = 6.51, P = 2.4 × 10-5). We did not find evidence of associations with specific ROH, suggesting that the associations observed with FROH and F3 for CLL and FL risk were not driven by a single region of homozygosity.
Conclusion: Our findings support the role of recessive genetic variation in the etiology of CLL and FL; additional research is needed to identify the specific loci associated with NHL risk.
British Journal of Cancer
PTEN loss is a putative driver in histotypes of ovarian cancer (high-grade serous (HGSOC), endometrioid (ENOC), clear cell (CCOC), mucinous (MOC), low-grade serous (LGSOC)). We aimed to characterise PTEN expression as a biomarker in epithelial ovarian cancer in a large population-based study.
Tumours from 5400 patients from a multicentre observational, prospective cohort study of the Ovarian Tumour Tissue Analysis Consortium were used to evaluate associations between immunohistochemical PTEN patterns and overall survival time, age, stage, grade, residual tumour, CD8+ tumour-infiltrating lymphocytes (TIL) counts, expression of oestrogen receptor (ER), progesterone receptor (PR) and androgen receptor (AR) by means of Cox proportional hazard models and generalised Cochran–Mantel–Haenszel tests.
Downregulation of cytoplasmic PTEN expression was most frequent in ENOC (most frequently in younger patients; p value = 0.0001) and CCOC and was associated with longer overall survival in HGSOC (hazard ratio: 0.78, 95% CI: 0.65–0.94, p value = 0.022). PTEN expression was associated with ER, PR and AR expression (p values: 0.0008, 0.062 and 0.0002, respectively) in HGSOC and with lower CD8 counts in CCOC (p value < 0.0001). Heterogeneous expression of PTEN was more prevalent in advanced HGSOC (p value = 0.019) and associated with higher CD8 counts (p value = 0.0016).
PTEN loss is a frequent driver in ovarian carcinoma associating distinctly with expression of hormonal receptors and CD8+ TIL counts in HGSOC and CCOC histotypes.
The journals of gerontology. Series A, Biological sciences and medical sciences, 2020
The genetic basis of healthy aging and longevity remains largely unexplained. One hypothesis as to why long-lived individuals do not appear to have a lower number of common-complex disease variants, is that despite carrying risk variants, they express disease-linked alleles at a lower level than the wild-type alleles. Allele-specific abundance (ASA) is the different transcript abundance of the two haplotypes of a diploid individual. We sequenced the transcriptomes of four healthy centenarians and four mid-life controls. CIBERSORT was used to estimate blood cell fractions: neutrophils were the most abundant source of RNA, followed by CD8+ T cells, resting NK cells, and monocytes. ASA variants were more common in noncoding than coding regions. Centenarians and controls had a comparable distribution of ASA variants by predicted effect, and we did not observe an overall bias in expression toward major or minor alleles. Immune pathways were most highly represented among the gene set that showed ASA. Although we found evidence of ASA in disease-associated genes and transcription factors, we did not observe any differences in the pattern of expression between centenarians and controls in this small pilot study.
Bioinformatics (Oxford, England), 2020
We present the R package SimRVSequences to simulate sequence data for pedigrees. SimRVSequences allows for simulations of large numbers of single-nucleotide variants (SNVs) and scales well with increasing numbers of pedigrees. Users provide a sample of pedigrees and SNV data from a sample of unrelated individuals.
BMC geriatrics, 2019
Super-Seniors are healthy, long-lived individuals who were recruited at age 85 years or older with no history of cancer, cardiovascular disease, diabetes, dementia, or major pulmonary disease. In a 10-year follow-up, we aimed to determine whether surviving Super-Seniors showed compression of morbidity, and to test whether the allele frequencies of longevity-associated variants in APOE and FOXO3 were more extreme in such long-term survivors.
PloS one, 2019
Inflammation contributes to breast cancer development through its effects on cell damage. This damage is usually dealt with by key genes involved in apoptosis and autophagy pathways.
Scientific reports, 2018
Diffuse Large B-Cell Lymphoma (DLBCL) is an aggressive hematological cancer for which mitochondrial metabolism may play an important role. Mitochondrial DNA (mtDNA) encodes crucial mitochondrial proteins, yet the relationship between mtDNA and DLBCL remains unclear. We analyzed the functional consequences and mutational spectra of mtDNA somatic mutations and private constitutional variants in 40 DLBCL tumour-normal pairs. While private constitutional variants occurred frequently in the D-Loop, somatic mutations were randomly distributed across the mitochondrial genome. Heteroplasmic constitutional variants showed a trend towards loss of heteroplasmy in the corresponding tumour regardless of whether the reference or variant allele was being lost, suggesting that these variants are selectively neutral. The mtDNA mutational spectrum showed minimal support for ROS damage and revealed strand asymmetry with increased C > T and A > G transitions on the heavy strand, consistent with a replication-associated mode of mutagenesis. These heavy strand transitions carried higher proportions of amino acid changes - which were also more pathogenic - than equivalent substitutions on the light strand. Taken together, endogenous replication-associated events underlie mtDNA mutagenesis in DLBCL and preferentially generate functionally consequential mutations. Yet mtDNA somatic mutations remain selectively neutral, suggesting that mtDNA-encoded mitochondrial functions may not play an important role in DLBCL.
PloS one, 2018
To understand why some people live to advanced age in good health and others do not, it is important to study not only disease, but also long-term good health. The Super-Seniors Study aims to identify factors associated with healthy aging.
Source code for biology and medicine, 2018
Studies that ascertain families containing multiple relatives affected by disease can be useful for identification of causal, rare variants from next-generation sequencing data.
Leukemia & lymphoma, 2017
We studied 140 families with two or more lymphoid cancers, including non-Hodgkin lymphoma (NHL), Hodgkin lymphoma (HL), chronic lymphocytic leukemia (CLL), and multiple myeloma (MM), for deviation from the population age of onset and lymphoid cancer co-occurrence patterns. Median familial NHL, HL, CLL and MM ages of onset are substantially earlier than comparable population data. NHL, HL and CLL (but not MM) also show earlier age of onset in later generations, known as anticipation. The co-occurrence of lymphoid cancers is significantly different from that expected based on population frequencies (p < .0001), and the pattern differs more in families with more affected members (p < .0001), suggesting specific lymphoid cancer combinations have a shared genetic basis. These families provide evidence for inherited factors that increase the risk of multiple lymphoid cancers. This study was approved by the BC Cancer Agency - University of British Columbia Clinical Research Ethics Board.
Several studies have found that long-lived individuals do not appear to carry lower numbers of common disease-associated variants than ordinary people; it has been hypothesized that they may instead carry protective variants. An intriguing type of protective variant is buffering variants that protect against variants that have deleterious effects. We genotyped 18 variants in 15 genes related to longevity or healthy aging that had been previously reported as having a gene-gene interaction or buffering effect. We compared a group of 446 healthy oldest-old 'Super-Seniors' (individuals 85 or older who have never been diagnosed with cancer, cardiovascular disease, dementia, diabetes or major pulmonary disease) to 421 random population-based midlife controls. Cases and controls were of European ancestry. Association tests of individual SNPs showed that Super-Seniors were less likely than controls to carry an APOEε4 allele or a haptoglobin HP2 allele. Interactions between APOE/FOXO3, APOE/CRYL1, and LPA/CRYL1 did not remain significant after multiple testing correction. In a network analysis of the candidate genes, lipid and cholesterol metabolism was a common theme. APOE, HP, and CRYL1 have all been associated with Alzheimer's Disease, the pathology of which involves lipid and cholesterol pathways. Age-related changes in lipid and cholesterol maintenance, particularly in the brain, may be central to healthy aging and longevity.