Aim: run univariate and bivariate analyses using eQTL data for select loci

1 Background

Bivariate local \(r_{g}\) analyses revealed several loci (representing LD blocks) where significant \(r_{g}\)'s were found between multiple trait pairs. Of note, some LD blocks contained local \(r_{g}\)'s between neurodegenerative traits and between neuropscyhiatric traits, but with no overlap between the disorder groups. In addition, we observed LD blocks where several traits were associated with one trait in particular, but directions of effect were opposing (e.g. locus 1719, where BIP and MDD were positively correlated with SCZ, while LBD was negatively correlated with SCZ; or locus 2351, where AD and PD were positively and negatively correlated with LBD, respectively). Assuming that these assocations also correlate with expression quantitative trait loci (eQTLs), such differences may be driven by associations to different gene eQTLs.

2 Supplementary code

Following section includes any intermediary code used in this .Rmd.

2.1 Get genic regions

source(here::here("scripts", "04a_get_gene_loci.R"))

All genic regions are shown below.

gene_loci <- 
  read_delim(
    file = here::here("results", "04_eqtl_univar_bivar", "window_100000", "gene_filtered.loci"),
    delim = "\t"
    )

gene_loci %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

2.2 Pre-processing eQTL data

source(here::here("scripts", "04b_preprocess_eqtl.R"))

2.3 Get sample overlap

source(here::here("scripts", "04c_get_sample_overlaps.R"))

Subset of sample overlap matrix shown below.

sample_overlap <-
  read.table(
    file = here::here("results", "04_eqtl_univar_bivar", "sample_overlap.txt")
    )

as.matrix(sample_overlap)[1:10,1:10]

##                          AD2019  BIP2021 LBD2020 MDD2019 PD2019.meta5.ex23andMe
## AD2019                  1.00000  0.00628 0.02589 0.01946                0.01193
## BIP2021                 0.00628  1.00000 0.00800 0.05997               -0.00150
## LBD2020                 0.02589  0.00800 1.00000 0.00040                0.01281
## MDD2019                 0.01946  0.05997 0.00040 1.00000                0.00436
## PD2019.meta5.ex23andMe  0.01193 -0.00150 0.01281 0.00436                1.00000
## SCZ2018                 0.01150  0.12928 0.00661 0.03615                0.00295
## EQTLGEN_ENSG00000007047 0.00000  0.00000 0.00000 0.00000                0.00000
## EQTLGEN_ENSG00000007255 0.00000  0.00000 0.00000 0.00000                0.00000
## EQTLGEN_ENSG00000041353 0.00000  0.00000 0.00000 0.00000                0.00000
## EQTLGEN_ENSG00000048028 0.00000  0.00000 0.00000 0.00000                0.00000
##                         SCZ2018 EQTLGEN_ENSG00000007047 EQTLGEN_ENSG00000007255
## AD2019                  0.01150                       0                       0
## BIP2021                 0.12928                       0                       0
## LBD2020                 0.00661                       0                       0
## MDD2019                 0.03615                       0                       0
## PD2019.meta5.ex23andMe  0.00295                       0                       0
## SCZ2018                 1.00000                       0                       0
## EQTLGEN_ENSG00000007047 0.00000                       1                       0
## EQTLGEN_ENSG00000007255 0.00000                       0                       1
## EQTLGEN_ENSG00000041353 0.00000                       0                       0
## EQTLGEN_ENSG00000048028 0.00000                       0                       0
##                         EQTLGEN_ENSG00000041353 EQTLGEN_ENSG00000048028
## AD2019                                        0                       0
## BIP2021                                       0                       0
## LBD2020                                       0                       0
## MDD2019                                       0                       0
## PD2019.meta5.ex23andMe                        0                       0
## SCZ2018                                       0                       0
## EQTLGEN_ENSG00000007047                       0                       0
## EQTLGEN_ENSG00000007255                       0                       0
## EQTLGEN_ENSG00000041353                       1                       0
## EQTLGEN_ENSG00000048028                       0                       1

2.4 Prepare input info file

# Most input info already available, just need to add eqtls
input_info  <- 
  read_delim(
    file = file.path(here::here("results", "01_input_prep"),
                     "input.info.txt"),
    delim = "\t"
  )

file_paths <- 
  list.files(
      path = here::here("results", "04_eqtl_univar_bivar", "qtl_files"),
      pattern = "lava.gz",
      full.names = T
    )

input_info <-
  tibble(
    phenotype = 
      basename(file_paths) %>% 
      str_remove(".lava.gz"),
    cases = rep(1, times = length(file_paths)),
    controls = rep(0, times = length(file_paths)),
    filename = file_paths
  ) %>% 
  dplyr::bind_rows(
    input_info
  )

write_delim(
  input_info,
  path = file.path(here::here("results", "04_eqtl_univar_bivar"),
                   "input.info.txt"),
  delim = "\t"
)

2.5 Run univariate and bivariate tests

This was run using nohup:

# Have to navigate to root project folder for script to work (as it uses here package)
cd /home/rreynolds/misc_projects/neurodegen-psych-local-corr

nohup Rscript \
/home/rreynolds/misc_projects/neurodegen-psych-local-corr/scripts/04d_run_univar_bivar_test.R \
&>/home/rreynolds/misc_projects/neurodegen-psych-local-corr/logs/04d_run_univar_bivar_test_multwindows.log&

2.6 Tidying, applying multiple test correction and loading results

Once run, results can be loaded using the following code chunk:

source(here::here("scripts", "04e_tidy_univar_bivar.R"))

results <- readRDS(here::here("results", "04_eqtl_univar_bivar", "results_summary.rds"))

2.7 Partial correlations

source(here::here("scripts", "04f_run_partial_corr.R"))

pcorr <-
  readRDS(here::here("results", "04_eqtl_univar_bivar", "window_100000", "pcor", "eqtl_pcor.partcorr.lava.rds")) %>% 
  lapply(., function(list){
    
    list %>%
      qdapTools::list_df2df(col1 = "list_name") %>%
      dplyr::mutate(
        eqtl_dataset =
          case_when(
            stringr::str_detect(phen1, "PSYCH") |
              stringr::str_detect(phen2, "PSYCH") |
              stringr::str_detect(z, "PSYCH") ~ "PSYCHENCODE",
            stringr::str_detect(phen1, "EQTLGEN") |
              stringr::str_detect(phen2, "EQTLGEN") |
              stringr::str_detect(z, "EQTLGEN") ~ "EQTLGEN"
          )
      ) %>%
      dplyr::mutate(
        across(
          .cols = phen1:z,
          ~ case_when(
            !str_detect(.x, "ENSG") ~ .x %>%
              str_replace_all("[:digit:]", "") %>%
              str_remove("\\..*"),
            str_detect(.x, "ENSG") ~ str_remove(.x, ".*_")
          )
        )
      )
    
  }) %>%
  qdapTools::list_df2df(col1 = "list_name_2") %>%
  dplyr::select(
    eqtl_dataset,
    gene_locus,
    gene_name,
    everything(),
    -contains("list_name")
  ) %>%
  as_tibble()

2.8 Loading plotting functions

source(here::here("R", "plot_chord_diagram.R"))
source(here::here("R", "plot_locus.R"))
source(here::here("R", "plot_edge_diagram.R"))

3 Methods

3.1 Deriving loci for eQTL analyses

Firstly, we determined a number of LD blocks of interest from the LD blocks highlighted by bivariate local \(r_{g}\) analyses. LD blocks included:

Two LD blocks where AD and PD were correlated, but the correlation was either positive or negative.
- Locus 681, containing SNCA, which is of interest to PD
- Locus 1273, containing CLU, which is of interest to AD
An LD block where correlations were observed among neurodegenerative traits and among neuropsychiatric traits, but not between disorder groups, i.e. locus 2281, containing RAB27B
Two LD blocks where several traits were associated with one trait in particular, but directions of effect were opposing.
- Locus 1719, containing DRD2
- Locus 2351, containing APOE

From these LD blocks of interest, we defined genic regions. That is, a 100-kb window was added to the start and end of any protein-coding, antisense or lincRNA gene that overlapped an LD block of of interest (window used as most cis-eQTLs lie outside gene body, but within 100 kb of the gene, see Figure 2 in PMID: 34475573). These genic regions (\(n\) = 92) were carried forward in downstream analyses. Further, to ensure that our results were not driven by window size, we re-ran all analyses using a 50 kb window.

3.2 Sample overlap

Due to the potential sample overlap between cohorts and its impact on any estimated correlations, it is advised to use cross-trait LDSC (PMID:25642630; PMID:26414676) to obtain an estimate of the sample overlap. This, however, is not possible with eQTL summary statistics. Provided a reasonable certainty that there is no sample overlap between eQTL and GWAS summary statistics, it can be assumed to be zero (which precludes any correlations between eQTLGen and PsychENCODE, as both use GTEx samples, albeit from different tissues). This, however, needs double-checking in terms of eQTL and GWAS cohorts.

3.3 Running univariate and bivariate tests using eQTL summary statistics

For each genic region, only those traits that were found to have significant local \(r_{g}\) in the associated LD block were used for univariate and bivariate analyses with eQTL summary statistics. As described in 02_run_univar_bivar_test.html, a univariate test was performed as a filtering step for bivariate local \(r_{g}\) analyses. Bivariate local correlations were only performed (i) if the eQTL within the genic region exhibited a significant univariate local genetic signal and (ii) for pairs of traits which both exhibited a significant univariate local genetic signal. A cut-off of p < 0.05/92 was used to determine univariate significance. Using a 100-kb window, this resulted in a total of 354 bivariate tests spanning 55 distinct genic regions (with a 50-kb window, a total of 267 bivariate tests spanning 50 distinct genic regions). Bivariate results were multiple test corrected using two strategies: (i) a more stringent Bonferroni correction (i.e. p < 0.05/n bivariate tests) and (ii) a more lenient FDR correction.

3.4 Partial correlations

Partial correlations were computed where three-way relationships were observed between 2 disease traits and an eQTL. The partial correlation reflects the correlation between 2 traits (e.g. disease X and Y) that can be explained by a third trait (e.g. eQTL, Z). Thus, a partial correlation approaching 0 suggests that trait Z captures an increasing proportion of the correlation between traits X and Y. Due to the three-way nature of the relationships, 3 possible conformations were possible (i.e. X~Y|Z, X~Z|Y and Y~Z|X); partial correlations were computed for all 3.

4 Results

4.1 Tabular results

4.1.1 Univariate results

print("Univariate column descriptions:")

## [1] "Univariate column descriptions:"

tibble(
  column = colnames(results$univ$window_100000),
  description = 
    c(
      "LD block",
      "eQTL dataset used",
      "Genic region as identified by ensembl ID",
      "Gene name",
      "Genic region chromosome",
      "Genic region start base pair",
      "Genic region end base pair",
      "The number of SNPs within the locus",
      "The number of PCs retained within the locus",
      "Analysed phenotype",
      "Observed local heritability",
      "P-value from the univariate test (F-test for continuous, Chi-sq for binary)"
    )
) %>%
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

print("Univariate results (p < 0.05/n_loci):")

## [1] "Univariate results (p < 0.05/n_loci):"

results$univ %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  dplyr::filter(p < 0.05/nrow(gene_loci)) %>%
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

4.1.2 Bivariate results

Multiple test correction is complicated by the many overlapping genic regions and the fact that we're repeatedly testing the same disease phenotypes within one LD block (and simply substituting the eQTL-gene)
Thus, applied to multiple test correction strategies:
- A stringent Bonferroni, correcting for all bivariate tests performed
- A lenient FDR, correcting for all bivariate tests performed

# Pivot results
results_long <- 
  results$bivar %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  tidyr::pivot_longer( 
    cols = -c("window_size", "ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
    names_to = "feature",
    values_to = "value"
  ) %>% 
  tidyr::pivot_wider(names_from = "window_size") %>% 
  dplyr::mutate(
    across(
      .cols = contains("window"),
      ~ case_when(
        feature == "p" ~ -log10(.x),
        TRUE ~ .x
      )
    )
  ) %>%
  dplyr::mutate(
    feature = 
      case_when(
        feature == "p" ~ "-log10(p)",
        TRUE ~ feature
      ), 
    phenotype_corr =
      case_when(
        str_detect(phen2, "ENSG") ~ "gwas-eqtl",
        TRUE ~ "gwas-gwas"
      ),
    doe_100000 =
      case_when(
        feature == "rho" & window_100000 < 0 ~ "-",
        feature == "rho" & window_100000 > 0 ~ "+"
      ),
    doe_50000 =
      case_when(
        feature == "rho" & window_50000 < 0 ~ "-",
        feature == "rho" & window_50000 > 0 ~ "+" 
      ),
    doe_same =
      case_when(
        doe_100000 == doe_50000 ~ TRUE,
        doe_100000 != doe_50000 ~ FALSE
      )
  )

print("Bivariate column descriptions:")

## [1] "Bivariate column descriptions:"

tibble(
  column = colnames(results$bivar$window_100000),
  description = 
    c(
      "LD block",
      "eQTL dataset used",
      "Genic region as identified by ensembl ID",
      "Gene name",
      "Genic region chromosome",
      "Genic region start base pair",
      "Genic region end base pair",
      "The number of SNPs within the locus",
      "The number of PCs retained within the locus",
      "Phenotype 1",
      "Phenotype 2",
      "Standardised coefficient for the local genetic correlation",
      "Lower 95% confidence estimate for rho",
      "Upper 95% confidence estimate for rho",
      "Equivalent of taking the square of rho. Denotes the proportion of variance in genetic signal for phen1 explained by phen2 (and vice versa)",
      "Lower 95% confidence estimate for r2",
      "Upper 95% confidence estimate for r2",
      "Simulation p-values for the local genetic correlation",
      "FDR-corrected p-values",
      "Bonferroni-corrected p-values"
    )
) %>%
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: wrap')

print("Bivariate results (FDR < 0.05):")

## [1] "Bivariate results (FDR < 0.05):"

bivar_fdr <-
  results$bivar %>% 
  lapply(., function(df) df %>% dplyr::filter(fdr < 0.05))
  

bivar_fdr %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

print("Bivariate results (Bonferroni p < 0.05):")

## [1] "Bivariate results (Bonferroni p < 0.05):"

bivar_bonf <-
  results$bivar %>% 
  lapply(., function(df) df %>% dplyr::filter(bonferroni < 0.05))

bivar_bonf %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

4.1.3 Overlapping bivariate results between windows

This includes all results that were FDR < 0.05 when using either a window size of 50 kb or 100 kb. Note that there may be some genic regions where a significant genetic correlation was observed using one window size, but not the other.

overlap <- 
  bivar_fdr$window_100000 %>%
  dplyr::select(
    -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
  ) %>% 
  dplyr::left_join(
    results$bivar$window_50000 %>% 
      dplyr::select(
        -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
      ),
    by = c("ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
    suffix = c("_100000", "_50000")
  ) %>% 
  dplyr::bind_rows(
    bivar_fdr$window_50000 %>%
      dplyr::select(
        -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
      ) %>% 
      dplyr::left_join(
        results$bivar$window_100000 %>% 
          dplyr::select(
            -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
          ),
        by = c("ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
        suffix = c("_50000", "_100000")
      )
  ) %>% 
  select(
    ld_block:phen2, contains("rho"), contains("r2"), contains("p"), contains("fdr"), contains("bonferroni")
  ) %>% 
  dplyr::distinct()

overlap %>% 
  dplyr::arrange(ld_block, gene_name) %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

# saveRDS(
#   overlap,
#   here::here("results", "04_eqtl_univar_bivar", "fdr_overlap.rds")
# )

4.1.4 Partial correlations

print("Partial correlations column descriptions:")

## [1] "Partial correlations column descriptions:"

tibble(
    column = colnames(pcorr),
    description = 
      c("eQTL dataset from which gene eQTLs are derived",
        "Ensembl ID for gene for which eQTLs were tested in genic region",
        "HGNC symbol for gene for which eQTLs were tested in genic region",
        "Genic region chromosome",
        "Genic region start base pair",
        "Genic region end base pair",
        "The number of SNPs within the genic region",
        "The number of PCs retained within the genic region",
        "Phenotype 1. Gene expression traits (i.e. gene eQTLs) are denoted by their ensembl gene id",
        "Phenotype 2. Gene expression traits (i.e. gene eQTLs) are denoted by their ensembl gene id",
        "Phenotype which the genetic correlation between phenotype 1 and 2 was conditioned on",
        "The proportion of genetic signal in phenotype 1 explained by z",
        "The proportion of genetic signal in phenotype 2 explained by z",
        "The partial correlation between phen1 and phen2 conditioned on z",
        "Lower bound of 95% confidence interval for partial correlation",
        "Upper bound of 95% confidence interval for partial correlation",
        "Simulation p-values for the partial genetic correlation"
      )
  ) %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

print("All partial correlations:")

## [1] "All partial correlations:"

pcorr %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

4.2 Comparison of bivariate local \(r_{g}\)'s across 50-kb and 100-kb windows

4.2.1 Text

Bivariate \(r^{2}\), \(\rho\) and -log10(p-value) estimates between phenotypes that could be tested using genic regions with a 50-kb or 100-kb window were significantly positively correlated across the two window sizes (Figure 4.1).
- Notably, \(r^{2}\) estimates (i.e. the proportion of variance in genetic signal for one phenotype explained by the other) tended to be higher when using the 50-kb window, as compared to the 100-kb window, as evidenced by the fitted line falling below the equivalent of \(y = x\).
- P-values between GWAS and eQTLs tended to be more significant when using the 50-kb window, as compared to the 100-kb window, as evidenced by the fitted line falling below the equivalent of \(y = x\). In contrast, p-values between gwas traits were comparable across the two window sizes.
- While rho estimates between the two window sizes were positively correlated, 30 local correlations (of the total 254 testable) did not share the same direction of effect.
Using the more lenient FDR cut-off, we detect a total of 83 significant bivariate local \(r_{g}\)'s across 41 distinct genic regions, which replicate irrespective of the window size used to define genic regions (Figure 4.2).
- This equates to 51% of the significant 162 bivariate local \(r_{g}\)'s across both window sizes.
- Alternatively, this equates to 61% of the 135 significant bivariate local \(r_{g}\)'s using the 100-kb window and 75% of the 110 significant bivariate local \(r_{g}\)'s using the 50-kb window.
By comparison, using the more stringent Bonferroni cut-off, we detect a total of 42 significant bivariate local \(r_{g}\)'s across 29 distinct genic regions, which replicate irrespective of the window size used to define genic regions (Figure 4.2).
- This equates to 48% of the significant 88 bivariate local \(r_{g}\)'s across both window sizes.
- Alternatively, this equates to 57% of the 74 significant bivariate local \(r_{g}\)'s using the 100-kb window and 75% of the 56 significant bivariate local \(r_{g}\)'s using the 50-kb window.
As seen with \(r^{2}\) estimates from all phenotype pairs that could be tested across both window sizes, bivariate local \(r_{g}\)'s that were significant (Bonferroni-corrected p < 0.05) across both window sizes tended to have a higher \(r^{2}\) estimate when using the 50-kb window, as compared to the 100-kb window (Figure 4.3).
Irrespective of stringency used, more significant bivariate local \(r_{g}\)'s were identified using the 100-kb window. Further, there were some significant bivariate local \(r_{g}\)'s that were only identified when using one of the window sizes (Figure 4.2).

4.2.2 Figures

results_long %>% 
  dplyr::filter(
    !feature %in%
      c("chr", "start", "stop", "fdr", "bonferroni", "n_pcs", "n_snps"),
    !str_detect(feature, "upper"),
    !str_detect(feature, "lower")
  ) %>%
  ggplot(
    aes(
      x = window_50000,
      y = window_100000
    )
  ) + 
  geom_point(
    aes(
      fill = doe_same
    ),
    shape = 21,
    colour = "black",
    alpha = 0.4
  ) + 
  geom_smooth(
    method = "lm",
    formula = "y~x",
    level = 0.99,
    colour = "black",
    fill = "grey"
  ) +
  ggpubr::stat_cor(
    method = "pearson",
    cor.coef.name = "R"
  ) +
  geom_abline(
    intercept = 0,
    linetype = "dashed",
    colour = "#EE442F"
  ) +
  facet_wrap(
    vars(phenotype_corr, feature),
    scales = "free"
  ) +
  labs(
    x = "50-kb window",
    y = "100-kb window"
  ) +
  scale_fill_discrete(na.translate = F) +
  theme_rhr

Scatter plot of -log10(p-value), r2 and rho for each pair of phenotypes that could be tested across genic regions with a 50-kb or 100-kb window. Points are coloured, where applicable, by whether they share the same direction of effect. The black line represents a linear model fitted to the data, with the 99% confidence interval indicated with a grey fill. The red dashed line represents the line y = x.

Figure 4.1: Scatter plot of -log10(p-value), r2 and rho for each pair of phenotypes that could be tested across genic regions with a 50-kb or 100-kb window. Points are coloured, where applicable, by whether they share the same direction of effect. The black line represents a linear model fitted to the data, with the 99% confidence interval indicated with a grey fill. The red dashed line represents the line y = x.

results$bivar %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  tidyr::pivot_longer( 
    cols = -c("window_size", "ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
    names_to = "feature",
    values_to = "value"
  ) %>% 
  dplyr::mutate(
    window_size =
      window_size %>% 
      readr::parse_number() %>% 
      format(scientific = FALSE) %>% 
      as.factor()
  ) %>% 
  dplyr::filter(
    feature %in% c("bonferroni", "fdr"),
    value < 0.05
  ) %>% 
  dplyr::count(window_size, feature, name = "n_total") %>% 
  dplyr::mutate(
    unique =
      case_when(
        feature == "bonferroni" ~ n_total - (overlap %>% dplyr::filter(bonferroni_100000 < 0.05 & bonferroni_50000 < 0.05) %>% nrow()),
        feature == "fdr" ~ n_total - (overlap %>% dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>% nrow())
      ),
    shared =
      n_total-unique
  ) %>% 
  tidyr::pivot_longer(
    cols = unique:shared,
    names_to = "sharing",
    values_to = "n"
  ) %>% 
  ggplot(
    aes(
      x = window_size,
      y = n,
      fill = sharing
    )
  ) + 
  geom_col() +
  facet_grid(cols = vars(feature)) +
  labs(
    x = "Window size (kb)",
    y = "Number of significant bivariate local rg"
  ) +
  theme_rhr

Number of significant bivariate local genetic correlations across window sizes, using either the stringent bonferroni cut-off or the more lenient FDR cut-off. Bars are coloured by whether local genetic correlations are significant across both window sizes (shared) or only one (unique).

Figure 4.2: Number of significant bivariate local genetic correlations across window sizes, using either the stringent bonferroni cut-off or the more lenient FDR cut-off. Bars are coloured by whether local genetic correlations are significant across both window sizes (shared) or only one (unique).

a <-
  results_long %>% 
  inner_join(
    overlap %>%
      dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>% 
      dplyr::select(ld_block, eqtl_dataset, gene_locus, gene_name, phen1, phen2)
  ) %>% 
  dplyr::filter(feature == "r2") %>% 
  dplyr::rename(
    `50` = window_50000,
    `100` = window_100000
  ) %>% 
  ggpubr::ggpaired(
    cond1 = "50",
    cond2 = "100",
    fill = "condition",
    line.color = "gray", 
    line.size = 0.4,
    ggtheme = theme_rhr
  ) + 
  labs(
    x = "Window size (kb)",
    y = "r2"
  ) +
  theme(legend.position = "none")

b <- 
  results_long %>% 
  inner_join(
    overlap %>%
      dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>% 
      dplyr::select(ld_block, eqtl_dataset, gene_locus, gene_name, phen1, phen2)
  ) %>%
  dplyr::filter(
    feature %in% c("r2"), 
    !is.na(window_50000), 
    !is.na(window_100000)
  ) %>% 
  dplyr::mutate(
    diff = window_100000 - window_50000
    ) %>% 
  ggplot(aes(x = diff)) + 
  geom_density() + 
  labs(
    x = "Difference in r2 between two window sizes\n(diff = 100 kb - 50 kb)",
    y = "Density"
  ) +
  theme_rhr

ggpubr::ggarrange(
  a,b,
  labels = letters[1:2]
)

(a) Boxplot displaying r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (FDR < 0.05) across both window sizes. Grey lines indicate paired estimates. (b) Density plot displaying difference in r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (FDR < 0.05) across both window sizes.

Figure 4.3: (a) Boxplot displaying r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (FDR < 0.05) across both window sizes. Grey lines indicate paired estimates. (b) Density plot displaying difference in r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (FDR < 0.05) across both window sizes.

4.3 Significant bivariate local \(r_{g}\)'s at a global level (100-kb window)

4.3.1 Text

Using the more lenient FDR cut-off, we detect a total of 135 significant bivariate local \(r_{g}\)'s across 47 distinct genic regions, 43 of which involved a GWAS-eQTL pairing (27 distinct genic regions). When pairs of disease phenotypes (which were tested across multiple genes within an LD block) are collapsed within an LD block, we detect a total of 51 significant bivariate local \(r_{g}\)'s.
By comparison, using the more stringent Bonferroni cut-off, we detect a total of 74 significant bivariate local \(r_{g}\)'s across 41 distinct genic regions, 19 of which involved a GWAS-eQTL pairing (17 distinct genic regions). When pairs of disease phenotypes (which were tested across multiple genes within an LD block) are collapsed within an LD block, we detect a total of 25 significant bivariate local \(r_{g}\)'s.
Of the two eQTL datasets, the dataset with the highest number of genic regions (i.e. a gene where eQTLs were detected) was PsychENCODE (eQTLGen, n genic regions = 47; PsychENCODE, n genic regions = 87). In spite of this, the eQTL dataset with highest number of significant bivariate local \(r_{g}\)'s was eQTLGen, irrespective of the stringency of multiple test correction (Figure 4.5, 4.4). This likely reflects the bias in the selection of the LD blocks, which were weighted towards the AD/PD phenotypes, where we might expect to see a greater association with blood-derived molecular phenotypes.
Given that genic regions might vary in size, we plotted the number of SNPs within a locus and the p-value from the univariate test to determine whether there was a relationship between locus size and univariate p-value. There was no apparent relationship between the two (as determined by visual inspection, Figure 4.6).

4.3.2 Figures

data_to_plot <-
  bivar_fdr$window_100000 %>% 
  dplyr::filter(!str_detect(phen2, "ENSG")) %>% 
  dplyr::distinct(ld_block, phen1, phen2, rho) %>% 
  dplyr::group_by(ld_block, phen1, phen2) %>% 
  dplyr::summarise(rho = mean(rho)) %>% 
  dplyr::bind_rows(
    bivar_fdr$window_100000 %>% 
      dplyr::filter(str_detect(phen2, "ENSG")) %>% 
      dplyr::distinct(eqtl_dataset, ld_block, phen1, phen2, rho) %>% 
      dplyr::mutate(
        phen2 = 
          case_when(
            str_detect(phen2, "ENSG") ~ eqtl_dataset,
            TRUE ~ phen2
          )
      ) %>% 
      dplyr::select(ld_block, phen1, phen2, rho)
  ) %>% 
  dplyr::ungroup()

plot_bivar_chord_diagram(
  bivar_corr = data_to_plot,
  fct_phen = c(fct_disease, "EQTLGEN", "PSYCHENCODE"),
  palette = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
)

Chord diagram showing the number of **distinct** significant bivariate local genetic correlations between each of the traits across all LD blocks (FDR-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

Figure 4.4: Chord diagram showing the number of distinct significant bivariate local genetic correlations between each of the traits across all LD blocks (FDR-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

data_to_plot <-
  bivar_bonf$window_100000 %>% 
  dplyr::filter(!str_detect(phen2, "ENSG")) %>% 
  dplyr::distinct(ld_block, phen1, phen2, rho) %>% 
  dplyr::group_by(ld_block, phen1, phen2) %>% 
  dplyr::summarise(rho = mean(rho)) %>% 
  dplyr::bind_rows(
    bivar_bonf$window_100000 %>% 
      dplyr::filter(str_detect(phen2, "ENSG")) %>% 
      dplyr::distinct(eqtl_dataset, ld_block, phen1, phen2, rho) %>% 
      dplyr::mutate(
        phen2 = 
          case_when(
            str_detect(phen2, "ENSG") ~ eqtl_dataset,
            TRUE ~ phen2
          )
      ) %>% 
      dplyr::select(ld_block, phen1, phen2, rho)
  ) %>% 
  dplyr::ungroup()

plot_bivar_chord_diagram(
  bivar_corr = data_to_plot,
  fct_phen = c(fct_disease, "EQTLGEN", "PSYCHENCODE"),
  palette = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
)

Figure 4.5: Chord diagram showing the number of distinct significant bivariate local genetic correlations between each of the traits across all LD blocks (Bonferroni-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

results$univ$window_100000 %>% 
  dplyr::mutate(
    phen =
      case_when(
        str_detect(phen, "ENSG") ~ eqtl_dataset,
        TRUE ~ phen
      )
  ) %>% 
  ggplot(
    aes(
      x = n_snps,
      y = -log10(p)
    )
  ) +
  geom_point(alpha = 0.5) +
  scale_x_log10() +
  scale_colour_manual(
    values = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
  ) +
  facet_wrap(vars(phen)) +
  labs(
    x = "Number of SNPs within the locus (logarithmic scale)",
    y = "-log10(p-value)"
  ) +
  theme_rhr

Figure 4.6: Scatter plot of number of SNPs in a tested genic region and the p-value from the univariate test for each trait.

bivar_fdr$window_100000 %>% 
  dplyr::mutate(
    phenotype_corr =
      case_when(
        str_detect(phen2, "ENSG") ~ "gwas-eqtl",
        TRUE ~ "gwas-gwas"
      )
  ) %>% 
  ggplot(
    aes(
      x = phenotype_corr,
      y = r2
      )
    ) +
  geom_boxplot(fill = "#d9d9d9", outlier.alpha = 0) +
  ggbeeswarm::geom_beeswarm(alpha = 0.5) +
  theme_rhr

Boxplot of explained variance (r2, the proportion of variance in genetic signal of one disease trait in a pair explained by the other) in trait pairs involved a disease and gene expression trait (gwas-eqtl) or two disease traits (gwas-gwas). Only local rgs that passed significance are plotted (FDR < 0.05).

Figure 4.7: Boxplot of explained variance (r2, the proportion of variance in genetic signal of one disease trait in a pair explained by the other) in trait pairs involved a disease and gene expression trait (gwas-eqtl) or two disease traits (gwas-gwas). Only local rgs that passed significance are plotted (FDR < 0.05).

4.4 Phenotype correlations at genic region level (100-kb window) using lenient FDR multiple test correction

4.4.1 Text

Given the overlap between genic regions, significance was determined using the more lenient FDR multiple test correction. Results that do not replicate using the more stringent Bonferroni multiple test correction should be interpreted with caution.

Locus 681
- Across the entire LD block, a negative local \(r_{g}\) was observed between AD and PD.
- When locus 681 was subsetted by genic region, a significant negative \(r_{g}\) between AD and PD was only observed in the SNCA genic region.
- In addition, a significant negative \(r_{g}\) was observed between PD and eQTLGen-derived SNCA eQTLs (implying that increased PD risk is correlated with decreased SNCA expression).
  - While initially this may seem contrary to expectation, given that SNCA duplication/triplication events cause familial PD, it is important to bear in mind that this correlation is using blood-derived SNCA eQTLs. SNCA transcript counts in circulating blood cells have been observed to be reduced in early PD, an association that was seen in two PD case-control studies (Harvard Biomarker Study and the multicentre PROBE study) and a study of patients near disease onset (PPMI; PMID: 26220939). Importantly, this reduction in indviduals with PD remained significant after adjusting for covariates of gender, white and red blood cell counts.
  - Worth noting that the \(r^2\) (proportion of variance in genetic signal for one phenotype explained by the other) was \(r^2\) = 0.02 (as compared to the \(r^2\) between AD and PD at this locus, which was \(r^2\) = 0.3).
Locus 1273
- Across the entire LD block, a positive local \(r_{g}\) was observed between AD and PD.
- When locus 1273 was subsetted by genic region, a significant positive local \(r_{g}\) was observed between AD and PD in 6 out of 15 genic regions (in the remaining genic regions, local \(r_{g}\)'s between AD and PD were non-significant). This included the SCARA5 genic region, where significant negative local \(r_{g}\)'s were observed between AD and eQTLGen-derived SCARA5 eQTLs, as well as, PD and eQTLGen-derived SCARA5 eQTLs.
  - SCARA5 is a ferritin receptor that internalises ferritin (protein that stores iron), after which iron is liberated and transported to the cytosol. It is important for non-transferrin iron delivery (PMID: 19154717).
  - This is notable, as ferritin has found to be increased in AD and PD and, more generally, several studies have implicated cellular iron overload and iron-induced oxidative stress in AD, PD and other neurodegenerative diseases (as reviewed in PMID: 28154410).
- In the PBK genic region there was also a significant positive local \(r_{g}\)'s between AD and PD, as well as between AD and eQTLGen-derived PBK eQTLs, as well as, PD and eQTLGen-derived PBK eQTLs.
  - PBK/TOPK (PDZ-binding kinase, T-LAK-cell-originated protein kinase) is a serine-threonine kinase. Functions partly as an MAP kinase kinase by phosphorylation of p38 mitogen-activated protein kinase (MAPK) and is active during mitosis through phosphorylation of residue Thr9 by cyclin B1/Cdk1 (PMID: 33670114)
  - Abundant in highly proliferative tissues, such as the placenta, testis, T-LAK cells, activated lymphoid cells and lymphoid tumors, but it is expressed at a very low level or is even absent in non-proliferative normal tissues, such as normal adult brain tissue (with the exception of the subependymal zone and early postnatal cerebellar external layer, which is enriched with rapidly proliferating progenitor cells) (PMID: 33670114, (16291951)[https://pubmed.ncbi.nlm.nih.gov/16291951/]).
- A significant positive local \(r_{g}\) between AD and PD was also observed at the ELP3 genic region, together with a positive local \(r_{g}\) between PD and eQTLGen-derived ELP3 eQTLs. ELP3 is a component of the RNA polymerase II elongator complex (specifically it is the catalytic tRNA acetyltransferase subunit), which is required for multiple tRNA modifications. It is thought to be involved in neurogenesis (PMID: 19185337). Emil Gustavsson mentioned that ELP3 was a PD candidate risk gene he had been working on, but was unable to link to PD risk.
- Notably, while no local \(r_{g}\) was observed between AD and PD in the CLU locus, a significant positive local \(r_{g}\) was observed between AD and eQTLGen-derived CLU eQTLs and a negative local \(r_{g}\) was observed between PD and eQTLGen-derived CLU eQTLs. CLU has been associated with AD (PMID: 30872998), although the mechanism by which it might confer risk or protection is not known.
Locus 1719
- Across the entire LD block, a positive local \(r_{g}\) was observed between BIP, MDD and SCZ and a negative local \(r_{g}\) between SCZ and LBD.
- This relationship was almost entirely replicated in the NCAM1 locus, the exception being that no positive local \(r_{g}\) was observed between BIP and MDD. In this locus, however, a significant positive local \(r_{g}\) was observed between BIP and eQTLGen-derived NCAM1 eQTLs.
  - NCAM1 is a cell adhesion protein, which is a member of the immunoglobulin superfamily, and is involved in cell-to-cell interactions as well as cell-matrix interactions during development and differentiation. It is best known as a marker of neural lineages, but is also expressed in the hematopoietic system (PMID: 28791027).
  - Genetic variation in NCAM1 has previously been associated with increased risk of BIP in Japanese individuals (PMID: 15050861).
- The positive local \(r_{g}\) observed between MDD and SCZ was also replicated in the ANKK1 locus. In addition, a significant negative local \(r_{g}\) was observed between both neuropsychiatric disorders and eQTLGen-derived ANKK1 eQTLs, an observation which was replicated for MDD and SCZ using ANKK1 PsychENCODE-derived eQTLs.
  - ANKK1 encodes a protein belonging to the Ser/Thr protein kinase family.
  - This gene is closely linked to DRD2 gene, and a well-studied restriction fragment length polymorphism designated TaqIA, was originally associated with the DRD2 gene, however, later was determined to be located in exon 8 of ANKK1 gene, where it causes a nonconservative amino acid substitution (PMID: 15146457, 18621654).
  - TaqIA polymorphism thought to be a marker of both DRD2 and ANKK1 genetic variant. Has been linked to: alcoholism, antisocial traits, schizophrenia, eating disorders, and some behavioral childhood disorders (PMID: 19526298).
- Worth noting that DRD2 is not expressed in whole blood (GTEx v8, median TPM = 0, n = 755), which would explain why no eQTLs were found to regulate DRD2 in the eQTLGen dataset. In the PsychENCODE dataset, eQTLs were detected and had sufficient local heritability (\(h^2\) = 0.025, p = 0.0021), but no significant local \(r_{g}\)'s were detected.
Locus 2281
- Across the entire LD block, a positive local \(r_{g}\) was observed between (i) BIP and SCZ, (ii) MDD and SCZ and (iii) AD and PD.
- The positive local \(r_{g}\) observed between MDD and SCZ was replicated in the TCF4 genic region. In addition, a significant negative local \(r_{g}\) was observed between (i) MDD and eQTLGen-derived TCF4 eQTLs and (ii) AD and eQTLGen-derived TCF4 eQTLs.
- In the RAB27B genic region, a positive local \(r_{g}\) was observed between (i) BIP and SCZ and (ii) BIP and MDD.
- In addition, a negative local \(r_{g}\) was observed between MDD and eQTLGen-derived RAB27B eQTLs. Notably, when using PsychENCODE-derived RAB27B eQTLs, a positive local \(r_{g}\) was observed between MDD and RAB27B eQTLs. In other words, the direction of effect changed depending on the tissue.
Locus 2351
- Across the entire LD block, a positive local \(r_{g}\) was observed between AD and LBD and a negative local \(r_{g}\) between PD and LBD. While the correlation between AD and PD did not pass the Bonferroni cut-off of p = 0.0001666667, it was nominally significant at p = 0.000119, with a rho = -0.18.
- While each of these correlations was not replicated in every genic region with a significant local \(r_{g}\) between a disease phenotype and eQTL, where they were, they shared the same direction of effect as the local \(r_{g}\) observed across the entire LD block.
- A total of 16 genic regions contained a local \(r_{g}\) between a disease phenotype and eQTL.
- No significant local \(r_{g}\) was observed between APOE eQTLs and any of the disease traits. In part this was because there was insufficient heritability in the region using PsychENCODE eQTLs (univariate p-value = 0.28). While there was sufficient heritability using eQTLGen eQTLs, bivariate p-values did not pass FDR-correction (AD-APOE, rho = 0.18, p = 0.039; LBD-APOE, rho = 0.09, p = 0.374; PD-APOE, rho = -0.22, p = 0.063). However, as seen at the level of LD block, AD and LBD were positively correlated, while (i) AD and PD and (ii) LBD and PD were negatively correlated.
- At the TOMM40 genic region (which is adjacent to the APOE genic region), disease phenotypes were correlated as observed at the level of LD block. In addition significant positive local \(r_{g}\)'s were observed between eQTLGen-derived TOMM40 eQTLs and both AD and LBD (TOMM40 has been previously associated with AD). Similar local \(r_{g}\) patterns were seen at the PVRL2 genic region, which is adjacent to the TOMM40 genic region. Notably, haplotypes in the PVRL2 have been found to exert risk effects on AD in an APOE-E4 and APOE-E2 genotype-independent manner (PMID: 31346172)
- Other notable eQTL-trait associations include:
  - A positive correlation between BCL3 eQTLs and AD/PD, which themselves were also positively correlated in the region.
    - A WGS study of 757 blood samples from the ADNI cohort found an association between common SNPs in BCL3 and CSF Abeta1-42, after adjusting for APOE genotype (PMID: 28589856). The same study also tested 11 other genes in the APOE region (including BCAM, CBLC, PVRL2, TOMM40, APOE, APOC1, APOC1P1, APOC4, APOC2, CLPTM1, and RELB). While CBLC, BCAM and CLPTM1 did not pass the Bonferroni cut-off (0.05/12 tested genes), they were nominally significant (p-value adjusted for APOE genotype < 0.05).
    - BCL3 upregulated postmortem in the brain of patients with LOAD and identified as a DEG in both AD blood and MCI blood (PMID: 30478411).
  - A positive correlation between AD and GEMIN7 eQTLs (across both eQTLGen and PyschENCODE, albeit stronger in PsychENCODE). GEMIN7 is a component of the core SMN complex (PMID: 12065586), which is required for pre-mRNA splicing in the nucleus.

4.4.2 Figures

ld_blocks <- 
  bivar_fdr$window_100000$ld_block %>% unique() %>% sort()

qtl_genes <-
  bivar_fdr$window_100000 %>%
  dplyr::filter(str_detect(phen2, "ENSG")) %>% 
  .[["gene_locus"]] %>% 
  unique()

bivar_list <- 
  setNames(
    bivar_fdr$window_100000 %>% 
      dplyr::group_split(ld_block),
    nm = 
      str_c("locus_", ld_blocks)
  )

ref <- import("/data/references/ensembl/gtf_gff3/v87/Homo_sapiens.GRCh37.87.gtf")
ref <- ref %>% keepSeqlevels(c(1:22), pruning.mode = "coarse") 
ref <- ref[ref$type == "gene"]

loci_gr <-
  LAVA::read.loci(here::here("results", "01_input_prep", "gwas_filtered.loci")) %>%
  dplyr::rename(locus = LOC) %>% 
  dplyr::filter(locus %in% bivar_fdr$window_100000$ld_block) %>% 
  GenomicRanges::makeGRangesFromDataFrame(
    .,
    keep.extra.columns = TRUE,
    ignore.strand = TRUE,
    seqinfo = NULL,
    seqnames.field = "CHR",
    start.field = "START",
    end.field = "STOP"
  )
  

fig_list <- vector(mode = "list", length = length(ld_blocks))

for(block in ld_blocks){
  
  locus_plot <- 
    plot_locus(
    locus_gr = loci_gr[loci_gr$locus == block], 
    ref = ref,
    window = 50000,
    highlight_gene = qtl_genes,
    highlight_gene_label = "QTL - local rg?"
    )
  
  edge_plot <- 
    plot_qtl_edge_diagram(
          bivar_corr_qtl = bivar_list[[str_c("locus_", block)]],
          phen = fct_disease,
          seed = 89
        )
  
  if(length(edge_plot)/4 >=2){
    
    height <- c(1,3.5)
    
  } else{

    height <- c(1,1)
  }
  
  fig_list[[which(ld_blocks == block)]] <- 
    ggpubr::ggarrange(
      locus_plot,
      ggpubr::ggarrange(
        plotlist = edge_plot,
        ncol = 4,
        nrow = ceiling(length(edge_plot)/4), 
        common.legend = TRUE,
        legend = "top"
      ),
      ncol = 1,
      labels = letters[1:2],
      heights = height
    )
  
    names(fig_list)[which(ld_blocks == block)] <- str_c("locus_", block)
  
}

fig_list[1:4]

## $locus_681

(a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.8: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1273

Figure 4.9: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1719

Figure 4.10: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_2281

Figure 4.11: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

fig_list[5]

## $locus_2351

(a) Locus plot. Genic regions with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.12: (a) Locus plot. Genic regions with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

4.5 Phenotype correlations at genic region level (100-kb window) using stringent bonferroni multiple test correction

4.5.1 Text

Of the 43 eQTL-GWAS bivariate correlations observed using the lenient FDR correction, 19 survived Bonferroni correction.
Some of these included:
- Locus 681
  - Negative \(r_{g}\) between AD and PD and between PD and eQTLGen-derived SNCA eQTLs. This was also significant when using a window size of 50 kb.
- Locus 1273
  - Positive local \(r_{g}\) between PD and eQTLGen-derived ESCO2, PBK and PNOC eQTLs. Using a window size of 50 kb, associations with PBK and PNOC eQTLs were significant, while the association with ESCO2 eQTLs did not survive Bonferroni correction (but was significant using FDR).
- Locus 1719
  - Positive local \(r_{g}\) between MDD and SCZ and negative local \(r_{g}\) between both neuropsychiatric disorders and eQTLGen-derived ANKK1 eQTLs. Using a window size of 50 kb, the association of MDD with ANKK1 eQTLs was significant, while the association of SCZ with ANKK1 eQTLs was nominal (Bonferroni-corrected p-value = 0.073).
  - Positve local \(r_{g}\) between BIP and eQTLGen-derived NCAM1 eQTLs. This was also significant when using a window size of 50 kb.
- Locus 2281
  - Positive local \(r_{g}\) between MDD and PsychENCODE-derived RAB27B eQTLs. This was also significant when using a window size of 50 kb.
  - Negative local \(r_{g}\) between MDD and eQTLGen-derived TCF4 eQTLs. Using a window size of 50 kb, this association did not survive Bonferroni correction, but was significant using the lenient FDR cut-off.
- Locus 2351 (of the 22 eQTL-GWAS bivariate correlations observed using the lenient FDR correction, 10 survived Bonferroni correction)
  - Positive local \(r_{g}\) between AD and eQTLGen- and PsychENCODE-derived GEMIN7 eQTLs. Using a window size of 50 kb, neither of these associations survived Bonferroni correction; however, the association between AD and PsychENCODE-derived GEMIN7 eQTLs was significant using the lenient FDR cut-off.
  - Positive local \(r_{g}\) between AD and eQTLGen-derived PVRL2 eQTLs. This was also significant when using a window size of 50 kb.

4.5.2 Figures

ld_blocks <- 
  bivar_bonf$window_100000 %>%
  dplyr::group_by(eqtl_dataset, gene_locus) %>%
  .[["ld_block"]] %>% 
  unique() %>% 
  sort()

qtl_genes <-
  bivar_bonf$window_100000 %>%
  dplyr::group_by(eqtl_dataset, gene_locus) %>%
  dplyr::filter(str_detect(phen2, "ENSG")) %>% 
  .[["gene_locus"]] %>% 
  unique()

bivar_list <- 
  setNames(
    bivar_bonf$window_100000 %>% 
      dplyr::group_split(ld_block),
    nm = 
      str_c("locus_", ld_blocks)
  )
  
fig_list <- vector(mode = "list", length = length(ld_blocks))

for(block in ld_blocks){
  
  locus_plot <- 
    plot_locus(
    locus_gr = loci_gr[loci_gr$locus == block], 
    ref = ref,
    highlight_gene = qtl_genes,
    highlight_gene_label = "QTL - local rg?"
    )
  
  edge_plot <- 
    plot_qtl_edge_diagram(
          bivar_corr_qtl = bivar_list[[str_c("locus_", block)]],
          phen = fct_disease,
          seed = 89
        )
  
  if(length(edge_plot)/4 >=2){
    
    height <- c(1,3.5)
    
  } else{

    height <- c(1,0.5)
  }
  
  fig_list[[which(ld_blocks == block)]] <- 
    ggpubr::ggarrange(
      locus_plot,
      ggpubr::ggarrange(
        plotlist = edge_plot,
        ncol = 4,
        nrow = ceiling(length(edge_plot)/4), 
        common.legend = TRUE,
        legend = "top"
      ),
      ncol = 1,
      labels = letters[1:2],
      heights = height
    )
  
    names(fig_list)[which(ld_blocks == block)] <- str_c("locus_", block)
  
}

fig_list[1:4]

## $locus_681

Figure 4.13: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1273

Figure 4.14: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1719

Figure 4.15: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_2281

Figure 4.16: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

fig_list[5]

## $locus_2351

Figure 4.17: (a) Locus plot. Genic regions with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

5 Session info

Show/hide

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.5 (2021-03-31)
##  os       Ubuntu 16.04.7 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2022-10-03                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  ! package              * version  date       lib source         
##  P abind                  1.4-5    2016-07-21 [?] CRAN (R 4.0.5) 
##  P assertthat             0.2.1    2019-03-21 [?] CRAN (R 4.0.5) 
##  P backports              1.3.0    2021-10-27 [?] CRAN (R 4.0.5) 
##  P beeswarm               0.4.0    2021-06-01 [?] CRAN (R 4.0.5) 
##  P Biobase                2.50.0   2020-10-27 [?] Bioconductor   
##  P BiocGenerics         * 0.36.1   2021-04-16 [?] Bioconductor   
##  P BiocParallel           1.24.1   2020-11-06 [?] Bioconductor   
##  P Biostrings             2.58.0   2020-10-27 [?] Bioconductor   
##  P bit                    4.0.4    2020-08-04 [?] CRAN (R 4.0.5) 
##  P bit64                  4.0.5    2020-08-30 [?] CRAN (R 4.0.5) 
##  P bitops                 1.0-7    2021-04-24 [?] CRAN (R 4.0.5) 
##  P bookdown               0.22     2021-04-22 [?] CRAN (R 4.0.5) 
##  P broom                  0.7.10   2021-10-31 [?] CRAN (R 4.0.5) 
##  P bslib                  0.3.1    2021-10-06 [?] CRAN (R 4.0.5) 
##  P car                    3.0-11   2021-06-27 [?] CRAN (R 4.0.5) 
##  P carData                3.0-4    2020-05-22 [?] CRAN (R 4.0.5) 
##  P cellranger             1.1.0    2016-07-27 [?] CRAN (R 4.0.5) 
##  P chron                  2.3-56   2020-08-18 [?] CRAN (R 4.0.5) 
##  P circlize             * 0.4.13   2021-06-09 [?] CRAN (R 4.0.5) 
##  P cli                    3.1.0    2021-10-27 [?] CRAN (R 4.0.5) 
##  P colorspace             2.0-2    2021-06-24 [?] CRAN (R 4.0.5) 
##  P cowplot                1.1.1    2020-12-30 [?] CRAN (R 4.0.5) 
##  P crayon                 1.4.2    2021-10-29 [?] CRAN (R 4.0.5) 
##  P crosstalk              1.1.1    2021-01-12 [?] CRAN (R 4.0.5) 
##  P curl                   4.3.2    2021-06-23 [?] CRAN (R 4.0.5) 
##  P data.table             1.14.2   2021-09-27 [?] CRAN (R 4.0.5) 
##  P DBI                    1.1.1    2021-01-15 [?] CRAN (R 4.0.5) 
##  P dbplyr                 2.1.1    2021-04-06 [?] CRAN (R 4.0.5) 
##  P DelayedArray           0.16.3   2021-03-24 [?] Bioconductor   
##  P digest                 0.6.29   2021-12-01 [?] CRAN (R 4.0.5) 
##  P dplyr                * 1.0.7    2021-06-18 [?] CRAN (R 4.0.5) 
##  P DT                     0.19     2021-09-02 [?] CRAN (R 4.0.5) 
##  P ellipsis               0.3.2    2021-04-29 [?] CRAN (R 4.0.5) 
##  P evaluate               0.14     2019-05-28 [?] CRAN (R 4.0.5) 
##  P fansi                  0.5.0    2021-05-25 [?] CRAN (R 4.0.5) 
##  P farver                 2.1.0    2021-02-28 [?] CRAN (R 4.0.5) 
##  P fastmap                1.1.0    2021-01-25 [?] CRAN (R 4.0.5) 
##  P forcats              * 0.5.1    2021-01-27 [?] CRAN (R 4.0.5) 
##  P foreign                0.8-81   2020-12-22 [?] CRAN (R 4.0.5) 
##  P fs                     1.5.1    2021-11-30 [?] CRAN (R 4.0.5) 
##  P generics               0.1.1    2021-10-25 [?] CRAN (R 4.0.5) 
##  P GenomeInfoDb         * 1.26.7   2021-04-08 [?] Bioconductor   
##  P GenomeInfoDbData       1.2.4    2021-07-03 [?] Bioconductor   
##  P GenomicAlignments      1.26.0   2020-10-27 [?] Bioconductor   
##  P GenomicRanges        * 1.42.0   2020-10-27 [?] Bioconductor   
##  P ggbeeswarm           * 0.6.0    2017-08-07 [?] CRAN (R 4.0.5) 
##  P ggforce                0.3.3    2021-03-05 [?] CRAN (R 4.0.5) 
##  P ggplot2              * 3.3.5    2021-06-25 [?] CRAN (R 4.0.5) 
##  P ggpubr                 0.4.0    2020-06-27 [?] CRAN (R 4.0.5) 
##  P ggraph               * 2.0.5    2021-02-23 [?] CRAN (R 4.0.5) 
##  P ggrepel                0.9.1    2021-01-15 [?] CRAN (R 4.0.5) 
##  P ggsignif               0.6.3    2021-09-09 [?] CRAN (R 4.0.5) 
##  P GlobalOptions          0.1.2    2020-06-10 [?] CRAN (R 4.0.5) 
##  P glue                   1.5.1    2021-11-30 [?] CRAN (R 4.0.5) 
##  P graphlayouts           0.7.1    2020-10-26 [?] CRAN (R 4.0.5) 
##  P gridExtra              2.3      2017-09-09 [?] CRAN (R 4.0.5) 
##  P gtable                 0.3.0    2019-03-25 [?] CRAN (R 4.0.5) 
##  P haven                  2.4.3    2021-08-04 [?] CRAN (R 4.0.5) 
##  P here                   1.0.1    2020-12-13 [?] CRAN (R 4.0.5) 
##  P highr                  0.9      2021-04-16 [?] CRAN (R 4.0.5) 
##  P hms                    1.1.1    2021-09-26 [?] CRAN (R 4.0.5) 
##  P htmltools              0.5.2    2021-08-25 [?] CRAN (R 4.0.5) 
##  P htmlwidgets            1.5.4    2021-09-08 [?] CRAN (R 4.0.5) 
##  P httr                   1.4.2    2020-07-20 [?] CRAN (R 4.0.5) 
##  P igraph                 1.2.7    2021-10-15 [?] CRAN (R 4.0.5) 
##  P IRanges              * 2.24.1   2020-12-12 [?] Bioconductor   
##  P jquerylib              0.1.4    2021-04-26 [?] CRAN (R 4.0.5) 
##  P jsonlite               1.7.2    2020-12-09 [?] CRAN (R 4.0.5) 
##  P knitr                  1.36     2021-09-29 [?] CRAN (R 4.0.5) 
##  P labeling               0.4.2    2020-10-20 [?] CRAN (R 4.0.5) 
##  P lattice                0.20-44  2021-05-02 [?] CRAN (R 4.0.5) 
##  P LAVA                   0.0.6    2021-09-01 [?] xgit (@7be3421)
##  P lifecycle              1.0.1    2021-09-24 [?] CRAN (R 4.0.5) 
##  P lubridate              1.8.0    2021-10-07 [?] CRAN (R 4.0.5) 
##  P magrittr               2.0.1    2020-11-17 [?] CRAN (R 4.0.5) 
##  P MASS                   7.3-54   2021-05-03 [?] CRAN (R 4.0.5) 
##  P Matrix                 1.3-4    2021-06-01 [?] CRAN (R 4.0.5) 
##  P MatrixGenerics         1.2.1    2021-01-30 [?] Bioconductor   
##  P matrixStats            0.61.0   2021-09-17 [?] CRAN (R 4.0.5) 
##  P mgcv                   1.8-36   2021-06-01 [?] CRAN (R 4.0.5) 
##  P modelr                 0.1.8    2020-05-19 [?] CRAN (R 4.0.5) 
##  P munsell                0.5.0    2018-06-12 [?] CRAN (R 4.0.5) 
##  P nlme                   3.1-152  2021-02-04 [?] CRAN (R 4.0.5) 
##  P openxlsx               4.2.4    2021-06-16 [?] CRAN (R 4.0.5) 
##  P pillar                 1.6.4    2021-10-18 [?] CRAN (R 4.0.5) 
##  P pkgconfig              2.0.3    2019-09-22 [?] CRAN (R 4.0.5) 
##  P polyclip               1.10-0   2019-03-14 [?] CRAN (R 4.0.5) 
##  P purrr                * 0.3.4    2020-04-17 [?] CRAN (R 4.0.5) 
##  P qdapTools              1.3.5    2020-04-17 [?] CRAN (R 4.0.5) 
##  P R6                     2.5.1    2021-08-19 [?] CRAN (R 4.0.5) 
##  P RColorBrewer           1.1-2    2014-12-07 [?] CRAN (R 4.0.5) 
##  P Rcpp                   1.0.7    2021-07-07 [?] CRAN (R 4.0.5) 
##  P RCurl                  1.98-1.5 2021-09-17 [?] CRAN (R 4.0.5) 
##  P readr                * 2.0.2    2021-09-27 [?] CRAN (R 4.0.5) 
##  P readxl                 1.3.1    2019-03-13 [?] CRAN (R 4.0.5) 
##  P reprex                 2.0.0    2021-04-02 [?] CRAN (R 4.0.5) 
##  P rio                    0.5.27   2021-06-21 [?] CRAN (R 4.0.5) 
##  P rlang                  0.4.12   2021-10-18 [?] CRAN (R 4.0.5) 
##  P rmarkdown              2.11     2021-09-14 [?] CRAN (R 4.0.5) 
##  P rprojroot              2.0.2    2020-11-15 [?] CRAN (R 4.0.5) 
##  P Rsamtools              2.6.0    2020-10-27 [?] Bioconductor   
##  P rstatix                0.7.0    2021-02-13 [?] CRAN (R 4.0.5) 
##  P rstudioapi             0.13     2020-11-12 [?] CRAN (R 4.0.5) 
##  P rtracklayer          * 1.50.0   2020-10-27 [?] Bioconductor   
##  P rvest                  1.0.1    2021-07-26 [?] CRAN (R 4.0.5) 
##  P S4Vectors            * 0.28.1   2020-12-09 [?] Bioconductor   
##  P sass                   0.4.0    2021-05-12 [?] CRAN (R 4.0.5) 
##  P scales                 1.1.1    2020-05-11 [?] CRAN (R 4.0.5) 
##  P sessioninfo          * 1.1.1    2018-11-05 [?] CRAN (R 4.0.5) 
##  P shape                  1.4.6    2021-05-19 [?] CRAN (R 4.0.5) 
##  P stringi                1.7.6    2021-11-29 [?] CRAN (R 4.0.5) 
##  P stringr              * 1.4.0    2019-02-10 [?] CRAN (R 4.0.5) 
##  P SummarizedExperiment   1.20.0   2020-10-27 [?] Bioconductor   
##  P tibble               * 3.1.6    2021-11-07 [?] CRAN (R 4.0.5) 
##  P tidygraph              1.2.0    2020-05-12 [?] CRAN (R 4.0.5) 
##  P tidyr                * 1.1.4    2021-09-27 [?] CRAN (R 4.0.5) 
##  P tidyselect             1.1.1    2021-04-30 [?] CRAN (R 4.0.5) 
##  P tidyverse            * 1.3.1    2021-04-15 [?] CRAN (R 4.0.5) 
##  P tweenr                 1.0.2    2021-03-23 [?] CRAN (R 4.0.5) 
##  P tzdb                   0.2.0    2021-10-27 [?] CRAN (R 4.0.5) 
##  P utf8                   1.2.2    2021-07-24 [?] CRAN (R 4.0.5) 
##  P vctrs                  0.3.8    2021-04-29 [?] CRAN (R 4.0.5) 
##  P vipor                  0.4.5    2017-03-22 [?] CRAN (R 4.0.5) 
##  P viridis                0.6.2    2021-10-13 [?] CRAN (R 4.0.5) 
##  P viridisLite            0.4.0    2021-04-13 [?] CRAN (R 4.0.5) 
##  P vroom                  1.5.5    2021-09-14 [?] CRAN (R 4.0.5) 
##  P withr                  2.4.3    2021-11-30 [?] CRAN (R 4.0.5) 
##  P xfun                   0.27     2021-10-18 [?] CRAN (R 4.0.5) 
##  P XML                    3.99-0.8 2021-09-17 [?] CRAN (R 4.0.5) 
##  P xml2                   1.3.2    2020-04-23 [?] CRAN (R 4.0.5) 
##  P XVector                0.30.0   2020-10-27 [?] Bioconductor   
##  P yaml                   2.2.1    2020-02-01 [?] CRAN (R 4.0.5) 
##  P zip                    2.2.0    2021-05-31 [?] CRAN (R 4.0.5) 
##  P zlibbioc               1.36.0   2020-10-27 [?] Bioconductor   
## 
## [1] /home/rreynolds/misc_projects/neurodegen-psych-local-corr/renv/library/R-4.0/x86_64-pc-linux-gnu
## [2] /opt/R/4.0.5/lib/R/library
## 
##  P ── Loaded and on-disk path mismatch.

Dissecting individual loci with eQTLs

Regina H. Reynolds

03 October 2022