Aim: run univariate and bivariate analyses using eQTL data for select loci
Bivariate local \(r_{g}\) analyses revealed several loci (representing LD blocks) where significant \(r_{g}\)'s were found between multiple trait pairs. Of note, some LD blocks contained local \(r_{g}\)'s between neurodegenerative traits and between neuropscyhiatric traits, but with no overlap between the disorder groups. In addition, we observed LD blocks where several traits were associated with one trait in particular, but directions of effect were opposing (e.g. locus 1719, where BIP and MDD were positively correlated with SCZ, while LBD was negatively correlated with SCZ; or locus 2351, where AD and PD were positively and negatively correlated with LBD, respectively). Assuming that these assocations also correlate with expression quantitative trait loci (eQTLs), such differences may be driven by associations to different gene eQTLs.
Following section includes any intermediary code used in this .Rmd
.
source(here::here("scripts", "04a_get_gene_loci.R"))
All genic regions are shown below.
gene_loci <-
read_delim(
file = here::here("results", "04_eqtl_univar_bivar", "window_100000", "gene_filtered.loci"),
delim = "\t"
)
gene_loci %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
source(here::here("scripts", "04b_preprocess_eqtl.R"))
source(here::here("scripts", "04c_get_sample_overlaps.R"))
Subset of sample overlap matrix shown below.
sample_overlap <-
read.table(
file = here::here("results", "04_eqtl_univar_bivar", "sample_overlap.txt")
)
as.matrix(sample_overlap)[1:10,1:10]
## AD2019 BIP2021 LBD2020 MDD2019 PD2019.meta5.ex23andMe
## AD2019 1.00000 0.00628 0.02589 0.01946 0.01193
## BIP2021 0.00628 1.00000 0.00800 0.05997 -0.00150
## LBD2020 0.02589 0.00800 1.00000 0.00040 0.01281
## MDD2019 0.01946 0.05997 0.00040 1.00000 0.00436
## PD2019.meta5.ex23andMe 0.01193 -0.00150 0.01281 0.00436 1.00000
## SCZ2018 0.01150 0.12928 0.00661 0.03615 0.00295
## EQTLGEN_ENSG00000007047 0.00000 0.00000 0.00000 0.00000 0.00000
## EQTLGEN_ENSG00000007255 0.00000 0.00000 0.00000 0.00000 0.00000
## EQTLGEN_ENSG00000041353 0.00000 0.00000 0.00000 0.00000 0.00000
## EQTLGEN_ENSG00000048028 0.00000 0.00000 0.00000 0.00000 0.00000
## SCZ2018 EQTLGEN_ENSG00000007047 EQTLGEN_ENSG00000007255
## AD2019 0.01150 0 0
## BIP2021 0.12928 0 0
## LBD2020 0.00661 0 0
## MDD2019 0.03615 0 0
## PD2019.meta5.ex23andMe 0.00295 0 0
## SCZ2018 1.00000 0 0
## EQTLGEN_ENSG00000007047 0.00000 1 0
## EQTLGEN_ENSG00000007255 0.00000 0 1
## EQTLGEN_ENSG00000041353 0.00000 0 0
## EQTLGEN_ENSG00000048028 0.00000 0 0
## EQTLGEN_ENSG00000041353 EQTLGEN_ENSG00000048028
## AD2019 0 0
## BIP2021 0 0
## LBD2020 0 0
## MDD2019 0 0
## PD2019.meta5.ex23andMe 0 0
## SCZ2018 0 0
## EQTLGEN_ENSG00000007047 0 0
## EQTLGEN_ENSG00000007255 0 0
## EQTLGEN_ENSG00000041353 1 0
## EQTLGEN_ENSG00000048028 0 1
# Most input info already available, just need to add eqtls
input_info <-
read_delim(
file = file.path(here::here("results", "01_input_prep"),
"input.info.txt"),
delim = "\t"
)
file_paths <-
list.files(
path = here::here("results", "04_eqtl_univar_bivar", "qtl_files"),
pattern = "lava.gz",
full.names = T
)
input_info <-
tibble(
phenotype =
basename(file_paths) %>%
str_remove(".lava.gz"),
cases = rep(1, times = length(file_paths)),
controls = rep(0, times = length(file_paths)),
filename = file_paths
) %>%
dplyr::bind_rows(
input_info
)
write_delim(
input_info,
path = file.path(here::here("results", "04_eqtl_univar_bivar"),
"input.info.txt"),
delim = "\t"
)
This was run using nohup
:
# Have to navigate to root project folder for script to work (as it uses here package)
cd /home/rreynolds/misc_projects/neurodegen-psych-local-corr
nohup Rscript \
/home/rreynolds/misc_projects/neurodegen-psych-local-corr/scripts/04d_run_univar_bivar_test.R \
&>/home/rreynolds/misc_projects/neurodegen-psych-local-corr/logs/04d_run_univar_bivar_test_multwindows.log&
Once run, results can be loaded using the following code chunk:
source(here::here("scripts", "04e_tidy_univar_bivar.R"))
results <- readRDS(here::here("results", "04_eqtl_univar_bivar", "results_summary.rds"))
source(here::here("scripts", "04f_run_partial_corr.R"))
pcorr <-
readRDS(here::here("results", "04_eqtl_univar_bivar", "window_100000", "pcor", "eqtl_pcor.partcorr.lava.rds")) %>%
lapply(., function(list){
list %>%
qdapTools::list_df2df(col1 = "list_name") %>%
dplyr::mutate(
eqtl_dataset =
case_when(
stringr::str_detect(phen1, "PSYCH") |
stringr::str_detect(phen2, "PSYCH") |
stringr::str_detect(z, "PSYCH") ~ "PSYCHENCODE",
stringr::str_detect(phen1, "EQTLGEN") |
stringr::str_detect(phen2, "EQTLGEN") |
stringr::str_detect(z, "EQTLGEN") ~ "EQTLGEN"
)
) %>%
dplyr::mutate(
across(
.cols = phen1:z,
~ case_when(
!str_detect(.x, "ENSG") ~ .x %>%
str_replace_all("[:digit:]", "") %>%
str_remove("\\..*"),
str_detect(.x, "ENSG") ~ str_remove(.x, ".*_")
)
)
)
}) %>%
qdapTools::list_df2df(col1 = "list_name_2") %>%
dplyr::select(
eqtl_dataset,
gene_locus,
gene_name,
everything(),
-contains("list_name")
) %>%
as_tibble()
source(here::here("R", "plot_chord_diagram.R"))
source(here::here("R", "plot_locus.R"))
source(here::here("R", "plot_edge_diagram.R"))
Firstly, we determined a number of LD blocks of interest from the LD blocks highlighted by bivariate local \(r_{g}\) analyses. LD blocks included:
From these LD blocks of interest, we defined genic regions. That is, a 100-kb window was added to the start and end of any protein-coding, antisense or lincRNA gene that overlapped an LD block of of interest (window used as most cis-eQTLs lie outside gene body, but within 100 kb of the gene, see Figure 2 in PMID: 34475573). These genic regions (\(n\) = 92) were carried forward in downstream analyses. Further, to ensure that our results were not driven by window size, we re-ran all analyses using a 50 kb window.
Due to the potential sample overlap between cohorts and its impact on any estimated correlations, it is advised to use cross-trait LDSC (PMID:25642630; PMID:26414676) to obtain an estimate of the sample overlap. This, however, is not possible with eQTL summary statistics. Provided a reasonable certainty that there is no sample overlap between eQTL and GWAS summary statistics, it can be assumed to be zero (which precludes any correlations between eQTLGen and PsychENCODE, as both use GTEx samples, albeit from different tissues). This, however, needs double-checking in terms of eQTL and GWAS cohorts.
For each genic region, only those traits that were found to have significant local \(r_{g}\) in the associated LD block were used for univariate and bivariate analyses with eQTL summary statistics. As described in 02_run_univar_bivar_test.html, a univariate test was performed as a filtering step for bivariate local \(r_{g}\) analyses. Bivariate local correlations were only performed (i) if the eQTL within the genic region exhibited a significant univariate local genetic signal and (ii) for pairs of traits which both exhibited a significant univariate local genetic signal. A cut-off of p < 0.05/92 was used to determine univariate significance. Using a 100-kb window, this resulted in a total of 354 bivariate tests spanning 55 distinct genic regions (with a 50-kb window, a total of 267 bivariate tests spanning 50 distinct genic regions). Bivariate results were multiple test corrected using two strategies: (i) a more stringent Bonferroni correction (i.e. p < 0.05/n bivariate tests) and (ii) a more lenient FDR correction.
Partial correlations were computed where three-way relationships were observed between 2 disease traits and an eQTL. The partial correlation reflects the correlation between 2 traits (e.g. disease X and Y) that can be explained by a third trait (e.g. eQTL, Z). Thus, a partial correlation approaching 0 suggests that trait Z captures an increasing proportion of the correlation between traits X and Y. Due to the three-way nature of the relationships, 3 possible conformations were possible (i.e. X~Y|Z, X~Z|Y and Y~Z|X); partial correlations were computed for all 3.
print("Univariate column descriptions:")
## [1] "Univariate column descriptions:"
tibble(
column = colnames(results$univ$window_100000),
description =
c(
"LD block",
"eQTL dataset used",
"Genic region as identified by ensembl ID",
"Gene name",
"Genic region chromosome",
"Genic region start base pair",
"Genic region end base pair",
"The number of SNPs within the locus",
"The number of PCs retained within the locus",
"Analysed phenotype",
"Observed local heritability",
"P-value from the univariate test (F-test for continuous, Chi-sq for binary)"
)
) %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
print("Univariate results (p < 0.05/n_loci):")
## [1] "Univariate results (p < 0.05/n_loci):"
results$univ %>%
qdapTools::list_df2df(col1 = "window_size") %>%
dplyr::filter(p < 0.05/nrow(gene_loci)) %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
# Pivot results
results_long <-
results$bivar %>%
qdapTools::list_df2df(col1 = "window_size") %>%
tidyr::pivot_longer(
cols = -c("window_size", "ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
names_to = "feature",
values_to = "value"
) %>%
tidyr::pivot_wider(names_from = "window_size") %>%
dplyr::mutate(
across(
.cols = contains("window"),
~ case_when(
feature == "p" ~ -log10(.x),
TRUE ~ .x
)
)
) %>%
dplyr::mutate(
feature =
case_when(
feature == "p" ~ "-log10(p)",
TRUE ~ feature
),
phenotype_corr =
case_when(
str_detect(phen2, "ENSG") ~ "gwas-eqtl",
TRUE ~ "gwas-gwas"
),
doe_100000 =
case_when(
feature == "rho" & window_100000 < 0 ~ "-",
feature == "rho" & window_100000 > 0 ~ "+"
),
doe_50000 =
case_when(
feature == "rho" & window_50000 < 0 ~ "-",
feature == "rho" & window_50000 > 0 ~ "+"
),
doe_same =
case_when(
doe_100000 == doe_50000 ~ TRUE,
doe_100000 != doe_50000 ~ FALSE
)
)
print("Bivariate column descriptions:")
## [1] "Bivariate column descriptions:"
tibble(
column = colnames(results$bivar$window_100000),
description =
c(
"LD block",
"eQTL dataset used",
"Genic region as identified by ensembl ID",
"Gene name",
"Genic region chromosome",
"Genic region start base pair",
"Genic region end base pair",
"The number of SNPs within the locus",
"The number of PCs retained within the locus",
"Phenotype 1",
"Phenotype 2",
"Standardised coefficient for the local genetic correlation",
"Lower 95% confidence estimate for rho",
"Upper 95% confidence estimate for rho",
"Equivalent of taking the square of rho. Denotes the proportion of variance in genetic signal for phen1 explained by phen2 (and vice versa)",
"Lower 95% confidence estimate for r2",
"Upper 95% confidence estimate for r2",
"Simulation p-values for the local genetic correlation",
"FDR-corrected p-values",
"Bonferroni-corrected p-values"
)
) %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: wrap')
print("Bivariate results (FDR < 0.05):")
## [1] "Bivariate results (FDR < 0.05):"
bivar_fdr <-
results$bivar %>%
lapply(., function(df) df %>% dplyr::filter(fdr < 0.05))
bivar_fdr %>%
qdapTools::list_df2df(col1 = "window_size") %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
print("Bivariate results (Bonferroni p < 0.05):")
## [1] "Bivariate results (Bonferroni p < 0.05):"
bivar_bonf <-
results$bivar %>%
lapply(., function(df) df %>% dplyr::filter(bonferroni < 0.05))
bivar_bonf %>%
qdapTools::list_df2df(col1 = "window_size") %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
overlap <-
bivar_fdr$window_100000 %>%
dplyr::select(
-chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
) %>%
dplyr::left_join(
results$bivar$window_50000 %>%
dplyr::select(
-chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
),
by = c("ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
suffix = c("_100000", "_50000")
) %>%
dplyr::bind_rows(
bivar_fdr$window_50000 %>%
dplyr::select(
-chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
) %>%
dplyr::left_join(
results$bivar$window_100000 %>%
dplyr::select(
-chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
),
by = c("ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
suffix = c("_50000", "_100000")
)
) %>%
select(
ld_block:phen2, contains("rho"), contains("r2"), contains("p"), contains("fdr"), contains("bonferroni")
) %>%
dplyr::distinct()
overlap %>%
dplyr::arrange(ld_block, gene_name) %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
# saveRDS(
# overlap,
# here::here("results", "04_eqtl_univar_bivar", "fdr_overlap.rds")
# )
print("Partial correlations column descriptions:")
## [1] "Partial correlations column descriptions:"
tibble(
column = colnames(pcorr),
description =
c("eQTL dataset from which gene eQTLs are derived",
"Ensembl ID for gene for which eQTLs were tested in genic region",
"HGNC symbol for gene for which eQTLs were tested in genic region",
"Genic region chromosome",
"Genic region start base pair",
"Genic region end base pair",
"The number of SNPs within the genic region",
"The number of PCs retained within the genic region",
"Phenotype 1. Gene expression traits (i.e. gene eQTLs) are denoted by their ensembl gene id",
"Phenotype 2. Gene expression traits (i.e. gene eQTLs) are denoted by their ensembl gene id",
"Phenotype which the genetic correlation between phenotype 1 and 2 was conditioned on",
"The proportion of genetic signal in phenotype 1 explained by z",
"The proportion of genetic signal in phenotype 2 explained by z",
"The partial correlation between phen1 and phen2 conditioned on z",
"Lower bound of 95% confidence interval for partial correlation",
"Upper bound of 95% confidence interval for partial correlation",
"Simulation p-values for the partial genetic correlation"
)
) %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
print("All partial correlations:")
## [1] "All partial correlations:"
pcorr %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
results_long %>%
dplyr::filter(
!feature %in%
c("chr", "start", "stop", "fdr", "bonferroni", "n_pcs", "n_snps"),
!str_detect(feature, "upper"),
!str_detect(feature, "lower")
) %>%
ggplot(
aes(
x = window_50000,
y = window_100000
)
) +
geom_point(
aes(
fill = doe_same
),
shape = 21,
colour = "black",
alpha = 0.4
) +
geom_smooth(
method = "lm",
formula = "y~x",
level = 0.99,
colour = "black",
fill = "grey"
) +
ggpubr::stat_cor(
method = "pearson",
cor.coef.name = "R"
) +
geom_abline(
intercept = 0,
linetype = "dashed",
colour = "#EE442F"
) +
facet_wrap(
vars(phenotype_corr, feature),
scales = "free"
) +
labs(
x = "50-kb window",
y = "100-kb window"
) +
scale_fill_discrete(na.translate = F) +
theme_rhr
Figure 4.1: Scatter plot of -log10(p-value), r2 and rho for each pair of phenotypes that could be tested across genic regions with a 50-kb or 100-kb window. Points are coloured, where applicable, by whether they share the same direction of effect. The black line represents a linear model fitted to the data, with the 99% confidence interval indicated with a grey fill. The red dashed line represents the line y = x.
results$bivar %>%
qdapTools::list_df2df(col1 = "window_size") %>%
tidyr::pivot_longer(
cols = -c("window_size", "ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
names_to = "feature",
values_to = "value"
) %>%
dplyr::mutate(
window_size =
window_size %>%
readr::parse_number() %>%
format(scientific = FALSE) %>%
as.factor()
) %>%
dplyr::filter(
feature %in% c("bonferroni", "fdr"),
value < 0.05
) %>%
dplyr::count(window_size, feature, name = "n_total") %>%
dplyr::mutate(
unique =
case_when(
feature == "bonferroni" ~ n_total - (overlap %>% dplyr::filter(bonferroni_100000 < 0.05 & bonferroni_50000 < 0.05) %>% nrow()),
feature == "fdr" ~ n_total - (overlap %>% dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>% nrow())
),
shared =
n_total-unique
) %>%
tidyr::pivot_longer(
cols = unique:shared,
names_to = "sharing",
values_to = "n"
) %>%
ggplot(
aes(
x = window_size,
y = n,
fill = sharing
)
) +
geom_col() +
facet_grid(cols = vars(feature)) +
labs(
x = "Window size (kb)",
y = "Number of significant bivariate local rg"
) +
theme_rhr
Figure 4.2: Number of significant bivariate local genetic correlations across window sizes, using either the stringent bonferroni cut-off or the more lenient FDR cut-off. Bars are coloured by whether local genetic correlations are significant across both window sizes (shared) or only one (unique).
a <-
results_long %>%
inner_join(
overlap %>%
dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>%
dplyr::select(ld_block, eqtl_dataset, gene_locus, gene_name, phen1, phen2)
) %>%
dplyr::filter(feature == "r2") %>%
dplyr::rename(
`50` = window_50000,
`100` = window_100000
) %>%
ggpubr::ggpaired(
cond1 = "50",
cond2 = "100",
fill = "condition",
line.color = "gray",
line.size = 0.4,
ggtheme = theme_rhr
) +
labs(
x = "Window size (kb)",
y = "r2"
) +
theme(legend.position = "none")
b <-
results_long %>%
inner_join(
overlap %>%
dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>%
dplyr::select(ld_block, eqtl_dataset, gene_locus, gene_name, phen1, phen2)
) %>%
dplyr::filter(
feature %in% c("r2"),
!is.na(window_50000),
!is.na(window_100000)
) %>%
dplyr::mutate(
diff = window_100000 - window_50000
) %>%
ggplot(aes(x = diff)) +
geom_density() +
labs(
x = "Difference in r2 between two window sizes\n(diff = 100 kb - 50 kb)",
y = "Density"
) +
theme_rhr
ggpubr::ggarrange(
a,b,
labels = letters[1:2]
)
Figure 4.3: (a) Boxplot displaying r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (FDR < 0.05) across both window sizes. Grey lines indicate paired estimates. (b) Density plot displaying difference in r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (FDR < 0.05) across both window sizes.
data_to_plot <-
bivar_fdr$window_100000 %>%
dplyr::filter(!str_detect(phen2, "ENSG")) %>%
dplyr::distinct(ld_block, phen1, phen2, rho) %>%
dplyr::group_by(ld_block, phen1, phen2) %>%
dplyr::summarise(rho = mean(rho)) %>%
dplyr::bind_rows(
bivar_fdr$window_100000 %>%
dplyr::filter(str_detect(phen2, "ENSG")) %>%
dplyr::distinct(eqtl_dataset, ld_block, phen1, phen2, rho) %>%
dplyr::mutate(
phen2 =
case_when(
str_detect(phen2, "ENSG") ~ eqtl_dataset,
TRUE ~ phen2
)
) %>%
dplyr::select(ld_block, phen1, phen2, rho)
) %>%
dplyr::ungroup()
plot_bivar_chord_diagram(
bivar_corr = data_to_plot,
fct_phen = c(fct_disease, "EQTLGEN", "PSYCHENCODE"),
palette = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
)
Figure 4.4: Chord diagram showing the number of distinct significant bivariate local genetic correlations between each of the traits across all LD blocks (FDR-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.
data_to_plot <-
bivar_bonf$window_100000 %>%
dplyr::filter(!str_detect(phen2, "ENSG")) %>%
dplyr::distinct(ld_block, phen1, phen2, rho) %>%
dplyr::group_by(ld_block, phen1, phen2) %>%
dplyr::summarise(rho = mean(rho)) %>%
dplyr::bind_rows(
bivar_bonf$window_100000 %>%
dplyr::filter(str_detect(phen2, "ENSG")) %>%
dplyr::distinct(eqtl_dataset, ld_block, phen1, phen2, rho) %>%
dplyr::mutate(
phen2 =
case_when(
str_detect(phen2, "ENSG") ~ eqtl_dataset,
TRUE ~ phen2
)
) %>%
dplyr::select(ld_block, phen1, phen2, rho)
) %>%
dplyr::ungroup()
plot_bivar_chord_diagram(
bivar_corr = data_to_plot,
fct_phen = c(fct_disease, "EQTLGEN", "PSYCHENCODE"),
palette = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
)
Figure 4.5: Chord diagram showing the number of distinct significant bivariate local genetic correlations between each of the traits across all LD blocks (Bonferroni-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.
results$univ$window_100000 %>%
dplyr::mutate(
phen =
case_when(
str_detect(phen, "ENSG") ~ eqtl_dataset,
TRUE ~ phen
)
) %>%
ggplot(
aes(
x = n_snps,
y = -log10(p)
)
) +
geom_point(alpha = 0.5) +
scale_x_log10() +
scale_colour_manual(
values = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
) +
facet_wrap(vars(phen)) +
labs(
x = "Number of SNPs within the locus (logarithmic scale)",
y = "-log10(p-value)"
) +
theme_rhr
Figure 4.6: Scatter plot of number of SNPs in a tested genic region and the p-value from the univariate test for each trait.
bivar_fdr$window_100000 %>%
dplyr::mutate(
phenotype_corr =
case_when(
str_detect(phen2, "ENSG") ~ "gwas-eqtl",
TRUE ~ "gwas-gwas"
)
) %>%
ggplot(
aes(
x = phenotype_corr,
y = r2
)
) +
geom_boxplot(fill = "#d9d9d9", outlier.alpha = 0) +
ggbeeswarm::geom_beeswarm(alpha = 0.5) +
theme_rhr
Figure 4.7: Boxplot of explained variance (r2, the proportion of variance in genetic signal of one disease trait in a pair explained by the other) in trait pairs involved a disease and gene expression trait (gwas-eqtl) or two disease traits (gwas-gwas). Only local rgs that passed significance are plotted (FDR < 0.05).
Given the overlap between genic regions, significance was determined using the more lenient FDR multiple test correction. Results that do not replicate using the more stringent Bonferroni multiple test correction should be interpreted with caution.
ld_blocks <-
bivar_fdr$window_100000$ld_block %>% unique() %>% sort()
qtl_genes <-
bivar_fdr$window_100000 %>%
dplyr::filter(str_detect(phen2, "ENSG")) %>%
.[["gene_locus"]] %>%
unique()
bivar_list <-
setNames(
bivar_fdr$window_100000 %>%
dplyr::group_split(ld_block),
nm =
str_c("locus_", ld_blocks)
)
ref <- import("/data/references/ensembl/gtf_gff3/v87/Homo_sapiens.GRCh37.87.gtf")
ref <- ref %>% keepSeqlevels(c(1:22), pruning.mode = "coarse")
ref <- ref[ref$type == "gene"]
loci_gr <-
LAVA::read.loci(here::here("results", "01_input_prep", "gwas_filtered.loci")) %>%
dplyr::rename(locus = LOC) %>%
dplyr::filter(locus %in% bivar_fdr$window_100000$ld_block) %>%
GenomicRanges::makeGRangesFromDataFrame(
.,
keep.extra.columns = TRUE,
ignore.strand = TRUE,
seqinfo = NULL,
seqnames.field = "CHR",
start.field = "START",
end.field = "STOP"
)
fig_list <- vector(mode = "list", length = length(ld_blocks))
for(block in ld_blocks){
locus_plot <-
plot_locus(
locus_gr = loci_gr[loci_gr$locus == block],
ref = ref,
window = 50000,
highlight_gene = qtl_genes,
highlight_gene_label = "QTL - local rg?"
)
edge_plot <-
plot_qtl_edge_diagram(
bivar_corr_qtl = bivar_list[[str_c("locus_", block)]],
phen = fct_disease,
seed = 89
)
if(length(edge_plot)/4 >=2){
height <- c(1,3.5)
} else{
height <- c(1,1)
}
fig_list[[which(ld_blocks == block)]] <-
ggpubr::ggarrange(
locus_plot,
ggpubr::ggarrange(
plotlist = edge_plot,
ncol = 4,
nrow = ceiling(length(edge_plot)/4),
common.legend = TRUE,
legend = "top"
),
ncol = 1,
labels = letters[1:2],
heights = height
)
names(fig_list)[which(ld_blocks == block)] <- str_c("locus_", block)
}
fig_list[1:4]
## $locus_681
Figure 4.8: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
##
## $locus_1273
Figure 4.9: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
##
## $locus_1719
Figure 4.10: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
##
## $locus_2281
Figure 4.11: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
fig_list[5]
## $locus_2351
Figure 4.12: (a) Locus plot. Genic regions with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
ld_blocks <-
bivar_bonf$window_100000 %>%
dplyr::group_by(eqtl_dataset, gene_locus) %>%
.[["ld_block"]] %>%
unique() %>%
sort()
qtl_genes <-
bivar_bonf$window_100000 %>%
dplyr::group_by(eqtl_dataset, gene_locus) %>%
dplyr::filter(str_detect(phen2, "ENSG")) %>%
.[["gene_locus"]] %>%
unique()
bivar_list <-
setNames(
bivar_bonf$window_100000 %>%
dplyr::group_split(ld_block),
nm =
str_c("locus_", ld_blocks)
)
fig_list <- vector(mode = "list", length = length(ld_blocks))
for(block in ld_blocks){
locus_plot <-
plot_locus(
locus_gr = loci_gr[loci_gr$locus == block],
ref = ref,
highlight_gene = qtl_genes,
highlight_gene_label = "QTL - local rg?"
)
edge_plot <-
plot_qtl_edge_diagram(
bivar_corr_qtl = bivar_list[[str_c("locus_", block)]],
phen = fct_disease,
seed = 89
)
if(length(edge_plot)/4 >=2){
height <- c(1,3.5)
} else{
height <- c(1,0.5)
}
fig_list[[which(ld_blocks == block)]] <-
ggpubr::ggarrange(
locus_plot,
ggpubr::ggarrange(
plotlist = edge_plot,
ncol = 4,
nrow = ceiling(length(edge_plot)/4),
common.legend = TRUE,
legend = "top"
),
ncol = 1,
labels = letters[1:2],
heights = height
)
names(fig_list)[which(ld_blocks == block)] <- str_c("locus_", block)
}
fig_list[1:4]
## $locus_681
Figure 4.13: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
##
## $locus_1273
Figure 4.14: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
##
## $locus_1719
Figure 4.15: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
##
## $locus_2281
Figure 4.16: (a) Locus plot. Genic regions (including 100-kb window) with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
fig_list[5]
## $locus_2351
Figure 4.17: (a) Locus plot. Genic regions with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for genic regions where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.
Show/hide
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.5 (2021-03-31)
## os Ubuntu 16.04.7 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2022-10-03
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## ! package * version date lib source
## P abind 1.4-5 2016-07-21 [?] CRAN (R 4.0.5)
## P assertthat 0.2.1 2019-03-21 [?] CRAN (R 4.0.5)
## P backports 1.3.0 2021-10-27 [?] CRAN (R 4.0.5)
## P beeswarm 0.4.0 2021-06-01 [?] CRAN (R 4.0.5)
## P Biobase 2.50.0 2020-10-27 [?] Bioconductor
## P BiocGenerics * 0.36.1 2021-04-16 [?] Bioconductor
## P BiocParallel 1.24.1 2020-11-06 [?] Bioconductor
## P Biostrings 2.58.0 2020-10-27 [?] Bioconductor
## P bit 4.0.4 2020-08-04 [?] CRAN (R 4.0.5)
## P bit64 4.0.5 2020-08-30 [?] CRAN (R 4.0.5)
## P bitops 1.0-7 2021-04-24 [?] CRAN (R 4.0.5)
## P bookdown 0.22 2021-04-22 [?] CRAN (R 4.0.5)
## P broom 0.7.10 2021-10-31 [?] CRAN (R 4.0.5)
## P bslib 0.3.1 2021-10-06 [?] CRAN (R 4.0.5)
## P car 3.0-11 2021-06-27 [?] CRAN (R 4.0.5)
## P carData 3.0-4 2020-05-22 [?] CRAN (R 4.0.5)
## P cellranger 1.1.0 2016-07-27 [?] CRAN (R 4.0.5)
## P chron 2.3-56 2020-08-18 [?] CRAN (R 4.0.5)
## P circlize * 0.4.13 2021-06-09 [?] CRAN (R 4.0.5)
## P cli 3.1.0 2021-10-27 [?] CRAN (R 4.0.5)
## P colorspace 2.0-2 2021-06-24 [?] CRAN (R 4.0.5)
## P cowplot 1.1.1 2020-12-30 [?] CRAN (R 4.0.5)
## P crayon 1.4.2 2021-10-29 [?] CRAN (R 4.0.5)
## P crosstalk 1.1.1 2021-01-12 [?] CRAN (R 4.0.5)
## P curl 4.3.2 2021-06-23 [?] CRAN (R 4.0.5)
## P data.table 1.14.2 2021-09-27 [?] CRAN (R 4.0.5)
## P DBI 1.1.1 2021-01-15 [?] CRAN (R 4.0.5)
## P dbplyr 2.1.1 2021-04-06 [?] CRAN (R 4.0.5)
## P DelayedArray 0.16.3 2021-03-24 [?] Bioconductor
## P digest 0.6.29 2021-12-01 [?] CRAN (R 4.0.5)
## P dplyr * 1.0.7 2021-06-18 [?] CRAN (R 4.0.5)
## P DT 0.19 2021-09-02 [?] CRAN (R 4.0.5)
## P ellipsis 0.3.2 2021-04-29 [?] CRAN (R 4.0.5)
## P evaluate 0.14 2019-05-28 [?] CRAN (R 4.0.5)
## P fansi 0.5.0 2021-05-25 [?] CRAN (R 4.0.5)
## P farver 2.1.0 2021-02-28 [?] CRAN (R 4.0.5)
## P fastmap 1.1.0 2021-01-25 [?] CRAN (R 4.0.5)
## P forcats * 0.5.1 2021-01-27 [?] CRAN (R 4.0.5)
## P foreign 0.8-81 2020-12-22 [?] CRAN (R 4.0.5)
## P fs 1.5.1 2021-11-30 [?] CRAN (R 4.0.5)
## P generics 0.1.1 2021-10-25 [?] CRAN (R 4.0.5)
## P GenomeInfoDb * 1.26.7 2021-04-08 [?] Bioconductor
## P GenomeInfoDbData 1.2.4 2021-07-03 [?] Bioconductor
## P GenomicAlignments 1.26.0 2020-10-27 [?] Bioconductor
## P GenomicRanges * 1.42.0 2020-10-27 [?] Bioconductor
## P ggbeeswarm * 0.6.0 2017-08-07 [?] CRAN (R 4.0.5)
## P ggforce 0.3.3 2021-03-05 [?] CRAN (R 4.0.5)
## P ggplot2 * 3.3.5 2021-06-25 [?] CRAN (R 4.0.5)
## P ggpubr 0.4.0 2020-06-27 [?] CRAN (R 4.0.5)
## P ggraph * 2.0.5 2021-02-23 [?] CRAN (R 4.0.5)
## P ggrepel 0.9.1 2021-01-15 [?] CRAN (R 4.0.5)
## P ggsignif 0.6.3 2021-09-09 [?] CRAN (R 4.0.5)
## P GlobalOptions 0.1.2 2020-06-10 [?] CRAN (R 4.0.5)
## P glue 1.5.1 2021-11-30 [?] CRAN (R 4.0.5)
## P graphlayouts 0.7.1 2020-10-26 [?] CRAN (R 4.0.5)
## P gridExtra 2.3 2017-09-09 [?] CRAN (R 4.0.5)
## P gtable 0.3.0 2019-03-25 [?] CRAN (R 4.0.5)
## P haven 2.4.3 2021-08-04 [?] CRAN (R 4.0.5)
## P here 1.0.1 2020-12-13 [?] CRAN (R 4.0.5)
## P highr 0.9 2021-04-16 [?] CRAN (R 4.0.5)
## P hms 1.1.1 2021-09-26 [?] CRAN (R 4.0.5)
## P htmltools 0.5.2 2021-08-25 [?] CRAN (R 4.0.5)
## P htmlwidgets 1.5.4 2021-09-08 [?] CRAN (R 4.0.5)
## P httr 1.4.2 2020-07-20 [?] CRAN (R 4.0.5)
## P igraph 1.2.7 2021-10-15 [?] CRAN (R 4.0.5)
## P IRanges * 2.24.1 2020-12-12 [?] Bioconductor
## P jquerylib 0.1.4 2021-04-26 [?] CRAN (R 4.0.5)
## P jsonlite 1.7.2 2020-12-09 [?] CRAN (R 4.0.5)
## P knitr 1.36 2021-09-29 [?] CRAN (R 4.0.5)
## P labeling 0.4.2 2020-10-20 [?] CRAN (R 4.0.5)
## P lattice 0.20-44 2021-05-02 [?] CRAN (R 4.0.5)
## P LAVA 0.0.6 2021-09-01 [?] xgit (@7be3421)
## P lifecycle 1.0.1 2021-09-24 [?] CRAN (R 4.0.5)
## P lubridate 1.8.0 2021-10-07 [?] CRAN (R 4.0.5)
## P magrittr 2.0.1 2020-11-17 [?] CRAN (R 4.0.5)
## P MASS 7.3-54 2021-05-03 [?] CRAN (R 4.0.5)
## P Matrix 1.3-4 2021-06-01 [?] CRAN (R 4.0.5)
## P MatrixGenerics 1.2.1 2021-01-30 [?] Bioconductor
## P matrixStats 0.61.0 2021-09-17 [?] CRAN (R 4.0.5)
## P mgcv 1.8-36 2021-06-01 [?] CRAN (R 4.0.5)
## P modelr 0.1.8 2020-05-19 [?] CRAN (R 4.0.5)
## P munsell 0.5.0 2018-06-12 [?] CRAN (R 4.0.5)
## P nlme 3.1-152 2021-02-04 [?] CRAN (R 4.0.5)
## P openxlsx 4.2.4 2021-06-16 [?] CRAN (R 4.0.5)
## P pillar 1.6.4 2021-10-18 [?] CRAN (R 4.0.5)
## P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.0.5)
## P polyclip 1.10-0 2019-03-14 [?] CRAN (R 4.0.5)
## P purrr * 0.3.4 2020-04-17 [?] CRAN (R 4.0.5)
## P qdapTools 1.3.5 2020-04-17 [?] CRAN (R 4.0.5)
## P R6 2.5.1 2021-08-19 [?] CRAN (R 4.0.5)
## P RColorBrewer 1.1-2 2014-12-07 [?] CRAN (R 4.0.5)
## P Rcpp 1.0.7 2021-07-07 [?] CRAN (R 4.0.5)
## P RCurl 1.98-1.5 2021-09-17 [?] CRAN (R 4.0.5)
## P readr * 2.0.2 2021-09-27 [?] CRAN (R 4.0.5)
## P readxl 1.3.1 2019-03-13 [?] CRAN (R 4.0.5)
## P reprex 2.0.0 2021-04-02 [?] CRAN (R 4.0.5)
## P rio 0.5.27 2021-06-21 [?] CRAN (R 4.0.5)
## P rlang 0.4.12 2021-10-18 [?] CRAN (R 4.0.5)
## P rmarkdown 2.11 2021-09-14 [?] CRAN (R 4.0.5)
## P rprojroot 2.0.2 2020-11-15 [?] CRAN (R 4.0.5)
## P Rsamtools 2.6.0 2020-10-27 [?] Bioconductor
## P rstatix 0.7.0 2021-02-13 [?] CRAN (R 4.0.5)
## P rstudioapi 0.13 2020-11-12 [?] CRAN (R 4.0.5)
## P rtracklayer * 1.50.0 2020-10-27 [?] Bioconductor
## P rvest 1.0.1 2021-07-26 [?] CRAN (R 4.0.5)
## P S4Vectors * 0.28.1 2020-12-09 [?] Bioconductor
## P sass 0.4.0 2021-05-12 [?] CRAN (R 4.0.5)
## P scales 1.1.1 2020-05-11 [?] CRAN (R 4.0.5)
## P sessioninfo * 1.1.1 2018-11-05 [?] CRAN (R 4.0.5)
## P shape 1.4.6 2021-05-19 [?] CRAN (R 4.0.5)
## P stringi 1.7.6 2021-11-29 [?] CRAN (R 4.0.5)
## P stringr * 1.4.0 2019-02-10 [?] CRAN (R 4.0.5)
## P SummarizedExperiment 1.20.0 2020-10-27 [?] Bioconductor
## P tibble * 3.1.6 2021-11-07 [?] CRAN (R 4.0.5)
## P tidygraph 1.2.0 2020-05-12 [?] CRAN (R 4.0.5)
## P tidyr * 1.1.4 2021-09-27 [?] CRAN (R 4.0.5)
## P tidyselect 1.1.1 2021-04-30 [?] CRAN (R 4.0.5)
## P tidyverse * 1.3.1 2021-04-15 [?] CRAN (R 4.0.5)
## P tweenr 1.0.2 2021-03-23 [?] CRAN (R 4.0.5)
## P tzdb 0.2.0 2021-10-27 [?] CRAN (R 4.0.5)
## P utf8 1.2.2 2021-07-24 [?] CRAN (R 4.0.5)
## P vctrs 0.3.8 2021-04-29 [?] CRAN (R 4.0.5)
## P vipor 0.4.5 2017-03-22 [?] CRAN (R 4.0.5)
## P viridis 0.6.2 2021-10-13 [?] CRAN (R 4.0.5)
## P viridisLite 0.4.0 2021-04-13 [?] CRAN (R 4.0.5)
## P vroom 1.5.5 2021-09-14 [?] CRAN (R 4.0.5)
## P withr 2.4.3 2021-11-30 [?] CRAN (R 4.0.5)
## P xfun 0.27 2021-10-18 [?] CRAN (R 4.0.5)
## P XML 3.99-0.8 2021-09-17 [?] CRAN (R 4.0.5)
## P xml2 1.3.2 2020-04-23 [?] CRAN (R 4.0.5)
## P XVector 0.30.0 2020-10-27 [?] Bioconductor
## P yaml 2.2.1 2020-02-01 [?] CRAN (R 4.0.5)
## P zip 2.2.0 2021-05-31 [?] CRAN (R 4.0.5)
## P zlibbioc 1.36.0 2020-10-27 [?] Bioconductor
##
## [1] /home/rreynolds/misc_projects/neurodegen-psych-local-corr/renv/library/R-4.0/x86_64-pc-linux-gnu
## [2] /opt/R/4.0.5/lib/R/library
##
## P ── Loaded and on-disk path mismatch.