Aim: create input files required to run LAVA
To run LAVA requires that a number of files have been generated, most important of which are (i) the loci of interest (as defined based either on genomic coordinates or RS ID) and (ii) the sample overlap matrix. As described by the authors in their pre-print, loci can be defined in (i) a targeted manner (e.g. follow up of a smaller subset of loci highlighted through GWAS) or (ii) a more agnostic manner (e.g. scanning multiple traits across the entire genome). Our interest stems from (i) the pathological overlap between neurodegenerative diseases pathologically classified by the deposition of specific proteins (i.e. amyloid-\(\beta\) in AD, \(\alpha\)-synuclein in PD and the combination of both in Lewy body dementia); (ii) the higher prevalence of depression in individuals with dementia compared to those without dementia (clinically significant in 35% of the PD population, PMID: 17987654, 30536144); and (iii) the higher risk of dementia diagnoses in individuals with schizophrenia versus individuals without a history of serious mental illness (PMID: 33688938, 26444987). Thus, we will begin by evaluating genome-wide significant loci from three neurodenerative disorders (Alzheimer's disease, AD; Lewy body dementia, LBD; and Parkinson's disease, PD), and three neuropsychiatric disorders (bipolar disorder, BIP; major depressive disorder, MDD; and schizophrenia, SCZ). The sample overlap matrix is important as it sample overlap can result in an upward bias of the estimated correlation.
Genome-wide significant loci (p < 5 x \(10^{-8}\)) were derived from the latest publicly-available AD, BIP, LBD, MDD, PD and SCZ GWASs. Genome-wide significant loci were overlapped with LD blocks generated in the original LAVA pre-print. These LD blocks represent approximately equal sized, semi-independent blocks of SNPs, with a minimum size requirement of 2500 SNPs (resulting in an average block size of around 1Mb). The advantage of using these blocks is that the method used to derive them attempts to determine the most suitable breakpoint in a block of LD (i.e. the point at which LD is the lowest), ensuring that the LD blocks we eventually test are not "breaking" LD chunks in an inappropriate manner. Only those LD blocks containing GWAS loci were carried forward in downstream analyses.
Due to the potential sample overlap between some GWASs and its impact on any estimated correlations, we used cross-trait LDSC (PMID:25642630; PMID:26414676) to obtain an estimate of the sample overlap. Summary statistics for each phenotype were pre-processed using LDSC's munge_sumstats.py
using HapMap Project Phase 3 SNPs (PMID:20811451). For the LD reference panel, 1000 Genomes Project Phase 3 European population SNPs were used (PMID:23128226). Any shared variance due to sample overlap was modelled as a residual genetic covariance. As described in the LAVA pre-print, genetic covariance derived from cross-trait LDSC was used to create a symmetric matrix, with diagonals populated by comparisons of each phenotype with itself. This was then converted to a correlation matrix.
Following section includes any intermediary code used in this .Rmd
.
To run some of the scripts within this .Rmd requires packages that are not available from CRAN/Bioconductor. These include LDSCforRyten
. If these are not already installed, run the code chunk below.
devtools::install_github("RHReynolds/LDSCforRyten")
Note: running this script requires an internet connection, in order to load the LD blocks from from the LAVA-partitioning GitHub repository.
source(here::here("scripts", "01a_get_test_loci.R"))
All LD blocks containing genome-wide signficiant loci are shown below.
loci <-
read_delim(
file = here::here("results", "01_input_prep", "gwas_filtered.loci"),
delim = "\t")
loci %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
source(here::here("scripts", "01b_get_sample_overlaps.R"))
Sample overlap matrix shown below.
sample_overlap <-
read.table(
file = here::here("results", "01_input_prep", "sample_overlap.txt")
)
sample_overlap %>%
as.matrix()
## AD2019 BIP2021 LBD2020 MDD2019 PD2019.meta5.ex23andMe
## AD2019 1.00000 0.00628 0.02589 0.01946 0.01193
## BIP2021 0.00628 1.00000 0.00800 0.05997 -0.00150
## LBD2020 0.02589 0.00800 1.00000 0.00040 0.01281
## MDD2019 0.01946 0.05997 0.00040 1.00000 0.00436
## PD2019.meta5.ex23andMe 0.01193 -0.00150 0.01281 0.00436 1.00000
## SCZ2018 0.01150 0.12928 0.00661 0.03615 0.00295
## SCZ2018
## AD2019 0.01150
## BIP2021 0.12928
## LBD2020 0.00661
## MDD2019 0.03615
## PD2019.meta5.ex23andMe 0.00295
## SCZ2018 1.00000
phenotypes <- c("AD2019", "LBD2020", "PD2019.meta5.ex23andMe", "BIP2021", "MDD2019", "SCZ2018") %>% sort()
gwas_details <-
readxl::read_xlsx(path = "/data/LDScore/GWAS/LDSC_GWAS_details.xlsx") %>%
dplyr::mutate(
Original_name = Original_name %>%
stringr::str_replace_all("_", "\\.")
) %>%
dplyr::select(Original_name,
cases = N_case,
controls = N_ctrl)
input_info <-
tibble(
phenotype =
list.files(
path = gwas_dir,
full.names = F,
recursive = T,
pattern = ".lava.gz"
) %>%
stringr::str_remove("/.*") %>%
stringr::str_replace_all("_", "\\."),
filename =
list.files(
path = gwas_dir,
full.names = T,
recursive = T,
pattern = ".lava.gz"
)
) %>%
dplyr::mutate(
phenotype =
case_when(phenotype == "MDD2019.ex23andMe" ~ "MDD2019",
TRUE ~ phenotype)
) %>%
dplyr::filter(phenotype %in% phenotypes) %>%
dplyr::inner_join(
gwas_details,
by = c("phenotype" = "Original_name")) %>%
# Continuous phenotypes should have cases = 1 and controls = 0
# Continuous phenotypes will have NA in controls
# Can use this as conditional criteria
dplyr::mutate(
cases =
case_when(is.na(controls) ~ 1,
TRUE ~ readr::parse_number(cases)),
controls =
case_when(is.na(controls) ~ 0,
TRUE ~ readr::parse_number(controls))
) %>%
dplyr::select(phenotype, cases, controls, filename)
write_delim(
input_info,
path = file.path(here::here("results", "01_input_prep"),
"input.info.txt"),
delim = "\t"
)
Input info shown below.
read_delim(
file = file.path(here::here("results", "01_input_prep"),
"input.info.txt"),
delim = "\t"
)
source(here::here("scripts", "00_trait_h2.R"))
h2 <-
read_delim(
here::here("results", "00_trait_h2", "h2.txt")
)
h2
loci %>%
dplyr::mutate(width = STOP - START) %>%
ggplot(
aes(
x = as.factor(CHR),
y = width)
) +
geom_boxplot() +
scale_y_sqrt(n.breaks = 10) +
labs(x = "Chromosome",
y = "Locus width (bp, square root scale)")
loci %>%
dplyr::count(CHR) %>%
ggplot(
aes(
x = forcats::fct_reorder(.f = as.factor(CHR),
.x = n,
.fun = median,
.desc = TRUE),
y = n)
) +
geom_col() +
labs(x = "Chromosome",
y = "Number of LD blocks")
gwas_loci <-
readRDS(here::here("results", "01_input_prep", "gwas_filtered_loci.rds")) %>%
dplyr::mutate(
GWAS =
fct_relevel(
GWAS,
c("ad", "lbd", "pd", "bip", "mdd", "scz")
),
gwas_type =
case_when(
GWAS %in% c("ad", "lbd", "pd") ~ "neurodegenerative",
GWAS %in% c("bip", "mdd", "scz") ~ "neuropsychiatric"
)
)
gwas_loci %>%
dplyr::count(LD_CHR, LD_LOC, GWAS) %>%
dplyr::inner_join(
gwas_loci %>%
dplyr::count(LD_CHR, LD_LOC, name = "n_total")
) %>%
dplyr::mutate(
LD_LOC = as.factor(LD_LOC)
) %>%
dplyr::slice_max(order_by = n_total, n = 50) %>%
ggplot(
aes(
x = n,
y = forcats::fct_reorder(.f = LD_LOC,
.x = n,
.fun = sum,
.desc = FALSE),
fill = GWAS)
) +
geom_col(colour = "black") +
scale_fill_brewer(
palette = "BrBG",
type = "div",
name = "GWAS"
) +
guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
labs(x = "Number of SNPs",
y = "LD block")
gwas_loci %>%
dplyr::count(LD_CHR, GWAS) %>%
dplyr::mutate(
LD_CHR = as.factor(LD_CHR)
) %>%
ggplot(
aes(
x = fct_rev(LD_CHR),
y = n,
fill = GWAS
)
) +
geom_col(
position = position_dodge2(preserve = "single"),
colour = "black"
) +
scale_fill_brewer(
palette = "BrBG",
type = "div"
) +
guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
labs(x = "Chromosome",
y = "Number of SNPs") +
coord_flip()
gwas_loci %>%
dplyr::distinct(LD_CHR, gwas_type, GWAS) %>%
dplyr::count(LD_CHR, gwas_type) %>%
dplyr::mutate(
LD_CHR = as.factor(LD_CHR)
) %>%
ggplot(
aes(
x = n,
y = forcats::fct_reorder(.f = LD_CHR,
.x = n,
.fun = sum,
.desc = FALSE),
fill = gwas_type)
) +
geom_col(colour = "black") +
scale_fill_manual(
values = RColorBrewer::brewer.pal(4, name = "BrBG")[c(2,3)],
name = "Phenotype category"
) +
guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
labs(
x = "Number of phenotypes with overlapping\ngenome-wide significant SNPs",
y = "Chromosome"
)
data_to_plot <-
gwas_loci %>%
dplyr::distinct(LD_CHR, gwas_type, GWAS, LD_LOC) %>%
dplyr::count(LD_CHR, gwas_type) %>%
dplyr::mutate(
LD_CHR = as.factor(LD_CHR)
)
a <-
data_to_plot %>%
ggplot(
aes(
x = n,
y = forcats::fct_reorder(.f = LD_CHR,
.x = n,
.fun = sum,
.desc = FALSE),
fill = gwas_type)
) +
geom_col(colour = "black") +
scale_fill_manual(
values = RColorBrewer::brewer.pal(4, name = "BrBG")[c(2,3)],
name = "Phenotype category"
) +
labs(
x = "Number of LD blocks with overlapping\ngenome-wide significant SNPs",
y = "Chromosome"
)
b <-
data_to_plot %>%
dplyr::inner_join(
data_to_plot %>%
dplyr::group_by(LD_CHR) %>%
dplyr::mutate(n_total = sum(n))
) %>%
dplyr::mutate(
LD_CHR = as.factor(LD_CHR),
prop = n/n_total
) %>%
ggplot(
aes(
x = prop,
y = forcats::fct_reorder(.f = LD_CHR,
.x = prop,
.fun = max,
.desc = FALSE),
fill = gwas_type)
) +
geom_col(colour = "black") +
scale_fill_manual(
values = RColorBrewer::brewer.pal(4, name = "BrBG")[c(2,3)],
name = "Phenotype category"
) +
labs(
x = "Proportion of LD blocks with overlapping\ngenome-wide significant SNPs each phenotype category",
y = "Chromosome"
)
ggpubr::ggarrange(
a,b,
labels = letters[1:2],
align = "hv",
common.legend = TRUE
)
data_to_plot <-
h2 %>%
dplyr::mutate(
total_observed_scale_h2 =
(total_observed_scale_h2 *100) %>% round(digits = 1),
se =
(total_observed_scale_h2_se * 100) %>% round(digits = 1),
label =
str_c(
total_observed_scale_h2, "%\n(", total_observed_scale_h2_se, ")"
)
)
data_to_plot %>%
ggplot(
aes(
x = 1,
y = fct_rev(phen),
fill = total_observed_scale_h2,
label = label
)
) +
ggplot2::geom_tile(colour = "black") +
ggplot2::geom_text(
size = 2.5
) +
ggplot2::coord_equal() +
ggplot2::labs(x = "", y = "", fill = "Global observed-scale\nSNP heritability (h2)") +
ggplot2::scale_fill_distiller(
palette = "GnBu",
limits = c(0, 100)
) +
ggplot2::guides(colour = F) +
theme_rhr +
ggplot2::theme(
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank()
)
Show/hide
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.5 (2021-03-31)
## os Ubuntu 16.04.7 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2022-09-27
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## ! package * version date lib source
## P abind 1.4-5 2016-07-21 [?] CRAN (R 4.0.5)
## P assertthat 0.2.1 2019-03-21 [?] CRAN (R 4.0.5)
## P backports 1.3.0 2021-10-27 [?] CRAN (R 4.0.5)
## P bit 4.0.4 2020-08-04 [?] CRAN (R 4.0.5)
## P bit64 4.0.5 2020-08-30 [?] CRAN (R 4.0.5)
## P bookdown 0.22 2021-04-22 [?] CRAN (R 4.0.5)
## P broom 0.7.10 2021-10-31 [?] CRAN (R 4.0.5)
## P bslib 0.3.1 2021-10-06 [?] CRAN (R 4.0.5)
## P car 3.0-11 2021-06-27 [?] CRAN (R 4.0.5)
## P carData 3.0-4 2020-05-22 [?] CRAN (R 4.0.5)
## P cellranger 1.1.0 2016-07-27 [?] CRAN (R 4.0.5)
## P cli 3.1.0 2021-10-27 [?] CRAN (R 4.0.5)
## P colorspace 2.0-2 2021-06-24 [?] CRAN (R 4.0.5)
## P cowplot 1.1.1 2020-12-30 [?] CRAN (R 4.0.5)
## P crayon 1.4.2 2021-10-29 [?] CRAN (R 4.0.5)
## P crosstalk 1.1.1 2021-01-12 [?] CRAN (R 4.0.5)
## P curl 4.3.2 2021-06-23 [?] CRAN (R 4.0.5)
## P data.table * 1.14.2 2021-09-27 [?] CRAN (R 4.0.5)
## P DBI 1.1.1 2021-01-15 [?] CRAN (R 4.0.5)
## P dbplyr 2.1.1 2021-04-06 [?] CRAN (R 4.0.5)
## P digest 0.6.29 2021-12-01 [?] CRAN (R 4.0.5)
## P dplyr * 1.0.7 2021-06-18 [?] CRAN (R 4.0.5)
## P DT 0.19 2021-09-02 [?] CRAN (R 4.0.5)
## P ellipsis 0.3.2 2021-04-29 [?] CRAN (R 4.0.5)
## P evaluate 0.14 2019-05-28 [?] CRAN (R 4.0.5)
## P fansi 0.5.0 2021-05-25 [?] CRAN (R 4.0.5)
## P farver 2.1.0 2021-02-28 [?] CRAN (R 4.0.5)
## P fastmap 1.1.0 2021-01-25 [?] CRAN (R 4.0.5)
## P forcats * 0.5.1 2021-01-27 [?] CRAN (R 4.0.5)
## P foreign 0.8-81 2020-12-22 [?] CRAN (R 4.0.5)
## P fs 1.5.1 2021-11-30 [?] CRAN (R 4.0.5)
## P generics 0.1.1 2021-10-25 [?] CRAN (R 4.0.5)
## P ggplot2 * 3.3.5 2021-06-25 [?] CRAN (R 4.0.5)
## P ggpubr 0.4.0 2020-06-27 [?] CRAN (R 4.0.5)
## P ggsignif 0.6.3 2021-09-09 [?] CRAN (R 4.0.5)
## P glue 1.5.1 2021-11-30 [?] CRAN (R 4.0.5)
## P gridExtra 2.3 2017-09-09 [?] CRAN (R 4.0.5)
## P gtable 0.3.0 2019-03-25 [?] CRAN (R 4.0.5)
## P haven 2.4.3 2021-08-04 [?] CRAN (R 4.0.5)
## P here * 1.0.1 2020-12-13 [?] CRAN (R 4.0.5)
## P highr 0.9 2021-04-16 [?] CRAN (R 4.0.5)
## P hms 1.1.1 2021-09-26 [?] CRAN (R 4.0.5)
## P htmltools 0.5.2 2021-08-25 [?] CRAN (R 4.0.5)
## P htmlwidgets 1.5.4 2021-09-08 [?] CRAN (R 4.0.5)
## P httr 1.4.2 2020-07-20 [?] CRAN (R 4.0.5)
## P jquerylib 0.1.4 2021-04-26 [?] CRAN (R 4.0.5)
## P jsonlite 1.7.2 2020-12-09 [?] CRAN (R 4.0.5)
## P knitr 1.36 2021-09-29 [?] CRAN (R 4.0.5)
## P labeling 0.4.2 2020-10-20 [?] CRAN (R 4.0.5)
## P lifecycle 1.0.1 2021-09-24 [?] CRAN (R 4.0.5)
## P lubridate 1.8.0 2021-10-07 [?] CRAN (R 4.0.5)
## P magrittr 2.0.1 2020-11-17 [?] CRAN (R 4.0.5)
## P modelr 0.1.8 2020-05-19 [?] CRAN (R 4.0.5)
## P munsell 0.5.0 2018-06-12 [?] CRAN (R 4.0.5)
## P openxlsx 4.2.4 2021-06-16 [?] CRAN (R 4.0.5)
## P pillar 1.6.4 2021-10-18 [?] CRAN (R 4.0.5)
## P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.0.5)
## P purrr * 0.3.4 2020-04-17 [?] CRAN (R 4.0.5)
## P R6 2.5.1 2021-08-19 [?] CRAN (R 4.0.5)
## P RColorBrewer 1.1-2 2014-12-07 [?] CRAN (R 4.0.5)
## P Rcpp 1.0.7 2021-07-07 [?] CRAN (R 4.0.5)
## P readr * 2.0.2 2021-09-27 [?] CRAN (R 4.0.5)
## P readxl 1.3.1 2019-03-13 [?] CRAN (R 4.0.5)
## P reprex 2.0.0 2021-04-02 [?] CRAN (R 4.0.5)
## P rio 0.5.27 2021-06-21 [?] CRAN (R 4.0.5)
## P rlang 0.4.12 2021-10-18 [?] CRAN (R 4.0.5)
## P rmarkdown 2.11 2021-09-14 [?] CRAN (R 4.0.5)
## P rprojroot 2.0.2 2020-11-15 [?] CRAN (R 4.0.5)
## P rstatix 0.7.0 2021-02-13 [?] CRAN (R 4.0.5)
## P rstudioapi 0.13 2020-11-12 [?] CRAN (R 4.0.5)
## P rvest 1.0.1 2021-07-26 [?] CRAN (R 4.0.5)
## P sass 0.4.0 2021-05-12 [?] CRAN (R 4.0.5)
## P scales 1.1.1 2020-05-11 [?] CRAN (R 4.0.5)
## P sessioninfo * 1.1.1 2018-11-05 [?] CRAN (R 4.0.5)
## P stringi 1.7.6 2021-11-29 [?] CRAN (R 4.0.5)
## P stringr * 1.4.0 2019-02-10 [?] CRAN (R 4.0.5)
## P tibble * 3.1.6 2021-11-07 [?] CRAN (R 4.0.5)
## P tidyr * 1.1.4 2021-09-27 [?] CRAN (R 4.0.5)
## P tidyselect 1.1.1 2021-04-30 [?] CRAN (R 4.0.5)
## P tidyverse * 1.3.1 2021-04-15 [?] CRAN (R 4.0.5)
## P tzdb 0.2.0 2021-10-27 [?] CRAN (R 4.0.5)
## P utf8 1.2.2 2021-07-24 [?] CRAN (R 4.0.5)
## P vctrs 0.3.8 2021-04-29 [?] CRAN (R 4.0.5)
## P vroom 1.5.5 2021-09-14 [?] CRAN (R 4.0.5)
## P withr 2.4.3 2021-11-30 [?] CRAN (R 4.0.5)
## P xfun 0.27 2021-10-18 [?] CRAN (R 4.0.5)
## P xml2 1.3.2 2020-04-23 [?] CRAN (R 4.0.5)
## P yaml 2.2.1 2020-02-01 [?] CRAN (R 4.0.5)
## P zip 2.2.0 2021-05-31 [?] CRAN (R 4.0.5)
##
## [1] /home/rreynolds/misc_projects/neurodegen-psych-local-corr/renv/library/R-4.0/x86_64-pc-linux-gnu
## [2] /opt/R/4.0.5/lib/R/library
##
## P ── Loaded and on-disk path mismatch.