Aim: to format SRP058181, such that it can be used to validate (i) deconvolution and (ii) differentially spliced genes

1 File paths/files for workflow

source(here::here("R", "file_paths.R"))

sample_info <- 
  read_excel(
    path = 
      file.path(
        path_to_raw_data,
        "sample_details/SRP058181_sample_metadata.xlsx")
  ) %>% 
  dplyr::na_if(.,"N/A") %>% 
  dplyr::select(-Proteomics, -Proteomics_SV1, -Proteomics_SV2, -Proteomics_SV3, -Microarray_study_ID) %>% 
  dplyr::mutate(sample_id = `RNA-Seq_Samples`,
                Braak_score = Braak_score %>% 
                  str_replace_all(c("IV" = "4",
                                    "II-III" = "3",
                                    "I-II" = "2",
                                    "III" = "3",
                                    "II" = "2",
                                    "I" = "1")) %>% 
                  as.integer(),
                Disease_group = ifelse(Condition == "Control", "Control",
                                       ifelse(Condition == "PD" & Dementia == "no", "PD",
                                              ifelse(Condition == "PD" & Dementia == "yes", "PDD", NA))),
                Disease_group = replace_na(Disease_group, "PD") %>% 
                  ordered(levels = c("Control", "PD", "PDD"))) %>% 
  dplyr::select(sample_id, Disease_group, everything(), -`RNA-Seq_Samples`)

2 Downloading the data from recount2

  • Data available from recount2 under id, SRP058181.
  • Downloaded using the recount package. As we already have mean coverage and bigwigs, these do not need to be downloaded.
library(recount)

counts <- c("rse-gene", "rse-exon", "counts-gene", "counts-exon")
junctions <- c("rse-jx", "counts-jx")

for(i in 1:length(counts)){
  
  download_study("SRP058181", 
               type = counts[i], 
               outdir = file.path(path_to_recount, "counts"), 
               download = TRUE)
  
}


for(i in 1:length(junctions)){
  
  download_study("SRP058181", 
               type = junctions[i], 
               outdir = file.path(path_to_recount, "junctions"), 
               download = TRUE)
  
}

download_study(
  "SRP058181", 
  type = "phenotype", 
  outdir = path_to_recount, 
  download = TRUE
)

3 Background

  • IMPORTANT: All samples were derived from males of European ancestry.
  • Library prep and sequencing:
    • Illumina TruSeq RNA Sample Prep Kit
    • polyA selection
    • 101-bp paired end on Illumina HiSeq 2000
    • Alignment performed as part of recount2.
      • Aligned to hg38 using Rail-RNA.
      • Gene and exon counts compiled by Rail-RAN using Gencode v25 annotation.

3.1 Sample demographics - discrete variables

  • Note: there are some individuals with both PD and dementia. For downstream analyses, these should be separated into their own group (PDD) to mirror our experimental set-up.
    Number of individuals (a) from each brain bank and (b) with dementia across disease groups.

    (#fig:sample Dementia)Number of individuals (a) from each brain bank and (b) with dementia across disease groups.

3.2 Sample demographics - numeric variables

  • Individuals with PD and a "yes" in the sample info were assigned to the PDD group in the newly created "Disease_group" column.
  • Individuals with PD and "NA" in the "Dementia" column were assigned to the PD group in the newly created "Disease_group" column.
  • Worth noting that some of the numeric variables are not complete for all samples.
    • Only 11 samples with motor onset
    • 9 samples without Braak score
    • 10 PD/PDD samples without disease duration.
      Plots of clinical, pathological and sample measures across disease groups.

      (#fig:sample demographics)Plots of clinical, pathological and sample measures across disease groups.

3.2.1 Kruskal-Wallis

3.2.2 Post-hoc pairwise wilcox tests with FDR correction

3.2.3 Summary

  • Braak score (i.e. AD pathology), age at death and RIN found to be significantly different between groups.
  • Post-hoc pairwise wilcox tests (corrected for multiple comparisons with FDR) demonstrate that significant differences for braak score, age at death and RIN are only observed between each disease group and the control group, and not between PD with/without dementia.
  • Matched for PMI.
  • Cannot really say whether matched for motor onset and disease duration given missing values.

4 Deconvolution

4.1 Summary

  • Looking only at controls, the two datasets are indistinguishable, suggesting that cell-type proportions between the two are similar.
  • We do pick up on some similar themes e.g. astrocyte and oligodendrocyte proportions appear to be quite similar between the two datasets; similar rise of microglial and vascular proportions in PD compared to controls seen in both datasets.

5 Post-quantification QC

For details, see: SRP058181_post_quant_QC.html

6 Leafcutter

  • To determine appropriate covariate-correction, ran PCA with different correction methods applied to expression data. For details, see: SRP058181_post_correction_PCA.html
  • Differential splicing performed with two different covariate correction strategies:

7 Replication analyses

8 Session info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] UpSetR_1.4.0                rtracklayer_1.46.0         
##  [3] GenomicRanges_1.38.0        GenomeInfoDb_1.22.1        
##  [5] IRanges_2.20.2              S4Vectors_0.24.4           
##  [7] BiocGenerics_0.32.0         RNAseqProcessing_0.0.0.9000
##  [9] readxl_1.3.1                forcats_0.5.1              
## [11] stringr_1.4.0               dplyr_1.0.2                
## [13] purrr_0.3.4                 readr_1.4.0                
## [15] tidyr_1.1.1                 tibble_3.0.3               
## [17] tidyverse_1.3.0             ggpubr_0.4.0               
## [19] ggplot2_3.3.2               ggsci_2.9                  
## [21] GeneOverlap_1.22.0          gProfileR_0.7.0            
## [23] data.table_1.13.0           corrplot_0.84              
## [25] broom_0.7.0                
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_2.0-0            ggsignif_0.6.0             
##  [3] ellipsis_0.3.1              rio_0.5.16                 
##  [5] rprojroot_2.0.2             XVector_0.26.0             
##  [7] fs_1.5.0                    rstudioapi_0.13            
##  [9] farver_2.0.3                DT_0.15                    
## [11] lubridate_1.7.9             xml2_1.3.2                 
## [13] knitr_1.29                  jsonlite_1.7.1             
## [15] Rsamtools_2.2.3             dbplyr_1.4.4               
## [17] png_0.1-7                   compiler_3.6.1             
## [19] httr_1.4.2                  backports_1.1.8            
## [21] assertthat_0.2.1            Matrix_1.2-17              
## [23] cli_2.2.0.9000              htmltools_0.5.1.1          
## [25] tools_3.6.1                 gtable_0.3.0               
## [27] glue_1.4.2                  GenomeInfoDbData_1.2.2     
## [29] Rcpp_1.0.5                  carData_3.0-4              
## [31] Biobase_2.46.0              cellranger_1.1.0           
## [33] vctrs_0.3.2                 Biostrings_2.54.0          
## [35] gdata_2.18.0                crosstalk_1.1.0.1          
## [37] xfun_0.16                   openxlsx_4.2.3             
## [39] rvest_0.3.6                 lifecycle_0.2.0            
## [41] gtools_3.8.2                rstatix_0.6.0              
## [43] XML_3.99-0.3                zlibbioc_1.32.0            
## [45] scales_1.1.1                hms_1.0.0                  
## [47] SummarizedExperiment_1.16.1 RColorBrewer_1.1-2         
## [49] yaml_2.2.1                  curl_4.3                   
## [51] gridExtra_2.3               stringi_1.5.3              
## [53] highr_0.8                   caTools_1.18.0             
## [55] zip_2.1.0                   BiocParallel_1.20.1        
## [57] rlang_0.4.7                 pkgconfig_2.0.3            
## [59] bitops_1.0-6                matrixStats_0.56.0         
## [61] evaluate_0.14               lattice_0.20-38            
## [63] labeling_0.4.2              htmlwidgets_1.5.3          
## [65] GenomicAlignments_1.22.1    cowplot_1.0.0              
## [67] tidyselect_1.1.0            here_1.0.0                 
## [69] plyr_1.8.6                  magrittr_2.0.1             
## [71] bookdown_0.21               R6_2.5.0                   
## [73] gplots_3.0.4                generics_0.0.2             
## [75] DelayedArray_0.12.3         DBI_1.1.1                  
## [77] pillar_1.4.6                haven_2.3.1                
## [79] foreign_0.8-72              withr_2.2.0                
## [81] abind_1.4-5                 RCurl_1.98-1.2             
## [83] modelr_0.1.8                crayon_1.4.1               
## [85] car_3.0-9                   KernSmooth_2.23-15         
## [87] rmarkdown_2.5               grid_3.6.1                 
## [89] blob_1.2.1                  reprex_1.0.0               
## [91] digest_0.6.27               munsell_0.5.0