Aim: to format SRP058181, such that it can be used to validate (i) deconvolution and (ii) differentially spliced genes

1 File paths/files for workflow

source(here::here("R", "file_paths.R"))

sample_info <- 
  read_excel(
    path = 
      file.path(
        path_to_raw_data,
        "sample_details/SRP058181_sample_metadata.xlsx")
  ) %>% 
  dplyr::na_if(.,"N/A") %>% 
  dplyr::select(-Proteomics, -Proteomics_SV1, -Proteomics_SV2, -Proteomics_SV3, -Microarray_study_ID) %>% 
  dplyr::mutate(sample_id = `RNA-Seq_Samples`,
                Braak_score = Braak_score %>% 
                  str_replace_all(c("IV" = "4",
                                    "II-III" = "3",
                                    "I-II" = "2",
                                    "III" = "3",
                                    "II" = "2",
                                    "I" = "1")) %>% 
                  as.integer(),
                Disease_group = ifelse(Condition == "Control", "Control",
                                       ifelse(Condition == "PD" & Dementia == "no", "PD",
                                              ifelse(Condition == "PD" & Dementia == "yes", "PDD", NA))),
                Disease_group = replace_na(Disease_group, "PD") %>% 
                  ordered(levels = c("Control", "PD", "PDD"))) %>% 
  dplyr::select(sample_id, Disease_group, everything(), -`RNA-Seq_Samples`)

2 Downloading the data from recount2

Data available from recount2 under id, SRP058181.
Downloaded using the recount package. As we already have mean coverage and bigwigs, these do not need to be downloaded.

library(recount)

counts <- c("rse-gene", "rse-exon", "counts-gene", "counts-exon")
junctions <- c("rse-jx", "counts-jx")

for(i in 1:length(counts)){
  
  download_study("SRP058181", 
               type = counts[i], 
               outdir = file.path(path_to_recount, "counts"), 
               download = TRUE)
  
}


for(i in 1:length(junctions)){
  
  download_study("SRP058181", 
               type = junctions[i], 
               outdir = file.path(path_to_recount, "junctions"), 
               download = TRUE)
  
}

download_study(
  "SRP058181", 
  type = "phenotype", 
  outdir = path_to_recount, 
  download = TRUE
)

3 Background

Reference: https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-016-0164-y
RNA-seq performed on prefrontal cortex (BA9), using 44 neurologically normal control inidividuals and 29 individuals with diagnosed PD.
- Note that this does differ from our PD-sequencing samples, which were sampled from anterior cingulate (BA24, BA32 and BA33).

IMPORTANT: All samples were derived from males of European ancestry.
Library prep and sequencing:
- Illumina TruSeq RNA Sample Prep Kit
- polyA selection
- 101-bp paired end on Illumina HiSeq 2000
- Alignment performed as part of recount2.
  - Aligned to hg38 using Rail-RNA.
  - Gene and exon counts compiled by Rail-RAN using Gencode v25 annotation.

3.1 Sample demographics - discrete variables

Note: there are some individuals with both PD and dementia. For downstream analyses, these should be separated into their own group (PDD) to mirror our experimental set-up.

(#fig:sample Dementia)Number of individuals (a) from each brain bank and (b) with dementia across disease groups.

3.2 Sample demographics - numeric variables

Individuals with PD and a "yes" in the sample info were assigned to the PDD group in the newly created "Disease_group" column.
Individuals with PD and "NA" in the "Dementia" column were assigned to the PD group in the newly created "Disease_group" column.
Worth noting that some of the numeric variables are not complete for all samples.
- Only 11 samples with motor onset
- 9 samples without Braak score
- 10 PD/PDD samples without disease duration.
  
  (#fig:sample demographics)Plots of clinical, pathological and sample measures across disease groups.

3.2.1 Kruskal-Wallis

3.2.2 Post-hoc pairwise wilcox tests with FDR correction

3.2.3 Summary

Braak score (i.e. AD pathology), age at death and RIN found to be significantly different between groups.
Post-hoc pairwise wilcox tests (corrected for multiple comparisons with FDR) demonstrate that significant differences for braak score, age at death and RIN are only observed between each disease group and the control group, and not between PD with/without dementia.
Matched for PMI.
Cannot really say whether matched for motor onset and disease duration given missing values.

4 Deconvolution

Deconvolution of SRP058181, using our own snRNA-seq. See report: SRP058181_deconvolution.html.

4.1 Summary

Looking only at controls, the two datasets are indistinguishable, suggesting that cell-type proportions between the two are similar.
We do pick up on some similar themes e.g. astrocyte and oligodendrocyte proportions appear to be quite similar between the two datasets; similar rise of microglial and vascular proportions in PD compared to controls seen in both datasets.

5 Post-quantification QC

For details, see: SRP058181_post_quant_QC.html

6 Leafcutter

To determine appropriate covariate-correction, ran PCA with different correction methods applied to expression data. For details, see: SRP058181_post_correction_PCA.html
Differential splicing performed with two different covariate correction strategies:
- AoD & RIN
- AoD, RIN, PMI and cell-type proportions
- For full details, see: SRP058181_leafcutter.html

7 Replication analyses

For details, refer to: cluster_validation_btwn_datasets.html
- Layers of replication:
  1. At level of junctions
  2. At level of clusters
  3. At level of differential splicing

8 Session info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] UpSetR_1.4.0                rtracklayer_1.46.0         
##  [3] GenomicRanges_1.38.0        GenomeInfoDb_1.22.1        
##  [5] IRanges_2.20.2              S4Vectors_0.24.4           
##  [7] BiocGenerics_0.32.0         RNAseqProcessing_0.0.0.9000
##  [9] readxl_1.3.1                forcats_0.5.1              
## [11] stringr_1.4.0               dplyr_1.0.2                
## [13] purrr_0.3.4                 readr_1.4.0                
## [15] tidyr_1.1.1                 tibble_3.0.3               
## [17] tidyverse_1.3.0             ggpubr_0.4.0               
## [19] ggplot2_3.3.2               ggsci_2.9                  
## [21] GeneOverlap_1.22.0          gProfileR_0.7.0            
## [23] data.table_1.13.0           corrplot_0.84              
## [25] broom_0.7.0                
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_2.0-0            ggsignif_0.6.0             
##  [3] ellipsis_0.3.1              rio_0.5.16                 
##  [5] rprojroot_2.0.2             XVector_0.26.0             
##  [7] fs_1.5.0                    rstudioapi_0.13            
##  [9] farver_2.0.3                DT_0.15                    
## [11] lubridate_1.7.9             xml2_1.3.2                 
## [13] knitr_1.29                  jsonlite_1.7.1             
## [15] Rsamtools_2.2.3             dbplyr_1.4.4               
## [17] png_0.1-7                   compiler_3.6.1             
## [19] httr_1.4.2                  backports_1.1.8            
## [21] assertthat_0.2.1            Matrix_1.2-17              
## [23] cli_2.2.0.9000              htmltools_0.5.1.1          
## [25] tools_3.6.1                 gtable_0.3.0               
## [27] glue_1.4.2                  GenomeInfoDbData_1.2.2     
## [29] Rcpp_1.0.5                  carData_3.0-4              
## [31] Biobase_2.46.0              cellranger_1.1.0           
## [33] vctrs_0.3.2                 Biostrings_2.54.0          
## [35] gdata_2.18.0                crosstalk_1.1.0.1          
## [37] xfun_0.16                   openxlsx_4.2.3             
## [39] rvest_0.3.6                 lifecycle_0.2.0            
## [41] gtools_3.8.2                rstatix_0.6.0              
## [43] XML_3.99-0.3                zlibbioc_1.32.0            
## [45] scales_1.1.1                hms_1.0.0                  
## [47] SummarizedExperiment_1.16.1 RColorBrewer_1.1-2         
## [49] yaml_2.2.1                  curl_4.3                   
## [51] gridExtra_2.3               stringi_1.5.3              
## [53] highr_0.8                   caTools_1.18.0             
## [55] zip_2.1.0                   BiocParallel_1.20.1        
## [57] rlang_0.4.7                 pkgconfig_2.0.3            
## [59] bitops_1.0-6                matrixStats_0.56.0         
## [61] evaluate_0.14               lattice_0.20-38            
## [63] labeling_0.4.2              htmlwidgets_1.5.3          
## [65] GenomicAlignments_1.22.1    cowplot_1.0.0              
## [67] tidyselect_1.1.0            here_1.0.0                 
## [69] plyr_1.8.6                  magrittr_2.0.1             
## [71] bookdown_0.21               R6_2.5.0                   
## [73] gplots_3.0.4                generics_0.0.2             
## [75] DelayedArray_0.12.3         DBI_1.1.1                  
## [77] pillar_1.4.6                haven_2.3.1                
## [79] foreign_0.8-72              withr_2.2.0                
## [81] abind_1.4-5                 RCurl_1.98-1.2             
## [83] modelr_0.1.8                crayon_1.4.1               
## [85] car_3.0-9                   KernSmooth_2.23-15         
## [87] rmarkdown_2.5               grid_3.6.1                 
## [89] blob_1.2.1                  reprex_1.0.0               
## [91] digest_0.6.27               munsell_0.5.0

SRP058181

Regina H. Reynolds

1 File paths/files for workflow

2 Downloading the data from recount2

3 Background

3.1 Sample demographics - discrete variables

3.2 Sample demographics - numeric variables

3.2.1 Kruskal-Wallis

3.2.2 Post-hoc pairwise wilcox tests with FDR correction

3.2.3 Summary

4 Deconvolution

4.1 Summary

5 Post-quantification QC

6 Leafcutter

7 Replication analyses

8 Session info