Aims: Using SRP058181 (i) determine effect of covariate correction on gene expression and (ii) determine whether a PC axis can be correlated with changes in the proportions of more than one cell-type.

1 File paths for workflow

source(here::here("R", "file_paths.R"))

2 Data pre-correction

2.1 Loading and filtering data

  • Recount2 provide a SummarisedExperiment. DESeq does have a function DESeqDataSet to convert such SummarisedExperiment objects. However, metadata in the summarised experiment does not include columns for sample characteristics including PMI, RIN, etc.
  • Therefore, chose to re-generate the DESeqDataSet from the count matrix in the SummarisedExperiment. This was done in: SRP058181_post_quant_QC.html.
  • Given that PCA plots without filtered data already available in SRP058181_post_quant_QC.html, will instead use filtered (but not batch-corrected) data initially.
  • Samples removed
dds <- 
  readRDS(
    file.path(
      path_to_results,
      "SRP058181/gene_level_quant/SRP058181_DESeqDataSet.Rds"
    )
  )
dds <- estimateSizeFactors(dds)

# Design formula
design(dds)
## ~Disease_Group
# Remove SRR2015746 (C0061), due to uncertainty re. sex
dds <- dds[, dds$recount_id != "SRR2015746"]
# dds <- dds[, dds$Source != "HBSFRC"]

# Filter genes such that only genes with a minimum count of 1 across 100% of samples in each group are retained
filter_count <- DEGreport::degFilter(counts = counts(dds),
                                     metadata = as.data.frame(colData(dds)),
                                     group = "Disease_Group",
                                     min = 1, # All samples in group must have more than expr > 0
                                     minreads = 0)
cat("Genes in final count matrix: ", nrow(filter_count))
## Genes in final count matrix:  27357
filter_dds <- dds[rownames(filter_count),] 
  • Following filtering, the number of indivuals per source and disease group are:
colData(filter_dds) %>% 
  as_tibble() %>% 
  dplyr::group_by(Source, Disease_Group) %>% 
  dplyr::summarise(n = n())

2.2 Correlations between covariates

  • See positive correlations between:
    • Disease group and dementia status (as would be expected)
    • Disease group and Braak score
    • Astrocyte, microglial, OPC and vascular proportions.
    • Excitatory and inhibitory neuron proportions.
    • Several measures of read count i.e. sizeFactor, read_count_as_reported_by_sra, reads_downloaded, mapped_read_count, auc
    • Source and PMI -- also expected if different brain banks have different criteria for PMI
  • See negative correlations between:
    • Excitatory/inhibitory neuron proportions and age at death.
    • Excitatory/inhibitory neuron proportions and oligo proportions.
DEGreport::degCorCov(
  colData(filter_dds)[, 
                      !colnames(colData(filter_dds)) %in% 
                        c("Lewy_Body_pathology", "ApoE", "Cause_of_death", 
                          "proportion_of_reads_reported_by_sra_downloaded", "paired_end",
                          "sra_misreported_paired_end", "Motor_onset", "Disease_duration")]
)

2.3 Heatmap of sample-to-sample distances

  • Transformed count matrix can be transposed (i.e. samples as rows and genes as columns), and the distance between samples can be computed and plotted in a heatmap. This can then provide an overview of the similarities/dissimilarities between samples.
  • As count data is not normally distributed,vst function from DESeq2 used to apply a variance-stabilising transformation. This transformation is roughly similar to putting the data on the log2 scale, while also dealing with the sampling variability of low counts. See Regularized logarithm transformation from DESeq2 paper for more thorough explanation. N.B. The vst function corrects for library size.
  • vst transformation has the option of being blinded to experimental design when estimating dispersion. By setting blind = FALSE, vst will use the design formula to calculate within-group variability, and if blind = TRUE it will calculate the across-all-sample variability. It does not use the design to remove variation in the data. Thus, it will not remove variation that can be associated with batch or other covariates.
vsd <- vst(object = filter_dds, 
           blind = FALSE # not blind to experimental design   
           )

sampleDists <- vsd %>% 
  assay() %>% 
  t() %>% # Transpose transformed count matrix, such that samples now rows.
  dist(method = "euclidean") # Compute distances between rows of the matrix i.e. sample-to-sample distance
  
sampleDistMatrix <- sampleDists %>% 
  as.matrix() # Convert to matrix

rownames(sampleDistMatrix) <- str_c(vsd$Disease_Group, ", ", vsd$Source)
colnames(sampleDistMatrix) <- NULL
pheatmap <- pheatmap(sampleDistMatrix,
                     clustering_distance_rows=sampleDists,
                     clustering_distance_cols=sampleDists,
                     col= viridis(n = 20), 
                     main = "Heatmap: uncorrected data")

  • Based on heatmap of sample-to-sample distances, some samples do appear to be grouping by disease, although it is not entirely clear cut.
  • Re-assuringly, there is mixing of the sources i.e. samples not clustering by brain bank.

2.4 Scree plot

  • A scree plot shows the fraction of total variance explained by each PC.
pca_vsd <- prcomp(t(assay(vsd)))

tibble(PC = c(1:72), 
       sdev = pca_vsd$sdev) %>% 
  dplyr::mutate(d = sdev^2, 
                pev = d/sum(d), 
                cev = cumsum(d)/sum(d))
ggarrange(pcascree(pca_vsd, type = "pev") + theme_rhr,
          pcascree(pca_vsd, type = "cev") + theme_rhr, 
          nrow = 2,
          labels = c("a", "b"))
Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

Figure 2.1: Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

2.5 Correlation of PC axes with sample covariates

  • Easiest way to determine associations between PC and co-variates is to perform a correlation analysis between PCs and co-variates.
  • Figure and table below show that there is a significant association between PC1 and RIN, PC1 and Age at death (AoD), as well as proportions of several cell types.
  • Table shows FDR-corrected p-values of association.
  • Based on these analyses, will be important to correct any analyses for:
    • RIN
    • AoD
    • Several cell-type proportions.
  • Note: Sex not included in these analyses as all individuals are male.
# Source function correlate PCs
source(here::here("R", "correlatePCs.R"))

PC_corr <- correlatePCs(pcaobj = pca_vsd, 
                        coldata = colData(vsd) %>% 
                          as.data.frame() %>% 
                          dplyr::select(Disease_Group, Age_at_death, PMI, RIN, Braak_score,mapped_read_count, contains("_proportion")), 
                        pcs = 1:72)

# Some proportions so highly correlated that p-value = 0
# Replace these 0s with p-value lower than minimum of 1.6E-9
PC_corr_FDR_uncorrected <- PC_corr$pvalue %>% 
  as_tibble() %>% 
  dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
  tidyr::gather(key = covariate, value = p, -PC) %>% 
  dplyr::mutate(p = case_when(p == 0 ~ 1 * 10^-12,
                              TRUE ~ p),
                FDR = p.adjust(p, method = "fdr"))
PC_corr_FDR_mat <- PC_corr_FDR_uncorrected %>% 
  dplyr::select(-p) %>% 
  tidyr::spread(key = covariate, value = FDR) %>% 
  dplyr::mutate(PC_number = str_replace(PC, "PC_", "") %>% 
                  as.numeric()) %>% 
  dplyr::arrange(PC_number) %>% 
  dplyr::select(-PC_number, -PC) %>% 
  as.matrix()

PC_corr_FDR_mat <- PC_corr_FDR_mat[, match(colnames(PC_corr$statistic), colnames(PC_corr_FDR_mat))]
rownames(PC_corr_FDR_mat) <- PC_corr_FDR_uncorrected %>% .[["PC"]] %>% unique()

heatmap.2(x = -log10(PC_corr_FDR_mat[1:40,]) %>% t(), 
          Colv = NA, 
          Rowv = NA,
          dendrogram = "none",
          col = viridis(n = 50),
          trace = "none",
          density.info = "none", 
          key = TRUE, 
          key.xlab = "-log10(FDR-corrected p-value)", 
          cexRow = 0.65, cexCol = 0.65)
Heatmap of significance of (cor)relations between PCA scores (only first 40) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

Figure 2.2: Heatmap of significance of (cor)relations between PCA scores (only first 40) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

  • Worth noting that the "strongest effect" of disease group was with PC1. Cannot speak of correlations with disease group as testing using Kruskal-Wallis.
  • This is something we may want to bear in mind if we correct using PC axes -- could correct away effect of disease group.

3 Data following correction with covariates as in original study: AoD, PMI and RIN

Start by correcting for same covariates as used in original study without cell-type proportions

3.1 Correcting the data

  • While PMI was not found significantly correlated with any PC axes (unlike AoD and RIN), will correct for this as this was also performed in the original study.
  • In the original study, RIN was binned into two categories, RIN <= 7 and RIN > 7. In this case, we will just use numbers as is.
  • As setting blind = FALSE in vst does not remove variation due to confounding covariates, this will have to be done using limma::removeBatchEffect, which removes any shifts in the log2-scale expression data that can be explained by confounders.
  • This batch-corrected data can then be plotted in a similar manner to above.
  • Note: scale() function is used to ensure that overall mean expression of each gene (i.e. as calculated with rowMeans()) remains unaffected by limma::removeBatchEffect. In general, scale() is also recommended as it ensures variables that are measured in different scales (e.g. age of death vs RIN) are comparable.
vsd_covar <- vst(filter_dds,
                     blind = FALSE)

# Current design
design(filter_dds)
## ~Disease_Group
# Change design
design(filter_dds) <- ~ Age_at_death + PMI + RIN + Disease_Group
design(filter_dds)
## ~Age_at_death + PMI + RIN + Disease_Group
# Batch correcting and filtering vsd data 
design <- model.matrix(design(filter_dds), data = colData(filter_dds))

design
##            (Intercept) Age_at_death PMI RIN Disease_GroupPD Disease_GroupPDD
## SRR2015713           1           73   2 7.7               0                0
## SRR2015714           1           91   2 7.9               0                0
## SRR2015715           1           82   2 8.6               0                0
## SRR2015716           1           97   2 9.1               0                0
## SRR2015717           1           86   5 8.6               0                0
## SRR2015718           1           91   2 8.7               0                0
## SRR2015719           1           81   3 6.0               0                0
## SRR2015720           1           79   2 8.4               0                0
## SRR2015721           1           63   2 6.5               0                0
## SRR2015722           1           66  19 7.1               0                0
## SRR2015723           1           69  15 7.8               0                0
## SRR2015724           1           79  21 8.0               0                0
## SRR2015725           1           61  10 8.2               0                0
## SRR2015726           1           58  20 8.4               0                0
## SRR2015727           1           70  21 8.2               0                0
## SRR2015728           1           66  17 8.5               0                0
## SRR2015729           1           73  19 7.5               0                0
## SRR2015730           1           60  24 7.9               0                0
## SRR2015731           1           76  26 7.3               0                0
## SRR2015732           1           61  17 7.8               0                0
## SRR2015733           1           62  18 6.6               0                0
## SRR2015734           1           69  26 8.7               0                0
## SRR2015735           1           61  25 8.1               0                0
## SRR2015736           1           88  11 7.1               0                0
## SRR2015737           1           93  13 6.4               0                0
## SRR2015738           1           53  24 7.3               0                0
## SRR2015739           1           57  24 8.3               0                0
## SRR2015740           1           46  21 7.6               0                0
## SRR2015741           1           57  20 7.7               0                0
## SRR2015742           1           80  15 7.3               0                0
## SRR2015743           1           74   2 8.5               0                0
## SRR2015744           1           69   2 8.4               0                0
## SRR2015745           1           76   2 7.5               0                0
## SRR2015747           1           87   2 8.7               0                0
## SRR2015748           1           86   2 8.7               0                0
## SRR2015749           1           54  24 8.3               0                0
## SRR2015750           1           68  19 6.3               0                0
## SRR2015751           1           52  23 7.4               0                0
## SRR2015752           1           46  30 8.2               0                0
## SRR2015753           1           55  26 7.6               0                0
## SRR2015754           1           57  18 7.8               0                0
## SRR2015755           1           66  32 8.4               0                0
## SRR2015756           1           64  19 8.7               0                0
## SRR2015757           1           74   3 7.2               0                1
## SRR2015758           1           83   2 7.1               0                1
## SRR2015759           1           80   2 8.5               1                0
## SRR2015760           1           83   2 7.3               0                1
## SRR2015761           1           80   2 8.4               1                0
## SRR2015762           1           84   2 6.7               0                1
## SRR2015763           1           88   2 8.1               0                1
## SRR2015764           1           81   2 5.8               1                0
## SRR2015765           1           77   4 6.1               1                0
## SRR2015766           1           64   4 8.1               1                0
## SRR2015767           1           85   3 6.1               0                1
## SRR2015768           1           94   9 6.4               1                0
## SRR2015769           1           80  27 6.5               0                1
## SRR2015770           1           85  16 6.9               1                0
## SRR2015771           1           75   7 8.0               1                0
## SRR2015772           1           74  15 7.1               1                0
## SRR2015773           1           89  31 6.1               1                0
## SRR2015774           1           66  11 7.7               1                0
## SRR2015775           1           65   8 6.8               1                0
## SRR2015776           1           85  19 6.1               1                0
## SRR2015777           1           64   4 7.8               0                1
## SRR2015778           1           64   1 6.1               1                0
## SRR2015779           1           75  30 6.9               0                1
## SRR2015780           1           68  18 7.3               0                1
## SRR2015781           1           95  16 7.0               0                1
## SRR2015782           1           74  23 7.5               1                0
## SRR2015783           1           68  23 7.9               1                0
## SRR2015784           1           79  12 6.9               1                0
## SRR2015785           1           70  25 6.7               1                0
## attr(,"assign")
## [1] 0 1 2 3 4 4
## attr(,"contrasts")
## attr(,"contrasts")$Disease_Group
## [1] "contr.treatment"
design_treatment <- design[,c("(Intercept)", "Disease_GroupPD", "Disease_GroupPDD")]
design_batch <- design[,c("Age_at_death", "PMI", "RIN")] %>% 
  # standardise covariates
  scale()

assay(vsd_covar) <- limma::removeBatchEffect(assay(vsd_covar), design = design_treatment, covariates = design_batch)

3.2 Heatmap of sample-to-sample distances

  • If we now plot the corrected vst-transformed data we see no improvement on clustering by disease group. If anything, same as before.
sampleDists <- vsd_covar %>% 
  assay() %>% 
  t() %>% # Transpose transformed count matrix, such that samples now rows.
  dist(method = "euclidean") # Compute distances between rows of the matrix i.e. sample-to-sample distance
  
sampleDistMatrix <- sampleDists %>% 
  as.matrix() # Convert to matrix

rownames(sampleDistMatrix) <- str_c(vsd_covar$Disease_Group, ", ", vsd_covar$Source)
colnames(sampleDistMatrix) <- NULL
pheatmap <- pheatmap(sampleDistMatrix,
                     clustering_distance_rows=sampleDists,
                     clustering_distance_cols=sampleDists,
                     col= viridis(n = 20), 
                     main = "Heatmap: corrected by AoD, PMI and RIN")

3.3 Scree plot

pca_vsd_covar <- prcomp(t(assay(vsd_covar)))

ggarrange(pcascree(pca_vsd_covar, type = "pev") + theme_rhr,
          pcascree(pca_vsd_covar, type = "cev") + theme_rhr, 
          nrow = 2,
          labels = c("a", "b"))
Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

Figure 3.1: Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

3.4 Correlation of PC axes with remaining sample covariates

  • Removing effects of age of death, PMI and RIN result in no correlation of PCs with disease group.
  • Further, cell-type proportions still correlated with PC axes that explain most variation in the data.
PC_corr <- 
  correlatePCs(pcaobj = pca_vsd_covar, 
               coldata = colData(vsd_covar) %>% 
                                as.data.frame() %>% 
                                dplyr::select(Disease_Group, Braak_score,mapped_read_count,contains("_proportion")), 
               pcs = 1:72)

# Replace p-values = 0, with low-pvalue
PC_corr_FDR_covar <- PC_corr$pvalue %>% 
  as_tibble() %>% 
  dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
  tidyr::gather(key = covariate, value = p, -PC) %>% 
  dplyr::mutate(p = case_when(p == 0 ~ 1*10^-12,
                              TRUE ~ p),
                FDR = p.adjust(p, method = "fdr"))

PC_corr_FDR_covar %>% 
  dplyr::inner_join(PC_corr$statistic %>% 
               as_tibble() %>% 
               dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
               tidyr::gather(key = covariate, value = test_statistic, -PC)) %>% 
  dplyr::arrange(FDR) %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')
PC_corr_FDR_mat <- PC_corr_FDR_covar %>% 
  dplyr::select(-p) %>% 
  tidyr::spread(key = covariate, value = FDR) %>% 
  dplyr::mutate(PC_number = str_replace(PC, "PC_", "") %>% 
                  as.numeric()) %>% 
  dplyr::arrange(PC_number) %>% 
  dplyr::select(-PC_number, -PC) %>% 
  as.matrix()
  
PC_corr_FDR_mat <- PC_corr_FDR_mat[, match(colnames(PC_corr$statistic), colnames(PC_corr_FDR_mat))]
rownames(PC_corr_FDR_mat) <- PC_corr_FDR_covar %>% .[["PC"]] %>% unique()

heatmap.2(x = -log10(PC_corr_FDR_mat)[1:40,] %>% t(), 
          Colv = NA, 
          Rowv = NA,
          dendrogram = "none",
          col = viridis(n = 50),
          trace = "none",
          density.info = "none", 
          key = TRUE, 
          key.xlab = "-log10(FDR-corrected p-value)", 
          cexRow = 0.65, cexCol = 0.65)
Heatmap of significance of (cor)relations between PCA scores (from vst-transformed data corrected by AoD, PMI and RIN) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

Figure 3.2: Heatmap of significance of (cor)relations between PCA scores (from vst-transformed data corrected by AoD, PMI and RIN) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

4 Data following correction with covariates: AoD, PMI, RIN and cell-type proportions

Correcting for covariates, including cell-type proportions.

4.1 Correcting the data

  • Correcting for covariates found to be significantly correlated with PC axes from filtered, but uncorrected data. These include:
# Get covar names
covar_colnames <- c(PC_corr_FDR_uncorrected %>% 
  dplyr::filter(FDR < 0.05, covariate != "Disease_Group") %>% 
  .[["covariate"]] %>% 
  unique(), "PMI") 

covar_colnames
##  [1] "Age_at_death"          "RIN"                   "Vascular_proportion"  
##  [4] "Astro_proportion"      "OPC_proportion"        "Oligo_proportion"     
##  [7] "Inhibitory_proportion" "Microglia_proportion"  "Excitatory_proportion"
## [10] "PMI"
vsd_covar_ct <- vst(filter_dds,
                     blind = FALSE)

# Current design
design(filter_dds)
## ~Age_at_death + PMI + RIN + Disease_Group
# Change design
design(filter_dds) <- ~ Age_at_death + PMI + RIN + Astro_proportion + Excitatory_proportion + Inhibitory_proportion + Microglia_proportion + Oligo_proportion + OPC_proportion + Vascular_proportion + Disease_Group
design(filter_dds)
## ~Age_at_death + PMI + RIN + Astro_proportion + Excitatory_proportion + 
##     Inhibitory_proportion + Microglia_proportion + Oligo_proportion + 
##     OPC_proportion + Vascular_proportion + Disease_Group
# Batch correcting and filtering vsd data 
design <- model.matrix(design(filter_dds), data = colData(filter_dds))

design
##            (Intercept) Age_at_death PMI RIN Astro_proportion
## SRR2015713           1           73   2 7.7      0.317628470
## SRR2015714           1           91   2 7.9      0.144444110
## SRR2015715           1           82   2 8.6      0.264410170
## SRR2015716           1           97   2 9.1      0.133938210
## SRR2015717           1           86   5 8.6      0.082716680
## SRR2015718           1           91   2 8.7      0.301827300
## SRR2015719           1           81   3 6.0      0.004081459
## SRR2015720           1           79   2 8.4      0.264108930
## SRR2015721           1           63   2 6.5      0.293554540
## SRR2015722           1           66  19 7.1      0.025600335
## SRR2015723           1           69  15 7.8      0.186805170
## SRR2015724           1           79  21 8.0      0.269654270
## SRR2015725           1           61  10 8.2      0.110553660
## SRR2015726           1           58  20 8.4      0.069511300
## SRR2015727           1           70  21 8.2      0.166278260
## SRR2015728           1           66  17 8.5      0.096365480
## SRR2015729           1           73  19 7.5      0.303567680
## SRR2015730           1           60  24 7.9      0.017056694
## SRR2015731           1           76  26 7.3      0.191778260
## SRR2015732           1           61  17 7.8      0.033671357
## SRR2015733           1           62  18 6.6      0.243732440
## SRR2015734           1           69  26 8.7      0.224252780
## SRR2015735           1           61  25 8.1      0.052709340
## SRR2015736           1           88  11 7.1      0.232266560
## SRR2015737           1           93  13 6.4      0.305767830
## SRR2015738           1           53  24 7.3      0.030169912
## SRR2015739           1           57  24 8.3      0.071914020
## SRR2015740           1           46  21 7.6      0.008273732
## SRR2015741           1           57  20 7.7      0.016609421
## SRR2015742           1           80  15 7.3      0.145583400
## SRR2015743           1           74   2 8.5      0.100300690
## SRR2015744           1           69   2 8.4      0.117336130
## SRR2015745           1           76   2 7.5      0.264845280
## SRR2015747           1           87   2 8.7      0.097025365
## SRR2015748           1           86   2 8.7      0.347456960
## SRR2015749           1           54  24 8.3      0.079489390
## SRR2015750           1           68  19 6.3      0.277366040
## SRR2015751           1           52  23 7.4      0.197652460
## SRR2015752           1           46  30 8.2      0.008366320
## SRR2015753           1           55  26 7.6      0.086838510
## SRR2015754           1           57  18 7.8      0.005911213
## SRR2015755           1           66  32 8.4      0.087293840
## SRR2015756           1           64  19 8.7      0.032967526
## SRR2015757           1           74   3 7.2      0.208997730
## SRR2015758           1           83   2 7.1      0.329267050
## SRR2015759           1           80   2 8.5      0.115616400
## SRR2015760           1           83   2 7.3      0.146002870
## SRR2015761           1           80   2 8.4      0.239067510
## SRR2015762           1           84   2 6.7      0.252657380
## SRR2015763           1           88   2 8.1      0.199928920
## SRR2015764           1           81   2 5.8      0.232121900
## SRR2015765           1           77   4 6.1      0.266157450
## SRR2015766           1           64   4 8.1      0.171510500
## SRR2015767           1           85   3 6.1      0.026376536
## SRR2015768           1           94   9 6.4      0.470961120
## SRR2015769           1           80  27 6.5      0.183678460
## SRR2015770           1           85  16 6.9      0.359981630
## SRR2015771           1           75   7 8.0      0.173112700
## SRR2015772           1           74  15 7.1      0.303035400
## SRR2015773           1           89  31 6.1      0.289158430
## SRR2015774           1           66  11 7.7      0.247226250
## SRR2015775           1           65   8 6.8      0.343444260
## SRR2015776           1           85  19 6.1      0.075658350
## SRR2015777           1           64   4 7.8      0.246304970
## SRR2015778           1           64   1 6.1      0.183992010
## SRR2015779           1           75  30 6.9      0.314297080
## SRR2015780           1           68  18 7.3      0.285236030
## SRR2015781           1           95  16 7.0      0.173917060
## SRR2015782           1           74  23 7.5      0.201486830
## SRR2015783           1           68  23 7.9      0.130935480
## SRR2015784           1           79  12 6.9      0.350070400
## SRR2015785           1           70  25 6.7      0.325087160
##            Excitatory_proportion Inhibitory_proportion Microglia_proportion
## SRR2015713          0.3950412600          0.0214479040          0.032719570
## SRR2015714          0.0105791320          0.0016683202          0.009936404
## SRR2015715          0.4983624000          0.0338336830          0.034877320
## SRR2015716          0.4572664500          0.0218681670          0.047339170
## SRR2015717          0.4233714000          0.0285239820          0.016753344
## SRR2015718          0.3619220300          0.0173113680          0.042988700
## SRR2015719          0.0002029018          0.0001992489          0.004930648
## SRR2015720          0.3705339400          0.0349377060          0.023475217
## SRR2015721          0.2384062600          0.0284928770          0.034085333
## SRR2015722          0.0943603500          0.0041718970          0.005621809
## SRR2015723          0.3740270000          0.0267784150          0.057285080
## SRR2015724          0.4233678300          0.0222742650          0.039071325
## SRR2015725          0.7478977000          0.0555598920          0.008909402
## SRR2015726          0.3724319000          0.0407935940          0.022795074
## SRR2015727          0.2335179400          0.0101048425          0.015968949
## SRR2015728          0.0687043740          0.3578559500          0.032157525
## SRR2015729          0.2031654600          0.0106369900          0.048579026
## SRR2015730          0.8437762000          0.0277052880          0.010424337
## SRR2015731          0.2224616400          0.0124794170          0.011109368
## SRR2015732          0.6263220000          0.0190644900          0.020182211
## SRR2015733          0.2314415400          0.0285673680          0.021781282
## SRR2015734          0.5185401400          0.0225306020          0.020285452
## SRR2015735          0.4164575600          0.0244964300          0.016492998
## SRR2015736          0.3547639000          0.0231488050          0.028515352
## SRR2015737          0.1383720600          0.0133647580          0.100518880
## SRR2015738          0.4673352200          0.0269302200          0.012242259
## SRR2015739          0.5877753000          0.0382446160          0.009284164
## SRR2015740          0.8236916700          0.0415883030          0.010140169
## SRR2015741          0.4921760600          0.0386445860          0.013094717
## SRR2015742          0.3162230000          0.0214704420          0.018812528
## SRR2015743          0.6860475500          0.0541235770          0.026304385
## SRR2015744          0.2901664000          0.0178152300          0.036035348
## SRR2015745          0.1605238200          0.0164642600          0.036441382
## SRR2015747          0.4895161000          0.0459123850          0.024905257
## SRR2015748          0.0529401940          0.0164213200          0.028230590
## SRR2015749          0.5662024600          0.0551115300          0.014810667
## SRR2015750          0.2380494500          0.0176432360          0.022530837
## SRR2015751          0.3537919200          0.0294949470          0.014853206
## SRR2015752          0.8497667300          0.0511921800          0.009488656
## SRR2015753          0.5085383700          0.0373799300          0.025945755
## SRR2015754          0.4092319000          0.0176657160          0.013677426
## SRR2015755          0.4365720700          0.0295782180          0.015748710
## SRR2015756          0.5035308000          0.0606974660          0.020453080
## SRR2015757          0.5267436000          0.0292147270          0.017173665
## SRR2015758          0.2136725800          0.0300858190          0.037622996
## SRR2015759          0.6038947700          0.0321264080          0.021231314
## SRR2015760          0.4207707300          0.0158284160          0.020312782
## SRR2015761          0.2383388700          0.0143245100          0.027999280
## SRR2015762          0.1170267240          0.0086719160          0.019031802
## SRR2015763          0.5561599000          0.0258920510          0.021742163
## SRR2015764          0.3086498000          0.0162843480          0.059743058
## SRR2015765          0.1930185100          0.0224425000          0.027352706
## SRR2015766          0.2645124000          0.0148259485          0.022638962
## SRR2015767          0.0004231346          0.0003661439          0.030453691
## SRR2015768          0.0617366900          0.0063833860          0.069979920
## SRR2015769          0.1800757600          0.0274648310          0.060583550
## SRR2015770          0.1254862700          0.0125799960          0.070164464
## SRR2015771          0.6187107600          0.0159419900          0.030258244
## SRR2015772          0.1712337300          0.0125954560          0.051837430
## SRR2015773          0.1879545500          0.0143161690          0.048937693
## SRR2015774          0.2796594500          0.0108342110          0.039777120
## SRR2015775          0.2862134300          0.0150001790          0.067998890
## SRR2015776          0.0013608827          0.0009181094          0.022617275
## SRR2015777          0.3595136400          0.0190840550          0.021304049
## SRR2015778          0.2235179500          0.0266799930          0.015409340
## SRR2015779          0.0513598200          0.0051720035          0.017739294
## SRR2015780          0.0701377300          0.0073131113          0.026312000
## SRR2015781          0.0618039260          0.0065083974          0.028345788
## SRR2015782          0.2343975300          0.0158466120          0.030232424
## SRR2015783          0.5377744400          0.0573031600          0.024340523
## SRR2015784          0.0966068400          0.0087238210          0.056345780
## SRR2015785          0.0336413200          0.0052393787          0.058059160
##            Oligo_proportion OPC_proportion Vascular_proportion Disease_GroupPD
## SRR2015713       0.12396010    0.030836953         0.078365800               0
## SRR2015714       0.81347007    0.005247565         0.014654455               0
## SRR2015715       0.10298584    0.019555487         0.045975108               0
## SRR2015716       0.22854781    0.011952660         0.099087500               0
## SRR2015717       0.40411380    0.015707118         0.028813662               0
## SRR2015718       0.17815189    0.020315295         0.077483440               0
## SRR2015719       0.98590260    0.001559540         0.003123550               0
## SRR2015720       0.24052013    0.019447638         0.046976414               0
## SRR2015721       0.28737018    0.030288657         0.087802150               0
## SRR2015722       0.85846920    0.005704316         0.006072115               0
## SRR2015723       0.19716983    0.015721120         0.142213360               0
## SRR2015724       0.17168856    0.017928070         0.056015710               0
## SRR2015725       0.03296958    0.023079902         0.021029802               0
## SRR2015726       0.19942862    0.025706947         0.269332530               0
## SRR2015727       0.52511716    0.017241554         0.031771276               0
## SRR2015728       0.28296176    0.132096050         0.029858842               0
## SRR2015729       0.14575593    0.023207160         0.265087750               0
## SRR2015730       0.05542423    0.014982286         0.030630918               0
## SRR2015731       0.50628670    0.019544123         0.036340510               0
## SRR2015732       0.07786898    0.022510076         0.200380880               0
## SRR2015733       0.33241966    0.035928500         0.106129150               0
## SRR2015734       0.10112041    0.025302917         0.087967664               0
## SRR2015735       0.43528620    0.017093700         0.037463750               0
## SRR2015736       0.25288627    0.038363915         0.070055190               0
## SRR2015737       0.25132730    0.050698508         0.139950650               0
## SRR2015738       0.40801194    0.013039357         0.042271107               0
## SRR2015739       0.21429110    0.019335583         0.059155222               0
## SRR2015740       0.04325353    0.019260370         0.053792294               0
## SRR2015741       0.39316484    0.016357890         0.029952513               0
## SRR2015742       0.41500340    0.019278806         0.063628470               0
## SRR2015743       0.07539407    0.017537607         0.040292155               0
## SRR2015744       0.46196890    0.014834735         0.061843320               0
## SRR2015745       0.31758177    0.029336058         0.174807430               0
## SRR2015747       0.25519463    0.018629653         0.068816540               0
## SRR2015748       0.48558867    0.015830247         0.053532016               0
## SRR2015749       0.18753791    0.018960843         0.077887240               0
## SRR2015750       0.07160624    0.032607704         0.340196460               0
## SRR2015751       0.22320305    0.035781340         0.145223140               0
## SRR2015752       0.03400154    0.014249515         0.032935100               0
## SRR2015753       0.10689118    0.029693238         0.204713060               0
## SRR2015754       0.50385493    0.010252212         0.039406580               0
## SRR2015755       0.34900308    0.023126833         0.058677185               0
## SRR2015756       0.27517010    0.028270103         0.078910950               0
## SRR2015757       0.16821395    0.016262362         0.033394072               0
## SRR2015758       0.20460717    0.041767705         0.142976660               0
## SRR2015759       0.16120683    0.017738363         0.048185986               1
## SRR2015760       0.34659103    0.013390496         0.037103593               0
## SRR2015761       0.39820632    0.020537930         0.061525565               1
## SRR2015762       0.54711230    0.013254060         0.042245846               0
## SRR2015763       0.14764732    0.015154713         0.033474956               0
## SRR2015764       0.26734295    0.030219795         0.085638140               1
## SRR2015765       0.36962387    0.023340017         0.098064980               1
## SRR2015766       0.49006394    0.009342668         0.027105605               1
## SRR2015767       0.92414635    0.003342916         0.014891294               0
## SRR2015768       0.14150392    0.027774414         0.221660550               1
## SRR2015769       0.06490280    0.013515749         0.469778800               0
## SRR2015770       0.13575529    0.053383440         0.242648880               1
## SRR2015771       0.08042159    0.014797695         0.066757050               1
## SRR2015772       0.28224513    0.027936058         0.151116830               1
## SRR2015773       0.28070268    0.027079059         0.151851400               1
## SRR2015774       0.32968953    0.016822303         0.075991124               1
## SRR2015775       0.05050916    0.015687004         0.221147060               1
## SRR2015776       0.84765500    0.006430063         0.045360412               1
## SRR2015777       0.23869560    0.022149460         0.092948250               0
## SRR2015778       0.38740265    0.025090247         0.137907800               1
## SRR2015779       0.41206208    0.017405247         0.181964400               0
## SRR2015780       0.44396806    0.020981798         0.146051270               0
## SRR2015781       0.61711430    0.015734850         0.096575655               0
## SRR2015782       0.23840897    0.020226637         0.259401050               1
## SRR2015783       0.15392125    0.022921873         0.072803326               1
## SRR2015784       0.30641827    0.027279107         0.154555720               1
## SRR2015785       0.28371197    0.022451410         0.271809600               1
##            Disease_GroupPDD
## SRR2015713                0
## SRR2015714                0
## SRR2015715                0
## SRR2015716                0
## SRR2015717                0
## SRR2015718                0
## SRR2015719                0
## SRR2015720                0
## SRR2015721                0
## SRR2015722                0
## SRR2015723                0
## SRR2015724                0
## SRR2015725                0
## SRR2015726                0
## SRR2015727                0
## SRR2015728                0
## SRR2015729                0
## SRR2015730                0
## SRR2015731                0
## SRR2015732                0
## SRR2015733                0
## SRR2015734                0
## SRR2015735                0
## SRR2015736                0
## SRR2015737                0
## SRR2015738                0
## SRR2015739                0
## SRR2015740                0
## SRR2015741                0
## SRR2015742                0
## SRR2015743                0
## SRR2015744                0
## SRR2015745                0
## SRR2015747                0
## SRR2015748                0
## SRR2015749                0
## SRR2015750                0
## SRR2015751                0
## SRR2015752                0
## SRR2015753                0
## SRR2015754                0
## SRR2015755                0
## SRR2015756                0
## SRR2015757                1
## SRR2015758                1
## SRR2015759                0
## SRR2015760                1
## SRR2015761                0
## SRR2015762                1
## SRR2015763                1
## SRR2015764                0
## SRR2015765                0
## SRR2015766                0
## SRR2015767                1
## SRR2015768                0
## SRR2015769                1
## SRR2015770                0
## SRR2015771                0
## SRR2015772                0
## SRR2015773                0
## SRR2015774                0
## SRR2015775                0
## SRR2015776                0
## SRR2015777                1
## SRR2015778                0
## SRR2015779                1
## SRR2015780                1
## SRR2015781                1
## SRR2015782                0
## SRR2015783                0
## SRR2015784                0
## SRR2015785                0
## attr(,"assign")
##  [1]  0  1  2  3  4  5  6  7  8  9 10 11 11
## attr(,"contrasts")
## attr(,"contrasts")$Disease_Group
## [1] "contr.treatment"
design_treatment <- design[,c("(Intercept)", "Disease_GroupPD", "Disease_GroupPDD")]
design_batch <- design[,covar_colnames] %>% 
  # standardise covariates
  scale()

assay(vsd_covar_ct) <- limma::removeBatchEffect(assay(vsd_covar_ct), design = design_treatment, covariates = design_batch)

4.2 Heatmap of sample-to-sample distances

  • If we now plot the corrected vst-transformed data we see some separation of disease groups -- more so than when only correcting for AoD, PMI and RIN. But still not entirely separated.
sampleDists <- vsd_covar_ct %>% 
  assay() %>% 
  t() %>% # Transpose transformed count matrix, such that samples now rows.
  dist(method = "euclidean") # Compute distances between rows of the matrix i.e. sample-to-sample distance
  
sampleDistMatrix <- sampleDists %>% 
  as.matrix() # Convert to matrix

rownames(sampleDistMatrix) <- str_c(vsd_covar_ct$Disease_Group, ", ", vsd_covar_ct$Source)
colnames(sampleDistMatrix) <- NULL
pheatmap <- pheatmap(sampleDistMatrix,
                     clustering_distance_rows=sampleDists,
                     clustering_distance_cols=sampleDists,
                     col= viridis(n = 20), 
                     main = "Heatmap: corrected by AoD, PMI, RIN, and cell-type proportions")

4.3 Scree plot

pca_vsd_covar_ct <- prcomp(t(assay(vsd_covar_ct)))

ggarrange(pcascree(pca_vsd_covar, type = "pev") + theme_rhr,
          pcascree(pca_vsd_covar, type = "cev") + theme_rhr, 
          nrow = 2,
          labels = c("a", "b"))
Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

Figure 4.1: Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

4.4 Correlation of PC axes with remaining sample covariates

  • By removing effects of age of death, PMI, RIN and changes in any cell-type proportion, we do see a correlation (only nominally significant) of disease group with PC axes 2, 4 and 6.
PC_corr <- 
  correlatePCs(pcaobj = pca_vsd_covar_ct, 
               coldata = colData(vsd_covar_ct) %>% 
                                as.data.frame() %>% 
                                dplyr::select(Disease_Group, 
                                              Braak_score,mapped_read_count), 
               pcs = 1:72)

# Replace p-values = 0, with low-pvalue
PC_corr_FDR_covar_ct <- PC_corr$pvalue %>% 
  as_tibble() %>% 
  dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
  tidyr::gather(key = covariate, value = p, -PC) %>% 
  dplyr::mutate(p = case_when(p == 0 ~ 1*10^-12,
                              TRUE ~ p),
                FDR = p.adjust(p, method = "fdr"))

PC_corr_FDR_covar_ct %>% 
  dplyr::inner_join(PC_corr$statistic %>% 
               as_tibble() %>% 
               dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
               tidyr::gather(key = covariate, value = test_statistic, -PC)) %>% 
  dplyr::arrange(FDR) %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')
PC_corr_FDR_mat <- PC_corr_FDR_covar_ct %>% 
  dplyr::select(-p) %>% 
  tidyr::spread(key = covariate, value = FDR) %>% 
  dplyr::mutate(PC_number = str_replace(PC, "PC_", "") %>% 
                  as.numeric()) %>% 
  dplyr::arrange(PC_number) %>% 
  dplyr::select(-PC_number, -PC) %>% 
  as.matrix()
  
PC_corr_FDR_mat <- PC_corr_FDR_mat[, match(colnames(PC_corr$statistic), colnames(PC_corr_FDR_mat))]
rownames(PC_corr_FDR_mat) <- PC_corr_FDR_covar_ct %>% .[["PC"]] %>% unique()

heatmap.2(x = -log10(PC_corr_FDR_mat)[1:40,] %>% t(), 
          Colv = NA, 
          Rowv = NA,
          dendrogram = "none",
          col = viridis(n = 50),
          trace = "none",
          density.info = "none", 
          key = TRUE, 
          key.xlab = "-log10(FDR-corrected p-value)", 
          cexRow = 0.65, cexCol = 0.65)
Heatmap of significance of (cor)relations between PCA scores (from vst-transformed data corrected by AoD, RIN, and cell-type proportions) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

Figure 4.2: Heatmap of significance of (cor)relations between PCA scores (from vst-transformed data corrected by AoD, RIN, and cell-type proportions) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

5 Data following correction with PC axes

Given that we have corrected our own data using PC axes, worth seeing how well that would work for this dataset.

5.1 Correcting the data

  • If we filter the uncorrected PC correlates for FDR < 0.05, we see that of those PC axes significantly correlated with covariates, PC axes 1, 2 and 4 appear to be those axes that correlate with most covariates.
  • It may be preferable to correct using PC axes because the number of covariates included in design is lower using PC axes, as opposed to individual covariates (where we are including 9), thus increasing the degrees of freedom in the final model.
  • We can extract these PCs from the pca object returned by prcomp() and use these to correct our vst-transformed data, as opposed to using covariates.
# Extract coordinates of individuals on principle components 1-4
PC_axes <- pca_vsd$x[, str_c("PC", 1:4)] %>% 
  as.data.frame() %>% 
  tibble::rownames_to_column(var = "recount_id")

# Include PC axes in filter_dds colData & design
filter_dds <- DESeqDataSetFromMatrix(countData = assay(filter_dds),
                              colData = colData(filter_dds) %>% 
                                as_tibble() %>% 
                                dplyr::inner_join(PC_axes),
                              design = ~ PC1 + PC2 + PC3 + PC4 + Disease_Group
                              )

vsd_PCs <- vst(filter_dds,
               blind = FALSE)

# Batch correcting and filtering vsd data 
design <- model.matrix(design(filter_dds), data = colData(filter_dds))

design
##            (Intercept)          PC1         PC2         PC3         PC4
## SRR2015713           1   -9.9606932 -33.8188613 -14.2830142 -17.0275765
## SRR2015714           1   84.4257298  21.9375544 -21.8144451 -25.2722675
## SRR2015715           1  -32.2848777 -15.6287023 -26.7508252  -3.9450243
## SRR2015716           1  -23.7546842  -4.6171358 -33.0568136  15.8098543
## SRR2015717           1  -20.1509379  13.9029931 -25.5831468 -19.9194467
## SRR2015718           1   -5.8642078  -9.9645544 -27.8780578  15.4365415
## SRR2015719           1  158.2771857 121.6088989  -6.1790406 -22.1971244
## SRR2015720           1   -8.6015290 -18.6554964  -7.1498246 -28.9671311
## SRR2015721           1   -7.7685229   1.5541157 -32.6489148   2.3263840
## SRR2015722           1   10.7608678  62.5091434 -26.5851516 -18.8312106
## SRR2015723           1   -8.6844364 -17.7287064 -29.6711252  23.9279335
## SRR2015724           1  -15.1204185 -18.0922139 -15.2593758  -4.7457894
## SRR2015725           1  -72.0433240   5.4265567 -30.6307851  -9.8765262
## SRR2015726           1  -26.6502342 -10.7046899 -20.0011549  17.4226125
## SRR2015727           1    1.9696007  27.1640062 -28.4169344  -2.8090203
## SRR2015728           1   -3.8442714  67.2078656 -89.1905796   4.6894774
## SRR2015729           1   14.5664227 -30.5397042 -25.0626767  41.3297196
## SRR2015730           1  -80.2128595  12.0011808 -31.1681203 -13.6390944
## SRR2015731           1    2.9013811  18.3892080 -16.7710893 -21.9367056
## SRR2015732           1  -45.6822410 -20.1905561 -15.1395331 -12.6345069
## SRR2015733           1   18.8442628 -26.9675707  -1.9132165 -41.5554899
## SRR2015734           1  -40.3670535  -7.1972193 -29.7565245   3.0022787
## SRR2015735           1  -29.3263364  27.8916944 -19.2866243 -17.1654995
## SRR2015736           1   -9.8586615 -12.0692380 -21.1384760 -16.7993628
## SRR2015737           1   30.8161942 -33.0136966   8.7045358   4.0979614
## SRR2015738           1  -40.7847140  22.1200329  59.6757498 -43.5808306
## SRR2015739           1  -62.8831035  36.6406052   3.5711003  24.7677573
## SRR2015740           1  -89.3397345  20.3320554   9.5424638  -3.8849141
## SRR2015741           1  -53.8024494  46.1941081  21.3517523  -5.8963566
## SRR2015742           1    0.1097669   0.9627987  44.5191832 -25.0092659
## SRR2015743           1  -65.9578593  20.6995303   5.2575359  33.6943228
## SRR2015744           1    7.8387740   0.3122915  59.1004877 -12.0876556
## SRR2015745           1   22.0334149  -6.0491942  53.5009755  15.1941235
## SRR2015747           1  -49.3288803  22.8596005  16.9312602  14.8438747
## SRR2015748           1   28.0763338  62.0650539  34.0898348  83.2802039
## SRR2015749           1  -58.1818948   6.9918380  55.1019594 -17.0232896
## SRR2015750           1    7.5151893 -48.7724774  27.9538860  21.8070685
## SRR2015751           1  -20.1483354  -7.4952934  35.4490896  -4.3123868
## SRR2015752           1 -102.1743639  32.8436371  10.7668295   6.3097887
## SRR2015753           1  -59.8847922  19.9498756  16.5264429  46.8846171
## SRR2015754           1  -34.5731657  40.3498188  50.5808752 -25.7664806
## SRR2015755           1  -46.2497149  39.1776481  18.8643980  13.4760646
## SRR2015756           1  -61.0381924  37.8207803  10.6616342  26.6957707
## SRR2015757           1  -17.0213209 -39.1590979   1.2717605 -63.5637099
## SRR2015758           1   32.6125719 -46.8338382  -9.3809608 -12.2795481
## SRR2015759           1  -52.6654424   5.4499031  18.0787596  -5.0201505
## SRR2015760           1  -14.0757910   0.8458461 -20.9442822 -25.1513202
## SRR2015761           1   18.9802277  -8.0984889 -18.0958232 -10.1046362
## SRR2015762           1   59.5822193 -21.4101177  -2.9945977 -44.7097526
## SRR2015763           1  -36.2996775  -8.0981456 -27.4389520 -21.9496849
## SRR2015764           1   23.4590596 -40.1350158   0.1388361 -25.0359933
## SRR2015765           1   23.5282192 -12.4176220 -16.8314980   4.5509168
## SRR2015766           1   10.8390727   6.4487980 -24.7871226 -19.1784641
## SRR2015767           1  161.0463169  66.6644246 -11.6232292   6.0181336
## SRR2015768           1   49.1141212 -44.7755134 -13.4621647  26.7977897
## SRR2015769           1   20.7694340 -49.3807156 -20.8767591  54.9757423
## SRR2015770           1   43.2143346 -56.8262078 -13.8287993   9.9180525
## SRR2015771           1  -37.3769015 -18.8664212 -29.0856681  -2.4759278
## SRR2015772           1   37.9363188 -33.9686192 -17.9098461  -1.8006934
## SRR2015773           1   32.0111242 -30.3848660 -12.1016634   0.8716096
## SRR2015774           1   18.6105069 -20.4691676 -17.7970223  -8.2777788
## SRR2015775           1   10.0086735 -48.5707188 -23.6730887  37.3172578
## SRR2015776           1  156.1441777  39.2647139 -11.1646477   1.9144869
## SRR2015777           1   -0.9240225 -15.9204065  41.3615408  -6.3554656
## SRR2015778           1    1.9657094   8.2890862  49.3373048   0.9928173
## SRR2015779           1   77.2467514 -37.8007414  59.6734343 -14.6302100
## SRR2015780           1   42.1574419  -1.0145537  26.0190382  22.4497777
## SRR2015781           1   60.6579931  -9.1927059  51.1147098 -23.7621953
## SRR2015782           1   20.9324647 -31.4755485  41.2646497  25.4841303
## SRR2015783           1  -56.7532607  12.9590261  14.7408416   7.8874669
## SRR2015784           1   52.9311824 -27.7524125  35.6542302  18.9114549
## SRR2015785           1   57.7558609  -4.7784550   6.5264756  66.0924960
##            Disease_GroupPD Disease_GroupPDD
## SRR2015713               0                0
## SRR2015714               0                0
## SRR2015715               0                0
## SRR2015716               0                0
## SRR2015717               0                0
## SRR2015718               0                0
## SRR2015719               0                0
## SRR2015720               0                0
## SRR2015721               0                0
## SRR2015722               0                0
## SRR2015723               0                0
## SRR2015724               0                0
## SRR2015725               0                0
## SRR2015726               0                0
## SRR2015727               0                0
## SRR2015728               0                0
## SRR2015729               0                0
## SRR2015730               0                0
## SRR2015731               0                0
## SRR2015732               0                0
## SRR2015733               0                0
## SRR2015734               0                0
## SRR2015735               0                0
## SRR2015736               0                0
## SRR2015737               0                0
## SRR2015738               0                0
## SRR2015739               0                0
## SRR2015740               0                0
## SRR2015741               0                0
## SRR2015742               0                0
## SRR2015743               0                0
## SRR2015744               0                0
## SRR2015745               0                0
## SRR2015747               0                0
## SRR2015748               0                0
## SRR2015749               0                0
## SRR2015750               0                0
## SRR2015751               0                0
## SRR2015752               0                0
## SRR2015753               0                0
## SRR2015754               0                0
## SRR2015755               0                0
## SRR2015756               0                0
## SRR2015757               0                1
## SRR2015758               0                1
## SRR2015759               1                0
## SRR2015760               0                1
## SRR2015761               1                0
## SRR2015762               0                1
## SRR2015763               0                1
## SRR2015764               1                0
## SRR2015765               1                0
## SRR2015766               1                0
## SRR2015767               0                1
## SRR2015768               1                0
## SRR2015769               0                1
## SRR2015770               1                0
## SRR2015771               1                0
## SRR2015772               1                0
## SRR2015773               1                0
## SRR2015774               1                0
## SRR2015775               1                0
## SRR2015776               1                0
## SRR2015777               0                1
## SRR2015778               1                0
## SRR2015779               0                1
## SRR2015780               0                1
## SRR2015781               0                1
## SRR2015782               1                0
## SRR2015783               1                0
## SRR2015784               1                0
## SRR2015785               1                0
## attr(,"assign")
## [1] 0 1 2 3 4 5 5
## attr(,"contrasts")
## attr(,"contrasts")$Disease_Group
## [1] "contr.treatment"
# Split into treatment and batch effects
design_treatment <- design[,c("(Intercept)", "Disease_GroupPD", "Disease_GroupPDD")]
design_batch <- 
  design[,str_c("PC", 1:4)] %>%
  # standardise covariates
  scale()

# Correction with limma
assay(vsd_PCs) <- limma::removeBatchEffect(assay(vsd_PCs), design = design_treatment, covariates = design_batch)

5.2 Heatmap of sample-to-sample distances

  • Vst-trasnformed data corrected with PC axes looks very uniform across all samples (apart from one sample), making it hard to distinguish between groups.
  • While there is some clustering within disease groups, it is still not entirely clear cut.
sampleDists <- vsd_PCs %>% 
  assay() %>% 
  t() %>% # Transpose transformed count matrix, such that samples now rows.
  dist(method = "euclidean") # Compute distances between rows of the matrix i.e. sample-to-sample distance
  
sampleDistMatrix <- sampleDists %>% 
  as.matrix() # Convert to matrix

rownames(sampleDistMatrix) <- str_c(vsd_PCs$Disease_Group, ", ", vsd_PCs$Source)
colnames(sampleDistMatrix) <- NULL
pheatmap <- pheatmap(sampleDistMatrix,
                     clustering_distance_rows=sampleDists,
                     clustering_distance_cols=sampleDists,
                     col= viridis(n = 20), 
                     main = "Heatmap: corrected by PC axes 1-4")

5.3 Scree plot

pca_vsd_PCs <- prcomp(t(assay(vsd_PCs)))

ggarrange(pcascree(pca_vsd_PCs, type = "pev") + theme_rhr,
          pcascree(pca_vsd_PCs, type = "cev") + theme_rhr, 
          nrow = 2,
          labels = c("a", "b"))
Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

Figure 5.1: Plot of (a) the proportion of variance explained by each PC and (b) the cumulative proportion of variance explained with the addition of each PC.

5.4 Correlation of PC axes with remaining sample covariates

  • Significant correlation of Braak score with PC3 and OPC proportions with PC4.
  • Disease group correlates with PC20, which explains very little variation in the dataset.
PC_corr_PCs <- correlatePCs(pcaobj = pca_vsd_PCs, 
                        coldata = colData(vsd_PCs) %>% 
                          as.data.frame() %>% 
                          dplyr::select(Disease_Group, Age_at_death, PMI, RIN, Braak_score,mapped_read_count, contains("_proportion")), 
                        pcs = 1:72)

PC_corr_FDR_PCs <- PC_corr_PCs$pvalue %>% 
  as_tibble() %>% 
  dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
  tidyr::gather(key = covariate, value = p, -PC) %>% 
  dplyr::mutate(FDR = p.adjust(p, method = "fdr"))

PC_corr_FDR_PCs %>% 
  dplyr::inner_join(PC_corr_PCs$statistic %>% 
               as_tibble() %>% 
               dplyr::mutate(PC = str_c("PC_", c(1:72))) %>% 
               tidyr::gather(key = covariate, value = test_statistic, -PC)) %>% 
  dplyr::arrange(FDR) %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')
PC_corr_FDR_mat <- PC_corr_FDR_PCs %>% 
  dplyr::select(-p) %>% 
  tidyr::spread(key = covariate, value = FDR) %>% 
  dplyr::mutate(PC_number = str_replace(PC, "PC_", "") %>% 
                  as.numeric()) %>% 
  dplyr::arrange(PC_number) %>% 
  dplyr::select(-PC_number, -PC) %>% 
  as.matrix()
  
PC_corr_FDR_mat <- PC_corr_FDR_mat[, match(colnames(PC_corr_PCs$statistic), colnames(PC_corr_FDR_mat))]
rownames(PC_corr_FDR_mat) <- PC_corr_FDR_PCs %>% .[["PC"]] %>% unique()

heatmap.2(x = -log10(PC_corr_FDR_mat)[1:40,] %>% t(), 
          Colv = NA, 
          Rowv = NA,
          dendrogram = "none",
          col = viridis(n = 50),
          trace = "none",
          density.info = "none", 
          key = TRUE, 
          key.xlab = "-log10(FDR-corrected p-value)", 
          cexRow = 0.65, cexCol = 0.65)
Heatmap of significance of (cor)relations between PCA scores (from vst-transformed data corrected by PC axes 1-4) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

Figure 5.2: Heatmap of significance of (cor)relations between PCA scores (from vst-transformed data corrected by PC axes 1-4) and sample covariates. Significance computed using Kruskal-Wallis for categorical variables and the cor.test based on Spearman's correlation for continuous variables. P-values were FDR-corrected.

6 Conclusions

  • No clear separation of disease groups following correction. Clearest separation is correction by AoD, PMI, RIN and cell-type proportions, although that involves removing 10 degrees of freedom from the final model and is also not a clean separation.
  • While correcting for PC axes 1-4 worked well for our own data, in terms of separating disease groups, it is not clear it would work as well here.
list_vsd <- list(pca_vsd, pca_vsd_covar, pca_vsd_covar_ct, pca_vsd_PCs)
plot_list <- vector(mode = "list", length = 4)
titles <- c("Uncorrected", "Corrected by AoD, PMI and RIN", "Corrected by AoD, PMI, RIN and cell-type proportions", "Corrected by PC axes 1-4")

for(i in 1:length(list_vsd)){
  
  plot_list[[i]] <- fviz_pca_ind(list_vsd[[i]],
                                    geom.ind = "point", # show points only (nbut not "text")
                                    col.ind = colData(vsd)[,c("Disease_Group")], # color by groups
                                    addEllipses = TRUE, ellipse.type = "confidence", # Confidence ellipses
                                    palette =  pal_jco("default", alpha = 0.9)(10)[c(3,9,1,2)],
                                    mean.point = FALSE,
                                    legend.title = "Disease group", 
                                    title = titles[i]
  ) +
    coord_cartesian(xlim = c(-120, 120))
  
}

ggarrange(plotlist = plot_list,
          labels = c("a", "b", "c", "d"),
          nrow = 4)
PCA plot of individuals grouped by disease group. Each plot represents data that is (a) uncorrected, (B) corrected by AoD, PMI and RIN, (c) corrected by AoD, PMI, RIN and all cell-type proportions or (d) corrected by PC axes 1-4. Ellipses represent the 95% confidence level. Points and ellipses are coloured by disease group.

Figure 6.1: PCA plot of individuals grouped by disease group. Each plot represents data that is (a) uncorrected, (B) corrected by AoD, PMI and RIN, (c) corrected by AoD, PMI, RIN and all cell-type proportions or (d) corrected by PC axes 1-4. Ellipses represent the 95% confidence level. Points and ellipses are coloured by disease group.

7 Session Info

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] vsn_3.54.0                  viridis_0.5.1              
##  [3] viridisLite_0.3.0           readxl_1.3.1               
##  [5] pcaExplorer_2.12.0          forcats_0.5.1              
##  [7] stringr_1.4.0               dplyr_1.0.2                
##  [9] purrr_0.3.4                 readr_1.4.0                
## [11] tidyr_1.1.1                 tibble_3.0.3               
## [13] tidyverse_1.3.0             pheatmap_1.0.12            
## [15] ggsci_2.9                   ggpubr_0.4.0               
## [17] ggrepel_0.8.2               gplots_3.0.4               
## [19] factoextra_1.0.7            ggplot2_3.3.2              
## [21] DT_0.15                     devtools_2.3.2             
## [23] usethis_1.6.3               DESeq2_1.26.0              
## [25] SummarizedExperiment_1.16.1 DelayedArray_0.12.3        
## [27] BiocParallel_1.20.1         matrixStats_0.56.0         
## [29] Biobase_2.46.0              GenomicRanges_1.38.0       
## [31] GenomeInfoDb_1.22.1         IRanges_2.20.2             
## [33] S4Vectors_0.24.4            BiocGenerics_0.32.0        
## [35] DEGreport_1.22.0           
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.1              SparseM_1.78               
##   [3] AnnotationForge_1.28.0      pkgmaker_0.31.1            
##   [5] bit64_4.0.2                 knitr_1.29                 
##   [7] data.table_1.13.0           rpart_4.1-15               
##   [9] RCurl_1.98-1.2              doParallel_1.0.15          
##  [11] generics_0.0.2              preprocessCore_1.48.0      
##  [13] callr_3.5.1                 cowplot_1.0.0              
##  [15] RSQLite_2.2.0               bit_4.0.4                  
##  [17] xml2_1.3.2                  lubridate_1.7.9            
##  [19] httpuv_1.5.4                assertthat_0.2.1           
##  [21] xfun_0.16                   hms_1.0.0                  
##  [23] evaluate_0.14               promises_1.1.1             
##  [25] progress_1.2.2              caTools_1.18.0             
##  [27] dbplyr_1.4.4                Rgraphviz_2.30.0           
##  [29] igraph_1.2.5                DBI_1.1.1                  
##  [31] geneplotter_1.64.0          tmvnsim_1.0-2              
##  [33] htmlwidgets_1.5.3           reshape_0.8.8              
##  [35] ellipsis_0.3.1              crosstalk_1.1.0.1          
##  [37] backports_1.1.8             bookdown_0.21              
##  [39] annotate_1.64.0             gridBase_0.4-7             
##  [41] biomaRt_2.42.1              vctrs_0.3.2                
##  [43] here_1.0.0                  remotes_2.2.0              
##  [45] Cairo_1.5-12.2              abind_1.4-5                
##  [47] cachem_1.0.3                withr_2.2.0                
##  [49] lasso2_1.2-20               checkmate_2.0.0            
##  [51] prettyunits_1.1.1           mnormt_2.0.2               
##  [53] cluster_2.1.0               crayon_1.4.1               
##  [55] genefilter_1.68.0           labeling_0.4.2             
##  [57] edgeR_3.28.1                pkgconfig_2.0.3            
##  [59] nlme_3.1-141                pkgload_1.1.0              
##  [61] nnet_7.3-12                 rlang_0.4.7                
##  [63] lifecycle_0.2.0             registry_0.5-1             
##  [65] affyio_1.56.0               BiocFileCache_1.10.2       
##  [67] GOstats_2.52.0              modelr_0.1.8               
##  [69] cellranger_1.1.0            rprojroot_2.0.2            
##  [71] rngtools_1.5                graph_1.64.0               
##  [73] Matrix_1.2-17               carData_3.0-4              
##  [75] reprex_1.0.0                base64enc_0.1-3            
##  [77] GlobalOptions_0.1.2         processx_3.4.5             
##  [79] png_0.1-7                   rjson_0.2.20               
##  [81] bitops_1.0-6                shinydashboard_0.7.1       
##  [83] ConsensusClusterPlus_1.50.0 KernSmooth_2.23-15         
##  [85] blob_1.2.1                  shape_1.4.4                
##  [87] shinyAce_0.4.1              jpeg_0.1-8.1               
##  [89] rstatix_0.6.0               ggsignif_0.6.0             
##  [91] scales_1.1.1                GSEABase_1.48.0            
##  [93] memoise_2.0.0               magrittr_2.0.1             
##  [95] plyr_1.8.6                  bibtex_0.4.3               
##  [97] gdata_2.18.0                zlibbioc_1.32.0            
##  [99] threejs_0.3.3               compiler_3.6.1             
## [101] RColorBrewer_1.1-2          d3heatmap_0.6.1.2          
## [103] clue_0.3-57                 affy_1.64.0                
## [105] cli_2.2.0.9000              XVector_0.26.0             
## [107] Category_2.52.1             ps_1.3.4                   
## [109] htmlTable_2.0.1             Formula_1.2-4              
## [111] MASS_7.3-51.4               tidyselect_1.1.0           
## [113] stringi_1.5.3               shinyBS_0.61               
## [115] highr_0.8                   yaml_2.2.1                 
## [117] askpass_1.1                 locfit_1.5-9.4             
## [119] latticeExtra_0.6-29         grid_3.6.1                 
## [121] tools_3.6.1                 rio_0.5.16                 
## [123] circlize_0.4.10             rstudioapi_0.13            
## [125] foreach_1.5.0               foreign_0.8-72             
## [127] logging_0.10-108            gridExtra_2.3              
## [129] farver_2.0.3                BiocManager_1.30.10        
## [131] digest_0.6.27               shiny_1.5.0                
## [133] Rcpp_1.0.5                  car_3.0-9                  
## [135] broom_0.7.0                 later_1.1.0.1              
## [137] httr_1.4.2                  Nozzle.R1_1.1-1            
## [139] ggdendro_0.1.21             AnnotationDbi_1.48.0       
## [141] ComplexHeatmap_2.7.7        psych_2.0.7                
## [143] colorspace_2.0-0            rvest_0.3.6                
## [145] XML_3.99-0.3                fs_1.5.0                   
## [147] topGO_2.38.1                splines_3.6.1              
## [149] RBGL_1.62.1                 sessioninfo_1.1.1          
## [151] xtable_1.8-4                jsonlite_1.7.1             
## [153] testthat_2.3.2              R6_2.5.0                   
## [155] Hmisc_4.4-1                 pillar_1.4.6               
## [157] htmltools_0.5.1.1           mime_0.9                   
## [159] NMF_0.23.0                  glue_1.4.2                 
## [161] fastmap_1.0.1               codetools_0.2-16           
## [163] pkgbuild_1.1.0              lattice_0.20-38            
## [165] curl_4.3                    gtools_3.8.2               
## [167] zip_2.1.0                   GO.db_3.10.0               
## [169] openxlsx_4.2.3              openssl_1.4.2              
## [171] survival_3.2-7              limma_3.42.2               
## [173] rmarkdown_2.5               desc_1.2.0                 
## [175] munsell_0.5.0               GetoptLong_1.0.2           
## [177] GenomeInfoDbData_1.2.2      iterators_1.0.12           
## [179] reshape2_1.4.4              haven_2.3.1                
## [181] gtable_0.3.0