Aim: check tissue and cell-type specificity of the colocalisations with PP H4 >= 0.75 when using p12 = 5e-6 i.e. MMRN1 and SNCA-AS1.

All external datasets used available at: https://github.com/RHReynolds/MarkerGenes

1 Definition of specificity

  • The proportion of a gene's expression that can be attributed to one cell type or tissue within a dataset of cell types/tissues. Values range from 0 to 1, with 0 indicating a gene is not expressed at all in a cell type/tissue and 1 indicating a gene is only expressed in that cell type. If the expression of a gene is shared between two or more cell types, it will get a lower specificity measure.
  • Specificity is calculated per dataset.

2 Tissue specificity

  • Genes to investigate:
genes <- c("MMRN1", "SNCA-AS1")

genes
## [1] "MMRN1"    "SNCA-AS1"
  • Tissue specificity derived from GTEx v8.
  • All tissues, apart from brain, were collapsed to an organ level.
  • Also, tissues were excluded on the following criteria:
    • Less than 100 samples used.
    • Gene expression outlier e.g. testis
    • Non-natural tissue e.g. EBV-transformed lymphocytes and cultured fibroblasts.
    • Brain cortex and Brain cerebellum (using instead the more specific anterior cingulate cortex, frontal cortex and cerebellar hemisphere) to reduce redundancy across brain regions.
# Load specificity 
gtex <- readRDS(str_c(path_to_markergenes_pkg, "specificity_df/GTEx_v8.Rds"))

# Plot specificity
gtex %>% 
  dplyr::filter(Description %in% genes) %>% 
  dplyr::mutate(brain = case_when(str_detect(Organ, "Brain") ~ TRUE,
                                  TRUE ~ FALSE)) %>% 
  ggplot(aes(x = MarkerGenes::reorder_within(x = Organ,
                                             by = specificity,
                                             within = Description,
                                             fun = median,
                                             desc = TRUE),
             y = specificity,
             fill = brain)
  ) +
  geom_col() +
  MarkerGenes::scale_x_reordered() +
  facet_wrap(vars(Description), scales = "free_x", nrow = 3) +
  labs(x = "Tissue", y = "Specificity", title = "") +
  # scale_y_continuous(limits = c(0,1)) +
  scale_fill_manual(values = c("#888888", "#00BFC4")) +
  theme_rhr
**Figure**: Specificity of MMRN1 and SNCA-AS1 in GTEx v8 tissues.

Figure: Specificity of MMRN1 and SNCA-AS1 in GTEx v8 tissues.

  • MMRN1 is most specific to the thyroid, followed by adipose tissue and lung.
    • Not much appears to be known about MMRN1 function.
    • It has primarily been described as a component of secretory granules found in megakaryocytes, endothelial cells (ECs). In addition, some suggestion that it may mediate cellular adhesion via integrin receptors (https://pubmed.ncbi.nlm.nih.gov/22566882/).
    • N.B. Megakaryocytes derived from the same common myeloid progenitor as microglia (https://www.nature.com/articles/nature09615).
  • SNCA-AS1 appears to be quite specific to brain tissues, in general. Notably, there appear to be a number of tissues in which it is not expressed e.g. blood vessels, adrenal gland, heart, liver, etc.

3 Cell-type specificity

  • Cell-type specificity derived from a number of external datasets. All are based on single-nucleus RNA-sequencing. Notably, the AIBS dataset useds SMART-seq library construction (i.e. full-length sequencing of transcripts, as opposed to just 3'-end).
# Load specificity matrices
load(str_c(path_to_markergenes_pkg, "specificity_matrices/AIBS2018_MTG.rda"))
load(str_c(path_to_markergenes_pkg, "specificity_matrices/Habib2017_DroNc_Human.rda"))
load(str_c(path_to_markergenes_pkg, "specificity_matrices/Agarwal2020_CRTX.rda"))
load(str_c(path_to_markergenes_pkg, "specificity_matrices/Agarwal2020_SNIG.rda"))
load(str_c(path_to_markergenes_pkg, "specificity_matrices/Lake2018_FrontalCortexOnly.rda"))

specificity <- 
  MarkerGenes::query_gene_ctd(genes = genes,
                              ctd_AIBS2018, ctd_DRONC_human, ctd_Agarwal2020_CRTX, ctd_Agarwal2020_SNIG, ctd_BlueLake2018_FrontalCortexOnly,
                              celltypeLevel = 1, 
                              median_included = F,
                              genelistSpecies = "human", 
                              ctdSpecies = "human")

specificity %>% 
  dplyr::distinct(Gene, Study) %>% 
  dplyr::arrange(Gene)
  • Worth noting that SNCA-AS1 is only found in two human brain datasets:
    1. ctd_AIBS2018: dataset includes single-nucleus transcriptomes from 15,928 nuclei derived from both frozen and neurosurgical human brain specimens, to survey cell type diversity in the human middle temporal gyrus (MTG). Nuclei from 8 human tissue donors ranging in age from 24-66 years were analyzed, revealing 75 transcriptionally distinct cell types: 45 inhibitory neuron types, 24 excitatory neuron types, and 6 non-neuronal types.
    2. ctd_Agarwal2020_SNIG: dataset includes single-nucleus transcriptomes from 5943 nuclei derived from the substantia nigra of five human postmortem brains. Sequenced using the 10x Genomics Chromium platform.
  • Thus, will plot MMRN1 and SNCA-AS1 specificity from these datasets.
# Plot
specificity %>% 
  dplyr::filter(Study %in% c("ctd_Agarwal2020_SNIG", "ctd_AIBS2018")) %>%
  ggplot(aes(x = MarkerGenes::reorder_within(x = CellType, 
                                             by = Specificity, 
                                             within = Gene, 
                                             fun = median, 
                                             desc = T), 
             y = Specificity)) + 
  geom_col() +
  MarkerGenes::scale_x_reordered() +
  facet_wrap(vars(Study, Gene), scales = "free_x") + 
  labs(x = "") +
  coord_cartesian(ylim = c(0,1)) +
  theme_rhr
**Figure**: Plot of gene specificity value for MMRN1 and SNCA-AS1 across external RNA-seq datasets. DaNs = dopaminergic neurons; GABA=GABAergic interneurons, ODC=oligodendrocytes, OPC=oligodendrocyte precursor cells.

Figure: Plot of gene specificity value for MMRN1 and SNCA-AS1 across external RNA-seq datasets. DaNs = dopaminergic neurons; GABA=GABAergic interneurons, ODC=oligodendrocytes, OPC=oligodendrocyte precursor cells.

  • MMRN1 is most specific to microglia in both datasets used; SNCA-AS1 appears highly neuron-specific.
  • It is worth noting that the expression profiles of MMRN1 and SNCA-AS1 are very different in that MMRN1 has a much higher mean expression than SNCA-AS1.
  • Also worth noting (as it is difficult to see from the plot) that MMRN1 is actually detected in all cell types except endothelial cells in the AIBS dataset. This is not the case for SNCA-AS1 where no expression is detected in several cell types in both datasets -- hard to say whether this is due to biology (i.e. cell-type-specific expression) or a technical artefact (i.e. gene dropout).
# Plot
specificity %>% 
  dplyr::filter(Study %in% c("ctd_Agarwal2020_SNIG", "ctd_AIBS2018")) %>%
  ggplot(aes(x = MarkerGenes::reorder_within(x = CellType, 
                                             by = Specificity, 
                                             within = Gene, 
                                             fun = median, 
                                             desc = T), 
             y = Mean_Expression)) + 
  geom_col() +
  MarkerGenes::scale_x_reordered() +
  facet_wrap(vars(Study, Gene), scales = "free") + 
  labs(x = "", y = "Mean expression") +
  theme_rhr
**Figure**: Plot of mean expression for MMRN1 and SNCA-AS1 across external RNA-seq datasets. Mean expression has been ordered by specificity, as in the previous plot to permit easier comparison. DaNs = dopaminergic neurons; GABA=GABAergic interneurons, ODC=oligodendrocytes, OPC=oligodendrocyte precursor cells.

Figure: Plot of mean expression for MMRN1 and SNCA-AS1 across external RNA-seq datasets. Mean expression has been ordered by specificity, as in the previous plot to permit easier comparison. DaNs = dopaminergic neurons; GABA=GABAergic interneurons, ODC=oligodendrocytes, OPC=oligodendrocyte precursor cells.

specificity %>% 
  dplyr::filter(Study %in% c("ctd_Agarwal2020_SNIG", "ctd_AIBS2018")) %>% 
  dplyr::arrange(Gene, Study, -Specificity)

4 Session Info

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] MarkerGenes_0.0.0.9000 forcats_0.5.0          stringr_1.4.0         
##  [4] dplyr_1.0.2            purrr_0.3.4            readr_1.3.1           
##  [7] tidyr_1.1.1            tibble_3.0.3           tidyverse_1.3.0       
## [10] ggpubr_0.4.0           ggplot2_3.3.2          devtools_2.3.2        
## [13] usethis_1.6.3          data.table_1.13.0     
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1     ggsignif_0.6.0       ellipsis_0.3.1      
##  [4] rio_0.5.16           rprojroot_2.0.2      ggdendro_0.1.21     
##  [7] fs_1.5.0             rstudioapi_0.11      farver_2.0.3        
## [10] remotes_2.2.0        bit64_4.0.2          AnnotationDbi_1.48.0
## [13] lubridate_1.7.9      xml2_1.3.2           knitr_1.29          
## [16] pkgload_1.1.0        jsonlite_1.7.1       broom_0.7.0         
## [19] dbplyr_1.4.4         EWCE_0.99.2          compiler_3.6.1      
## [22] httr_1.4.2           backports_1.1.8      assertthat_0.2.1    
## [25] limma_3.42.2         cli_2.2.0.9000       htmltools_0.5.1.1   
## [28] prettyunits_1.1.1    tools_3.6.1          gtable_0.3.0        
## [31] glue_1.4.2           reshape2_1.4.4       rappdirs_0.3.1      
## [34] Rcpp_1.0.5           carData_3.0-4        Biobase_2.46.0      
## [37] cellranger_1.1.0     vctrs_0.3.2          xfun_0.16           
## [40] ps_1.3.4             openxlsx_4.2.3       testthat_2.3.2      
## [43] rvest_0.3.6          lifecycle_0.2.0      rstatix_0.6.0       
## [46] XML_3.99-0.3         MASS_7.3-51.4        scales_1.1.1        
## [49] hms_0.5.3            parallel_3.6.1       yaml_2.2.1          
## [52] curl_4.3             memoise_1.1.0        gridExtra_2.3       
## [55] biomaRt_2.42.1       stringi_1.4.6        RSQLite_2.2.0       
## [58] highr_0.8            S4Vectors_0.24.4     desc_1.2.0          
## [61] BiocGenerics_0.32.0  pkgbuild_1.1.0       zip_2.1.0           
## [64] rlang_0.4.7          pkgconfig_2.0.3      evaluate_0.14       
## [67] labeling_0.3         bit_4.0.4            processx_3.4.5      
## [70] tidyselect_1.1.0     plyr_1.8.6           magrittr_1.5        
## [73] R6_2.4.1             IRanges_2.20.2       generics_0.0.2      
## [76] DBI_1.1.0            pillar_1.4.6         haven_2.3.1         
## [79] foreign_0.8-72       withr_2.2.0          abind_1.4-5         
## [82] modelr_0.1.8         crayon_1.3.4         car_3.0-9           
## [85] BiocFileCache_1.10.2 rmarkdown_2.5        progress_1.2.2      
## [88] grid_3.6.1           readxl_1.3.1         blob_1.2.1          
## [91] callr_3.5.1          reprex_0.3.0         digest_0.6.25       
## [94] HGNChelper_0.8.1     openssl_1.4.2        stats4_3.6.1        
## [97] munsell_0.5.0        sessioninfo_1.1.1    askpass_1.1