Aim: identification of (i) differentially expressed genes and (ii) differentially spliced genes, and their associated molecular pathways, that distinguish between control, PD, PDD and DLB.
source(here::here("R", "file_paths.R"))
DEGreport
can easily plot the correlation amongst covariates.dds <-
readRDS(
file.path(
path_to_results,
"Salmon_quant/bulk_tissue/PD_tissue_polyA_DESeqDataSet_design_SexRINAoD.Rds"
)
)
DEGreport::degCorCov(colData(dds))
All steps below, describe the workflow for bulk-tissue RNA-sequencing:
fastp
combined with fastQC
.STAR
.RNASeqQC
.Salmon
used to quantify gene-level reads.Scaden
.DESeq2
used for differential expression analyses.Leafcutter
used. Defines intron clusters by shared junctions and calculates differential usage of introns within a cluster across case/controls.recount2
(ctrl = 44, case = 29). Study SRP058181 on https://jhubiostatistics.shinyapps.io/recount/.clusterProfiler
.bcl2fastq
and the following flags.
--mask-short-adapter-reads 0
--use-bases-mask Y*,I8Y7,I8,Y*
--no-lane-splitting
--output-dir
--runfolder-dir
--sample-sheet
/data/RNAseq_PD/tissue_polyA_samples/raw_data/bcl2fastq_demultiplex_nohup_1091.e
.# prefix to file names always NM followed by 4 digits, therefore "NM...._", will remove anything between 'NM' and '_'
nohup Rscript \
/home/rreynolds/packages/RNAseqProcessing/QC/prealignmentQC_fastp_PEadapters.R \
/data/RNAseq_PD/tissue_polyA_samples/raw_data \
/data/RNAseq_PD/tissue_polyA_samples/QC \
PD_tissue_polyA \
NM...._ \
_.* \
&>/home/rreynolds/projects/Aim2_PDsequencing_wd/LBD-seq-bulk-analyses/nohup_logs/PD_tissue_polyA_fastp.log&
--sjdbFileChrStartEnd
flag.# STAR 1st pass
nohup Rscript \
/home/rreynolds/packages/RNAseqProcessing/alignment/STAR_alignment_withReadGroups_multi2pass.R \
/data/RNAseq_PD/tissue_polyA_samples/QC/fastp \
/data/STAR_data/genome_index_hg38_ens_v97/sjdbOverhang_99 \
/data/RNAseq_PD/tissue_polyA_samples/STAR \
--sample_prefix=NM...._ \
--sample_suffix=_S.* \
--read_groups=/home/rreynolds/projects/Aim2_PDsequencing_wd/raw_data/sample_details/Flowcell_info.txt \
&>/home/rreynolds/projects/Aim2_PDsequencing_wd/LBD-seq-bulk-analyses/nohup_logs/PD_tissue_polyA_STAR_1pass.log&
# SJ.out.tab files moved to SJ_out_1pass folder
mv /data/RNAseq_PD/tissue_polyA_samples/STAR/*.SJ.out.tab /data/RNAseq_PD/tissue_polyA_samples/STAR/SJ_out_1pass/
nohup Rscript \
/home/rreynolds/packages/RNAseqProcessing/alignment/STAR_splice_junction_merge.R \
/data/RNAseq_PD/tissue_polyA_samples/STAR/SJ_out_1pass \
-o /data/RNAseq_PD/tissue_polyA_samples/STAR/ \
&>/home/rreynolds/projects/Aim2_PDsequencing_wd/LBD-seq-bulk-analyses/nohup_logs/PD_tissue_polyA_SJ_filtering.log&
# STAR 2nd pass
nohup Rscript \
/home/rreynolds/packages/RNAseqProcessing/alignment/STAR_alignment_withReadGroups_multi2pass.R \
/data/RNAseq_PD/tissue_polyA_samples/QC/fastp \
/data/STAR_data/genome_index_hg38_ens_v97/sjdbOverhang_99 \
/data/RNAseq_PD/tissue_polyA_samples/STAR \
--sample_prefix=NM...._ \
--sample_suffix=_S.* \
--read_groups=/home/rreynolds/projects/Aim2_PDsequencing_wd/raw_data/sample_details/Flowcell_info.txt \
--sj_file=/data/RNAseq_PD/tissue_polyA_samples/STAR/merged_junctions.SJ.out.tab \
&>/home/rreynolds/projects/Aim2_PDsequencing_wd/LBD-seq-bulk-analyses/nohup_logs/PD_tissue_polyA_STAR_2pass.log&
# Post-alignment QC script
nohup Rscript \
/home/rreynolds/packages/RNAseqProcessing/QC/post_alignment_QC_RSeQC.R \
/data/RNAseq_PD/tissue_polyA_samples/STAR \
/data/RNAseq_PD/tissue_polyA_samples/QC/ \
/data/references/ensembl/bed/v97/ensembl_GRCh38_v97.bed \
100 \
--sample_suffix=_Aligned.sortedByCoord.out.bam \
&>/home/rreynolds/projects/Aim2_PDsequencing_wd/LBD-seq-bulk-analyses/nohup_logs/PD_tissue_polyA_postalignment_QC.log&
# MultiQC for full report
multiqc /data/RNAseq_PD/tissue_polyA_samples/ \
-o /data/RNAseq_PD/tissue_polyA_samples/QC/multiqc/ \
--ignore /data/RNAseq_PD/tissue_polyA_samples/raw_data/ \
-n PD_tissue_polyA_full_report
Salmon
used as:
DESeq2
.DESeq2
takes as input un-normalised counts from a variety of sources (e.g. Salmon
, HTSeq
)(see here)RNASeqProcessing
..html
files linked above can be found in the misc_scripts
directory of this project. Many processes were run with logs, which can be found here in the nohug_logs
directory of this project## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] MarkerGenes_0.0.0.9000 EWCE_0.99.2
## [3] rjson_0.2.20 readxl_1.3.1
## [5] forcats_0.5.1 stringr_1.4.0
## [7] dplyr_1.0.2 purrr_0.3.4
## [9] readr_1.4.0 tidyr_1.1.1
## [11] tibble_3.0.3 ggplot2_3.3.2
## [13] tidyverse_1.3.0 ggsci_2.9
## [15] DT_0.15 devtools_2.3.2
## [17] usethis_1.6.3 DEGreport_1.22.0
## [19] DESeq2_1.26.0 SummarizedExperiment_1.16.1
## [21] DelayedArray_0.12.3 BiocParallel_1.20.1
## [23] matrixStats_0.56.0 Biobase_2.46.0
## [25] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
## [27] IRanges_2.20.2 S4Vectors_0.24.4
## [29] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 RSQLite_2.2.0
## [3] AnnotationDbi_1.48.0 htmlwidgets_1.5.3
## [5] grid_3.6.1 munsell_0.5.0
## [7] codetools_0.2-16 withr_2.2.0
## [9] colorspace_2.0-0 highr_0.8
## [11] knitr_1.29 rstudioapi_0.13
## [13] labeling_0.4.2 lasso2_1.2-20
## [15] GenomeInfoDbData_1.2.2 mnormt_2.0.2
## [17] farver_2.0.3 bit64_4.0.2
## [19] rprojroot_2.0.2 vctrs_0.3.2
## [21] generics_0.0.2 xfun_0.16
## [23] BiocFileCache_1.10.2 R6_2.5.0
## [25] doParallel_1.0.15 clue_0.3-57
## [27] locfit_1.5-9.4 bitops_1.0-6
## [29] cachem_1.0.3 reshape_0.8.8
## [31] assertthat_0.2.1 scales_1.1.1
## [33] nnet_7.3-12 gtable_0.3.0
## [35] Cairo_1.5-12.2 processx_3.4.5
## [37] rlang_0.4.7 genefilter_1.68.0
## [39] GlobalOptions_0.1.2 splines_3.6.1
## [41] broom_0.7.0 checkmate_2.0.0
## [43] yaml_2.2.1 reshape2_1.4.4
## [45] modelr_0.1.8 crosstalk_1.1.0.1
## [47] backports_1.1.8 Hmisc_4.4-1
## [49] tools_3.6.1 bookdown_0.21
## [51] psych_2.0.7 logging_0.10-108
## [53] ellipsis_0.3.1 RColorBrewer_1.1-2
## [55] ggdendro_0.1.21 sessioninfo_1.1.1
## [57] Rcpp_1.0.5 plyr_1.8.6
## [59] base64enc_0.1-3 progress_1.2.2
## [61] zlibbioc_1.32.0 RCurl_1.98-1.2
## [63] ps_1.3.4 prettyunits_1.1.1
## [65] rpart_4.1-15 openssl_1.4.2
## [67] GetoptLong_1.0.2 cowplot_1.0.0
## [69] haven_2.3.1 ggrepel_0.8.2
## [71] cluster_2.1.0 fs_1.5.0
## [73] here_1.0.0 magrittr_2.0.1
## [75] data.table_1.13.0 circlize_0.4.10
## [77] reprex_1.0.0 tmvnsim_1.0-2
## [79] pkgload_1.1.0 hms_1.0.0
## [81] mime_0.9 evaluate_0.14
## [83] xtable_1.8-4 XML_3.99-0.3
## [85] jpeg_0.1-8.1 gridExtra_2.3
## [87] shape_1.4.4 testthat_2.3.2
## [89] compiler_3.6.1 biomaRt_2.42.1
## [91] crayon_1.4.1 htmltools_0.5.1.1
## [93] Formula_1.2-4 geneplotter_1.64.0
## [95] lubridate_1.7.9 DBI_1.1.1
## [97] dbplyr_1.4.4 ComplexHeatmap_2.7.7
## [99] MASS_7.3-51.4 rappdirs_0.3.1
## [101] Matrix_1.2-17 cli_2.2.0.9000
## [103] pkgconfig_2.0.3 foreign_0.8-72
## [105] xml2_1.3.2 foreach_1.5.0
## [107] annotate_1.64.0 XVector_0.26.0
## [109] rvest_0.3.6 callr_3.5.1
## [111] digest_0.6.27 ConsensusClusterPlus_1.50.0
## [113] rmarkdown_2.5 cellranger_1.1.0
## [115] HGNChelper_0.8.1 htmlTable_2.0.1
## [117] edgeR_3.28.1 curl_4.3
## [119] lifecycle_0.2.0 nlme_3.1-141
## [121] jsonlite_1.7.1 desc_1.2.0
## [123] askpass_1.1 limma_3.42.2
## [125] pillar_1.4.6 lattice_0.20-38
## [127] Nozzle.R1_1.1-1 fastmap_1.0.1
## [129] httr_1.4.2 pkgbuild_1.1.0
## [131] survival_3.2-7 glue_1.4.2
## [133] remotes_2.2.0 png_0.1-7
## [135] iterators_1.0.12 bit_4.0.4
## [137] stringi_1.5.3 blob_1.2.1
## [139] latticeExtra_0.6-29 memoise_2.0.0