This function will reduce GO redundancy first by creating a
semantic similarity matrix (using
GOSemSim::mgoSim
), which is then passed
through rrvgo::reduceSimMatrix()
,
which will reduce a set of GO terms based on their semantic similarity and
scores (in this case, a default score based on set size is assigned.)
go_reduce( pathway_df, orgdb = "org.Hs.eg.db", threshold = 0.7, scores = NULL, measure = "Wang" )
pathway_df | a
|
---|---|
orgdb |
|
threshold |
|
scores | named vector, with scores (weights) assigned to each
term. Higher is better. Can be NULL (default, means no scores. In this case,
a default score based on set size is assigned, thus favoring larger sets).
Note: if you have p-values as scores, consider log-transforming them
( |
measure |
|
a tibble object of pathway results, a "reduced" parent term to which pathways have been assigned. New columns:
parent_id
: the GO ID of the parent term
parent_term
: a description of the GO ID
parent_sim_score
: the similarity score between the child GO term and
its parent term
Semantic similarity is calculated using the "Wang" method, a
graph-based strategy to compute semantic similarity using the topology of
the GO graph structure. GOSemSim::mgoSim
does permit use of other measures (primarily information-content measures),
but "Wang" is used as the default in GOSemSim (and was, thus, used as the
default here). If you wish to use a different measure, please refer to the
GOSemSim documentation.
rrvgo::reduceSimMatrix()
creates a
distance matrix, defined as (1-simMatrix). The terms are then hierarchically
clustered using complete linkage (an agglomerative, or "bottom-up"
clustering approach), and the tree is cut at the desired threshold. The term
with the highest "score" is used to represent each group.
Yu et al. (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products Bioinformatics (Oxford, England), 26:7 976--978, April 2010. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/26/7/976 PMID: 20179076
Yu (2021) Biomedical Knowledge Mining using GOSemSim and clusterProfiler https://yulab-smu.top/biomedical-knowledge-mining-book/index.html
Sayols S (2020). rrvgo: a Bioconductor package to reduce and visualize Gene Ontology terms. https://ssayols.github.io/rrvgo
go_plot
for plotting the output of go_reduce
,
GOSemSim::mgoSim
for calculation of semantic
similarity and
rrvgo::reduceSimMatrix()
for reduction
of similarity matrix
Other GO-related functions:
go_plot()
file_path <- system.file( "testdata", "go_test_data.txt", package = "rutils", mustWork = TRUE ) pathway_df <- readr::read_delim(file_path, delim = "\t" )#> #>go_reduce( pathway_df = pathway_df, orgdb = "org.Hs.eg.db", threshold = 0.9, scores = NULL, measure = "Wang" )#> [1] "Reducing sub-ontology: BP"#>#>#> # A tibble: 10 x 6 #> go_type go_id go_term parent_id parent_sim_score parent_term #> <chr> <chr> <chr> <chr> <dbl> <chr> #> 1 BP GO:001… Regulation of n… GO:00109… 1 regulation of ne… #> 2 BP GO:006… Axon development GO:00109… 0.618 regulation of ne… #> 3 BP GO:005… Negative regula… GO:00109… 0.408 regulation of ne… #> 4 BP GO:000… Ensheathment of… GO:00109… 0.304 regulation of ne… #> 5 BP GO:002… Regulation of c… GO:00226… 1 regulation of ce… #> 6 BP GO:009… Regulation of t… GO:00226… 0.169 regulation of ce… #> 7 BP GO:005… Synapse organiz… GO:00508… 1 synapse organiza… #> 8 BP GO:007… Neuron death GO:00508… 0.227 synapse organiza… #> 9 BP GO:005… Establishment o… GO:00508… 0.156 synapse organiza… #> 10 BP GO:003… Cytoskeleton-de… GO:00508… 0.134 synapse organiza…