This function will reduce GO redundancy first by creating a semantic similarity matrix (using GOSemSim::mgoSim), which is then passed through rrvgo::reduceSimMatrix(), which will reduce a set of GO terms based on their semantic similarity and scores (in this case, a default score based on set size is assigned.)

go_reduce(
  pathway_df,
  orgdb = "org.Hs.eg.db",
  threshold = 0.7,
  scores = NULL,
  measure = "Wang"
)

Arguments

pathway_df

a data.frame or tibble object, with the following columns:

  • go_type: the sub-ontology the GO term relates to. Should be one of c("BP", "CC", "MF").

  • go_id: the gene ontology identifier (e.g. GO:0016209)

orgdb

character() vector, indicating name of the org.* Bioconductor package to be used

threshold

numeric() vector. Similarity threshold (0-1) for rrvgo::reduceSimMatrix(). Default option is 0.7. Some guidance:

  • For large term groupings, use threshold = 0.9

  • For medium term groupings, use threshold = 0.7

  • For small term groupings, use threshold = 0.5

  • For tiny term groupings, use threshold = 0.4

scores

named vector, with scores (weights) assigned to each term. Higher is better. Can be NULL (default, means no scores. In this case, a default score based on set size is assigned, thus favoring larger sets). Note: if you have p-values as scores, consider log-transforming them (-log10(p)).

measure

character() vector, indicating method to be used to calculate semantic similarity measure. Must be one of the methods supported by GOSemSim: c("Resnik", "Lin", "Rel", "Jiang", "Wang"). Default is "Wang".

Value

a tibble object of pathway results, a "reduced" parent term to which pathways have been assigned. New columns:

  • parent_id: the GO ID of the parent term

  • parent_term: a description of the GO ID

  • parent_sim_score: the similarity score between the child GO term and its parent term

Details

Semantic similarity is calculated using the "Wang" method, a graph-based strategy to compute semantic similarity using the topology of the GO graph structure. GOSemSim::mgoSim does permit use of other measures (primarily information-content measures), but "Wang" is used as the default in GOSemSim (and was, thus, used as the default here). If you wish to use a different measure, please refer to the GOSemSim documentation.

rrvgo::reduceSimMatrix() creates a distance matrix, defined as (1-simMatrix). The terms are then hierarchically clustered using complete linkage (an agglomerative, or "bottom-up" clustering approach), and the tree is cut at the desired threshold. The term with the highest "score" is used to represent each group.

References

See also

go_plot for plotting the output of go_reduce, GOSemSim::mgoSim for calculation of semantic similarity and rrvgo::reduceSimMatrix() for reduction of similarity matrix

Other GO-related functions: go_plot()

Examples

file_path <- system.file( "testdata", "go_test_data.txt", package = "rutils", mustWork = TRUE ) pathway_df <- readr::read_delim(file_path, delim = "\t" )
#> #> ── Column specification ──────────────────────────────────────────────────────── #> cols( #> go_type = col_character(), #> go_id = col_character(), #> go_term = col_character() #> )
go_reduce( pathway_df = pathway_df, orgdb = "org.Hs.eg.db", threshold = 0.9, scores = NULL, measure = "Wang" )
#> [1] "Reducing sub-ontology: BP"
#> preparing gene to GO mapping data...
#> No scores provided. Falling back to term's size
#> # A tibble: 10 x 6 #> go_type go_id go_term parent_id parent_sim_score parent_term #> <chr> <chr> <chr> <chr> <dbl> <chr> #> 1 BP GO:001… Regulation of n… GO:00109… 1 regulation of ne… #> 2 BP GO:006… Axon development GO:00109… 0.618 regulation of ne… #> 3 BP GO:005… Negative regula… GO:00109… 0.408 regulation of ne… #> 4 BP GO:000… Ensheathment of… GO:00109… 0.304 regulation of ne… #> 5 BP GO:002… Regulation of c… GO:00226… 1 regulation of ce… #> 6 BP GO:009… Regulation of t… GO:00226… 0.169 regulation of ce… #> 7 BP GO:005… Synapse organiz… GO:00508… 1 synapse organiza… #> 8 BP GO:007… Neuron death GO:00508… 0.227 synapse organiza… #> 9 BP GO:005… Establishment o… GO:00508… 0.156 synapse organiza… #> 10 BP GO:003… Cytoskeleton-de… GO:00508… 0.134 synapse organiza…