GO term functional network, enrichment network, functional network drawing, R language, Cytoscape

Kiavash Movahedi et al. (2020) Nat Neuroscience

Tools:
simplifyEnrichment (R package)
Cytoscape
AI

preface:
In the GO term functional network, each node represents a Gene ontology, and the size of the point represents the enrichment score (or other scores used for Storytelling). Related functions in the network will form "edges". The manually drawn boundary circles the functions marked as the same cluster.

Why cluster GO term? Because now almost any experimental conditions can enrich hundreds of GO terms, if these functions are checked one by one and finally selected manually, it is inevitable to get biased conclusions, so as to lead the experiment to a more wrong direction.
In order to show more comprehensive differences in biological functions, a beautiful GO term network is very necessary.

Drawing steps:

R language part

library(msigdbr)
library(simplifyEnrichment)
library(dplyr)

m_df = msigdbr(species = "Mus musculus", category = "C5",subcategory = "GO:BP")
GOIDs = m_df$gs_exact_source%>% unique() %>% sample(,size = 100) # generate GOID sets randomly

head(GOIDs)
# [1] "GO:0009256" "GO:0009256" "GO:0009256" "GO:0009256" "GO:0009256" "GO:0006103"

mat = GO_similarity(GOIDs)
df = simplifyGO(mat,method = "kmeans")

net = reshape2::melt(mat)
net = net[which(net$value > 0.6),]
net = net[which(as.character(net$Var1)> as.character(net$Var2)),]

head(net)
# Var1       Var2     value
# 402  GO:0015824 GO:0014050 0.6017792
# 511  GO:1990266 GO:0072676 0.7430274
# 1109 GO:1901224 GO:0070374 0.6283545
# 1430 GO:1900425 GO:0060331 0.7138334
# 1785 GO:1905456 GO:1902037 0.7216737
# 1834 GO:0060442 GO:0048755 0.6681442

df$size = sample(1:100,size = 100,replace = T)
head(df)
#           id                                                           term cluster size
# 1 GO:1901298 regulation of hydrogen peroxide-mediated programmed cell death       6   81
# 2 GO:0015824                                              proline transport       3   59
# 3 GO:0051703                     intraspecies interaction between organisms       5   25
# 4 GO:0038127                                         ERBB signaling pathway       6   71
# 5 GO:0014050                     negative regulation of glutamate secretion       3   75
# 6 GO:0072676                                           lymphocyte migration       3    1

write.table(net,file = "net.txt",col.names = T,row.names = F,quote = F,sep = "\t")
write.table(df,file = "df.txt",col.names = T,row.names = F,quote = F,sep = "\t")


########################
# variables annotation #
########################
# 
# net : variable stored information about edges in the network
# df : variables stored information about nodes in network
# GOIDs : seleted Go ontologys in research 
# 

After getting the data, import Cytoscape to draw the network
The specific operation steps of Cytoscape are not described in this article. Here are some teaching links of Cytoscape.

https://zhuanlan.zhihu.com/p/220527695
http://www.360doc.com/content/19/0409/20/49059453_827533578.shtml

It is not very difficult and complex to draw the functional network, but there are still some settings to pay attention to:

  1. The network should be as full as possible, not too flat or too wide
  2. Remember to size code by DF $size for network nodes
  3. Remember that each group of network nodes represents the GO cluster.

After we finish drawing the Cytoscape network, we can get the following pdf (note that we must export the pdf to the network, otherwise we can't make more detailed modifications with AI)

Then there is the AI process. Let's say a few words here,

  1. After getting the network pdf, adjust the edge color (#7FB5E5) and the color of the dot (#E9471E).
  2. If the edges are too dense, remember to set the transparency of the edges
  3. If you have a bad memory, use Cytoscape to mark the position of GO term cluster
  4. Mark the node s of the same cluster with an ellipse, and then summarize the general functions of the cluster manually
  5. If you want to label genes, remember to use italics

We need to provide health information and statistical chart drawing services,
You can add my VX:LGR581X
Mainly for students, the price is cheap~

Keywords: R Language network

Added by Vanness on Mon, 03 Jan 2022 21:47:14 +0200