Identify the group of genes of interest on the UMAP map (single cell data)

Reference link: https://www.jianshu.com/p/37d2e8d68c91
https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
https://satijalab.org/seurat/articles/visualization_vignette.html
The requirement we need to solve in this code is to process the expression matrix (. csv file) we downloaded from the GEO database with the package seurat. The desired goal is to draw a UMAP map and mark the genes we are interested in.
(1) At present, my understanding of the package of seurat is almost zero. It's like learning from scratch.
(2) On the other hand, the processing method of the processing matrix we use is relatively vague.
These are the two problems I have encountered. I believe I have the ability to avert danger every time. I believe I can go on to overcome the difficulties. I hope my analysis results are reliable enough to provide guidance for their next experiments.

In fact, we don't need to use the encapsulated package of seurat, although each step of processing is fool like, which is very cool to use. But I need to put a question mark on the credibility of the results.
Can we simply search and draw UMAP diagram directly using matrix?
Last night, we used umap to directly process the original count matrix. As a result, the calculation was not completed all night (this method must be wrong). Later, check pipeline on official website , it is found that the hypervariable genes are used to find before calculating the principal components. Therefore, the general idea now should be to go through it again according to the process of the official website.

1. First of all, the first step, will be our Objects that convert csv data to seurat.

Reference link: https://www.jianshu.com/p/41d7fdae0484 #There is something wrong with the author's code. Don't use her
https://www.jianshu.com/p/37d2e8d68c91

data<-read.csv("process.csv",header=T)
dim(data)
#36920   454
#First, convert the data into a secrut object. Our matrix is the count matrix
#Convert data to sparse matrix
dataan <- as(as.matrix(data),"dgCMatrix")

Error display:

Error in as(as.matrix(data), "dgCMatrix") :
There is no method or default function that can be used to cast "matrix" to "dgCMatrix"

But yesterday seemed to be a success. Why?
The main reason found is the lack of dependent packages.

library(dplyr)
library(Seurat)
library(patchwork)

After loading, continue to run this step without error.
But the emergence of warning:

Warning message:
In storage.mode(from) <- "double" : NAs introduced by coercion
#The value of NA is forced to be introduced.

Let's see if the value of NA is generated.
The main reason is that one column of my matrix is a character, and when I read the data, I did not set this column (column X in the figure below, that is, gene name) as the row name.

I'll try reading the matrix again.

library(dplyr)
library(Seurat)
library(patchwork)
data<-read.csv("process.csv",header=T)
row.names(data)<-data[,1]
data<-data[,-1]
dim(data)
#36920   453
#First, convert the data into a secrut object. Our matrix is the count matrix
#Convert data to sparse matrix
data_1<- as.matrix(data)
data_2 <- as(as.matrix(data),"dgCMatrix") #The main reason is that the package was not loaded.

After reading the matrix again, a new error occurs.

Error in asMethod(object) :
'Realloc' could not re-allocate memory (132335648 bytes)

Search on the search engine with the keyword "Realloc 'could not re allocate memory".
I just can't reallocate memory (every time I encounter this type of problem, I'll be stupid immediately).
After checking, it is necessary to clear the running memory of R. maybe I have processed many large matrices before and after, so fill up the memory.
Reference link: https://www.zhihu.com/question/390053502
Solution:

rm(list=ls())
gc()
           used  (Mb) gc trigger   (Mb)  max used
Ncells  3632349 194.0    6540962  349.4   6540962
Vcells 40283137 307.4  139136965 1061.6 173921129
         (Mb)
Ncells  349.4
Vcells 1327.0

rstudioapi::restartSession() #Restart R

Restarting R session...

Finally.
Let's look at the structure of this sparse matrix.

This matrix should be similar to the expression matrix I am familiar with.
Then, convert the data to a seurat object.

data_3<- CreateSeuratObject(counts = data_2, project = "tang", min.cells = 3, min.features = 200)

During the conversion, new problems arise.

Error in h(simpleError(msg, call)) :
An error occurred while evaluating the 'x' parameter while selecting a method for the 'colSums' function: Cholmod error' out of memory 'at file... / Core/cholmod_memory.c, line 146

It's strange to say out of memory.

The solution to this problem is very simple. It's because I've taken up all the space in front of me. It's strange to be dissatisfied with several super large matrices. At this time, the simplest way is to restart Rstudio.

After this problem is solved, our seurat object has been created. Then, the next thing we have to face is how to deal with this object.

2. Process the seurat object.

(I forgot to record. The next steps are encapsulated in the seurat package. You only need to run it like a fool.)

data_3<-FindVariableFeatures(data_3,selection.method = "vst",nfeatures = 5633) #Search for hypervariable genes
all.genes<-rownames(data_3)
data_3<-ScaleData(data_3,features = all.genes) #normalization
data_3<-RunPCA(data_3,features = VariableFeatures(object = data_3)) #PCA
data_3<-FindNeighbors(data_3,dims = 1:40) #clustering
data_3<-FindClusters(data_3,resolution = 0.5)
data_3<-RunUMAP(data_3,dims = 1:40) #UMAP

3. Individually label the cell populations of interest

DimPlot(data_3,reduction = "umap")
#cell.highlight
#cols.highlight

g1_treat <- WhichCells(data_3, idents = c("1","2","3"))
DimPlot(data_3,label=T,cells.highlight= list(g1_treat), cols.highlight = c("pink"),cols = "grey")

g2_treat <- WhichCells(data_3, idents = c("5"))
DimPlot(data_3,label=T,cells.highlight= list(g2_treat), cols.highlight = c("pink"),cols = "grey")

g3_treat <- WhichCells(data_3, idents = c("5","0"))
DimPlot(data_3,label=T,cells.highlight= list(g3_treat), cols.highlight = c("pink"),cols = "grey")

g4_treat <- WhichCells(data_3, idents = c("0"))
DimPlot(data_3,label=T,cells.highlight= list(g4_treat), cols.highlight = c("pink"),cols = "grey")

The resulting figure is shown below.


4. Label cells expressing genes of interest to us

FeaturePlot(data_3,features = c("Gpr65"))
FeaturePlot(data_3,features = c("Vip"))
FeaturePlot(data_3,features = c("Glp1r"))
FeaturePlot(data_3,features = c("Oxtr"))

FeaturePlot(data_3,features = c("Gpr65","Vip","Glp1r","Oxtr"))

Of course, there are other details that can be further explored. So far, we have basically completed our needs.

However, for our project, it should be understood that the author draws the tsne diagram.

In the tsne diagram, the author divides the cells into 27 categories, while we only have 6 categories. I don't know whether there will be any difference.
If you just want to see it, it doesn't matter.

data_3<-RunTSNE(data_3,dims = 1:40)
DimPlot(data_3,reduction = "tsne")


The article also said that some cells were removed due to cell damage during operation. We marked them in the figure.
(pictures can't be uploaded here)
Roughly speaking, these damaged cells are not in the class of genes we are interested in. So we can remove it. Or these damaged cells will not affect our experiment.

Added by PhantomCube on Mon, 17 Jan 2022 07:59:35 +0200