Because the tutorials span different time cycles, software updates and the specificity of data sets, many small partners follow the tutorials of different systems and get different error reports.
Generally speaking, I will use 500 two different normal blood cells as the control of the inferCNV algorithm, and then mix 300 two different normal blood cells into the epithelial cells with the calculated copy number as the control condition. The code is as follows:
dat=cbind(epiMat,TcellsMat,monoMat) # sce is the predicted copy number of other cells after removing two different normal blood cells groupinfo=data.frame(v1=colnames(dat), v2=c( sce$celltype , rep('spike-Tcells',300), rep('ref-Tcells',500), rep('spike-mono',300), rep('ref-mono',500)))
Because the predicted epithelial cells may be clustered. In fact, it is normal for all other cells after removing two different normal blood cells.
So when we run the inforcnv process, we will set the cluster_by_groups=T :
library(infercnv) infercnv_obj = CreateInfercnvObject(raw_counts_matrix=expFile, annotations_file=groupFiles, delim="\t", gene_order_file= geneFile, ref_group_names=c('ref-Tcells', 'ref-mono')) ## This depends on the information in your grouping information # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics infercnv_obj2 = infercnv::run(infercnv_obj, cutoff=0.1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics out_dir= "infercnv_output", # dir is auto-created for storing outputs cluster_by_groups=T , # cluster hclust_method="ward.D2", plot_steps=F)
In our previous tutorials, we didn't actually set up cluster_by_groups=T, because we predict all epithelial cells, and we won't cluster them at this time. So there's no need to go to the cluster_by_groups=T, but recently I found that many other non epithelial cells may also have copy numbers, so I predicted all other cells after removing two different normal blood cells, so I set up cluster_by_groups=T .
In this way, the dendrogram file of inforcnv cannot be read with the previous code:
infercnv.dend <- read.dendrogram(file = "inferCNV_output/infercnv.observations_dendrogram.txt")
Instead, you need to use the ape package's read Tree function, as follows:
myTree <- ape::read.tree(file = "inferCNV_output/infercnv.observations_dendrogram.txt") u
We can see that the dendrogram file of inforcnv read in is actually 9 contents:
> length(myTree) [1] 9 > unlist(lapply(myTree, function(x){length(x$tip.label)})) [1] 23 858 57 19 71 10 14 300 300
However, if we look at the previous cell grouping information, it is as follows:
> as.data.frame(table(gp$V2)) Var1 Freq 1 Bcells 14 2 CD24-epi 71 3 cyc-epi 57 4 endo 10 5 ESR1-epi 1 6 fibro 23 7 lum-epi 858 8 plasma 19 9 ref-mono 500 10 ref-Tcells 500 11 SMC 2 12 spike-mono 300 13 spike-Tcells 300
There should be 13 cell types, of which two 'ref Tcells' and' ref mono 'are separate files and will not be in inforcnv observations_ dendrogram. Txt, and the two cell subsets of ESR1 EPI and SMC were directly filtered because the number of cells was too small. So it is the 9 contents of the dendrogram file of inforcnv read in.
If you really think my tutorial is helpful to your scientific research project and makes you enlightened, or your project uses a lot of my skills, please add a short thank you when publishing your achievements in the future, as shown below:
We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.
If I have traveled around the world in universities and research institutes (including Chinese mainland) in ten years, I will give priority to seeing you if you have such a friendship.