Cluster of inforcnv_ by_ The groups parameter affects the reading function of subsequent hierarchical clustering files

Because the tutorials span different time cycles, software updates and the specificity of data sets, many small partners follow the tutorials of different systems and get different error reports.

Generally speaking, I will use 500 two different normal blood cells as the control of the inferCNV algorithm, and then mix 300 two different normal blood cells into the epithelial cells with the calculated copy number as the control condition. The code is as follows:

# sce is the predicted copy number of other cells after removing two different normal blood cells
                     v2=c( sce$celltype ,

Because the predicted epithelial cells may be clustered. In fact, it is normal for all other cells after removing two different normal blood cells.

So when we run the inforcnv process, we will set the cluster_by_groups=T :

infercnv_obj = CreateInfercnvObject(raw_counts_matrix=expFile,
gene_order_file= geneFile,
'ref-mono'))  ## This depends on the information in your grouping information

# cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
infercnv_obj2 = infercnv::run(infercnv_obj,
cutoff=0.1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
out_dir= "infercnv_output",  # dir is auto-created for storing outputs
cluster_by_groups=T ,   # cluster

In our previous tutorials, we didn't actually set up cluster_by_groups=T, because we predict all epithelial cells, and we won't cluster them at this time. So there's no need to go to the cluster_by_groups=T, but recently I found that many other non epithelial cells may also have copy numbers, so I predicted all other cells after removing two different normal blood cells, so I set up cluster_by_groups=T .

In this way, the dendrogram file of inforcnv cannot be read with the previous code:

 infercnv.dend <- read.dendrogram(file = "inferCNV_output/infercnv.observations_dendrogram.txt")

Instead, you need to use the ape package's read Tree function, as follows:

myTree <- ape::read.tree(file = "inferCNV_output/infercnv.observations_dendrogram.txt")

We can see that the dendrogram file of inforcnv read in is actually 9 contents:

> length(myTree)
[1] 9
> unlist(lapply(myTree, function(x){length(x$tip.label)}))
[1]  23 858  57  19  71  10  14 300 300

However, if we look at the previous cell grouping information, it is as follows:

           Var1 Freq
1        Bcells   14
2      CD24-epi   71
3       cyc-epi   57
4          endo   10
5      ESR1-epi    1
6         fibro   23
7       lum-epi  858
8        plasma   19
9      ref-mono  500
10   ref-Tcells  500
11          SMC    2
12   spike-mono  300
13 spike-Tcells  300

There should be 13 cell types, of which two 'ref Tcells' and' ref mono 'are separate files and will not be in inforcnv observations_ dendrogram. Txt, and the two cell subsets of ESR1 EPI and SMC were directly filtered because the number of cells was too small. So it is the 9 contents of the dendrogram file of inforcnv read in.

If you really think my tutorial is helpful to your scientific research project and makes you enlightened, or your project uses a lot of my skills, please add a short thank you when publishing your achievements in the future, as shown below:

We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.

If I have traveled around the world in universities and research institutes (including Chinese mainland) in ten years, I will give priority to seeing you if you have such a friendship.

Added by jcavard on Thu, 03 Mar 2022 08:05:28 +0200