# preface

Cell picture data are as follows

# Cell recognition

To classify the cells in the picture, we must first identify the cells in the picture
First read the picture, then convert it into a gray image, then Gaussian blur the picture, then convert the picture into a binary image by using an adaptive threshold, and then define an elliptical check image for the operation of corrosion before expansion to deal with noise.
The renderings are as follows:
Gray image

The threshold is processed as a binary image

Corrosion expansion denoised image

However, there is still some noise, so calculate the connected part, calculate the number of pixels for each connected part, and filter out the parts less than 100 pixels
The renderings are as follows:
The connected parts are displayed, and different colors represent different connected parts

The effect after removing the connected part with the number of pixels less than 100 is shown in the figure

Then find the boundary of each connected part and mark each part boundary with a circle
The renderings are as follows:
Each circle represents a recognized cell

However, some adjacent cells were identified as one. At this time, I tried many methods, such as cleaning the picture again, calculating its foreground and background, and using the watershed algorithm, which did not get a good improvement, so I had to give up temporarily and discard these abnormal data later.

# Cell classification

After identifying cells, they should be classified. Because they do not know the classification basis, the color and size of cells are recognized as the characteristics of classification. The cells were classified by Kmeans self clustering method
The cell color adopts the mean value of the color in the labeled area, and the cell size adopts the area of the labeled area
As shown in the figure, the horizontal axis represents the cell area and the vertical axis represents the cell color. However, it can be seen in the figure that there are three values with abnormally large area, which are exactly where the previously marked cells are wrong, so these three values are removed temporarily:

Remove the three outliers as shown in the figure

Finally, self clustering is carried out by Kmeans algorithm, which is divided into two categories and marked on the original graph
The classification effect is as follows

The marking effect is carried out on the image, in which the red dot is one type and the blue dot is one type

# The complete code is as follows:

```import cv2
import numpy as np
from skimage import measure,color
import collections
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

gray = cv2.cvtColor(image_orign,cv2.COLOR_BGR2GRAY)
blurred=cv2.GaussianBlur(gray,(11,11),0)
kernel=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
erode_img=cv2.erode(thresh,kernel,iterations=1)#corrosion
dilate_img=cv2.dilate(erode_img,kernel,iterations=1)#expand

#watershed algorithm
# sure_bg=cv2.dilate(dilate_img,kernel,iterations=3)#expand
# cv2.imshow('t',sure_bg)
# cv2.waitKey(0)
# dist_transform=cv2.distanceTransform(dilate_img,cv2.DIST_L2,5)
# ret ,sure_fg=cv2.threshold(dist_transform,0.55*dist_transform.max(),255,0)
# cv2.imshow('t',sure_fg)
# cv2.waitKey(0)
# sure_fg = np.uint8(sure_fg)
# unknow=cv2.subtract(sure_bg,sure_fg)
# ret2,makers=cv2.connectedComponents(sure_fg)
# markers=makers+1
# markers[unknow==0]=255
# markers=cv2.watershed(image_orign,markers)
# dilate_img[markers == 255] =0
# cv2.imshow('t',dilate_img)
# cv2.waitKey(0)

labels=measure.label(dilate_img,connectivity=2,background=1)
dst=color.label2rgb(labels)
image=np.zeros(dilate_img.shape,dtype='uint8')
cell_color=[]
for label in collections.Counter(np.unique(labels)).keys():
if(label==1):
continue
if(num_pixel>100):

cell_set=[]
cell_size=[]
cont,hierarchy=cv2.findContours(image.copy(),cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for i in cont:
x,y,w,h=cv2.boundingRect(i)

cell_set.append((int(cx),int(cy)))
roi=np.zeros(gray.shape,dtype='uint8')
roi = cv2.circle(roi, (int(cx),int(cy)), int(radius), 255, cv2.FILLED)
cell_color.append(area.sum()/cv2.countNonZero(area))

value=[]
d=cell_size.copy()
d.sort()
for i in d[len(d)-3:]:
cell_set.pop(cell_size.index(i))
cell_color.pop(cell_size.index(i))
cell_size.remove(i)

for i in range(len(cell_set)):
value.append([cell_size[i],cell_color[i]])
data=dict(zip(cell_set,value))
plt.scatter(cell_size,cell_color)
plt.show()

pre=KMeans(n_clusters=2).fit_predict(value)
c=[(255,0,0),(0,0,255)]
c2=['red','blue']
for i in range(len(list(pre))):
plt.scatter(value[i][0],value[i][1],c=c2[pre[i]])
cv2.circle(image_orign,(cell_set[i][0],cell_set[i][1]),3,c[pre[i]],cv2.FILLED)
cv2.imshow('t',image_orign)
cv2.waitKey(0)
plt.show()
```

Keywords: Python Machine Learning

Added by jamiel on Sat, 08 Jan 2022 08:14:36 +0200