Reading and visualization of dicom,mhd and raw files in medical imaging

Medical imaging

Medical image refers to the technology and processing process of obtaining internal tissue image of human body or part of human body in a non-invasive way for medical treatment or medical research.

At present, the main processing methods include: X-ray imaging (X-ray), computed tomography (CT), magnetic resonance imaging (MRI), ultrasonic imaging (ultrasound), positron emission tomography (PET), electroencephalogram (EEG), magnetoencephalogram (MEG), eye tracking, transcranial magnetic wave stimulation (TMS) and other modern imaging technologies, The process of examining parts of the human body that cannot be examined by non-surgical means.

The rise of medical imaging makes the diagnosis of patients have more intuitive evidence. In addition, it also has a great impact on the relevant research in the medical field.

dicom file related

1.1 what is dicom image

DICOM (Digital Imaging and Communications in Medicine), namely medical digital imaging and communication, is an international standard for medical images and related information. It is the most widely used medical image format at present. The common CT, MRI and cardiovascular imaging mostly use docim format.

1.2 what is in DICOM images

doicm mainly stores two aspects of information:

(1) Information about the patient's phi (protected health information)
In short, it is the patient's relevant information (which is protected). For example, name, gender, age, follow-up medical records, etc

(2) Intuitive image information
Part of the information is a certain layer (slice, which will be discussed later) of the scanned patient image. Doctors can open it through a special dicom reader to see the patient's condition. The other part is the relevant equipment information. For example, if the dicom image produced is a layer of the X-ray image scanned by the X-ray machine, dicom will store the relevant equipment information about the X-ray machine.

More specifically, this information can be divided into the following four categories:

(1)Patient
For example, the patient's name, gender, weight, ID, etc. mainly reflect the patient's personal information.

(2)Study
That is, the information recorded about the patient receiving a certain examination, such as examination number, examination date, patient's age at the time of examination, examination location, etc.

(3)Series
Store simple information about the inspection, such as layer thickness, layer spacing, image orientation, etc.

(4)Image
Store detailed information about inspection results (such as a layer of X-ray image), such as image shooting date, image resolution, pixel spacing, etc.

be careful
dicom file ≠ patient image

On the one hand, in the actual examination process, such as CT examination, X-ray and other radioactive light are used to do multi-layer cross-sectional scanning around the human body together with the detector. In other words, the result we get is the cross-sectional images layer by layer. Only when these cross-sectional images are superimposed on the z-axis is a complete medical image. Each dicom stores one layer of images (but not just images). In other words, a patient has multiple dicom files.

As shown in the figure above, a complete medical image is a three-dimensional image formed by stacking many cross-sectional images on the z-axis. dicom stores only one image and all related information.

On the other hand, dicom files not only store the slices affected by the patient, but also store relevant information (mentioned above).

To sum up, dicom is a file that stores a certain layer of sectional view and related information of medical images taken by patients.

1.3 dicom structure and composition

The internal structure of dicom file is shown in the figure below:

It can be seen that a complete dicom file consists of two parts: file header and data set.

1.3.1 file header

Dicom file header contains relevant information about data sets, mainly including the following parts:

File preamble

Dicom prefix

File information element

1.3.2 data set

Data set is the place where relevant data is stored. The basic unit of data set is data element. The data element consists of four parts:

(1) Tag
That is, the label part is used to identify the specific type of data. It includes two parts: Group number and element number. Here, the code corresponding to Group refers to the category of stored information, that is, indicating that the information is patient,study, series , which of the four categories of image. After determining the Group iu, retrieve the Element to determine the specific information. For example, (00080050) refers to the patient's examination number information.

See Appendix for more specific Tag table.

(2) VR (value representation)
Store the data type of information pointed to by Tag.

(3) VL (data length)
Save the data length of the information pointed to by Tag.

(4) Range of values
Save the specific value of Tag pointing information.

Generally speaking, dicom seems a little troublesome when dealing with it. Therefore, dicom files will be converted to process the converted files. The conversion method used here is to convert dicom files into raw files and mhd files.

raw file related

Raw file, which means "unprocessed file", saves pure pixel information. Often, the images in all dicom files of a patient are extracted and put in a raw file. In other words, a patient corresponds to a raw file, in which the image information of the patient is stored. (it can be understood that different dicom slice images of the patient are stacked together to form a three-dimensional image).

mhd file related

As mentioned above, the dicom file contains some other information besides the slice image. Then, after the file format conversion, the image information is extracted by the raw file, and the non image information is stored in the mhd header file. Simply put, mhd header files store non image information in all dicom files about a patient.

From the above relationship, we can know that raw files and mhd files correspond one-to-one, and their number is less than or equal to (in fact, it must be less than) the number of dicom files.

Corresponding code

2.1 dicom visualization
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import pydicom
from pydicom.data import get_testdata_files
import os
import xlrd
import xlwt
print(__doc__)
path = r'C:\Users\86152.000\Desktop\python\M'
files = os.listdir(path)
for filename in files:
    dataset = pydicom.dcmread(filename)
# Normal mode:
print()
print("Filename.........:", filename)
print("Storage type.....:", dataset.SOPClassUID)
print()
pat_name = dataset.PatientName
display_name = pat_name.family_name + ", " + pat_name.given_name
print("Patient's name...:", display_name)
print("Patient id.......:", dataset.PatientID)
print("Modality.........:", dataset.Modality)
print("Study Date.......:", dataset.StudyDate)
if 'PixelData' in dataset:
    rows = int(dataset.Rows)
    cols = int(dataset.Columns)
    print("Image size.......: {rows:d} x {cols:d}, {size:d} bytes".format(
        rows=rows, cols=cols, size=len(dataset.PixelData)))
    if 'PixelSpacing' in dataset:
        print("Pixel spacing....:", dataset.PixelSpacing)
# use .get() if not sure the item exists, and want a default value if missing
print("Slice location...:", dataset.get('SliceLocation', "(missing)"))
def readexcel():
            workbook=xlrd.open_workbook(r'C:\Users\86152.000\Desktop\python\annotated_coordinates.xlsx')
            print(workbook.sheet_names())
            sheet1=workbook.sheet_by_name('Sheet1')
            nrows=sheet1.nrows
            ncols=sheet1.ncols
            print(nrows,ncols)
            r=2
            c=4
            while True:
                cellA=sheet1.cell(r,c).value
                cellB=sheet1.cell(r,c).value
                print(cellA,cellB)
                rect=patches.Rectangle((20,30),10,20,linewidth=1,edgecolor='red',facecolor='none')
                ax=plt.subplot(1,1,1)
                ax.add_patch(rect)
                plt.imshow(dataset.pixel_array, cmap=plt.cm.bone)
                plt.show()
                r=r+1
                if r==nrows:
                    c=c+1
                    r=2
                    if c==ncols-2:
                       print('complete')
                       break
readexcel()

The output is as follows:


The red box here can be removed and commented in the code. (I wanted to mark dicom data, but the tutor said it should be raw data)

2.2 dicom reads and converts into raw and mhd files
import cv2
import os
import pydicom
import numpy
import SimpleITK

Path and list declarations

A folder in the same directory as the python file stores dicom files. The file path should preferably not contain Chinese

PathDicom = r"/home/Wangling/TuoBaoqiang/python/Test1"

The folder in the same directory as the python file is used to store mhd files and raw files. The file path should preferably not contain Chinese

SaveRawDicom = r"/home/Wangling/TuoBaoqiang/python/Data_Img/test1"

lstFilesDCM= []
lstFilesDCM_0 = []
temp = []
listdicom = []

First level directory under dicom folder - > group one-dimensional list

root_1 = os.listdir(PathDicom)
group = []
for i in range(0,len(root_1)):
group.append(PathDicom + '/' + root_1[i]) #group list

Second level directory under dicom folder - > 000x one-dimensional list

root_2 = []
for j in range(0,len(group)):
root_2. Append (OS. Listdir (Group [J]) # dicom folder under group, root_2 is a two-dimensional list, for example [['0001', '0002', '0003', '0104']]
nameformhd = sum(root_2, []) # will root_ 2 is reduced to a one-dimensional list in case of naming the mhd file at the end of the program

Complete directory of each dicom file - > / group / 000x

for m in range(0,len(group)):
for n in range(0,len(root_2[0]): # suppose that the number of dicom folders in each group is equal, that is, 10. If there is any change, it needs to be modified!
eachdicom = group[m] + '/' + root_2[m][n]
lstFilesDCM_0.append(eachdicom)

print('the dicom filelist - >% s'% lstFilesDCM_0) # print dicom file list

The complete list will be lstfilesdcm_ The dicom file address under 0 is read into listicon

for k in range(0,len(lstFilesDCM_0)):
for dirName, subdirList, fileList in os.walk(lstFilesDCM_0[k]):
for filename in fileList:
if “img” in filename.lower(): # judge whether the file is a dicom file
temp.append(os.path.join(dirName, filename))
listdicom.append(temp) # add to the list
temp = []

print('listicon =% s'% listicon) # print the list of all dicom files loaded

print('the number of listicon is% d '% len (listicon)) # print the number of dicom files to be processed

for filenames in listdicom:
#print('the current processing list is% s'% filenames) # print the dicom list currently being processed
lstFilesDCM = filenames

Step 1: take the last picture as the reference picture and think that all pictures have the same dimension

RefDs = pydicom.read_file(lstFilesDCM[-1],force = True)  # Read the last dicom picture
RefDs_0 = pydicom.read_file(lstFilesDCM[0],force = True)    # Read the first dicom picture, and the third step is standby

Step 2: get the dimension of the 3D picture composed of dicom pictures

ConstPixelDims = (int(RefDs.Rows), int(RefDs.Columns), len(lstFilesDCM))  # ConstPixelDims is a tuple

Step 3: obtain the Spacing in x direction and y direction, and obtain the layer thickness in z direction

RefDs.SliceThickness = abs(RefDs_0.ImagePositionPatient[2]-RefDs.ImagePositionPatient[2])/(len(lstFilesDCM)-1)
 #  Print the value of slicethickness to judge whether the data is correct
print(float(RefDs_0.SliceThickness),RefDs.SliceThickness,RefDs_0.ImagePositionPatient[2],RefDs.ImagePositionPatient[2],len(lstFilesDCM))

print(RefDs_0.ImagePositionPatient)
print(RefDs.ImagePositionPatient)

ConstPixelSpacing = (float(RefDs.PixelSpacing[0]), float(RefDs.PixelSpacing[1]), float(RefDs.SliceThickness))

Step 4: get the origin of the image

Origin = RefDs.ImagePositionPatient

Create a numpy three-dimensional array according to the dimension and set the element type to pixel_array.dtype

ArrayDicom = numpy.zeros(ConstPixelDims, dtype=RefDs.pixel_array.dtype)  # array is a numpy array

Step 5: traverse all dicom files, read image data and store it in numpy array

i = 0
for filenameDCM in lstFilesDCM:
    ds = pydicom.read_file(filenameDCM)
    ArrayDicom[:, :, lstFilesDCM.index(filenameDCM)] = ds.pixel_array
    cv2.imwrite("out_" + str(i) + ".png", ArrayDicom[:, :, lstFilesDCM.index(filenameDCM)])
    i += 1

Step 6: transpose the numpy array, that is, transform the coordinate axis (x,y,z) into (z,y,x). This is the format of dicom storage file, that is, the first dimension is the Z axis, which is convenient for image stacking

ArrayDicom = numpy.transpose(ArrayDicom, (2, 0, 1))

Step 7: convert the current numpy array into mhd and raw files through SimpleITK

sitk_img = SimpleITK.GetImageFromArray(ArrayDicom, isVector=False)
sitk_img.SetSpacing(ConstPixelSpacing)
sitk_img.SetOrigin(Origin)
SimpleITK.WriteImage(sitk_img,os.path.join(SaveRawDicom,nameformhd[listdicom.index(filenames)]+ ".mhd"))

The output is as follows:

2.3 raw file visualization
import os
import SimpleITK as sitk
import matplotlib.pyplot as plt

if name=="main":
#Identify Chinese path name
path=r"C:\Users\86152.000\Desktop\python\SaveRaw\0002.mhd"
pwd = os.getcwd()
os.chdir(os.path.dirname(path))

#Read slice image
image = sitk.ReadImage(os.path.basename(path))
image = sitk.GetArrayFromImage(image)

#Display image
for i in range(1,image.shape[0],1):
    plt.figure()
    plt.imshow(image[i,:,:], cmap='gray')
    plt.pause(1)
    plt.close()
    print(i)

The output is as follows:

2.4 traversing the specified file
import os
import SimpleITK as sitk
import matplotlib.pyplot as plt

def show_files(path, all_files):
#First, traverse all files and folders in the current directory
file_list = os.listdir(path)
#Prepare a loop to judge whether each element is a folder or a file. If it is a file, pass the name into the list. If it is a folder, recursion
for file in file_list:
#Using OS path. The join () method obtains the full path name and stores it in cur_path variable, otherwise you can only traverse one layer of directory at a time
cur_path = os.path.join(path, file)
#Determine whether it is a folder
if os.path.isdir(cur_path):
show_files(cur_path, all_files)
else:
#Determine whether the suffix is satisfied
if suffix in cur_path:
all_files.append(cur_path)
return all_files

Operation on the specified file

import os

def show_files(path, all_files):
#First, traverse all files and folders in the current directory
file_list = os.listdir(path)
#Prepare a loop to judge whether each element is a folder or a file. If it is a file, pass the name into the list. If it is a folder, recursion
for file in file_list:
#Using OS path. The join () method obtains the full path name and stores it in cur_path variable, otherwise you can only traverse one layer of directory at a time
cur_path = os.path.join(path, file)
#Determine whether it is a folder
if os.path.isdir(cur_path):
show_files(cur_path, all_files)
else:
#Determine whether the suffix is satisfied
if suffix in cur_path:
all_files.append(cur_path)
return all_files

Keywords: AI Computer Vision Deep Learning

Added by Leadwing on Sat, 12 Feb 2022 14:36:31 +0200