# ID3 Decision Tree Algorithm

## 1. Theory

1. purity
For a branch node, if all the samples it contains belong to the same class, its purity is 1, and we always want the higher the purity, the better, that is, as many samples as possible belong to the same class. So how do you measure "purity"? The concept of "information entropy" is introduced.

2. information entropy
Assuming that the proportion of class k samples in the current sample set D is pk (k=1,2,..., |y|), the information entropy of D is defined as:

```          Ent(D) = -∑k=1 pk·log2 pk    (If agreed p=0，be log2 p=0)
```

Obviously, the smaller the Ent(D) value, the higher the purity of D. Because 0<=pk<= 1, log2 pk<=0, Ent(D)>=0. In the limit case, if the samples in D belong to the same class, the Ent(D) value is 0 (to the minimum). Ent(D) takes the maximum log2 |y | when the samples in D belong to different categories.

3. information gain
Assume that discrete attribute a has all V possible values {a1,a2,..., av}. If A is used to classify the sample set D, then V branch nodes will be generated. Note that D V is a sample with a V values on all attribute a contained in the V branch node. Different branch nodes have different samples, so we give different weights to the branch nodes: |Dv|/|D|, which gives more influence to the branch nodes with more samples. Therefore, the information gain obtained by dividing the sample set D with attribute a is defined as:

```        Gain(D,a) = Ent(D)-∑v=1 |Dv|/|D|·Ent(Dv)
```

Ent(D) is the information entropy before the partition of dataset D. _v=1 |D v|/|D|Ent(D v) can be expressed as the partitioned information entropy. The "before-after" result shows the decrease of information entropy obtained by this division, that is, the improvement of purity. Obviously, the larger the Gain(D,a), the greater the purity improvement, the better the effect of this division.

1. Gain ratio
Information Gain Criterion, the optimal attribute division principle based on information gain, has a preference for attributes with more accessible data. The C4.5 algorithm uses gain rate instead of information gain to select the optimal partition attribute, which is defined as:

```        Gain_ratio(D,a) = Gain(D,a)/IV(a)
```

among

```           IV(a) = -∑v=1 |Dv|/|D|·log2 |Dv|/|D|
```

An intrinsic value called attribute a. The more possible values for attribute a (that is, the larger V), the larger the value for IV(a). This partly eliminates the preference for attributes with more accessible data.

```  In fact, the gain criterion has a preference for attributes with fewer available values. C4.5 Instead of directly using the gain rate criterion, the algorithm first finds out the above average information gain attributes from the candidate partition attributes, and then selects the one with the highest gain rate.
```
1. Gini index
The CART decision tree algorithm uses the Gini index to select partition attributes, which is defined as:

```       Gini(D) = ∑k=1 ∑k'≠1 pk·pk' = 1- ∑k=1  pk·pk
```

The Kini index can be interpreted as the probability of inconsistencies in the class labels of two samples randomly sampled from dataset D. The smaller the Gini(D), the higher the purity.

Gini index definition for attribute a:

```      Gain_index(D,a) = ∑v=1 |Dv|/|D|·Gini(Dv)
```

The optimal partition attribute is selected by using the Kini index, that is, the attribute that minimizes the Kini index after partition is selected as the optimal partition attribute.

## 2. Code implementation

### 1. Introducing data and packages to use

```import numpy as np
import pandas as pd
import sklearn.tree as st
import math
data
```
Column1Column2Column3Column4Column5Column6Column7
0Dark greenCoiledTurbid SoundclearSunkenHard Slideyes
1BlackCoiledDepressedclearSunkenHard Slideyes
2BlackCoiledTurbid SoundclearSunkenHard Slideyes
3Dark greenCoiledDepressedclearSunkenHard Slideyes
4plainCoiledTurbid SoundclearSunkenHard Slideyes
5Dark greenCurl up a littleTurbid SoundclearSlightly concaveSoft stickyyes
6BlackCurl up a littleTurbid SoundSlightly blurrySlightly concaveSoft stickyyes
7BlackCurl up a littleTurbid SoundclearSlightly concaveHard Slideyes
8BlackCurl up a littleDepressedSlightly blurrySlightly concaveHard Slideno
9Dark greenStiffCrispclearflatSoft stickyno
10plainStiffCrispvagueflatHard Slideno
11plainCoiledTurbid SoundvagueflatSoft stickyno
12Dark greenCurl up a littleTurbid SoundSlightly blurrySunkenHard Slideno
13plainCurl up a littleDepressedSlightly blurrySunkenHard Slideno
14BlackCurl up a littleTurbid SoundclearSlightly concaveSoft stickyno
15plainCoiledTurbid SoundvagueflatHard Slideno
16Dark greenCoiledDepressedSlightly blurrySlightly concaveHard Slideno

### 2. Functions

#### Calculating Entropy

```def calcEntropy(dataSet):
mD = len(dataSet)
dataLabelList = [x[-1] for x in dataSet]
dataLabelSet = set(dataLabelList)
ent = 0
for label in dataLabelSet:
mDv = dataLabelList.count(label)
prop = float(mDv) / mD
ent = ent - prop * np.math.log(prop, 2)

return ent
```

#### Split Dataset

```# index - Subscript of the feature to be split
# Feature - The feature to be split
# Return value - The index in the dataSet is characterized by a feature, and the index column is removed from the set
def splitDataSet(dataSet, index, feature):
splitedDataSet = []
mD = len(dataSet)
for data in dataSet:
if(data[index] == feature):
sliceTmp = data[:index]
sliceTmp.extend(data[index + 1:])
splitedDataSet.append(sliceTmp)
return splitedDataSet
```

#### Choose the best features

```# Return Value - Subscript for Best Features
def chooseBestFeature(dataSet):
entD = calcEntropy(dataSet)
mD = len(dataSet)
featureNumber = len(dataSet) - 1
maxGain = -100
maxIndex = -1
for i in range(featureNumber):
entDCopy = entD
featureI = [x[i] for x in dataSet]
featureSet = set(featureI)
for feature in featureSet:
splitedDataSet = splitDataSet(dataSet, i, feature)  # Split Dataset
mDv = len(splitedDataSet)
entDCopy = entDCopy - float(mDv) / mD * calcEntropy(splitedDataSet)
if(maxIndex == -1):
maxGain = entDCopy
maxIndex = i
elif(maxGain < entDCopy):
maxGain = entDCopy
maxIndex = i

return maxIndex
```

#### Look for the most, as labels

```# Return Value-Label
def mainLabel(labelList):
labelRec = labelList
maxLabelCount = -1
labelSet = set(labelList)
for label in labelSet:
if(labelList.count(label) > maxLabelCount):
maxLabelCount = labelList.count(label)
labelRec = label
return labelRec
```

#### spanning tree

```def createFullDecisionTree(dataSet, featureNames, featureNamesSet, labelListParent):
labelList = [x[-1] for x in dataSet]
if(len(dataSet) == 0):
return mainLabel(labelListParent)
elif(len(dataSet) == 1): #No more divisible attributes
return mainLabel(labelList)  #Select the most label s for this dataset
elif(labelList.count(labelList) == len(labelList)): # All belong to the same Label
return labelList

bestFeatureIndex = chooseBestFeature(dataSet)
bestFeatureName = featureNames.pop(bestFeatureIndex)
myTree = {bestFeatureName: {}}
featureList = featureNamesSet.pop(bestFeatureIndex)
featureSet = set(featureList)
for feature in featureSet:
featureNamesNext = featureNames[:]
featureNamesSetNext = featureNamesSet[:][:]
splitedDataSet = splitDataSet(dataSet, bestFeatureIndex, feature)
myTree[bestFeatureName][feature] = createFullDecisionTree(splitedDataSet, featureNamesNext, featureNamesSetNext, labelList)
return myTree
```

#### Initialization

```# Return value
# dataSet dataset
# featureNames tag
# featureNamesSet column label
dataSet = data.values.tolist()
featureNames =['Color and lustre', 'Root pedicle', 'Tapping', 'texture', 'Umbilical Cord', 'Touch']
#Get featureNamesSet
featureNamesSet = []
for i in range(len(dataSet) - 1):
col = [x[i] for x in dataSet]
colSet = set(col)
featureNamesSet.append(list(colSet))

return dataSet, featureNames, featureNamesSet
```

#### Drawing

```# Ability to display Chinese
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['font.serif'] = ['SimHei']

# Bifurcation node, also known as decision node
decisionNode = dict(boxstyle="sawtooth", fc="0.8")

# Leaf Node
leafNode = dict(boxstyle="round4", fc="0.8")

# Arrow Style Panel
arrow_args = dict(arrowstyle="<-")

def plotNode(nodeTxt, centerPt, parentPt, nodeType):
"""
Draw a node
:param nodeTxt: Text information describing the node
:param centerPt: Coordinates of text
:param parentPt: The coordinates of the point, which is also the coordinates of the parent node
:param nodeType: Node type,Divided into leaf nodes and decision nodes
:return:
"""
createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',
xytext=centerPt, textcoords='axes fraction',
va="center", ha="center", bbox=nodeType, arrowprops=arrow_args)

def getNumLeafs(myTree):
"""
Get the number of leaf nodes
:param myTree:
:return:
"""
# Count the total number of leaf nodes
numLeafs = 0

# Get the current first key, which is the root node
firstStr = list(myTree.keys())

# Get what the first key corresponds to
secondDict = myTree[firstStr]

# Recursively traverse leaf nodes
for key in secondDict.keys():
# If the key corresponds to a dictionary, it is called recursively
if type(secondDict[key]).__name__ == 'dict':
numLeafs += getNumLeafs(secondDict[key])
# No, it means a leaf node at this time
else:
numLeafs += 1
return numLeafs

def getTreeDepth(myTree):
"""
Get the number of depth layers
:param myTree:
:return:
"""
# Save maximum number of layers
maxDepth = 0

# Get Root Node
firstStr = list(myTree.keys())

# Get what the key corresponds to
secondDic = myTree[firstStr]

# Traverse all child nodes
for key in secondDic.keys():
# If the node is a dictionary, it is called recursively
if type(secondDic[key]).__name__ == 'dict':
# Subnode depth plus 1
thisDepth = 1 + getTreeDepth(secondDic[key])

# Explain that this is a leaf node
else:
thisDepth = 1

# Replace Maximum Layers
if thisDepth > maxDepth:
maxDepth = thisDepth

return maxDepth

def plotMidText(cntrPt, parentPt, txtString):
"""
Calculate the middle position of parent and child nodes, fill in the information
:param cntrPt: Child Node Coordinates
:param parentPt: Parent node coordinates
:param txtString: Filled Text Information
:return:
"""
# Calculate the middle position of the x-axis
xMid = (parentPt-cntrPt)/2.0 + cntrPt
# Calculate the middle position of the y-axis
yMid = (parentPt-cntrPt)/2.0 + cntrPt
# Draw
createPlot.ax1.text(xMid, yMid, txtString)

def plotTree(myTree, parentPt, nodeTxt):
"""
Draw all nodes of the tree, drawing recursively
:param myTree: tree
:param parentPt: Coordinates of parent node
:param nodeTxt: Text information for nodes
:return:
"""
# Calculate the number of leaf nodes
numLeafs = getNumLeafs(myTree=myTree)

# Calculate tree depth
depth = getTreeDepth(myTree=myTree)

# Get the information content of the root node
firstStr = list(myTree.keys())

# Calculates the middle coordinate of the current root node among all the child nodes, that is, the offset of the current X-axis plus the calculated center position of the root node as the x-axis (for example, first time: initial x-offset: -1/W, calculated root node center position: (1+W)/W, added to: 1/2), current Y-axis offset as the y-axis
cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalW, plotTree.yOff)

# Draw the connection between the node and its parent
plotMidText(cntrPt, parentPt, nodeTxt)

# Draw the node
plotNode(firstStr, cntrPt, parentPt, decisionNode)

# Get the subtree corresponding to the current root node
secondDict = myTree[firstStr]

# Calculate a new Y-axis offset and move 1/D down, which is the next level of drawing y-axis
plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD

# Loop through all key s
for key in secondDict.keys():
# If the current key is a dictionary and represents a subtree, traverse it recursively
if isinstance(secondDict[key], dict):
plotTree(secondDict[key], cntrPt, str(key))
else:
# Calculate a new x-axis offset, where the x-axis coordinates drawn by the next leaf move one-quarter of a W to the right
plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalW
# Open comment to see coordinate changes of leaf nodes
# print((plotTree.xOff, plotTree.yOff), secondDict[key])
# Draw leaf nodes
plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)
# Draw the middle line content between leaf and parent nodes
plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))

# Before returning to recursion, you need to increase the offset of the y-axis by 1/D, which is to go back and draw the Y-axis of the previous layer
plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD

def createPlot(inTree):
"""
Decision Tree to Draw
:param inTree: Decision Tree Dictionary
:return:
"""
# Create an image
fig = plt.figure(1, facecolor='white')
fig.clf()
axprops = dict(xticks=[], yticks=[])
createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)
# Calculate the total width of the decision tree
plotTree.totalW = float(getNumLeafs(inTree))
# Calculating the total depth of a decision tree
plotTree.totalD = float(getTreeDepth(inTree))
# The initial x-axis offset, which is -1/W, moves 1/W to the right each time, that is, the x-coordinates drawn by the first leaf node are: 1/W, the second: 3/W, the third: 5/W, and the last: (W-1)/W
plotTree.xOff = -0.5/plotTree.totalW
# Initial y-axis offset, moving 1/D down or up each time
plotTree.yOff = 1.0
# Call function to draw node image
plotTree(inTree, (0.5, 1.0), '')
# Draw
plt.show()
```

### 3. Results

```dataSet, featureNames, featureNamesSet=readWatermelonDataSet()
testTree= createFullDecisionTree(dataSet, featureNames, featureNamesSet,featureNames)
createPlot(testTree)
``` ## 3. Reference

[Machine Learning]-Decision Tree (Watermelon Dataset)

Keywords: AI

Added by pennythetuff on Sat, 23 Oct 2021 19:04:03 +0300