Classification I-tree, delay and probability note

Prepare training and test data sets

As soon as I came up, I found that the dataset could not be found. After searching, I finally found the dataset in another package.

# install.packages("C50")
# library(C50)
# data('churn', package = 'C50')
# install.packages("modeldata")
churn <- mlc_churn
# 7: 3-point training and test set
ind <- sample(2,nrow(churnTrain),replace = TRUE,
              prob = c(0.7,0.3))
trainset <- churnTrain[ind==1,]
testset <- churnTrain[ind==2,]

This data set is slightly different from that in the book, but it should contain more relationships. There are more samples of this data, which should not be affected. Extension: split function completes the division of training and testing <- function(data, p= 0.7, s= 666){
  index <- sample(1:dim(data)[1])
  train <- data[index[1:floor(dim(data)[1]*p)],]
  test <-  data[index[((ceiling(dim(data)[1]*p))+1):dim(data)[1]],]

li <-

Using recursive segmentation tree to establish classification model

Recursion and segmentation are two steps of this algorithm. CP is a cost complexity parameter. The disadvantage of decision tree algorithm is that it is easy to produce deviation and over adaptation. Conditional reasoning tree can overcome deviation, and over adaptation can be solved by random forest method or tree pruning.

churn.rp <- rpart(churn~., data=trainset)

5.4 recursive split tree visualization

plot and text functions draw a classification tree.

plot(churn.rp, margin = 0.1) # frame
text(churn.rp, all = TRUE, use.n = TRUE) # use.n number of actual observations per category
# Change the parameters to adjust the display results
plot(churn.rp, uniform = TRUE, branch = 0.6, margin = 0.1) # brach setting shoudler
text(churn.rp,all = TRUE, use.n = TRUE)

5.5 evaluate the classification ability of recursive split tree

# forecast
predictions <- predict(churn.rp, testset, type = "class")
table(testset$churn, predictions)
# #############
       yes   no
  yes  133   81
  no    29 1278
# Generating confusion matrix
confusionMatrix(table(predictions, testset$churn))
Confusion Matrix and Statistics
# ##############
predictions  yes   no
        yes  133   29
        no    81 1278
               Accuracy : 0.9277          
                 95% CI : (0.9135, 0.9402)
    No Information Rate : 0.8593          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.6671   

5.6 recursive split tree pruning

Sometimes, it is necessary to prune the rules with weak classification description ability to avoid over adaptation and improve the prediction accuracy. The cost complexity method is used here.

[1] 0.4523327
# Minimum cost complexity parameter
churn.cp <- churn.rp$cptable[5,"CP"]
[1] 0.01014199
# trim
prune.tree <- prune(churn.rp, cp=churn.cp)
plot(churn.rp, margin = 0.1)
text(churn.rp,all = TRUE, use.n = TRUE)
# The confusion matrix is slightly lower than that before pruning to avoid over fitting
confusionMatrix(table(predictions, testset$churn))
# ################

I don't seem to find much

5.7 establish classification model using conditional reasoning tree

In addition to the traditional rpart decision tree algorithm, conditional inference tree ctree is another commonly used tree based classification algorithm. The recursive partition of data is also realized for non independent variables. The difference is that the conditional reasoning tree selects split variables based on the results of significance measurement rather than the information maximization method. The Gini coefficient is used in rpart, which does not represent the gap between the rich and the poor.

# Conditional reasoning tree
ctree.moddel <- ctree(churn~., data = trainset)

5.8 conditional reasoning tree visualization

# visualization
daycharhe.model <- ctree(churn~total_day_charge, data = trainset)

Simplify it

5.9 evaluation and prediction ability

 # forecast
> ctree.predict<- predict(ctree.moddel, testset)
> table(ctree.predict, testset$churn)
ctree.predict  yes   no
          yes  139    9
          no    75 1298
confusionMatrix(table(ctree.predict, testset$churn))
# #######################
Confusion Matrix and Statistics

ctree.predict  yes   no
          yes  139    9
          no    75 1298
               Accuracy : 0.9448          
                 95% CI : (0.9321, 0.9557)
    No Information Rate : 0.8593          
    P-Value [Acc > NIR] : < 2.2e-16  
# probability
> treeresponse(ctree.moddel, newdata = testset[1:5,])
[1] 0.02715356 0.97284644

[1] 0.06842105 0.93157895

[1] 0.06842105 0.93157895

[1] 0.06842105 0.93157895

[1] 0.02715356 0.97284644

5.10 using k-adjacency classification algorithm

It is a nonparametric inert learning method, does not make any assumptions about the data distribution, and does not require the algorithm to have an explicit learning process.

levels(trainset$international_plan) = list("0" = "no", "1" = "yes")
levels(trainset$voice_mail_plan) = list("0" = "no", "1" = "yes")
levels(testset$international_plan) = list("0" = "no", "1" = "yes")
levels(testset$voice_mail_plan) = list("0" = "no", "1" = "yes")

churn.knn <- knn(trainset[,!names(trainset) %in% c("churn", "area_code", "state" )],
                 testset[,!names(testset)  %in% c("churn", "area_code", "state" )], trainset$churn, k=3)
# ########################
Confusion Matrix and Statistics

       yes   no
  yes   76  138
  no    46 1261
               Accuracy : 0.879          
                 95% CI : (0.8616, 0.895)
    No Information Rate : 0.9198         
    P-Value [Acc > NIR] : 1              
                  Kappa : 0.3901  

knn algorithm uses similarity distance to train and classify. For example, Euclidean distance or Manhattan distance, k=1, will allocate samples to the nearest category. If K is small, it may be over fitting, too large or low fitting, and a suitable value can be obtained by cross test. The advantage is that the learning cost is 0, there is no need to assume distribution, and any type of data can be processed; The disadvantage is that it is difficult to understand, the data set is large, the calculation cost is very high, and the dimension of high-dimensional data must be reduced first. Character type data shall be processed into integer first, and k=3 shall be allocated to the last three clusters. kknn package can provide weighted k-nearest neighbor algorithm, regression and clustering.

5.11 using logistic regression

It is an algorithm based on probability and statistics. The logit function can be executed. The glm family specified as binomial is also a logical regression algorithm.

# logistic
fit <- glm(churn~., data = trainset,family = binomial)
# Remove non significant variables

fit <- glm(churn~international_plan + voice_mail_plan + total_intl_calls + number_customer_service_calls , data = trainset,family = binomial)
pred <- predict(fit, testset, type = "response")
pred <- predict(fit, testset, type = "response")
Class <- pred > .5
   Mode   FALSE    TRUE 
logical      44    1477 
pred_class <- churn.mod
pred_class[pred <=.5] = 1- pred_class[pred<=.5]
ctb <- table(churn.mod, pred_class)
# ###########
churn.mod    0    1
        0 1287   20
        1   24  190
Confusion Matrix and Statistics

churn.mod    0    1
        0 1287   20
        1   24  190
               Accuracy : 0.9711          
                 95% CI : (0.9614, 0.9789)
    No Information Rate : 0.8619          
    P-Value [Acc > NIR] : <2e-16          
                  Kappa : 0.8794 

Logistic regression is easy to understand, directly outputs probability and confidence interval, and can quickly merge new data sets and update classification models. The disadvantage is that it cannot deal with multicollinearity, and the explanatory variables must be linear independent.

5.12 using naive Bayesian classification algorithm

It is also a probability based classifier, which assumes that the sample attributes are independent of each other.

classifer <- naiveBayes(trainset[,!names(trainset) %in% c("churn")], trainset$churn)
# ##############
Naive Bayes Classifier for Discrete Predictors

naiveBayes.default(x = trainset[, !names(trainset) %in% c("churn")], 
    y = trainset$churn)
# ###############
A-priori probabilities:
      yes        no 
0.1417074 0.8582926 
bayes.table <- table(predict(classifer, testset[,!names(trainset) %in% c("churn")]), testset$churn)
# #############     
       yes   no
  yes  104   52
  no   110 1255
Confusion Matrix and Statistics
# ############
       yes   no
  yes  104   52
  no   110 1255
               Accuracy : 0.8935          
                 95% CI : (0.8769, 0.9086)
    No Information Rate : 0.8593          
    P-Value [Acc > NIR] : 4.220e-05       
                  Kappa : 0.5032  

Classification evaluation is based on the above routine. The naive Yess algorithm assumes that the characteristic variables are conditionally independent, with relatively simple advantages and direct application. It is suitable for training data sets with small-scale trees and possible missing or data noise. The disadvantage is that the above conditions are independent and equally important, which is difficult to achieve in the real world. To summarize this chapter, see the following pictures:

