- 22nd Sep 2021
- 06:03 am

**Analysis Report**

**Predicting Safety issues using dials – Decision trees technique**

**1. Introduction**

Decision trees are versatile Machine Learning algorithm that can perform both classification and regression tasks. They are very powerful algorithms, capable of fitting complex datasets. Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. We are going to use decision tree technique to predict the safety issue.

**2. Data set and Methodology**

Data set incudes measures the measurements from 20 dials along with the safety issue which labeled as red, yellow, and green. We are going to apply the decision tree machine learning algorithm to predict the safety issue (red, yellow, and green) using the measurements of 20 dials. Dataset was consist with 1000 observations. We are going to validate the classification model using cross validation. 7:3 ration was used to split the data set in to train and test. Model were build using train set and validated using test set. The CP (complexity parameter) is one of the key parameter in the decision tree which is used to control tree growth. If the cost of adding a variable is higher than the value of CP, then tree growth stops. We need to find the optimal complexity parameter at the first stage.

**3. Tuning the complexity parameter**

According to the figure 01, the corresponding complexity parameter value that error is becomes constant is 0.025. We can assume that optimal CP as 0.025 for the further analysis.

**4. Decision tree results**

Figure 02 shows the decision tree which we can use to predict the safety issue using the measurement of 20 dials. There are 7 rules with their confidence. For examples:-

Rule 01:- If dial14 >= -0.57 and Dial15 >= 1.5 then predicted safety issue is Green with 84% confidence.

Rule 02:- If dial14 >= -0.57 and Dial15 < 1>= -0.39 then predicted safety issue is Red with 77% confidence.

So by looking at the tree, safety issues can be predicted.

**5. Model Evaluation**

Decision tree was evaluated using test set that we split at the first stage.

Figure 02: Decision tree

predict

Green Red Yellow

Green 57 22 25

Red 15 78 4

Yellow 4 22 73

Table 01: Confusion matrix table

According to the table 01, out of 104 green safety issues, model has correctly categorized 57. Out of 97 red safety issues, model has correctly categorized 78. Out of 99 yellow safety issues, model has correctly categorized 73. Overall accuracy is 69.33%.

**6. Recommendation and next steps.**

Decision model that were build has good accuracy (almost 70%) and can be used to predict the safety issues using 20 measurements of dials. Out of the 20 dials, dial15, dial6 and dial8 seems very important as they are appear in top nodes. Out of them dial 15 is very important. So, management can pay attention towards dial15 and can take necessary actions to control safety issue. Ensemble methods such as random forest and bagging can be used to increase model accuracy. SO we can continue the work on them to increase the model accuracy and to make predictions more accurate and reliable.

**7. Appendix.**

library(readxl) library(rpart) library(rpart.plot) data <-read_excel("Data for Analytics Individual Assignment Combined.xlsx") View(data) summary(data) # Splitting in to train and test set sample_ind <- sample(nrow(data),nrow(data)*0.7) train <- data[sample_ind,] test <- data[-sample_ind,] #CP tuning base_model <- rpart(Safety_Issue ~ ., data = train, method = "class", control = rpart.control(cp = 0)) # Examine the complexity plot printcp(base_model) plotcp(base_model) # Decision tree model <- rpart(Safety_Issue ~ ., data = train, method = "class", control = rpart.control(cp = 0.025)) rpart.plot(model, box.palette = list("green", "red", "yellow")) #Validation the decision tree predict <-predict(model, test, type = 'class') # Confusion marix table_mat <- table(test$Safety_Issue, predict) table_mat # Accuracy accuracy_Test <- sum(diag(table_mat)) / sum(table_mat) accuracy_Test