EES1137 - Winter 2025: Lecture 17 GraphViz

Hello,

I have been trying to turn each lecture into a script.

This has been really great for learning and just having some functions on hand.

However, I am struggling with lecture 17's decision tree stuff. Most notably, the graphViz part of it.

If I comment out the following lines:

# import graphviz

# dot_data = tree.export_graphviz(model, out_file = None,

# class_names = iris.target_names,

# feature_names = iris.feature_names,

# impurity = False, filled = True,

# label = "none")

# graph = graphviz.Source(dot_data)

# graph.format = 'pdf'

# graph.render("iris", view = True)

### UNSURE WHY GRAPHVIZ ISN'T WORKING IN SCRIPT FORM

My code works perfectly and returns confusion matrices for both testing and training data. However, I would like to have a script running which displays the decision tree graphically as well.
I've installed graphviz with pip install, but it doesn't work.

I've attached my code below

Please let me know if I'm using graphviz improperly!
Thank you
Raabez

# decisionTree.py

from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size = 0.33, random_state = 42)
# random state gives you a different set depending on the number
# setting it makes it so you get the same results everytime you run this script
# if you don't set it, you get a different random-state each time you run the code
# so your plots may look different, slightly, every time if you don't set it.

from sklearn import tree, metrics
model = tree.DecisionTreeClassifier(max_depth=2)
# Decision Trees fit crazy well, but you don't want to fit to the noise
# so you want to start low with max_depth
model.fit(X_train, y_train)
# fit with the training data, you don't want the model to see your testing data yet

# import graphviz
# dot_data = tree.export_graphviz(model, out_file = None,
#     class_names = iris.target_names,
#     feature_names = iris.feature_names,
#     impurity = False, filled = True,
#     label = "none")
# graph = graphviz.Source(dot_data)
# graph.format = 'pdf'
# graph.render("iris", view = True)
### UNSURE WHY GRAPHVIZ ISN'T WORKING IN SCRIPT FORM

# lec 17, slide 21 tells you how to interpret a decision tree

# plot a confusion matrix to determien the effectiveness of a classifier
# on diagonal = classified, off diagonal = mis-classified
y_pred = model.predict(X_train)
print(metrics.confusion_matrix(y_train, y_pred))

# now, see how well it works on the test dataset (first time model is interacting with the test dataset)
y_pred = model.predict(X_test)
print(metrics.confusion_matrix(y_test, y_pred))

# a model's strenght is based on how well it works on the testing dataset
# it will be good on the training set, as the model was made using the training set

# prune the tree at some level, where the results are "good enough" and the model is not "too complex"
## try max_depth at lower numbers until you start getting bad confusion matrices for training and testing dataset

# model works pretty good!

Re: Lecture 17 GraphViz

par Alexey Fedoseev, jeudi 3 avril 2025, 08:36

Hi Raabez,

When you say that Graphviz isn't working, what kind of error message are you seeing? Did you run `conda install python-graphviz` (not pip) as mentioned in the lecture?

Alexey