BCH2203 - Winter 2022: 5. Seeds classification

Opened: Wednesday, 23 March 2022, 12:59 AM

Due: Wednesday, 30 March 2022, 11:59 PM

For this assignment, you will be working ith the 'seeds' data set, hosted at the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/seeds

This data consists of measurements of 3 types of wheat seed. By default the data does not contain the column names, so please use the csv file below which does contain the column names.

You should write a script using sklearn to

Read this data set in (e.g. using pandas).
Split it into training and testing data sets
Create and fit a number of decision trees, exploring at least 8 parameter sets, where one parameter set is a combination of maximum depths, minimal samples per leaf, and max_features.
Determine the confusion matrix for each tree. Which parameters worked best?

Submit your script and conclusions by March 30th, at midnight.

seeds_dataset2.csv
23 March 2022, 10:55 AM