Assignment
5. Seeds classification
Completion requirements
Opened: Wednesday, 23 March 2022, 12:59 AM
Due: Wednesday, 30 March 2022, 11:59 PM
For this assignment, you will be working ith the 'seeds' data set, hosted at the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/seeds
This data consists of measurements of 3 types of wheat seed. By default the data does not contain the column names, so please use the csv file below which does contain the column names.
You should write a script using sklearn to
- Read this data set in (e.g. using pandas).
- Split it into training and testing data sets
- Create and fit a number of decision trees, exploring at least 8 parameter sets, where one parameter set is a combination of maximum depths, minimal samples per leaf, and max_features.
- Determine the confusion matrix for each tree. Which parameters worked best?
Submit your script and conclusions by March 30th, at midnight.
- 23 March 2022, 11:02 AM