Assignment 11
Due date: April 13th at midnight (Thursday night). Late assignments will NOT be accepted!
Let us continue studying the codon data set introduced in Assignment 10. The modified data can be found here.
0) You must use version control ("git"), as you develop your code. We suggest you start, from the Linux command line, by creating a new directory, e.g. assignment11, cd into that directory and initialize a git repository ("git init"
) within it, and perform "git add ..., git commit"
repeatedly as you add to your code. You will hand in the output of "git log"
for your assignment repository as part of the assignment. You must have a significant number of commits representing the modifications, alterations and changes to your code. If your log does not show a significant number of commits with meaningful comments you will lose marks.
1) Create a file named codon_utilities.py
containing whatever functions you think you need to implement a modular solution to this problem.
You may reuse your data-loading function from Assignment 10. However, returning the Kingdoms as indices may be more useful than returning them as strings. The numpy.unique
function has some functionality in it that might be useful here.
2) Create a Python script called codon_classification.py
that will perform the following steps:
- Load and return the modified data, linked above. The data should be split into training and testing data sets.
- Three different classification models should be trained on the training data. The models should include a kNN model (with \(k=1\)), a decision tree, and a neural network. The models should be fit without outputting tonnes of text to the screen. Do this as efficiently as possible.
- The models should each be tested against the test data. The accuracy of each model should be printed out, along with sentences describing what is happening as the script runs.
- A final sentence should be printed out, declaring the model with the highest accuracy.
Be sure to comment and document your functions. Defensive programming is not needed for this assignment.
Submit your codon_utilities.py
, codon_classification.py
files, and the output of git log
from your assignment repository.
Both Python code files must be added and committed frequently to the repository. To capture the output of git log
use redirection (git log > git.log
, and hand in the git.log
file).
Assignments will be graded on a 10 point basis. Due date is April 13th 2023 (midnight). Late assignments will not be accepted.