Assignment 11
Due date: December 5th at midnight (Thursday night). Late assignments will NOT be accepted!
Let us continue studying the codon data set introduced in Assignment 10. The modified data can be found here.
0) You must use version control ("git"), as you develop your code. We suggest you start, from the Linux command line, by creating a new directory, e.g. assignment11, cd into that directory and initialize a git repository ("git init"
) within it, and perform "git add ..., git commit"
repeatedly as you add to your code. You will hand in the output of "git log"
for your assignment repository as part of the assignment. You must have a significant number of commits representing the modifications, alterations and changes to your code. If your log does not show a significant number of commits with meaningful comments you will lose marks.
1) Create a file named Codon.Utilities.R
containing whatever functions you think you need to implement a modular solution to this problem.
You may reuse and modify your data-loading function from Assignment 10.
2) Create an R script called Codon.Classification.R
that will perform the following steps:
- Load and return the modified data, linked above. The data should be split into training and testing data sets.
- Three different classification models should be trained on the training data. The models should include a decision tree and cross-validated kNN and Support Vector Machine models. The target should be the 'Kingdom' column.
- The models should each be tested against the test data. The accuracy of each model should be printed out, along with sentences describing what is happening as the script runs.
- A final sentence should be printed out, declaring the model with the highest accuracy.
Be sure to comment your code. The usual defensive programming is required.
Submit your Codon.Utilities.R
, Codon.Classification.R
files, and the output of git log
from your assignment repository.
Both R code files must be added and committed frequently to the repository. To capture the output of git log
use redirection (git log > git.log
, and hand in the git.log
file).
Assignments will be graded on a 10 point basis. Due date is December 5th 2024 (midnight). Late assignments will not be accepted.