Skip to main content
SciNet
  • Home
  • All Courses
  • Calendar
  • Certificates
  • SciNet
    Main Site Documentation my.SciNet
  • CCDB
  • More
Close
Toggle search input
English
English Français
You are currently using guest access
Log in
SciNet
Home All Courses Calendar Certificates SciNet Collapse Expand
Main Site Documentation my.SciNet
CCDB
Expand all Collapse all
  1. Dashboard
  2. MSC1090 - Fall 2024
  3. Assignment 11

Assignment 11

Completion requirements
Opened: Thursday, 28 November 2024, 12:00 AM
Due: Thursday, 5 December 2024, 11:59 PM

Due date: December 5th at midnight (Thursday night). Late assignments will NOT be accepted!


Let us continue studying the codon data set introduced in Assignment 10. The modified data can be found here.


0) You must use version control ("git"), as you develop your code. We suggest you start, from the Linux command line, by creating a new directory, e.g. assignment11, cd into that directory and initialize a git repository ("git init") within it, and perform "git add ..., git commit" repeatedly as you add to your code. You will hand in the output of "git log" for your assignment repository as part of the assignment. You must have a significant number of commits representing the modifications, alterations and changes to your code. If your log does not show a significant number of commits with meaningful comments you will lose marks.


1) Create a file named Codon.Utilities.R containing whatever functions you think you need to implement a modular solution to this problem.

You may reuse and modify your data-loading function from Assignment 10.


2) Create an R script called Codon.Classification.R that will perform the following steps:

  1. Load and return the modified data, linked above. The data should be split into training and testing data sets.
  2. Three different classification models should be trained on the training data. The models should include a decision tree and cross-validated kNN and Support Vector Machine models.  The target should be the 'Kingdom' column.
  3. The models should each be tested against the test data. The accuracy of each model should be printed out, along with sentences describing what is happening as the script runs.
  4. A final sentence should be printed out, declaring the model with the highest accuracy.

Be sure to comment your code. The usual defensive programming is required.


Submit your Codon.Utilities.R, Codon.Classification.R files, and the output of git log from your assignment repository.

Both R code files must be added and committed frequently to the repository. To capture the output of git log use redirection (git log > git.log, and hand in the git.log file). 

Assignments will be graded on a 10 point basis. Due date is December 5th 2024 (midnight). Late assignments will not be accepted.

Contact site support
You are currently using guest access (Log in)
Data retention summary


All content on this website is made available under the Creative Commons Attribution 4.0 International licence, with the exception of all videos which are released under the Creative Commons Attribution-NoDerivatives 4.0 International licence.
Powered by Moodle