Assignment 3
In this assignment you will be creating functions for creating professional-quality plots. Ideally you could use this approach for any figure you would need to produce for your papers, thesis, etc. For doing so, I invite you to use a representative set from your own data. It doesn't need to be unpublished or new data, just something that might resemble the actual data you have to deal with in your research. If you don't have any data available, you can still use other data that is close to your interests, either from the R data sets or from other websites, like Open Data Toronto, or the UCI Machine Learning Repository. If you are going to use one of the R data sets, do not use the ones we have been presenting and discussing in class. If you use an R data set, dump the data to a file so that you have a data file to load.
If you need some inspiration we invite you to visit our "Visualization Gallery" which is entirely composed of outstanding submissions from students from previous years (maybe next year we could have your plots displayed here as well).
Create a utilities file, named plotting.utilities.R
, to contain the code for this assignment. Create as many functions as necessary for your code to be modular. A minimum of four functions is expected.
1a) create a function which takes a single string as an argument, the name of the file to be read. The function should read in the associated file, print out a nice sentence saying what file is being processed, and return the resulting data.
Within the comments of this function, describe what the data is, what it represents, how it was collected, and any other information needed to understand the data. This is for my edification, so that I have a better idea of what I'm looking at.
1b) create a driver function for your assignment. The function should take (at least) two arguments: the name of the data file which contains the data to be plotted, and a flag, 'plot1' or 'plot2', to indicate which plot to create. Depending on the value of the second argument, the script will perform the following actions:
- if the argument is
plot1
, the script will generate a professional/publication quality plot, preferably using your own data, following the criteria and conventions discussed in class. - if the command line argument is
plot2
, the script will generate a professional/publication quality plot of a different type, preferably using your own data, following the criteria and conventions discussed in class. You may use the same data as used in for theplot1
argument.
Please make sure your plots follow the professional-plotting criteria outlined in class. You can use basic plotting tools available in R or ggplot2
.
The plots should contain more than one graphical representation, i.e. it can not be just dots representing the data; it should be something like the data points + a fit, i.e., at least two graphical representations, or additional statistical results, should be present! Please select an appropriate file type to save the plots generated in 1) and 2), such that it preserves the quality of your figure!
Additionally,
- your main driver program should have defensive programming strategies for dealing with the arguments. The
is.character
function is useful here. - the functions should have arguments for receiving information and return statements in the cases where you need to communicate further information to the rest of the code,
- no global variables of any kind, i.e. functions can not access variables that are not passed to them.
Please submit:
- your
plotting.utilities.R
code, - the final products of your R script, i.e. two plot files,
- your data, so that when the script is run it will run successfully.
Assignments will be graded on a 10 point basis.
Due date is March 22, 2023 at midnight, with 0.5 point penalty per day for late submission until the cut-off date of March 29, 2023 at 9:00am.