Assignment 5
Due date: Thursday, February 16th at midnight (Thursday night).
Unexpected things can happen in real life if people are not careful with statistics. An example of this is the excessively-large-roll-down mistake committed by the lotteries of Massachusetts and Michigan in the early 2000s. If you're interested in the details, you can read the government of Massachusett's report which was commissioned in the aftermath of the media uproar.
As you may be aware, with a pick-k-numbers style of lottery, a lottery contestant picks k numbers out of N. If those k numbers are then picked at draw time the contestant wins the jackpot, the biggest prize. If k - 1 numbers are picked, the contestant wins a smaller prize, k - 2 an even smaller prize, and so on. If the jackpot is not awarded for a given draw, the amount of the jackpot increases for the next draw. The lotteries mentioned in the previous paragraph made a mistake when they designed their lotteries: if the jackpot got too large, the extra money would "roll-down" to the higher-probability wins. This meant that getting 5, 4, or 3 correct numbers, instead of 6, would get a higher payout than under normal circumstances. The mistake which the lotteries made was to make the payout for those numbers too big. This is the subject of this assignment.
0) You must use version control ("git"), as you develop your code. We suggest you start, from the Linux command line, by creating a new directory, e.g. assignment5, cd into that directory and initialize a git repository ("git init"
) within it, and perform "git add ..., git commit"
repeatedly as you add to your code. You will hand in the output of "git log"
for your assignment repository as part of the assignment. You must have a significant number of commits representing the modifications, alterations and changes to your code. If your log does not show a significant number of commits with meaningful comments you will lose marks.
1) Create a file named Lottery.Utilities.R
containing the following functions.
1a) Write a function which calculates the probability of getting \(m\) out of \(k\) numbers correct, when \(k\) numbers are drawn from a pool of \(N\) numbers. This is the same as the "if a bag contains N marbles, k of which are red, and you pick k marbles from the bag, what is the probability of picking m red marbles" problem. The probability is given by the expression
$$P(m, k, N) = \frac{{k \choose m}{N - k \choose k - m}}{{N \choose k}}$$where \(N \choose k\) ("N choose k") is the number of ways of picking \(k\) numbers from a pool of \(N\). The choose()
function will be useful here.
1b) If we consider a lottery ticket to be an investment, we can define the "return" on our investment to be the amount of money we get from the lottery, minus the amount we invested. On average, the amount of money we get from a lottery ticket will be the amount of money awarded multiplied by the probability of winning, for each possible way of winning.
$$R = p_kA_k + p_{k-1}A_{k-1} + ... + p_1A_1 - C$$where \(R\) is the total return, \(p_m\) is the probability of getting m numbers correct, \(A_m\) is the award for getting \(m\) numbers correct, and \(C\) is the cost of the ticket. Under normal circumstances, for a properly-designed lottery, the return, \(R\), will always be negative.
Write a function that takes the arguments \(k\), \(N\), a vector of awards, and the cost per ticket, and returns the average 'return' for a lottery ticket. You may assume that the awards in the awards vector are in ascending order, meaning the first entry is for getting 1 number correct, the second is for getting 2 numbers correct, etc.
1c) Write a function which randomly draws \(k\) numbers from the numbers 1 through \(N\), and returns them. The built-in sample()
function will be useful here.
1d) Write a function which takes two vectors as arguments, and returns the number of numbers the two vectors have in common. The %in%
operator is useful here.
1e) Let us assume you bought \(X\) lottery tickets, and they all have random sets of \(k\) numbers. Suppose that the winning lottery draw ended up being the numbers 1 through \(k\). Write a function which takes as its arguments \(X\), \(k\), \(N\), a vector of awards, and the cost per ticket, and
- randomly generates a lottery ticket's numbers,
- compares the ticket numbers to the winning numbers, 1 through \(k\),
- calculates the return for the given ticket,
- repeats the above for \(X\) tickets, and returns the total profit from the \(X\) tickets.
Note that this function should not use the function from part 1b), as the function for part 1b) is the average return for a ticket, and this function is calculating the return for specific tickets.
We will use the Massachusetts lottery mentioned above as our case study. The lottery was a 6/46 game, meaning 6 numbers were drawn out of the numbers 1 through 46, and the cost per ticket was $2. Under normal, non-rolldown circumstances, the awards for getting the correct numbers were [0, 2, 5, 150, 4000, 500000], where the numbers are in dollars and the jackpot listed here is the minimum; the jackpot could rise to as much as $2,000,000, at which point a rolldown would occur. Under a rolldown, the non-jackpot awards would change depending on the number of tickets sold, but an example award vector might be [0, 2, 27, 807, 22096, 2000000].
2) Create an R script called Lottery.Analysis.R
that will perform the following steps:
- sources your utilities file
Lottery.Utilities.R
, - takes an argument from the command line. If the command line argument is "Normal", the script will perform steps 3-6, below, using the non-rolldown awards vector, [0, 2, 5, 150, 4000, 500000]. If the command line argument is "Rolldown", the script will perform the steps below using the rolldown awards vector, [0, 2, 27, 807, 22096, 2000000]. Otherwise the script will throw an error message. The script should also indicate which case it's using: "Using the rolldown awards vector..."
- calculates the average ticket return for this example, using the aforementioned awards vector, and prints out a sentence describing the result: "The average return for a ticket is ..."
- calculates the return generated from buying 1000 tickets, 1000 times. Thus, you should have a resulting vector of 1000 entries.
- prints out a sentence describing the average value of your 1000-ticket purchases: "The average return for 1000 tickets is ...".
- runs the
hist()
function on your resulting 1000 returns.
Your script should output something like this, when run from the shell terminal:
$
$ Rscript Lottery.Analysis.R Normal
Using the normal awards vector.
The average return for a ticket is -1.258585
The average return for the 1000 tickets is -1326.219
$
Note that this script will generate an "Rplot.pdf" file when you run it from the bash prompt. This file contains the histogram your script generated. Play with the number of bins in your histogram to create a plot that is easy to read. Your plot should look something like this:
Be sure to comment and document your functions. Defensive programming of the functions is not required, but defensive programming of the script is expected.
Submit your Lottery.Utilities.R
and Lottery.Analysis.R
files, and the output of git log
from your assignment repository.
Both R code files must be added and committed frequently to the repository. To capture the output of git log
use redirection (git log > git.log
, and hand in the git.log
file).
Assignments will be graded on a 10 point basis. Due date is February 16th 2023 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of February 23rd 2023, at 11:00am.