Assignment 10
Due date: Tuesday, April 9th at midnight (Tuesday night).
0) You must use version control ("git"), as you develop your code. We suggest you start, from the Linux command line, by creating a new directory, e.g. assignment10, cd into that directory and initialize a git repository ("git init"
) within it, and perform "git add ..., git commit"
repeatedly as you add to your code. You will hand in the output of "git log"
for your assignment repository as part of the assignment. You must have a significant number of commits representing the modifications, alterations and changes to your code. If your log does not show a significant number of commits with meaningful comments you will lose marks.
In this assignment we will run a Markov Chain Monte Carlo analysis to examine the German tank problem. The problem has a long history, but in summary: during World War II, the Allied forces captured or destroyed German tanks. Those tanks had sequential serial numbers. The Allies wanted to estimate the number of tanks that had been manufactured based on the serial numbers of the tanks that had been captured or destroyed.
The goal of this assignment is to determine \(N\), the number of tanks that have been manufactured. To do this we will use a Markov Chain Monte Carlo Bayesian approach. Our Markov Chain will search through the possible values of \(N\), and then be used to determine the mean and median values of the number of tanks produced.
Let us assume that the serial numbers of the tanks that have been captured or destroyed are 10, 15, 101, 124, 256, 202, 97, 200, 220, 121, 111, 198, 151. Let us assume that the tanks that are in the field of battle have been randomly shuffled.
1) Create a file named german_tank_utilities.py
containing the following functions.
1a) Write a function whose purpose is to return the above list of tank serial numbers.
1b) Write a function that will calculate and return the value of our likelihood function, \(L = P(\mathbf{x}|N)\), where \(\mathbf{x}\) are the tank serial numbers. This function will calculate the probability of getting the serial numbers \(\mathbf{x}\) given some value of \(N\).
As it happens, the functional form of this probability has already been calculated in the Wikipedia article linked above, and is given by
$$P(\mathbf{x}|N) = \frac{{m - 1 \choose k - 1}}{{N \choose k}}$$
where \(m\) is the maximum value of \(\mathbf{x}\) and \(k\) is the number of elements in \(\mathbf{x}\). The math.comb()
function may be helpful here. Also note that for the case of \(N\) < \(m\) the value of the likelihood is zero.
1c) Write a function which will return the value of your prior. To do this, choose a sensible functional form for the prior, based on your existing knowledge of the problem.
1d) Write a sampling function which will determine the next value of \(N\), and return it. The scipy.stats.randint.rvs()
function may be helpful here.
2) Create a Python script, named run_german_tank.py
, which:
- Reads in the list of tank serial numbers,
- initializes \(N\),
- Loops 10000 times, during which the code will perform the Metropolis algorithm, updating the value of \(N\) appropriately. The chain of values of \(N\) should be kept. The
random.random()
function may be helpful here. - Once the looping is complete, the code should print out the mean and median values of \(N\), less an appropriate burn-in period.
My code looked like this when I ran it.
$
$ python run_german_tank.py
The mean number of tanks is 274.94278095661355
The median number of tanks is 270.0
$
Note that the theoretical mean value for our particular data is about 278, with a standard deviation of about 25.
Submit your german_tank_utilities.py
and run_german_tank.py
files, and the output of git log
from your assignment repository.
Both Python code files must be added and committed frequently to the repository. To capture the output of git log
use redirection (git log > git.log
, and hand in the git.log
file).
Assignments will be graded on a 10 point basis. Due date is April 9th 2024 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of April 11th 2024, at midnight.