Assignment 6
Note: late assignments will not be accepted!
Let us consider the following situation: you've bought a half dozen eggs. You're worried about salmonella contamination, so you take DNA samples from the 6 eggs. You decide to test which batches contain salmonella by aligning the DNA samples against a reference salmonella genome and a reference chicken genome.
For this assignment, we will not get our genomes from Genbank. Instead, we will use the data contained in the file a6data.tar.gz. This zip file contains the following files:
- salmonella.fa, which contains the salmonella genome,
- chicken.fa, which contains the chicken genome (for the purpose of this assignment, chicken.fa is actually only one tenth of one of the chicken's chromosomes), and
- eggX-fragmentYY.fa, which contain the sequences sampled from the eggs. There are roughly 40 fragments per egg, each about 150 nucleotides long. Do not hard-code these files, but rather search for them use
dir
andstartsWith
.
Your task is to write a driver function, using as many other functions as you think are necessary to perform modular programming, which will
- build the BLAST indices for the two reference genomes. You may hard-code the reference files. You may also assume that the 'data' directory is located in the same directory as your driver function.
- use rBLAST to align the fragments with the references,
- count the number of matches for each reference,
- for each egg, use the ratio of the number of hits against the salmonella reference to the number of hits for chicken reference as a measure on the rottenness of that egg.
- print out the rottenness of each egg, sorted from freshest to most rotten.
In this assignment, you'll be using the rBLAST
library to perform alignment. You will need BLAST installed on your laptop. Some functions I found helpful were dir
, startsWith
, and order
. There are many warnings and errors generated when this code runs, which is annoying. I found the 'silent' flag in the predict
function to be helpful, as well as the suppressWarnings
function.
Keep best practices in mind, i.e., use functions, comment your code, use good names for variables and functions.
Submit your code by May 3rd, 2023, 23:55 PM. Late assignments will not be accepted!