In this assignment, you'll be using the Blast submodule of Biopython.
The use case is as follows:
Given a number of batches of DNA fragments taken from 12 eggs, we want to know which batches contain salmonella by aligning against a reference salmonella genome and a reference chicken genome.
For the purpose of this assignment, do not get the reference genomes from Genbank, instead, use the following files
- genome.fa for the salmonella genome and
- chromosome1.fa for the
(for the purpose of this assignment, chromosome1.fa is actually only one tenth of one of the chicken's chromosomes).
The data out of the experiment is as follows
- We have 12 batches (one per egg) of roughly 40 DNA fragments.
- Each sample is a 150 bases long.
Goal: find out which of the eggs is/are contaminated with salmonella.
All the data can be found in the directory /scinet/course/bch2203/a4data.zip on the teach cluster, or in the zip file a4data.zip below.
Your task is to use write a script that uses Biopython's Bio.Blast module to build indices for the two reference genomes (which are given as fast files), and then to use blastn to align the fragments with the references and count the matches, all within your Python script. Use the ratio of the number of hits with salmonella for a given egg and the number of hits for chicken as a measure on the rottenness of that egg.
Print the rottenness of the eggs out, sort the eggs from freshest to rotten.
Keep best practices in mind, i.e., use functions, document and comment your code, use good names for variables and functions.
Submit your script by March 17, 2022 at 23:55 PM.
- 10 March 2022, 11:37 PM