Skip to main content
SciNet
  • Home
  • All Courses
  • Calendar
  • Certificates
  • SciNet
    Main Site Documentation my.SciNet
  • CCDB
  • More
Close
Toggle search input
English
English Français
You are currently using guest access
Log in
SciNet
Home All Courses Calendar Certificates SciNet Collapse Expand
Main Site Documentation my.SciNet
CCDB
Expand all Collapse all
  1. Dashboard
  2. BCH2203 - Winter 2022
  3. 6. Classify downloaded protein structures

6. Classify downloaded protein structures

Completion requirements
Opened: Sunday, 10 April 2022, 12:00 AM
Due: Sunday, 17 April 2022, 11:59 PM

In this assignment, you'll be trying out the k-means methods on protein structures.  The proteins structures to use correspond to the human hemoglobin protein, as found on the PDB at rcsb.org, by searching for hemoglobin, and selecting "Homo sapiens" and the best refinement resolution (<1.5 Angstrom).  This results in the following 44 PDBIDs:

6LCX, 6LCW, 6KAO, 6KAP, 6L5V, 6KA9, 6KAI, 6KAH, 6KAE, 3S66, 7JY3,
2D5Z, 2W72, 7DY4, 7DY3, 2DN2, 2DN1, 2DN3, 1J40, 1J41, 1IRD, 5QR5,
5QQR, 5QR1, 5QQY, 4Y08, 6TX8, 4HF3, 5O10, 3ZOO, 5TY3, 3TEM, 6YA6,
3UVC, 4GR8, 7N5O, 6H5W, 6E4F, 4QC4, 6J6M, 7P8X, 7MU3, 3U9W, 4AJX

Your task is to write a script that

  • Uses Biopython's Bio.PDB module to download the protein structure with these PDBIDs
  • For each, extracts the 3-dimensional positions of all the atoms
  • Computes or determines N, the number of atoms stored in the structure, and M, the mean square displacement of atoms from the center, defined as 
$$ M\ =\ \frac{1}{N} \sum_{i} \left[ (x_{i} -<x>)^{2} +  (y_{i} -<y>)^{2} +  (z_{i} -<z>)^{2}\right]$$  

where

$$ <x> =\ \frac{1}{N} \sum_{i} x_{i},\ <y> =\ \frac{1}{N} \sum_{i} y_{i} ,\ <z> =\ \frac{1}{N} \sum_{i} z_{i} $$  

  • With these 44 pairs of (N,M), performs a k-means clustering with the number of cluster set to 3, 4, 5, and 6.
  • And produces plots of the results. The plots should be scatter plots of M and N, with the colour of each point determined by the cluster number found by the k-means method.

Your script may combine the four plots using subplots if you want. Submit your script and the four plots (or combination thereof) by April 17, 2022 at 23:55 PM.

Contact site support
You are currently using guest access (Log in)
Data retention summary


All content on this website is made available under the Creative Commons Attribution 4.0 International licence, with the exception of all videos which are released under the Creative Commons Attribution-NoDerivatives 4.0 International licence.
Powered by Moodle