Chapter 15 PART 3 - BIOLOGICAL PROBLEMS
15.1 Problem 1 - Count Nucleotides
I have a part of DNA sequence from SARS-CoV2 genome. Can you tell me the counts of A, T, G, C in it?
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTG
CATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG
TTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAG
TTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGA
GCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCT
TCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTGATCCTTATGAAGATTT
TCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGG
CCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTTCATGCACTTTGTCCGAACAACTGGACTTTATTGACACTAAGAGGGG
TGTATACTGCTGCCGTGAACATGAGCATGAAATTGCTTGGTACACGGAACGTTCTGAAAAGAGCTATGAATTGCA
Biological context: This count is helpful if you are designing primers for PCR. The primers from high-GC or high-AT regions do not have the right melting temperature.
15.2 Problem 2 - Reverse Complement
Can you find the reverse complement of the above sequence? The concept of reverse complement is explained below.
15.2.1 Reverse complement
To determine reverse complement of a sequence (let’s say ATGG), you substitute A with T, T with A, G with C and C with G. Therefore, ATGGG will become TACCC after this step.
Then you read the new sequence from the other end. That means the reverse complement of the ATGGG is CCCAT.
Let’s work out two examples to learn about palindromes in biology.
ATGG The reverse complement of ATGG is CCAT. It is not identical to the original sequence. So, ATGG is not a palindrome.
ACGT The reverse complement of ACGT is drumbeats ACGT. So, ACGT is a palindrome.
15.3 Problem 3 - Palindromes
Can you check the sequence for all 4 nucleotide word and print the locations for all 4-nucleotide palindromes?
(Let us say dna[4:8] is a palindrome and so is dna[15:19]. Your answer will be 4, 15.)
Remember palindrome in biology is not the same as palindrome in English.
15.3.1 Palindrome in Biology
A DNA sequence is called a palindrome, it is identical to its reverse complement. Let’s work out two examples to learn about palindromes in biology.
Case 1 - ATGG: The reverse complement of ATGG is CCAT. It is not identical to the original sequence. So, ATGG is not a palindrome.
Case 2 ACGT: The reverse complement of ACGT is drumbeats ACGT. So, ACGT is a palindrome.
Biological context:
Palindromes are fun in English, but biological palindromes mean life and death for bacteria. Bacteria (as simple as you think they are) can read palindromes in DNA sequence. They use a protein called restriction enzyme to do that.
Bacteria saves itself from viruses by chopping off the viral genomes using restriction enzymes. These restriction enzymes can recognize specific palindromes in viral genomes, and cut off DNA at those locations.
When a virus enters a bacteria cell, the bacteria fights back using a protein called “restriction enzyme.” This restriction enzyme recognizes a palindrome inside the virus DNA and chops it off. https://en.wikipedia.org/wiki/Restriction_enzyme
15.4 Problem 4 - Translation
For the fourth problem, you will find the protein sequence from its DNA sequence. This is known as translation in biology.
In cell, translation follows transcription, which makes a copy of DNA into RNA. Since we are doing this in computer, we can skip this intermediate step and check for the protein sequence from DNA itself.
Please find the protein sequence for the following gene. Do this manually first and DM the answer. Then write Python code to solve the same problem.
ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA
Use the DNA codon table from here. Make sure to use single letter symbols for the amino acids in your translation with the stop codon being represented by *
.