In Russia, The Gene Domain Models You

When I first started getting interested in Computational Biology, I had a very simplistic mental model of genes based on the “central dogma of biology”: DNA makes RNA and RNA makes proteins.  I thought that the human genome was just computer code – instead of binary ones and zeros, it was base-four “A” “T” “C” “G”.*.  Therefore, to understand it, just start mapping the genome like the human genome project did and then see the areas that I am interested in: the four billion nucleotides are memory on the hard drive with some of the genome being the operating system and some being installed software.  When the computer is powered on, the ones and zeros are read off of the hard drive and placed into memory – the equivalent would be the DNA being read out of the chromosomes and placed into RNA.

Man, was I wrong.

DNA is not just your operating system and installed programs – it is the complete instructions to make the computer, turn it on, write, compile, and then run the most sophisticated software in the world.  Oh, it also has the instructions to replicate itself and correct errors that might be introduced during replication.  And we can barely figure out how to write Objective C for a phone app…

So carrying the analogy of the human genome to computer, even if we could determine which part of bytes on the hard drive is for building a transistor, for example, we have another problem.  Unlike a computer’s hard drive where each byte is allocated to only one purpose, each nucleotide can be used by none, one, or many genes.  It is a fricken madhouse – a single “C-G” pair might be used in creating a val amino acid in one protein for one gene, frameshifted to creating an ala amino acid in a different protein for another gene, and then be used in the regulation region of yet another gene being read in the reverse direction.

The implication is that it does not appear possible to “build up” and domain model of the genome.  You can’t take a series of nucleotides of DNA like “AAAGGTTCGTAGATGCTAG” and know anything about what it is doing.  Rather, it looks like the domain model has to work in reverse:  Given a gene like TP53, allocate different sections of DNA to the different regions: Promoter, Intons, Exons, UTRs, etc….

From a F# modeling point of view, DNA is certainly an “OR” type:

type DNA = A | T | C| G

With the caveat that we sometimes don’t know if a location is really an A, rather it is not a G or T. Sweeping aside that problem, then individual nucleotides are an “AND” type:

Type Nucleotide = {Index: int; DNA: DNA; etc…}

let nucleotide0 = {Index=0; DNA=C; etc…}

and then gene would look something like this:

let tP53 = {Gene= TP53; Promoter = [nucleotide0; nucleotide1]; etc…}

Note that I am only 1/3 of the way through my genetics class right now, so I might change my understanding next week.  For now, this is my Mental Model 0.2 of the genome.

*side note: some April 4th, I want to sneak into all of the computational biologists offices and steal the A,C,T,G,Us keys from their keyboards.  Pandemonium would reign.

One Response to In Russia, The Gene Domain Models You

  1. Pingback: F# Weekly #8, 2021 – .NET 6 Preview 1 and a methodical approach to looking at F# compile times – Sergey Tihon's Blog

Leave a comment