Wednesday, September 18, 2013

Genes, and pseudogenes, and junk DNA! Oh, my!

I'd like to tack on some of my thoughts to Steve's post on genomics.

Keep in mind I'm simplifying the science since it may be overwhelming for a wider audience. However, if someone wants to discuss specifics, I'm not at all averse to discussing these topics in more detail if I can.

  1. The Genomes OnLine Database (GOLD) is the main online database for all the genomes that have been sequenced in the world. It's meant to be comprehensive. There are over 1300 species that have had their genomes sequenced, and a good deal more in the pipeline. Genomes that have been sequenced include humans (2004) and chimps (2005), but also dogs, cats, mice, rats, fruit flies, cows, etc. Overall, however, as Steve pointed out, this is but a fraction of all the available organisms in the world.

  2. As an aside, scientists have tried to sequence extinct genomes like the Neanderthal genome. At least as I understand it, the primary problem is we don't actually have a complete Neanderthal genome to look at. What we have is bits and pieces that are fossilized, very old, of poor quality. Not to mention contaminated by various microorganisms like bacteria. So scientists have had to mix and match with other Neanderthals, perhaps make inferences as to what's likely, and try to stitch it all together into one complete genome. What they sequenced is more like a Neanderfrankenthal genome.

  3. According to this article (2008):

    Researchers have carried out the largest study of differences between human and chimpanzee genomes, identifying regions that have been duplicated or lost during evolution of the two lineages. The study, published in Genome Research, is the first to compare many human and chimpanzee genomes in the same fashion....

    The team looked at genomes of 30 chimpanzees and 30 humans: a direct comparison of this scale or type has not been carried out before....

    The project used DNA samples from 30 chimpanzees (29 from W Africa, one from E Africa): the chimpanzee reference was produced using DNA from Clint, the chimpanzee whose DNA was used for the genome sequence.

    Human DNA samples were obtained from following participants: ten Yoruba (Ibadan, Nigeria), ten Biaka rainforest hunter-gatherers (Central African Republic) and ten Mbuti rainforest hunter-gatherers (Democratic Republic of Congo). The human reference is a European-American male from the HapMap Project (NA10852).

    It's possible to have sample selection bias. It's possible the subjects chosen aren't representative of the general population, and so whatever outcomes or results a study finds isn't necessarily generalizable to the entire population. Are these 30 chimps representative of the general population of chimps, and are these 30 humans representative of the general population of humans?

    It's possible to have other biases like sample size bias as well. Are 30 chimps and 30 humans enough to draw the sorts of conclusions the study wants to draw about the human-chimp genomes?

    There are many other possible biases a study can have, intentionally or unintentionally, known or unknown to the authors.

    However, since I'm not an expert in the field, I don't know if these questions are relevant and worth asking here. Or, if they are, whether they have been adequately addressed.

  4. The percentages between human and chimp genomes can differ depending on which expert we ask. For example, someone like Richard Buggs would say around 70%, while Todd Wood would say around 98%.

    I'm simplifying and potentially oversimplifying here. I think people will get different figures depending on what and how they measure various genomes. For example, say we compare Robert Heinlein's Citizen of the Galaxy with Rudyard Kipling's Kim. Heinlein based his scifi novel in large part on Kipling's. If we compare plots, the plots are very similar, with the main difference being one takes place in the 1800s, whereas the other takes place sometime in the distant future. If we compare language, we might say both are written in English, which is very similar, but Kim is 19th century Britspeak, whereas Citizen of the Galaxy is mid-20th century American. If we compare word count or page count, then maybe Citizen of the Galaxy is like say 200 pages, whereas Kim is 250 pages given the same font size and so forth. If we compare letters, then we can say they each use 26 letters of the alphabet. And so on.

    Anyway, there are many ways to line up and compare the books, and we can get different figures depending on what we're looking at and how we're looking and so forth. This analogy has significant limitations, of course, but that's the general gist of it from a bird's eye view, I think.

  5. Nevertheless, 98% is the mainstream figure.

    a. This figure originated way back in the 1970s by a professor named Allan Wilson and his PhD student Mary-Claire King. They concluded humans and chimps are genetically 99% similar. From what I understand, several modern studies have more or less confirmed their study, although some modern studies say closer to 98%. But the figure is obviously still very high.

    b. However, it's worth noting Wilson and King didn't look at human or chimp DNA as such. (Actually, I don't even know if they would have had the means or technology to do so back then; or if they did whether it was feasible.) Rather they primarily looked at proteins (and their constituent amino acids), which are the products of DNA.

    c. It's also worth noting Wilson and King, along with many though not all secular scientists in relevant fields today, didn't think the similarities between the human-chimp genetic sequences in and of themselves was necessarily as significant as other factors. Rather they thought it was likewise about many other things like gene regulation, for instance.

  6. In fact, even a dyed-in-the-wool secularist like Jerry Coyne accepts the 98% figure (he says 98.5%) but argues there's more than meets the eye with the figure. Here's an excerpt from the section titled "Our Genetic Heritage" from Chapter 8 "What About Us?" of his book Why Evolution Is True (emphasis in the original):

    [R]ecent work shows that our genetic resemblance to our evolutionary cousins is not quite as close as we thought. Consider this. A 1.5 percent difference in protein sequence means that when we line up the same protein (say, hemoglobin) of humans and chimps, on average we'll see a difference at just one out of every hundred amino acids. But proteins are typically composed of several hundred amino acids. So a 1.5 percent difference in a protein three hundred amino acids long translates into about four differences in the total protein sequence. (To use an analogy, if you change only 1 percent of the letters on this page, you will alter far more than 1 percent of the sentences.) That oft-quoted 1.5 percent difference between ourselves and chimps, then, is really larger than it looks: a lot more than 1.5 percent of our proteins will differ by at least one amino acid from the sequence in chimps. And since proteins are essential for building and maintaining our bodies, a single difference can have substantial effects.

    Now that we've finally sequenced the genomes of both chimp and human, we can see directly that more than 80 percent of all the proteins shared by the two species differ in at least one amino acid. Since our genomes have about 25,000 protein-making genes, that translates to a difference in the sequence of more than 20,000 of them. That's not a trivial divergence. Obviously, more than a few genes distinguish us. And molecular evolutionists have recently found that humans and chimps differ not only in the sequence of genes, but also in the presence of genes. More than 6 percent of genes found in humans simply aren't found in any form in chimpanzees. There are over fourteen hundred novel genes expressed in humans but not in chimps. We also differ from chimps in the number of copies of many genes that we do share. The salivary enzyme amylase, for example, acts in the mouth to break down starch into digestible sugar. Chimps have but a single copy of the gene, while individual humans have between two and sixteen, with an average of six copies. This difference probably resulted from natural selection to help us digest our food, as the ancestral human diet was probably much richer in starch than that of fruit-eating apes.

    Putting this together, we see that the genetic divergence between ourselves and chimpanzees comes in several forms - changes not only in the proteins produced by genes, but also in the presence or absence of genes, the number of gene copies, and when and where genes are expressed during development. We can no longer claim that "humanness" rests on only one type of mutation, or changes in only a few key genes. But this is not really surprising if you think about the many traits that distinguish us from our closest relatives. There are differences not only in anatomy, but also in physiology (we are the sweatiest of apes, and the only ape whose females have concealed ovulation), behavior (humans pair-bond and other apes do not), language, and brain size and configuration (surely there must be many differences in how the neurons in our brains are hooked up). Despite our general resemblance to our primate cousins, then, evolving a human from an apelike ancestor probably required substantial genetic change.

    Can we say anything about the specific genes that did make us human? Right now, not very much. Using genomic "scans" that compare the entire DNA sequence of chimps and humans, we can pick out classes of genes that have evolved rapidly on the human branch of our divergence. These happen to include genes involved in the immune system, gamete formation, cell death, and, most intriguingly, sensory perception and nerve formation. But it's a different matter entirely to zero in on a single gene and demonstrate that mutations in that gene actually produced human/chimp differences. There are "candidate" genes of this sort, including one (FOXP2) that might have been involved in the appearance of human speech, but the evidence is inconclusive. And it might always remain so. Conclusive proof that a given gene causes human/chimp differences requires moving the gene from one species to another and seeing what difference it makes, and that's not the kind of experiment anyone would want to try.

  7. How do we make an organism like a mammal via genes? Basically, and this will be very much simplified, mammals (and I'll include humans here at least for illustrative purposes) are made of and made by proteins. (Proteins can be further subdivided into amino acids. There are over 300 amino acids in existence, but proteins in primates and other mammals are only made up of about 20 amino acids.)

    All the information needed to make all the proteins in our bodies is encoded in our DNA. DNA makes something called RNA. RNA in turn makes proteins. Thus we go unidirectionally from DNA to RNA to proteins to, eventually, well, organism. This is known as the Central Dogma of molecular biology. See here for more information. (BTW, this is an oversimplified version of the Central Dogma. Plus, it's known the central dogma isn't, in fact, a dogma; it has its exceptions. Likewise, there have been challenges to the Central Dogma. For example, see here.)

    A standard definition for a gene is a particular region of DNA that codes for proteins (via RNA). This brings us back full circle.

  8. Humans have about 25,000 genes. But only about 2% of our genes are what's known as protein-coding genes. The rest (98%) is considered non-coding aka "junk DNA".

  9. Pseudogenes are a significant chunk of this non-coding "junk DNA."

    a. Neo-Darwinists argue pseudogenes are vestigial genes similar to vestigial organs (and in fact they argue vestigial genes can sometimes correspond to vestigial organs). They argue pseudogenes used to code proteins in the past, but no longer code proteins in the present. As such, pseudogenes no longer function. However, neo-Darwinists argue, humans and chimps should each have similar pseudogenes in similar sequences in similar places, and by inference these same pseudogenes should be traceable back to a common ancestor.

    b. All this has been challenged. For starters, junk DNA is not truly junk in the sense of useless or without function. Not all junk DNA may directly code proteins, but junk DNA serves a lot of other functions such as all the activities involved in gene regulation. For example, see here, here, here, and here.

    c. One of the more prominent challenges has been the ENCODE project. ENCODE concluded that over 80% of our genome actually does have various functions, thereby dispelling the whole junk DNA including pseudogenes myth.

    Of course, neo-Darwinists have been highly critical of ENCODE. This includes people like Coyne, the Panda's Thumb, and especially Larry Moran. Maybe some of their criticisms are correct to some degree, but it doesn't mean they're right about the main point: even if 80% is too high, it doesn't mean junk DNA is function-less.

    d. Moreover, here is James Shapiro talking about ENCODE:

    In other words, the old idea of the genome as a string of genes interspersed with unimportant noncoding DNA is no longer tenable. Many eminent scientists had opined that the noncoding DNA, much of it repeated at many different locations, is nothing more than "junk DNA." ENCODE revealed that most (and probably just about all) of this noncoding and repetitive DNA contained essential regulatory information. Moreover, much of it was also copied into RNA with additional but still unknown functions....

    In 2005, I published two articles on the functional importance of repetitive DNA with Rick von Sternberg. The major article was entitled "Why repetitive DNA is essential to genome function."

    These articles with Rick are important to me (and to this blog) for two reasons. The first is that shortly after we submitted them, Rick became a momentary celebrity of the Intelligent Design movement. Critics have taken my co-authorship with Rick as an excuse for "guilt-by-association" claims that I have some ID or Creationist agenda, an allegation with no basis in anything I have written.

    The second reason the two articles with Rick are important is because they were, frankly, prescient, anticipating the recent ENCODE results. Our basic idea was that the genome is a highly sophisticated information storage organelle. Just like electronic data storage devices, the genome must be highly formatted by generic (i.e. repeated) signals that make it possible to access the stored information when and where it will be useful....

    So, while Rick's choice of evolutionary philosophies is different from mine, I am grateful to him for doing so much work on a paper that remains a source of justified scientific pride. Thinking of the genome informatically and of mobile DNA as a potent force for genome organization are central to the arguments presented on this blog and in my book.

    e. As for pseudogenes which currently are said to have no function, just because we haven't discovered a function doesn't mean it has no function. The absence of evidence is not evidence of absence.

    f. Furthermore, here's Michael Behe take on pseudogenes from his book Darwin's Black Box:

    [E]ven if pseudogenes have no function, evolution has explained nothing about how pseudogenes arose. In order to make even a pseudo-copy of a gene, a dozen sophisticated proteins are required: to pry apart the two DNA strands, to align the copying machinery at the right place, to stitch the nucleotides together into a string, to insert the pseudocopy back into the DNA, and much more. In his article [Ken] Miller has not told us how any of these functions might have arisen in a Darwinian step-by-step process, nor has he pointed to articles in the scientific literature where we can find the information. He can't do that, because the information is nowhere to be found.

    Folks such as Douglas Futuyma, who cite vestigial organs as evidence of evolution, have the same problem. Futuyma never explains how a real pelvis or eye developed in the first place, so as to be able to give rise to a vestigial organ later on, yet both the functioning organ and the vestigial organ require explanation. I do not purport to understand everything about design or evolution - far from it; I just cannot ignore the evidence for design. If I insert a letter into a photocopier, for instance, and it makes a dozen good copies and one copy that has a couple of large smears on it, I would be wrong to use the smeared copy as evidence that the photocopier arose by chance.

    Arguments based on perceived faults or vestigial genes and organs run the danger of the argument of Diogenes that the progression of seasons shows intelligent design. It is scientifically unsound to make any assumptions of the way things ought to be.

    g. There is some controversy over whether a pseudogene can be "resurrected." Many neo-Darwinists don't think this is possible since it's generally a series of multiple mutations that bring about pseudogenes, so they think it'd be hard to reverse the process and bring the pseudogene back to normal, as it were. Perhaps I'm mistaken, but isn't this tied into the bigger debate over whether Dollo's law is truly irreversible?

    h. On the one hand, secular neo-Darwinists argue, "Biology is the study of complicated things that give the appearance of having been designed for a purpose."

    But on the other hand, given their arguments for pseudogenes and junk DNA, they'd just as well argue, "Biology is the study of complicated things that don't give the appearance of having been designed for a purpose."

    If something appears designed, then it's an argument for neo-Darwinism. If something appears not designed, then it's an argument for neo-Darwinism. This makes one wonder, is neo-Darwinism even falsifiable?

    i. Creationists and intelligence design theorists aren't necessarily allergic to all pseudogenes as such. They can take each pseudogene on a case by case basis. It's not as if all pseudogenes are one and the same. And it's not as if creationists and ID theorists entirely deny all common descent or ancestry, which may be an assumption neo-Darwinists have in the debate. For example, I think many if not most creationists and most if not all ID theorists accept common ancestry between domestic dogs and gray wolves. If pseudogenes exist betwen dogs and gray wolves, then pseudogenes could very well be indicative of common ancestry between dogs and gray wolves. Thus, if a pseudogene is used to evidence common descent or ancestry between, say, humans and chimps, then let's take a look at the particular pseudogene(s) in question.

    j. One of the most commonly cited pseudogene examples to prove common ancestry between humans and chimps is the GULO gene which helps synthesize Vitamin C. Most mammals can synthesize Vitamin C. But humans and chimps can't synthesize Vitamin C. Neo-Darwinists say GULO was inactivated in primates including humans and chimps, since it's present in a similar place in both humans and chimps, and looks similar, and are inactivated in a similar way, therefore evidencing common ancestry. But check out Jonathan McLatchie's post for a possible response.

  10. It's interesting our definition for the "gene" isn't necessarily static. See here for instance. Can we take the most basic terms in genetics for granted?

1 comment:

  1. Thanks, Patrick.

    Just to clarify: my statement about sample selection bias wasn't confined to comparisons between human and chimp genomes, but between human genomes and millions of other species, extant or extinct.

    My point is that if, instead of comparing the human genome to the chimp genome or other primate genomes, we were to compare the human genome to various fauna and flora across the board, there might be striking parallels sequences like the GULO gene, but Darwinians wouldn't draw the same conclusions if something like that turned up when comparing the human genome to a kumquat genome.

    In other words, I'm not suggesting geneticists have undersampled human and chimp genomes. If anything, that's been oversampled, given their prior assumption that we're closely related to chimps.

    Rather, I'm suggesting that geneticists have undersampled non-human genomes across the board. Sample selection bias in terms of comparing the human genome to primate genomes rather than systematically comparing the human genome to a vast range of faunal and floral genomes, extant or extinct.

    I don't have a particular problem with the 98% figure. Of course, I'm not qualified to take issue with it.

    For even if that figure is accurate, even Darwinians admit that there must be more to it than meets the eye, given how human intellectual abilities vastly outstrip chimpanzee abilities.