There is a general rule-of-thumb in science that whenever something looks understandable, it usually is a gross simplification and in reality everything is so complicated that nobody understands anything. Mathematics is an exception and it achieves lucidity only by studying models that are too simple to describe the real world. Often such simplifications are good enough that we can get reasonable results using them, but that does not mean that we understand the issue, or are even conscious how little we understand of it.
This seems to be the case with human blood groups. Most of us, at least i did, think of blood groups as the ABO-system and the Rh-system, where the first has three alleles A, B and O and the second has two alleles Rh+ and Rh- where A, B and Rh+ alleles are dominant and O and Rh+ are recessive. It is nothing like this in reality. Up to 2015 scientists had found 34 human blood group systems. The familiar ABO-system is numbered the first and the RH-system has been given the number 30 in the list of human blood group systems. The ABO-system has much more alleles than three: both A and B have over hundred alleles and O, today denoted as H, has several alleles, though two are most common. There is an article by Laura Cooling, “Blood Groups in Infection and Host Susceptibility” freely available in
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475644/
Several alleles of many blood groups are present in high frequencies in the same population. For instance, Finns have 31% O, 44% A+, 17% B+ and 8% AB in the ABO-system. Though there are over 100 alleles of type A and B and two main alleles of O (plus some rare ones), we can treat this system as being composed of three alleles; A, B and H. If the only effect influencing the distribution is random mating and alleles A, B and H have the probabilities x, y and z respectively, then AA, BB and HH have the probabilities x2, y2, z2 and AH, BH and AB have the probabilities 2xz, 2yz and 2xy. Solving the equations gives z=√O, x=√(O+A)-√O, y=√(O+B)-√O. We can see if the system is in this Hardy-Weinberg equilibrium by checking if AB=2xy. For Finns we get x=0.309, y=0.136, z=0.557% and for checking we calculate 2xy=0.084, which agrees with AB=0.08. Clearly, this population is in Hardy-Weinberg equilibrium for ABO and allele H has frequency 55.7%, but A and B also have relatively high frequencies, 30.9% and 13.6% respectively.
It is not strange in itself that several alleles can be maintained in the same population in relatively high frequencies. If no allele gives selective advantages the frequencies stay stable and the only puzzle is how the population got the initial distribution of the alleles, but according to Cooling’s article, blood group alleles do have selection advantages: the long article goes through the results showing that many blood groups have different responses to infective diseases. This is also the case with ABO.
As there are selection advantages, we would expect that an advantageous allele should get reach a very high penetration and the less advantageous should disappear or remain in low frequencies. The article reports one such case: the Duffy blood group includes an allele, which offers malaria resistance. Cooling writes:
“Experiments in humans confirmed that many blacks not only were resistant to P. vivax but also displayed resistance to Plasmodium knowlesi and P. cynomolgi as well. Serologic studies subsequently showed that blacks, especially those residing in areas where P. vivax is endemic, had a high incidence of the Fy(a−b−) phenotype. The most recent global maps show the highest incidence of the Duffy-null phenotype in sub-Saharan Africa, where frequencies of the Fy(a−b−) phenotype reach 98 to 100% in western, central, and southeastern Africa. The high incidence of the Fy-null phenotype coincides with a low incidence of P. vivax (0.6%).”
This is as one would expect if there is natural selection of an advantageous allele: the advantageous allele reaches penetration of 98-100%. According to the Hardy-Weinberg equations, selective advantage should finally lead to fixation if the population size is finite (as it always is, but fixation takes time, so read finite as small). In this example of the Duffy blood group there probably would be fixation if there were no gene flow from other areas.
Unlike in this Duffy example, in most of the reported cases alleles of blood group systems do not get fixed or rise to very high percentages. For instance, when discussing the ABO-system, the article states that O blood type people (with two H-alleles) are more susceptible to severe forms of cholera and get serious norovirus diarrhea more often, but that they are relatively resistant to SARS and to severe malaria. However, the H-allele of the ABO-system has not disappeared in cholera areas. Obviously this is not a simple case of natural selection of an advantageous allele.
Cursive reading of the long article gives the impression that blood groups do have relevance to susceptibility to bacteria and viruses. Many cases when the selective advantage of an allele is best demonstrated deal with malaria resistance. It may be relevant that malaria affects not only humans but African apes (chimpanzee, bonobo and gorilla) and humans have had a very long contact with this disease. For most blood groups and other diseases the results are often contradictory and there is little conclusive evidence that the observed frequencies of blood groups are a result of natural selection driven by these differences in susceptibility to infective diseases. Indeed, it is known that in many cases the observed frequencies were caused by migrations and frequencies of blood types have changed in recorded history as a result of migrations and possibly of diseases.
We get to the problem of this post. Why the frequencies differ from population to population and especially if this is a case of so called balancing selection?
Let us focus only on the ABO blood group system as there is good data on geographic distributions of phenomenological blood types. Allele B has an eastern distribution, which can be explained by diffusion from Central Asia. Allele A is found in Northern Europe, especially in Saami, some places in Western and Central Europe, but also in some Australian aboriginal tribes and in Blackfoot (and some other) Indians. Allele H is the most common everywhere and it reached 100% in Central and South America before Europeans moved there. Amerinds (=American Indians talking languages of the Amerind group) who migrated the these regions from Siberia apparently lacked A and B alleles. Northern American Indians had A but lacked B.
A natural explanation is that when Amerinds moved from Siberia to Americas some 19-17,000 years ago there was no A or B alleles in Siberia. In this way it might appear that O must be the oldest blood type and alleles A and B came later. This is what was initially thought, but it was shown that O has appeared in humans in at least two separate mutations and that A is the oldest allele. However, more recent research claims that all alleles A, B and H are 20 million years old:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3494955/
The ABO-system can be found in many primates. The article claims it is not because of convergent evolution, which in this case would mean that same mutations have happened in different species, but that humans have directly inherited ABO blood groups from earlier primates and that here we have a case of balancing selection.
What is balancing selection?
The word balancing selection only restates what is observed: the phenomenon is caused by some mechanism, which maintains several alleles in high frequencies for a long time. It is necessary to specify which mechanism of balancing selection is proposed. The most common mechanism is heterozygote advantage.
In a two allele system it leads to a ratio of homozygotes to heterozygotes that is different from the Hardy-Weinberg equilibrium. This can be tested easily for the ABO-system from the observed blood type frequencies. For Finns the results agreed with the equilibrium, as show above. I tried to find published results that did not agree with the equilibrium and found this one
https://www.sciencedirect.com/science/article/pii/S1110863011000796
This paper gives phenomenological ABO frequencies for Lagos, Nigeria, as A 23.1%, B 21.3% AB 2.7% O 52.9%. From these we get allele frequencies (x, y, z) = (14.4%, 13.4%, 72.7%). The paper notes that the measured results differ significantly from the Hardy-Weinberg law and indeed 2xy=3.87% is not exactly AB=2.7%, but is it significant enough to suggest heterozygote advantage?
The difference may be caused by cancellation of the most significant numbers when calculating x=√(O+A)-√O and y=√(O+B)-√O. We can check if the system is in equilibrium in another way. The three allele system ABO can be reduced into a two allele system by grouping two alleles together. Let us group A and B together into allele D. Then DD+DH=A+B, HH=O, DD=AA+BB+AB. If the (D, H) system is in Hardy-Weinberg equilibrium, the frequency r of D should be r=√(O+A+B)-√O=0.259, so DD should be 0.0671. We can calculate DD from x=0.144, y=0.134 and AB=0.027. That yields DD=x2+y2+AB=0.0657. It is almost what the equilibrium predicts. If there is a heterozygote advantage of DH, there must be homozygote disadvantage for DD. There is a very small difference, 0.0657 is a bit smaller than what it should be, 0.0671, but this difference is so small that it can be better explained e.g. by a small deviation from random mating. The difference is significant in the meaning intended in the paper but it is not significant enough to suggest heterozygote advantage.
Thus, this balancing selection is not heterozygote advantage.
The second most common mechanism of balancing selection is frequency dependent selection and one for of it has been proposed as the correct mechanism: negative frequency dependent selection. This mechanism is expected to work in the following way. Bacteria and viruses attack a population, where some allele has high penetration. These disease vectors respond differently to different alleles and reduce the number of one or some alleles. As a result, less common alleles become more frequent. Next time other disease vectors attack new popular alleles and again the frequencies change.
Another form of balancing selection is that each allele has advantages and disadvantages depending on the place or age or a changing environment. Natural selection first favors one, then another. Age is probably not the balancing issue since most deaths from diseases happen in small children and there seems to be no balancing advantage of ABO-alleles in the later life.
Negative frequency dependent selection or some other form of balancing selection can indeed keep many alleles in large frequencies in the same population, but I think that a more natural explanation is a combination of natural selection, the founder effect and drift and mixing of populations through migration.
I suggest a scenario where humans left Africa with all three alleles. The migration to the east passed through malaria areas and selective pressure boosted the frequency of the advantageous H allele to 100% or near it. An offshoot of this population moved to Siberia and from there to the Americas. This would be the reason why Amerinds had only the H allele. This was natural selection, not balancing selection.
Hunter-gatherers migrated to Europe in two waves; Aurignacian and Gravettian. Some mechanism removed the mtDNA allele M from the European gene pool during the last Ice Age. The reason could be selective pressure, but there was a founder effect: the population was reduced to a very low level during a cold phase. The Western European Hunter-gatherers lost H and B alleles. I propose that this was a case of a founder effect, drift and fixation in a small population.
Northern American Indians have allele A. Of course A could have gone extinct while Amerinds passed to the Central America, but I think that a more likely reason is that there was a later migration through the land bridge over the Bering Strait, or possibly they got the allele A even later from Western Europe, not excluding even the Vikings as one source. Some Australian aborigines have high frequency of A allele. Aborigines migrated to the continent about 60,000 years ago, but there must have been later migrations as Aborigines have art as all human populations, but art first appeared 40,000-30,000 years ago in Europe. Allele A may have arrived to the continent during later migrations. Allele B has spread through human migrations from Central Asia in rather recent times, as Amerinds lack it. In some way H moved to Europe. The founding population of American Indians is also the founding population of Indo-Europeans and Indo-European migrations took H to Europe around 3000 BC, but blood type O is high also in the Middle East, thus it probably arrived to Europe already with Anatolian farmers. All of these are cases of migration.
Thus, I do not think balancing selection is needed as an explanation for the ABO blood group allele frequency distributions and initially I am skeptical of balancing selection as an explanation for any blood group distributions, but the time will show. Of course, my reluctance towards balancing selection is that the most common mechanism of balancing selection is heterozygote advantage and I am kind of skeptical of it after spending some time with those equations.