Race and genetics

I think your view is not that of modern genetics: Automatic statistical techniques find clusters (edit: I mean here: possibly overlapping groupings of genetic data) which correspond very roughly to commonly used racial categories.

Look at the graph in the following post in which automatic, statistical techniques were used to visualize variations in human genetic diversity:

Or if you want something peer reviewed, try this:
“Despite the low average level of between-group variation, clusters recently inferred from multilocus genetic data coincide closely with groups defined by self-identified race or continental ancestry.”

Obviously, when you attempt to cut up a complex dataset there can be some things that don’t fit neatly into any of your categories, and there may be multiple ways of categorizing the same data. In that sense, race (and any other categorization) is purely a social construct. On the other hand, those categorizations that are provided by automatic, apolitical statistical techniques which are ‘optimal’ in some formal sense correspond roughly to everyday racial categories. In that sense, race seems to be a biological reality.

1 Like

No, taking that as evidence for racial clustering is a mistake; see for instance this discussion by Bolnick. In short, while the techniques may have been automatic, they were based around taking some number of clusters as an assumption and so couldn’t help but find them somewhere.

If you keep that in mind, you can see how the same chart fits as well or better with everyone on a single continuum or genetic cline, and from what I can tell that’s the current consensus. You can certainly cut that distribution into any number of pieces, but like MarjaE suggested, the divisions will be essentially arbitrary.

Also, this is all very off-topic for this thread.

1 Like

I don’t really see how specifying the number of clusters makes a difference to the fact that the resulting clusters coincide with commonly used racial categories.

If the hypothesis that race has no basis in biological reality were true, then no matter what number clusters you specify a-priori, the result would not correspond to common racial categories. Note that I never said that our everyday racial categories are optimal in any sense of the word, just that they have a basis in non-socially constructed reality.

The text you’re linking seems to point out a feature of a statistical technique that allows you to choose how finely-grained you want your clustering to be. I agree that this does imply that clusters are not set in stone, but it doesn’t mean the clusters themselves are arbitrary.

I find the following argument useful as a rule of thumb, and its the reason I didn’t read the text you linked in detail:

By the same argument you’re referencing, we could say that culture does not exist, never mind that we can statistically identify clusters of commonly held beliefs, values, common languages, common types of cultural expression, etc. The result of any clustering (automatic or manual) depends on the number of clusters that you ask for. Does the US have one culture? Does it have twelve? More or less? If there is no clear answer, does this mean the concept of US culture has no basis in reality?

Another tack is: If this argument were true, this would apply to ALL categorization, and not just racial categorization. Does this mean every scientist who studies clustering is misguided?

1 Like

No. You can use statistical analysis to show some things actually do come in separate or semi-separate clusters, as for instance is the basis for biological taxonomy. The point is that no such analysis was done here; the possibility of a single cline was never considered.

So sure, the resulting groups may coincide with common racial categories, though as Bolnick discusses not always so well as presented. But it doesn’t change they aren’t actually clusters they way they’re assumed, they may just be sections of a continuum. There is obviously variation, but it would take more to show the splits aren’t arbitrary, and in this case they seem to be.

But hey, Bolnick is a lot more knowledgeable than I am, and explains this a lot better. If you are actually interested in what modern genetics supports, you’d be better reading her chapter than listening to me. And if you’d rather trust an analogy from a blog post than that, what are the odds you’d listen to me anyway?

1 Like

I understand your point, and I think we are arguing slightly different things here:

My claim is that the common, non-scientific concept of race, that is, the concept of race as it is commonly understood among the general population has a basis in biological reality. It corresponds, however fuzzy, to some genetic truth.

Your claim is that based on distribution of genotypes, the term “race” is not appropriate from the perspective of biological taxonomy, because the clusters that can be extracted from the dataset don’t meet the necessary criteria.

I have no issue with your claim and what you are saying seems to support mine.

My claim was straightforward: that human variation appears to be a cline; that the existence of genetic clustering, something you presented as supported by modern genetics, is actually not. So no, that doesn’t support what you were saying, and no, it’s still not on topic here. :unamused:

1 Like

To nitpick: In order to do that you have to define what you mean by “cluster”. The concept of cluster is itself fuzzy, so any statistical technique that identifies clusters, or checks whether a dataset is clustered will require some assumptions to do it’s job.

E.g., in the following 1-dimensional data set how many clusters are there?


Depending on the algorithm you use, and the parameters you choose, you could find one, two or three clusters in the above set.

My apologies, I used the term “cluster” in the sense that I’m familiar with from CS (edit: and - on reflection - also with the definition on wikipedia), which roughly corresponds to “the output of a clustering algorithm”. I didn’t mean to imply non-overlap, distance between clusters or anything like that. I understand that this use is maybe non-standard and possibly misleading (I will edit the above post to clarify)

Your annoyance at the off-topic discussion is somewhat disingenuous given that we’ve equally participated in it. If you want the last word, just say it.

Well, how many clusters do you find in the following 1-dimensional data set?


This is what a genetic cline looks like, and from all evidence what human variation looks like. The type of algorithm that was used to generate the chart would be happy to cut it into two, three, or more pieces. Are you actually saying you’d describe it as clustered data? Because I know lots of CS people, and they generally would not.

I suppose being annoyed at the off-topic discussion is disingenuous, sure, though I do think it belongs elsewhere or nowhere. As far as me being annoyed, it’s really more that you would talk about what modern evidence shows, explicitly refuse to read through a discussion of modern evidence, and then pretend it supports what you were saying anyway.


I think you were right to call me out on my use of the term “cluster”. I expressed myself badly (somewhat influenced by the nature article, which uses similar language), and I appreciate the feedback. I have also read through the discussion you linked in the meantime: it didn’t challenge any beliefs I held.

I still hold to my original point which was meant to be “commonly used racial categories have some basis in biological reality”, i.e., they are better predictors of genetic similarity then if you group people randomly. Any discussion of “clustering” was supposed to be in the service of making that point. In no way did I mean to imply that these clusters are non-contiguous, my only point was that they express similarity information. The blog-post I linked about culture may be of interest.

Also, feel free to ask your CS people if it is weird to call a partition provided by a clustering algorithm “cluster”, regardless of whether it is contiguous with another cluster or not. (Or never mind, from wikipedia: “clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters)”)

First, modern genetics is redundant.

Second, I recall the consensus being that first, there is more diversity within each group than between groups, and second, that there is more diversity between groups in Africa than between groups in the rest of the world.

Third, I recall recent discoveries showing that the few exceptions such as Neanderthal genes are found in all groups in the rest of the world.

Fourth, none of this could fit all traditional racial categories because there are inconsistencies among traditional racial categories, such as whether northern and southern Asian populations are grouped into one or two categories.

P.S. And fifth, there’s a long history of people being racist against what were obviously not racial groups.


Only if you consider Mendel “modern.”


The claim about diversity within group vs between groups is true, but only when talking about individual loci. When you look at multiple loci and take into account correlations, then “racially different” populations start to cluster: Check out the diagram here, and look at this. See also the analogy to culture here:

“It should be terribly obvious that almost all variation in people’s cultural traits is within-culture rather than between-culture. Do you play the piano? Speak Chinese? Eat meat? Vote straight Libertarian? Have gay sex? Go to the zoo? Certainly there is more variation among individuals within California in all of these areas than there is between the average Californian and the average New Yorker.”
Does this mean it doesn’t make sense to speak of California and New York culture?

I’m not sure how meaningful your third point is. Regarding your fourth point, the inconsistency between traditional racial categories, I’m not suggesting that ALL racial categories are equally meaningful or good or that we are currently using very good ones. What I’m suggesting is that the everyday racial categories we use are a better clustering of the data than you would get if you randomly assigned people to groups. In this respect, racial categories have a degree of biological reality (i.e., they aren’t freely constructed).

Regarding your other point about the possibility of subdividing categories: If you distinguish a downpour from a drizzle, yet I call both of these rain, is this evidence that weather is not real? Since I can further subdivide US culture into northern US culture and southern US culture, is this evidence that culture is not real? “Black” could be considered a racial category, as could be “African American”. The latter is more precise than the former, yet I would assume that both contain some level of information that would allow me to make certain predictions about genetics better than chance. Categories don’t have to be clear cut or non-overlapping to be meaningful. I’m not arguing that “race” is a good term for commonly used racial categories, and its quite possible that commonly used racial categories don’t qualify as “races” in some strict biological sense. My argument is that our common-use racial taxonomies do cluster the genetic data better than just randomly putting people into groups.

Regarding the fifth point, again I’m not saying that all racial categories invented by humans make sense, just that commonly used categories have some “genetic reality” (i.e., more than assigning people to groups randomly).

on a purely technical level, you are wrong. there are clustering techniques which do estimate the number of clusters (and estimating it at as 1 corresponds to a single cline). it’s a harder problem, and not quite as satisfactorily solved, but it has been done.

also these clustering techniques generally assume something like a number of separated proto-populations (as would be formed by population bottlenecks or founder effects) which then admix together to produce what we observe today. so, in a sense, they really DO assume that modern populations are “sections of a continuum”; each subject has a point estimate corresponding to how much of each proto-population’s alleles is present in his or her genome. the clustering is on hypothetical populations, not observed subjects.

Except that’s more or less what I said; the problem with the sources that were linked is that such techniques were not used. Again, this is discussed in the chapter I linked.

Yes, he was born in 1822.

Unfortunately, the link doesn’t work.

Huh! If you have some other source for things like this, I was linking Individual Ancestry Inference and the Reification of Race as a Biological Phenomenon. Looking now it’s apparently available when I’m signed into google and not otherwise. I’d seen it described as freely accessible and didn’t know to check if it depended on something like that. My apologies.

No not really. It has been a while.

again, the software cited, STRUCTURE, does have the capability to infer the “best” number of clusters. it’s not terribly good at it, but it is there. i know this because i’ve used it.

and there is, nonetheless, the remarkable coincidence that if you do assume clusters (which really just correspond to hypothetical isolated human populations in the past) and standard allele inheritance models on neutral genes (i.e. neither “positive” nor “negative”), you just happen to get predictions which correspond to national borders within Europe.

1 Like