Skip to main content

The Flattening of Blackness in Data: What AI Cannot See

By Yanica Faustin, PhD (she/her), Senior Consultant at Advancing Health Equity — March 10th, 2026

I am a health equity researcher who focuses on maternal health inequities within the Black population. I focus on maternal health because I believe you can tell the health of a nation by the health of its birthing people. Mothers are often the canary in the coal mine. When maternal outcomes are poor, they can often reveal deeper structural failures in a society’s health, safety, and care systems.

My work is especially concerned with the relationship between racism, racial stress, and health within the Black population in the United States. And when I say “the Black population,” I want to be clear about what I mean, because this is where the problem begins.

Long before artificial intelligence entered the conversation, our data systems had already decided whose experiences would be visible. AI may be accelerating how decisions get made, but the underlying assumptions about race were baked into the data long ago. In U.S. data systems, Blackness is treated as singular. That assumption is not neutral. It is harmful.

How our Data Systems Flatten Black Identity

Black people living in the U.S. are not a monolith. The Black population is heterogeneous and deeply diverse. It includes Black Americans whose families have lived in this country for generations, Black immigrants from the African continent, and Black immigrants from the Caribbean. These groups share racialization in the U.S., but they do not share identical histories, relationships to Black identity, or experiences of racism. Yet when we collect data, we treat them as if they do.

Most U.S. data systems ask a single question: Are you Black or not? Everyone who checks that box is grouped together. That data is then analyzed, summarized, and used to inform research, policy, clinical guidelines, and public health practice. Entire populations disappear inside that single checkbox. Black migrants, in particular, are rendered statistically invisible.

What makes this especially striking is that we already know how to do better. Asian populations are routinely disaggregated. Latino populations are routinely disaggregated. The system recognizes heterogeneity when it decides it matters. It does not do the same for Black populations.

What the Evidence Already Shows (But the Data Ignores)

This lack of disaggregation matters because although all of these groups are racialized as Black in the U.S., their lived experiences of Blackness are not identical. Black identity is shaped by geography, history, migration, and context. Even within these broad categories, experiences vary widely. Not every Black American has the same experience. Not every Black African immigrant does. Not every Black Caribbean immigrant does. But we are not even at the point of grappling with that level of complexity yet. As a field, we are still stuck on a much more basic question: do we understand that not all Black people experience life in the same way? Health outcomes tell us that the answer is no.

Decades of public health theory help explain why these differences matter. Life course theory teaches us that timing, duration, and accumulation of exposure shape health across the lifespan (Elder, 1998). The weathering hypothesis shows how chronic exposure to racism accelerates biological aging (Geronimus, 1992). Pearlin’s work on chronic stress demonstrates how repeated stressors wear down the body over time (Pearlin, 1989). When we apply these frameworks, it becomes clear why variation within the Black population is not just interesting, but essential to understanding inequities.

Existing quantitative and qualitative research already shows that adverse health outcomes vary by nativity within the Black population (Elo & Culhane, 2010). Black immigrants often experience better health outcomes than Black Americans in the first generation. But that advantage erodes over time (Teitler et al., 2012). The longer individuals live in the U.S., and the longer their exposure to U.S. racism, the worse their health outcomes become (Teitler et al., 2015). By the second generation, outcomes often mirror or exceed those of Black Americans (Ifatunji et al., 2022). In short, the longer your exposure to racism in the U.S., the greater the health toll. These patterns are not random. They are signals. And yet our data systems are not designed to capture them.

From Data to AI: Automating the “Flattening”

When we flatten the Black population into a single category, we lose the ability to see how identity, discrimination, stress, and exposure operate differently across groups. We miss how people conceptualize their racial identity differently. We miss how they describe and experience discrimination differently. And we miss how their bodies respond to those experiences differently. The data cannot tell us what we have not allowed it to ask.

This brings us to the growing conversation about artificial intelligence. We keep hearing that AI is the future. That AI will solve problems. That AI can help us analyze data, predict risk, and improve health outcomes. There is also no shortage of discussion about AI and bias, about racism in algorithms, and about the idea that AI is only as good as the data it is trained on.

Here is what I think we are not talking about enough. The problem is not simply that AI is biased. The problem is that AI is being asked to operate on data systems that were never built to see certain populations at all.

AI does not question categories. It learns them. If Blackness is flattened at the input level, nuance cannot magically appear at the output. When racial data treats Black identity as singular, AI inherits and operationalizes that assumption. It scales it. It makes it more efficient. And it makes it harder to challenge.

We know how to disaggregate data. We do it for Asian populations. We do it for Latino populations. We recognize heterogeneity when we decide it matters. But we do not do the same for Black populations. That is not an oversight. It is a design flaw. And design flaws have consequences.

Why this Matters for Health Equity

This is not just about representation. Design flaws have consequences. When Black migrants and Black Americans are lumped together, masking effects occur. Average outcomes appear better than they actually are for Black Americans, because populations with different exposure histories are combined. This has real implications for our understanding of the state of racial health inequities, how we allocate resources, and how we justify policy and practice decisions.

Recent data releases continue to show widening racial health inequities (NYC Department of Health and Mental Hygiene, 2025), even in places like New York City, which has one of the largest Black immigrant populations in the country. Yet Black populations are still reported as a single group. What would we see if that data were disaggregated? Would inequities for Black Americans appear even more severe? Would differences between Black migrant groups become clearer? We cannot know, because the system was not designed to let us see.

These data systems were built for administrative convenience and political comfort, not for complexity. And now, as AI becomes embedded in healthcare decision-making, those same limitations are becoming infrastructure.

This is not just about misrepresentation. It affects risk prediction, clinical decision support, resource allocation, and policy justification. Long before AI entered the picture, we were already using incomplete data to shape practice and policy. AI simply accelerates the consequences of those choices.

The Uncomfortable Truth

AI cannot see what our data systems refuse to conceptualize. And until we confront how Blackness has been flattened in our data systems, the future AI is building will continue to reflect that erasure.

Loading...