Without Small Data, AI in Health Care Contributes to Disparities

Artificial intelligence systems in health care must be trained on the data of lived experience to prevent bias and disparities

Doctor viewing a patient's brain scan on a screen

Several years ago, I attended an international health care conference, eagerly awaiting the keynote speaker’s talk about a diabetes intervention that targeted people in lower socioeconomic groups of the U.S. He noted how an AI tool enabled researchers and physicians to use pattern recognition to better plan treatments for people with diabetes.

The speaker described the study, the ideas behind it and the methods and results. He also described the typical person who was part of the project: a 55-year-old Black female with a 7th to 8th grade reading level and a body mass index suggesting obesity. This woman, the speaker said, rarely adhered to her normal diabetes treatment plan. This troubled me: whether or not a person adhered to her treatment was reduced to a binary yes or no. And that did not take into consideration her lived experience—the things in her day-to-day life that led to her health problems and her inability to stick to her treatment.

The algorithm rested on data from medications, laboratory tests and diagnosis codes, among other things, and, based on this study, doctors would be delivering health care and designing treatment plans for middle-aged, lower-income Black women without any notion of how feasible those plans would be. Such practices would undoubtedly add to health disparities and health inequity.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


As we continue to build and use AI in health care, if we want true equity in access, delivery and outcomes, we need a more holistic approach throughout the health care process and ecosystem. AI developers must come from diverse backgrounds to achieve this, and they will need to train their systems on “small data”—information about human experience, choices, knowledge and, more broadly, the social determinants of health. The clinical errors that we will avoid in doing so will save money, shrink stigma and lead to better lives.

To me, one of the fundamental flaws of artificial intelligence in health care is its overreliance on big data, such as medical records, imaging and biomarker values, while ignoring the small data. Yet these small data are crucial to understanding whether people can access health care, as well as how it is delivered, and whether people can adhere to treatment plans. It's the missing component in the push to bring AI into every facet of medicine, and without it, AI will not only continue to be biased, it will promote bias.

Holistic approaches to AI development in health care can happen at any point; lived-experience data can inform early stages like problem definition, data acquisition, curation and preparation stages, intermediate work like model development and training, and the final step of results interpretation.

For example, if the AI diabetes model, based on a platform called R, had been trained on small data, it would have known that some participants needed to travel by bus or train for more than an hour to get to a medical center, while others worked jobs that made it difficult to get to the doctor during business hours. The model could have accounted for food deserts, which limit access to nutritious foods and physical activity opportunities, as food insecurity is more common in people with diabetes (16 percent) than in those without (9 percent).

These factors are part of socioeconomic status; this is more than income, and includes social class, educational attainment as well as opportunities and privileges afforded to people in our society. A better approach would have meant  including data that captures or considers the social determinants of health along with health equity. These data points could include economic stability, neighborhood or environment attributes, social and community context, education access and quality, and health care access and quality.

All this could have given providers and health systems more nuance into why any one woman in the study might not be able to adhere to a regimen that includes many office visits, multiple medications per day, physical activity or community support groups. The treatment protocols could have included longer-acting medications, interventions that don’t require travel and more.

Instead, what we were left with in that talk was that the typical Black woman in the study does not care about her condition and its chronic health implications. Such research results are often interpreted narrowly and are absent of the “whole” life experiences and conditions. Clinical recommendations, then, exclude the social determinants of health for the “typical” patient and are given, reported and recorded without understanding the “how,” as in how does the Black female patient live, work, travel, worship and age. This is profoundly harmful medicine.

Predictive modeling, generative AI and many other technological advances are blasting through public health and life science modeling without small data being baked into the project life cycle. In the case of COVID-19 and pandemic preparedness, people with darker skin were less likely to receive supplemental oxygen and lifesaving treatment than people with lighter skin, because the rapid speed of algorithmic development of pulse oximeters did not take into account that darker skin causes the oximeter to overestimate how much oxygenated blood patients have—and to underestimate how severe a case of COVID-19 is.

Human-machine pairing requires that we all reflect rather than make a rush to judgment or results, and that we ask the critical questions that can inform equity in health decision-making, such as about health care resource allocation, resource utilization and disease management. Algorithmic predictions have been found to account for 4.7 times more health disparities in pain relative to the standard deviation, and has been shown to result in racial biases in cardiology, radiology and nephrology, just to name a few. Model results are not the end of the data work but should be embedded in the algorithmic life cycle.

The need for lived experience data is also a talent problem: Who is doing the data gathering and algorithmic development? Only 5 percent of active physicians in 2018 identified as Black, and about 6 percent identified as Hispanic or Latine. Doctors who look like their patients, and have some understanding of the communities where they practice, are more likely to ask about the things that become small data.

The same goes for the people who build AI platforms; science and engineering education has dropped among the same groups, as well as American Indians or Alaska Natives. We must bring more people from diverse groups into AI development, use and results interpretation.

How to address this is layered. In employment, people of color can be invisible but present, absent or unheard in data work; I talk about this in my book Leveraging Intersectionality: Seeing and Not Seeing. Organizations must be held accountable for the systems that they use or create; they must foster inclusive talent as well as leadership. They must be intentional in recruitment and retention of people of color and in understanding the organizational experiences that people of color have.

The small data paradigm in AI can serve to unpack lived experience. Otherwise, bias is coded in the data sets that do not represent truth, coding that embeds erasure of human context and counting that informs our interpretation—ultimately amplifying bias in “typical” patients’ lives. The data problem points to a talent problem, both at the clinical and technological levels. The development of such systems can’t be binary, like the AI in the diabetes study. Neither can the “typical” patient being deemed adherent or nonadherent be accepted as the final version of truth; the inequities in care must be accounted for.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.