Will AI make research on humans… less human?

If you’re a human, there’s a very good chance you’ve been involved in human subjects research. Maybe you’ve participated in a clinical trial, completed a survey about your health habits, or took part in a graduate student’s experiment for $20 when you were in college. Or maybe you’ve conducted research yourself as a student or […]

Will AI make research on humans… less human?
An illustration of a man in a business suit uploading his brain to a computer.

If you’re a human, there’s a very good chance you’ve been involved in human subjects research.

Maybe you’ve participated in a clinical trial, completed a survey about your health habits, or took part in a graduate student’s experiment for $20 when you were in college. Or maybe you’ve conducted research yourself as a student or professional. 

Key takeaways

  • AI is changing the way people conduct research on humans, but our regulatory frameworks to protect human subjects haven’t kept pace. 
  • AI has the potential to improve health care and make research more efficient, but only if it’s built responsibly with appropriate oversight. 
  • Our data is being used in ways we may not know about or consent to, and underrepresented populations bear the greatest burden of risk. 

As the name suggests, human subjects research (HSR) is research on human subjects. Federal regulations define it as research involving a living person that requires interacting with them to obtain information or biological samples. It also encompasses research that “obtains, uses, studies, analyzes, or generates” private information or biospecimens that could be used to identify the subject. It falls into two major buckets: social-behavioral-educational and biomedical.  

If you want to conduct human subjects research, you have to seek Institutional Review Board (IRB) approval. IRBs are research ethics committees designed to protect human subjects, and any institution conducting federally funded research must have them. 

We didn’t always have protection for human subjects in research. The 20th century was rife with horrific research abuses. Public backlash to the declassification of the Tuskegee Syphilis Study in 1972, in part, led to the publication of the Belmont Report in 1979, which established a few ethical principles to govern HSR: respect for people’s autonomy, minimizing potential harms and maximizing benefits, and distributing the risks and rewards of the research fairly. This became the foundation for the federal policy for human subjects protection, known as the Common Rule, which regulates IRBs. Older Black men included in a syphilis study stand for a photo.

It’s not 1979 anymore. And now AI is changing the way people conduct research on humans, but our ethical and regulatory frameworks have not kept up. 

Tamiko Eto, a certified IRB professional (CIP) and expert in the field of HSR protection and AI governance, is working to change that. Eto founded TechInHSR, a consultancy that supports IRBs reviewing research involving AI. I recently spoke with Eto about how AI has changed the game and the biggest benefits — and greatest risks — of using AI in HSR. Our conversation below has been lightly edited for length and clarity.

You have over two decades of experience in human subjects research protection. How has the widespread adoption of AI changed the field?

AI has actually flipped the old research model on its head entirely. We used to study individual people to learn something about the general population. But now AI is pulling huge patterns from population-level data and using that to make decisions about an individual. That shift is exposing the gaps that we have in our IRB world, because what drives a lot of what we do is called the Belmont Report. 

That was written almost half a century ago, and that was not really thinking about what I would term “human data subjects.” It was thinking about actual physical beings and not necessarily their data. AI is more about human data subjects; it’s their information that’s getting pulled into these AI systems, often without their knowledge. And so now what we have is this world where massive amounts of personal data are collected and reused over and over by multiple companies, often without consent and almost always without proper oversight.

Could you give me an example of human subjects research that heavily involves AI?

In areas like social-behavioral-education research, we’re going to see things where people are training on student-level data to identify ways to improve or enhance teaching or learning. 

In health care, we use medical records to train models to identify possible ways that we can predict certain diseases or conditions. The way we understand identifiable data and re-identifiable data has also changed with AI. 

So right now, people can use that data without any oversight, claiming it’s de-identified because of our old, outdated definitions of identifiability.

Where are those definitions from?

Health care definitions are based on HIPAA.

The law wasn’t shaped around the way that we look at data now, especially in the world of AI. Essentially it’s saying that if you remove certain parts of that data, then that individual might not reasonably be re-identified — which we know now is not true.

What’s something that AI can improve in the research process — most people aren’t necessarily familiar with why IRB protections exist. What’s the argument for using AI?

So AI does have real potential in improving health care, patient care and research in general — if we build it responsibly. We do know that when built responsibly, these well-designed tools can actually help catch problems earlier, like detecting sepsis or spotting signs of certain cancers with imaging and diagnostics because we’re able to compare that outcome to what expert clinicians would do. 

Though I’m seeing in my field that not a lot of these tools are designed well and nor is the plan for their continued use really thought through. And that does cause harm. 

I’ve been focusing on how we leverage AI to improve our operations: AI is helping us handle large amounts of data and reduce repetitive tasks that make us less productive and less efficient. So it does have some capabilities to help us in our workflows so long as we use it responsibly. 

It can speed up the actual process of research in terms of submitting an [IRB] application for us. IRB members can use it to review and analyze certain levels of risk and red flags and guide how we communicate with the research team. AI has shown to have a lot of potential but again it entirely depends on if we build it and use it responsibly.

What do you see as the greatest near-term risks posed by using AI in human subjects research?

The immediate risks are things that we know already: Like these black box decisions where we don’t actually know how the AI is making these conclusions, so that is going to make it very difficult for us to make informed decisions on how it’s used. 

Even if AI improved in terms of being able to understand it a little bit more, the issue that we’re facing now is the ethical process of collecting that data in the first place. Did we have authorization? Do we have permission? Is it rightfully ours to take and even commodify? 

So I think that leads into the other risk, which is privacy. Other countries may be a little bit better at it than we are, but here in the US, we don’t have a lot of privacy rights or self data ownership. We’re not able to say if our data gets collected, how it gets collected, and how it’s going to be used and then who it’s going to be shared with — that essentially is not a right that US citizens have right now. 

Everything is identifiable, so that increases the risk that it poses to the people whose data we use, making it essentially not safe. There’s studies out there that say that we can reidentify somebody just by their MRI scan even though we don’t have a face, we don’t have names, we don’t have anything else, but we can reidentify them through certain patterns. We can identify people through their step counts on their Fitbits or Apple Watches depending on their locations. 

I think maybe the biggest thing that’s coming up these days is what’s called a digital twin. It’s basically a detailed digital version of you built from your data. So that could be a lot of information that’s grabbed about you from different sources like your medical records and biometric data that may be out there. Social media, movement patterns if they’re capturing it from your Apple Watch, online behavior from your chats, LinkedIn, voice samples, writing styles. The AI system then gathers all your behavioral data and then creates a model that is duplicative of you so that it can do some really good things. It can predict what you’ll do in terms of responding to medications. 

But it can also do some bad things. It can mimic your voice or it can do things without your permission. There is this digital twin out there that you did not authorize to have created. It’s technically you, but you have no right to your digital twin. That’s something that’s not been addressed in the privacy world as well as it should be, because it’s going under the guise of “if we’re using it to help improve health, then it’s justified use.”

What about some of the long-term risks?

We don’t really have a lot we can do now. IRBs are technically prohibited from considering long-term impact or societal risks. We’re only thinking about that individual and the impact on that individual. But in the world of AI, the harms that matter the most are going to be discrimination, inequity, the misuse of data, and all of that stuff that happens at a societal scale.

“If I was a clinician and I knew that I was liable for any of the mistakes that were made by the AI, I wouldn’t embrace it because I wouldn’t want to be liable if it made that mistake.”

Then I think the other risk we were talking about is the quality of the data. The IRB has to follow this principle of justice, which means that the research benefits and harm should be equally distributed across the population. But what’s happening is that these usually marginalized groups end up having their data used to train these tools, usually without consent, and then they disproportionately suffer when the tools are inaccurate and biased against them. 

So they’re not getting any of the benefits of the tools that get refined and actually put out there, but they’re responsible for the costs of it all. 

Could someone who was a bad actor take this data and use it to potentially target people?

Absolutely. We don’t have adequate privacy laws, so it’s largely unregulated and it gets shared with people who can be bad actors or even sell it to bad actors, and that could harm people.

How can IRB professionals become more AI literate?

One thing that we have to realize is that AI literacy is not just about understanding technology. I don’t think just understanding how it works is going to make us literate so much as knowing what questions we need to ask.

I have some work out there as well with this three-stage framework for IRB review of AI research that I created. It was to help IRBs better assess what risks happen at certain development time points and then understand that it’s cyclical and not linear. It’s a different way for IRBs to look at research phases and evaluate that. So building that kind of understanding, we can review cyclical projects so long as we slightly shift what we’re used to doing.

As AI hallucination rates decrease and privacy concerns are addressed, do you think more people will embrace AI in human subjects research?

There’s this concept of automation bias, where we have this tendency to just trust the output of a computer. It doesn’t have to be AI, but we tend to trust any computational tool and not really second guess it. And now with AI, because we have developed these relationships with these technologies, we still trust it. 

And then also we’re fast-paced. We want to get through things quickly and we want to do something quickly, especially in the clinic. Clinicians don’t have a lot of time and so they’re not going to have time to double-check if the AI output was correct.

I think it’s the same for an IRB person. If I was pressured by my boss saying “you have to get X amount done every day,” and if AI makes that faster and my job’s on the line, then it’s more likely that I’m going to feel that pressure to just accept the output and not double-check it. 

And ideally the rate of hallucinations is going to go down, right?

What do we mean when we say AI improves? In my mind, an AI model only becomes less biased or less hallucinatory when it gets more data from groups that it previously ignored or it wasn’t normally trained on. So we need to get more data to make it perform better.

So if companies are like, “Okay, let’s just get more data,” then that means that more than likely they’re going to get this data without consent. It’s just going to scrape it from places where people never expected — which they never agreed to. 

I don’t think that that’s progress. I don’t think that’s saying the AI improved, it’s just further exploitation. Improvement requires this ethical data sourcing permission that has to benefit everybody and has limits on how our data is collected and used. I think that that’s going to come with laws, regulations and transparency but more than that, I think this is going to come from clinicians. 

Companies who are creating these tools are lobbying so that if anything goes wrong, they’re not going to be accountable or liable. They’re going to put all of the liability onto the end user, meaning the clinician or the patient. 

If I was a clinician and I knew that I was liable for any of the mistakes that were made by the AI, I wouldn’t embrace it because I wouldn’t want to be liable if it made that mistake. I would always be a little bit cautious about that.

Walk me through the worst-case scenario. How can we avoid that?

I think it all starts in the research phase. The worst case scenario for AI is that it shapes the decisions that are made about our personal lives: Our jobs, our health care, if we get a loan, if we get a house. Right now, everything has been built based on biased data and largely with no oversight.

The IRBs are there for primarily federally funded research. But because this AI research is done with unconsented human data, IRBs usually just give waivers or it doesn’t even go through an IRB. It’s going to slip past all these protections that we would normally have built in for human subjects.

At the same time, people are going to be trusting these systems so much they’re just going to stop questioning its output. We’re relying on tools that we don’t fully understand. We’re just further embedding these inequities into our everyday systems starting in that research phase. And people trust research for the most part. They’re not going to question the tools that come out of it and end up getting deployed into real-world environments. It’s just consistently feeding into continued inequity, injustice, and discrimination and that’s going to harm underrepresented populations and whoever’s data wasn’t the majority at the time of those developments.

Share

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0