An Asian MIT student asked AI to turn an image of her into a professional headshot. It made her white, with lighter skin and blue eyes.
www.businessinsider.com
external-link
Rona Wang, a 24-year-old MIT student, was experimenting with the AI image creator Playground AI to create a professional LinkedIn photo.

An Asian MIT student asked AI to turn an image of her into a professional headshot. It made her white with lighter skin and blue eyes.::Rona Wang, a 24-year-old MIT student, was experimenting with the AI image creator Playground AI to create a professional LinkedIn photo.

@Hazdaz@lemmy.world
link
fedilink
English
81Y

Ask AI to generate an image of a basketball player and see what happens.

This isn’t some OMG ThE CoMpUtER Is tHe rAcIsT… this is using historical data and using that data to alter or generation a new image. But our news media will of course try to turn it into some clickbait BS.

I asked a taxi driver in Bollywood to take me to the home of someone famous. He took me to an Indian person’s house. Does he think all famous people are Indian?

deleted by creator

Or… and Im just spit balling here. Dont ask it to do something you knew probably wouldnt give you something youre happy with and you wont be insulted…

These biases have always existed in the training data used for ML models (society and all that influencing the data we collect and the inherent biases that are latent within), but it’s definitely interesting that generative models now make these biases much much more visible (figuratively and literally with image models) to the lay person

But they know the AI’s have these biases, at least now, shouldn’t they be able to code them out or lessen them? Or would that just create more problems?

Sorry, I’m no programer so I have no idea if thats even possible or not. Just sounds possible in my head.

Dale
link
fedilink
English
11Y

It’s possible sure. In order to train these image AIs you essentially feed them a massive amount of pictures as “training data.” These biases happen because more often than not the training data used is mostly pictures of white people. This might be due to racial bias of the creators, or a more CRT explanation where they only had the rights to pictures of mostly white people. Either way, the fix is to train the AI on more diverse faces.

CharlestonChewbacca
link
fedilink
English
41Y

That’s not how it works. You don’t just “program out the biases” you have to retain the model with more inclusive training data.

Dojan
link
fedilink
English
13
edit-2
1Y

You don’t really program them, they learn from the data provided. If say you want a model that generates faces, and you provide it with say, 500 faces, 470 of which are of black women, when you ask it to generate a face, it’ll most likely generate a face of a black woman.

The models are essentially maps of probability, you give it a prompt, and ask it what the most likely output is given said prompt.

If she had used a model trained to generate pornography, it would’ve likely given her something more pornographic, if not outright explicit.


You’ve also kind of touched on a point of problem with large language models; they’re not programmed, but rather prompted.

When it comes to Bing Chat, Chat GPT and others, they have additional AI agents sitting alongside them to help filter/mark out problematic content both provided by the user, as well as possible problematic content the LLM itself generates. Like this prompt, the model marked my content as problematic and the bot gives me a canned response, “Hi, I’m bing. Sorry, can’t help you with this. Have a nice day. :)”

These filters are very crude, but are necessary because of problems inherent in the source data the model was trained on. See, if you crawl the internet for data to train it on, you’re bound to bump into all sorts of good information; Wikipedia articles, Q&A forums, recipe blogs, personal blogs, fanfiction sites, etc. Enough of this data will give you a well rounded model capable of generating believable content across a wide range of topics. However, you can’t feasibly filter the entire internet, among all of this you’ll find hate speech, you’ll find blogs run by neo nazis and conspiracy theorists, you’ll find blogs where people talk about their depression, suicide notes, misogyny, racism, and all sorts of depressing, disgusting, evil, and dark aspects of humanity.

Thus there’s no code you can change to fix racism.

if (bot.response == racist) 
{
    dont();
}

But rather simple measures that read the user/agent interaction, filtering it for possible bad words, or likely using another AI model to gauge the probability of an interaction being negative,

if (interaction.weightedResult < negative)
{
    return "I'm sorry, but I can't help you with this at the moment. I'm still learning though. Try asking me something else instead! 😊";
}

As an aside, if she’d prompted “professional Asian woman” it likely would’ve done a better job. Depending on how much “creative license” she gives the model though, it still won’t give her her own face back. I get the idea of what she’s trying to do, and there’s certainly ways of acheiving it, but she likely wasn’t using a product/model weighted to do specifically the thing she was asking to do.


Edit

Just as a test, because I myself got curious; I had Stable Diffusion generate 20 images given the prompt

professional person dressed in business attire, smiling

20 sampling steps, using DPM++ 2M SDE Karras, and the v1-5-pruned-emaonly Stable Diffusion model.

Here’s the result

I changed the prompt to

professional person dressed in business attire, smiling, [diverse, diversity]

And here is the result

The models can generate non-white men, but it is in a way just a reflection of our society. White men are the default. Likewise if you prompt it for “loving couple” there’ll be so many images of straight couples. But don’t just take my word for it, here’s an example.

@Blademax@lemmy.world
link
fedilink
English
31Y

The Hands/digits…the horror…

It’s clearly biased against fingered folks!

Dojan
link
fedilink
English
31Y

It can do faces quite well on second passes but struggles hard with hands.

Corporate photography tends to be uncanny and creepy to begin with, so using an AI to generate it made it even more so.

I totally didn’t just spend 30 minutes generating corporate stock photos and laughing at the creepy results. 😅

@Blademax@lemmy.world
link
fedilink
English
11Y

Just glad there’s a tell, for AI photo. Hope it never figure it out.

@Tgs91@lemmy.world
link
fedilink
English
21Y

Shouldn’t they be able to code them out?

You can’t “code them out” because AI isn’t using a simple script like traditional software. They are giant nested statistical models that learn from data. It learns to read the data it was trained on. It learns to understand images that it was trained on, and how they relate to text. You can’t tell it “in this situation, don’t consider race” because the situation itself is not coded anywhere. It’s just learned behaviors from the training data.

Shouldn’t they be able to lessen them?

For this one the answer is YES. And they DO lessen them as much as they can. But they’re training on data scraped from many sources. You can try to curate the data to remove racism/sexism, but there’s no easy way to remove bias from data that is so open ended. There is no way to do this in an automated way besides using an AI model, and for that, you need to already have a model that understands race/gender/etc bias, which doesn’t really exist. You can have humans go through the data to try to remove bias, but that introduces a ton of problems as well. Many humans would disagree on what is biased. And human labelers also have a shockingly high error rate. People are flat out bad at repetitive tasks.

And even that only covers data that actively contains bigotry. In most of these generative AI cases, the real issue is just a lack of data or imbalanced data from the internet. For this specific article, the user asked to make a photo look professional. Training data where photos were clearly a professional setting probably came from sites like LinkedIn, which had a disproportionate number of white users. These models also have a better understanding of English than other languages because there is so much more training data available in English. So asian professional sites may exist in the training data, but the model didn’t understand the language as well, so it’s not as confident about professional images of Asians.

So you can address this by curating the training data. But this is just ONE of THOUSANDS and THOUSANDS of biases, and it’s not possible to control all of them in the data. Often if you try to correct one bias, it accidentally causes the model to perform even worse on other biases.

They do their best. But ultimately these are statistical models that reflect the existing data on the internet. As long as the internet contains bias, so will AI

Thank you so much for taking the time to answer!

Why is anyone surprised at this? People are using AI for things it was never designed and optimized for.

This was kind of my thought, this is a rather complex task that I’m not clear what even a “good” outcome would look like especially given the first photo was a pretty good photo. Should it just color correct and sharpen it? Should it change the background? Should it position your head?

I’m curious what it would do if you just fed it already good professional photos of white people, would it just spit back the same image?

Like there has to be a cap on how much it will change so it still looks like you, in which case I assume you’d need to feed it multiple images to get a good result.

@Fantomas@lemmy.world
link
fedilink
English
11Y

A.I. - Aryan Intelligence.

@gmtom@lemmy.world
link
fedilink
English
41Y

This is just dumb rage-bait. At worst this shows a bias in training data, probably because the AI was developed in a majority white country that used images of majority white people to train it.

And likely its not even that. The AI has no concept of race, so doesnt know to make white people white and asian people asian, so would also be likely to do the reverse.

So? There are white people in the world. Ten bucks says she tuned it to make her look white for the clicks. I’ve seen this in person several times at my local college. People die for attention, and shit like this is an easy-in.

@Jackthelad@lemmy.world
link
fedilink
English
91Y

This is why you get a human to take professional shots for you.

@Haha@lemmy.world
link
fedilink
English
91Y

Garbage post

@Squeezer@lemmy.world
link
fedilink
English
11Y

How can bias amplification be controlled in these models? Surely it’s built in?

Hm. Probably trained on more white people than Asians shrug

AI turned her into a White Walker

I read this as a “White Wanker”

@AbouBenAdhem@lemmy.world
link
fedilink
English
50
edit-2
1Y

They should just call AIs “confirmation bias amplifiers”.

AI learns what is in the data.

Stereotype machines

Humans will identify sterotypes in AI generated materials that match the dataset.

Assume the dataset will grow and eventually mimic reality.

How will the law handle discrimination based on data supported sterotypes?

@Pipoca@lemmy.world
link
fedilink
English
21Y

Assume the dataset will grow and eventually mimic reality.

How would that happen, exactly?

Stereotypes themselves and historical bias can bias data. And AI trained on biased data will just learn those biases.

For example, in surveys, white people and black people self-report similar levels of drug use. However, for a number of reasons, poor black drug users are caught at a much higher rate than rich white drug users. If you train a model on arrest data, it’ll learn that rich white people don’t use drugs much but poor black people do tons of drugs. But that simply isn’t true.

The datasets will get better because people have started to care.

Historically much of the data used was what was easy and cheap to acquire. Surveys of class mates. Arrest reports. Public available, government curated data.

Good data costs money and time to create.

The more people fact check, the more flaws can be found and corrected. The more attention the dataset gets the more funding is likely to come to resurvey or w/e.

It part of the peer review thing.

@Pipoca@lemmy.world
link
fedilink
English
11Y

It’s not necessarily a matter of fact checking, but of correcting for systemic biases in the data. That’s often not the easiest thing to do. Systems run by humans often have outcomes that reflect the biases of the people involved.

The power of suggestion runs fairly deep with people. You can change a hiring manager’s opinion of a resume by only changing the name at the top of it. You can change the terms a college kid enrolled in a winemaking program uses to describe a white wine using a bit of red food coloring. Blind auditions for orchestras result in significantly more women being picked than unblinded auditions.

Correcting for biases is difficult, and it’s especially difficult on very large data sets like the ones you’d use to train chatgpt. I’m really not very hopeful that chatgpt will ever reflect only justified biases, rather than the biases of the broader culture.

Create a post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


  • 1 user online
  • 197 users / day
  • 590 users / week
  • 1.38K users / month
  • 4.49K users / 6 months
  • 1 subscriber
  • 7.41K Posts
  • 84.7K Comments
  • Modlog