Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

Jilanico
link
fedilink
English
09M

So stable diffusion, midjourney, etc., all have massive databases with every picture on the Internet stored in them? I know the AI models are trained on lots of images, but are the images actually stored? I’m skeptical, but I’m no expert.

QubaXR
link
fedilink
English
49M

These models were trained on datasets that, without compensating the authors, used their work as training material. It’s not every picture on the net, but a lot of it is scrubbing websites, portfolios and social networks wholesale.

A similar situation happens with large language models. Recently Meta admitted to using illegally pirated books (Books3 database to be precise) to train their LLM without any plans to compensate the authors, or even as much as paying for a single copy of each book used.

Jilanico
link
fedilink
English
09M

Most of the stuff that inspires me probably wasn’t paid for. I just randomly saw it online or on the street, much like an AI.

AI using straight up pirated content does give me pause tho.

QubaXR
link
fedilink
English
3
edit-2
9M

I was on the same page as you for the longest time. I cringed at the whole “No AI” movement and artists’ protest. I used the very same idea: Generations of artists honed their skills by observing the masters, copying their techniques and only then developing their own unique style. Why should AI be any different? Surely AI will not just copy works wholesale and instead learn color, composition, texture and other aspects of various works to find it’s own identity.

It was only when my very own prompts started producing results I started recognizing as “homages” at best and “rip-offs” at worst that gave me a stop.

I suspect that earlier generations of text to image models had better moderation of training data. As the arms race heated up and pace of development picked up, companies running these services started rapidly incorporating whatever training data they could get their hands on, ethics, copyright or artists’ rights be damned.

I remember when MidJourney introduced Niji (their anime model) and I could often identify the mangas and characters used to train it. The imagery Niji produced kept certain distinct and unique elements of character designs from that training data - as a result a lot of characters exhibited “Chainsaw Man” pointy teeth and sticking out tongue - without as much as a mention of the source material or even the themes.

How much profit do you make from this stuff ?

Jilanico
link
fedilink
English
09M

The stuff I sell on jilanico.com? Enough to make it worth my while.

Create a post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


  • 1 user online
  • 186 users / day
  • 583 users / week
  • 1.37K users / month
  • 4.49K users / 6 months
  • 1 subscriber
  • 7.41K Posts
  • 84.7K Comments
  • Modlog