Pressure grows on artificial intelligence firms over the content used to train their products

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

Help Help! My business model is illegal, but it makes SO MUCH money! What do I doooo?

It doesn’t really make any money yet also the law is bad. Copyrights shouldn’t be a thing but if they have to be they should be short duration.

@gardylou@lemmy.world
link
fedilink
English
259M

deleted by creator

If the rule is stupid or evil we should applaud people who break it.

@Grabbels@lemmy.world
link
fedilink
English
09M

Except they pocket millions of dollars by breaking that rule and the original creators of their “essential data” don’t get a single cent while their creations indirectly show up in content generated by AI. If it really was about changing the rules they wouldn’t be so obvious in making it profitable, but rather use that money to make it available for the greater good AND pay the people that made their training data. Right now they’re hell-bent in commercialising their products as fast as possible.

If their statement is that stealing literally all the content on the internet is the only way to make AI work (instead of for example using their profits to pay for a selection of all that data and only using that) then the business model is wrong and illegal. It’s as a simple as that.

I don’t get why people are so hell-bent on defending OpenAI in this case; if I were to launch a food-delivery service that’s affordable for everyone, but I shoplifted all my ingredients “because it’s the only way”, most would agree that’s wrong and my business is illegal. Why is this OpenAI case any different? Because AI is an essential development? Oh, and affordable food isn’t?

I am not defending OpenAi I am attacking copyright. Do you have freedom of speech if you have nothing to say? Do you have it if you are a total asshole? Do you have it if you are the nicest human who ever lived? Do you have it and have no desire to use it?

@kiagam@lemmy.world
link
fedilink
English
109M

we should use those who break it as a beacon to rally around and change the stupid rule

deleted by creator

If I steal something from you I have it and you don’t. When I copy an idea from you, you still have the idea. As a whole the two person system has more knowledge. While actual theft is zero sum. Downloading a car and stealing a car are not the same thing.

And don’t even try the awarding artists and inventor argument. Companies that fund R&D get tax breaks for it, so they already get money. An artists are rarely compensated appropriately.

@McArthur@lemmy.world
link
fedilink
English
99M

It feels to be like every other post on lemmy is taking about how copyright is bad and should be changed, or piracy is caused by fragmentation and difficulty accessing information (streaming sites). Then whenever this topic comes up everyone completely flips. But in my mind all this would do is fragment the ai market much like streaming services (suddenly you have 10 different models with different licenses), and make it harder for non mega corps without infinite money to fund their own llms (of good quality).

Like seriously, can’t we just stay consistent and keep saying copyright bad even in this case? It’s not really an ai problem that jobs are effected, just a capitalism problem. Throw in some good social safety nets and tax these big ai companies and we wouldn’t even have to worry about the artist’s well-being.

Marxism-Fennekinism
link
fedilink
English
7
edit-2
9M

I think looking at copyright in a vacuum is unhelpful because it’s only one part of the problem. IMO, the reason people are okay with piracy of name brand media but are not okay with OpenAI using human-created artwork is from the same logic of not liking companies and capitalism in general. People don’t like the fact that AI is extracting value from individual artists to make the rich even richer while not giving anything in return to the individual artists, in the same way we object to massive and extremely profitable media companies paying their artists peanuts. It’s also extremely hypocritical that the government and by extention “copyright” seems to care much more that OpenAI is using name brand media than it cares about OpenAI scraping the internet for independent artists’ work.

Something else to consider is that AI is also undermining copyleft licenses. We saw this in the GitHub Autopilot AI, a 100% proprietary product, but was trained on all of GitHub’s user-generated code, including GPL and other copyleft licensed code. The art equivalent would be CC-BY-SA licenses where derivatives have to also be creative commons.

@McArthur@lemmy.world
link
fedilink
English
19M

Maybe I’m optimistic but I think your comparison to big media companies paying their artist’s peanuts highlights to me that the best outcome is to let ai go wild and just… Provide some form of government support (I don’t care what form, that’s another discussion). Because in the end the more stuff we can train ai on freely the faster we automate away labour.

I think another good comparison is reparations. If you could come to me with some plan that perfectly pays out the correct amount of money to every person on earth that was impacted by slavery and other racist policies to make up what they missed out on, ids probably be fine with it. But that is such a complex (impossible, id say) task that it can’t be done, and so I end up being against reparations and instead just say “give everyone money, it might overcompensate some, but better that than under compensating others”. Why bother figuring out such a complex, costly and bureaucratic way to repay artists when we could just give everyone robust social services paid for by taxing ai products an amount equal to however much money they have removed from the work force with automation.

@dasgoat@lemmy.world
link
fedilink
English
119M

Cool! Then don’t!

@Treczoks@lemmy.world
link
fedilink
English
99M

If a business relies on breaking the law as a fundament of their business model, it is not a business but an organized crime syndicate. A Mafia.

@platypus_plumba@lemmy.world
link
fedilink
English
5
edit-2
9M

It’s impossible to extract all the money from a bank without robbing the bank :(

@badbytes@lemmy.world
link
fedilink
English
89M

Impossible. Then illegal? Get fucked AI

@unreasonabro@lemmy.world
cake
link
fedilink
English
189M

finally capitalism will notice how many times it has shot up its own foot with their ridiculous, greedy infinite copyright scheme

As a musician, people not involved in the making of my music make all my money nowadays instead of me anyway. burn it all down

Pitchfork fest 2024

@unreasonabro@lemmy.world
cake
link
fedilink
English
19M

… that’s a good album name, might use that ;)

██████████
link
fedilink
English
1
edit-2
9M

it would sell

I wonder if the act of picking cotton was copyrighted, would we had got the cotton gin? We have automated most non-creative pursues and displaced their workers. Is it because people can take joy out of creative pursues that we balk at the automation? If you have a particular style in picking items to fulfill Amazon orders, should that be copyrighted and protected from being used elsewhere?

██████████
link
fedilink
English
3
edit-2
9M

Bro the cotton gin literally led to millions of black slaves because now it was profitable. Worst example possible

i literally coughed i laughed so hard

So automation can lead to more (crappy) jobs? https://www.smbc-comics.com/comic/the-future

@S410@lemmy.ml
link
fedilink
English
69M

They’re not wrong, though?

Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

Things that we don’t even think about as “protected works” are in fact just that. Doesn’t matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don’t know about you, but I’ve never seen a license notice attached to a napkin doodle.

Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren’t licensed to use. You wouldn’t end up with a person well suited to exist in the world. They’d lack education regarding science, technology, they’d lack understanding of pop-culture, they’d know no brand names, etc.

Machine learning models are similar. You can train them that way, sure, but they’d be basically useless for real-world applications.

@AntY@lemmy.world
link
fedilink
English
259M

The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

@BURN@lemmy.world
link
fedilink
English
09M

Also an “AI” is not human, and should not be regulated as such

Neither is a corporation and yet they claim first amendment rights.

@BURN@lemmy.world
link
fedilink
English
19M

That’s an entirely separate problem, but is certainly a problem

I don’t think it is. We have all these non-human stuff we are awarding more rights to than we have. You can’t put a corporation in jail but you can put me in jail. I don’t have freedom from religion but a corporation does.

@BURN@lemmy.world
link
fedilink
English
19M

Corporations are not people, and should not be treated as such.

If a company does something illegal, the penalty should be spread to the board. It’d make them think twice about breaking the law.

We should not be awarding human rights to non-human, non-sentient creations. LLMs and any kind of Generative AI are not human and should not in any case be treated as such.

Corporations are not people, and should not be treated as such.

Understand. Please tell Disney that they no longer own Mickey Mouse.

@S410@lemmy.ml
link
fedilink
English
-49M

Not necessarily. There’s plenty that are open source and available for free to anyone willing to provide their own computational power.
In cases where you pay for a service, it could be argued that you aren’t paying for the access to the model or its results, but the convenience and computational power necessary to run the model.

deweydecibel
link
fedilink
English
59M

And ultimately replace the humans it learned from.

Yes clearly 90 years plus death of artist is acceptable

@testfactor@lemmy.world
link
fedilink
English
09M

And real children aren’t in a capitalist society?

Exatron
link
fedilink
English
79M

The difference here is that a child can’t absorb and suddenly use massive amounts of data.

@S410@lemmy.ml
link
fedilink
English
3
edit-2
9M

The act of learning is absorbing and using massive amounts of data. Almost any child can, for example, re-create copyrighted cartoon characters in their drawing or whistle copyrighted tunes.

If you look at, pretty much, any and all human created works, you will be able to trace elements of those works to many different sources. We, usually, call that “sources of inspiration”. Of course, in case of human created works, it’s not a big deal. Generally, it’s considered transformative and a fair use.

Exatron
link
fedilink
English
29M

The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn’t include taking entire works, shoving them in a box, and shaking it until something you want comes out.

@S410@lemmy.ml
link
fedilink
English
-19M

Expect for all the cases when humans do exactly that.

A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don’t count?

Then there’s things like sayings, which are entire phrases that only really work if they’re repeated verbatim. You sure can deliver the same idea using different words, but it’s not the same saying at that point.

To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it’s composed by John Williams)

Sometimes the artists don’t even realize they’re doing exactly that, so we end up with with “subconscious plagiarism” cases, e.g. Bright Tunes Music v. Harrisongs Music.

Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they’re praised.

Exatron
link
fedilink
English
29M

Except they literally don’t. Human memory doesn’t retain an exact copy of things. Very good isn’t the same as exactly. And human beings can’t grab everything they see and instantly use it.

@S410@lemmy.ml
link
fedilink
English
0
edit-2
9M

Machine learning doesn’t retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain “exact copies” of everything? If “AI” could function as a compression algorithm, it’d definitely be used as one. But it can’t, so it isn’t.

Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an “exact” re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

Here’s an image from an article claiming that machine learning image generators plagiarize things.

However, if you take a second to look at the image, you’ll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren’t one-to-one copies. It doesn’t take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they’ve seen a couple dozen or hundred times, they’d most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.

Is this the point where we start UBI and start restructuring society for the future of AI?

@LouNeko@lemmy.world
link
fedilink
English
-29M

Copyright protection only exists in the context of generating profit from someone else’s work. If you were to figure out cold fusion and I’d look at your research and say “That’s cool, but I am going to go do some woodworking.” I am not infringing any copyrights. It’s only ever an issue if the financial incentive to trace the profits back to it’s copyrighted source outway the cost of doing so. That’s why China has had free reign to steal any western technology, fighting them in their courts is not worth it. But with AI it’s way easier to trace the output back to it’s source (especially for art), so the incentive is there.

The main issue is the extraction of value from the original data. If I where to steal some bricks from your infinite brick pile and build a house out of them, do you have a right to my house? Technically I never stole a house from you.

You stole bricks. How rich I am does not impact what you did. Copying is not theft. You can keep stretching any shady analogy you want but you can’t change the fundamentals.

bravesirrbn ☑️
link
fedilink
English
69M

Then don’t

@kibiz0r@lemmy.world
link
fedilink
English
139M

I’m dumbfounded that any Lemmy user supports OpenAI in this.

We’re mostly refugees from Reddit, right?

Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content’s real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn’t a huge problem, because that’s the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.

But as Reddit started to dominate Google search results, it displaced results that might have linked to the “real home” of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.

At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you’ll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.

This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

There are genuine problems with copyright law. Don’t get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don’t even own the copyright to the stuff they make. It was invented to protect creators, but in practice this “protection” gets assigned to a publisher immediately after the protected work comes into being.

And then that copyright – the very same thing that was intended to protect creators – is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can’t speak out of turn. Fans can’t remix their favorite content and share it back to the community.

This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can’t pay for admission. Creators are underpaid, and their creative ambitions are redirected to what’s popular. We end up with an auto-tuned culture – insular, uncritical, and predictable. Creativity reduced to a product.

But.

If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won’t solve it by giving OpenAI a free pass to do the exact same thing at massive scale.

Too long didn’t read, busy downloading a car now. How much did Disney pay for this comment?

@Evotech@lemmy.world
link
fedilink
English
11
edit-2
9M

Then LLMs should be FOSS

@BURN@lemmy.world
link
fedilink
English
09M

That does nothing to solve the problem of data being used without consent to train the models. It doesn’t matter if the model is FOSS if it stole all the data it trained on.

The only way I can steal data from you is if I break into your office and walk off with your hard drive. Do you have access to something? It hasn’t been stolen.

rivermonster
link
fedilink
English
5
edit-2
9M

All AI should be FOSS and public domain, owned by the people, and all gains from its use taxed at 100%. It’s only because of the public that AI exists, through the schools, universities, NSF, grants, etc and all the other places that taxes have been poured into that created the advances upon which AI stands, and the AI critical research as well.

Create a post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


  • 1 user online
  • 186 users / day
  • 583 users / week
  • 1.37K users / month
  • 4.49K users / 6 months
  • 1 subscriber
  • 7.41K Posts
  • 84.7K Comments
  • Modlog