‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

@IzzyScissor@lemmy.world

Help Help! My business model is illegal, but it makes SO MUCH money! What do I doooo?

@afraid_of_zombies@lemmy.world

It doesn’t really make any money yet also the law is bad. Copyrights shouldn’t be a thing but if they have to be they should be short duration.

@gardylou@lemmy.world

deleted by creator

@afraid_of_zombies@lemmy.world

If the rule is stupid or evil we should applaud people who break it.

@Grabbels@lemmy.world

Except they pocket millions of dollars by breaking that rule and the original creators of their “essential data” don’t get a single cent while their creations indirectly show up in content generated by AI. If it really was about changing the rules they wouldn’t be so obvious in making it profitable, but rather use that money to make it available for the greater good AND pay the people that made their training data. Right now they’re hell-bent in commercialising their products as fast as possible.

If their statement is that stealing literally all the content on the internet is the only way to make AI work (instead of for example using their profits to pay for a selection of all that data and only using that) then the business model is wrong and illegal. It’s as a simple as that.

I don’t get why people are so hell-bent on defending OpenAI in this case; if I were to launch a food-delivery service that’s affordable for everyone, but I shoplifted all my ingredients “because it’s the only way”, most would agree that’s wrong and my business is illegal. Why is this OpenAI case any different? Because AI is an essential development? Oh, and affordable food isn’t?

@afraid_of_zombies@lemmy.world

I am not defending OpenAi I am attacking copyright. Do you have freedom of speech if you have nothing to say? Do you have it if you are a total asshole? Do you have it if you are the nicest human who ever lived? Do you have it and have no desire to use it?

@kiagam@lemmy.world

we should use those who break it as a beacon to rally around and change the stupid rule

@Passerby6497@lemmy.world

deleted by creator

@afraid_of_zombies@lemmy.world

If I steal something from you I have it and you don’t. When I copy an idea from you, you still have the idea. As a whole the two person system has more knowledge. While actual theft is zero sum. Downloading a car and stealing a car are not the same thing.

And don’t even try the awarding artists and inventor argument. Companies that fund R&D get tax breaks for it, so they already get money. An artists are rarely compensated appropriately.

@McArthur@lemmy.world

It feels to be like every other post on lemmy is taking about how copyright is bad and should be changed, or piracy is caused by fragmentation and difficulty accessing information (streaming sites). Then whenever this topic comes up everyone completely flips. But in my mind all this would do is fragment the ai market much like streaming services (suddenly you have 10 different models with different licenses), and make it harder for non mega corps without infinite money to fund their own llms (of good quality).

Like seriously, can’t we just stay consistent and keep saying copyright bad even in this case? It’s not really an ai problem that jobs are effected, just a capitalism problem. Throw in some good social safety nets and tax these big ai companies and we wouldn’t even have to worry about the artist’s well-being.

Marxism-Fennekinism

I think looking at copyright in a vacuum is unhelpful because it’s only one part of the problem. IMO, the reason people are okay with piracy of name brand media but are not okay with OpenAI using human-created artwork is from the same logic of not liking companies and capitalism in general. People don’t like the fact that AI is extracting value from individual artists to make the rich even richer while not giving anything in return to the individual artists, in the same way we object to massive and extremely profitable media companies paying their artists peanuts. It’s also extremely hypocritical that the government and by extention “copyright” seems to care much more that OpenAI is using name brand media than it cares about OpenAI scraping the internet for independent artists’ work.

Something else to consider is that AI is also undermining copyleft licenses. We saw this in the GitHub Autopilot AI, a 100% proprietary product, but was trained on all of GitHub’s user-generated code, including GPL and other copyleft licensed code. The art equivalent would be CC-BY-SA licenses where derivatives have to also be creative commons.

@McArthur@lemmy.world

Maybe I’m optimistic but I think your comparison to big media companies paying their artist’s peanuts highlights to me that the best outcome is to let ai go wild and just… Provide some form of government support (I don’t care what form, that’s another discussion). Because in the end the more stuff we can train ai on freely the faster we automate away labour.

I think another good comparison is reparations. If you could come to me with some plan that perfectly pays out the correct amount of money to every person on earth that was impacted by slavery and other racist policies to make up what they missed out on, ids probably be fine with it. But that is such a complex (impossible, id say) task that it can’t be done, and so I end up being against reparations and instead just say “give everyone money, it might overcompensate some, but better that than under compensating others”. Why bother figuring out such a complex, costly and bureaucratic way to repay artists when we could just give everyone robust social services paid for by taxing ai products an amount equal to however much money they have removed from the work force with automation.

@dasgoat@lemmy.world

Cool! Then don’t!

@Treczoks@lemmy.world

If a business relies on breaking the law as a fundament of their business model, it is not a business but an organized crime syndicate. A Mafia.

@platypus_plumba@lemmy.world

It’s impossible to extract all the money from a bank without robbing the bank :(

@badbytes@lemmy.world

Impossible. Then illegal? Get fucked AI

@unreasonabro@lemmy.world

finally capitalism will notice how many times it has shot up its own foot with their ridiculous, greedy infinite copyright scheme

As a musician, people not involved in the making of my music make all my money nowadays instead of me anyway. burn it all down

██████████

Pitchfork fest 2024

@unreasonabro@lemmy.world

… that’s a good album name, might use that ;)

██████████

it would sell

@randon31415@lemmy.world

I wonder if the act of picking cotton was copyrighted, would we had got the cotton gin? We have automated most non-creative pursues and displaced their workers. Is it because people can take joy out of creative pursues that we balk at the automation? If you have a particular style in picking items to fulfill Amazon orders, should that be copyrighted and protected from being used elsewhere?

██████████

Bro the cotton gin literally led to millions of black slaves because now it was profitable. Worst example possible

i literally coughed i laughed so hard

@randon31415@lemmy.world

So automation can lead to more (crappy) jobs? https://www.smbc-comics.com/comic/the-future

@S410@lemmy.ml

They’re not wrong, though?

Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

Things that we don’t even think about as “protected works” are in fact just that. Doesn’t matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don’t know about you, but I’ve never seen a license notice attached to a napkin doodle.

Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren’t licensed to use. You wouldn’t end up with a person well suited to exist in the world. They’d lack education regarding science, technology, they’d lack understanding of pop-culture, they’d know no brand names, etc.

Machine learning models are similar. You can train them that way, sure, but they’d be basically useless for real-world applications.

@AntY@lemmy.world

The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

@BURN@lemmy.world

Also an “AI” is not human, and should not be regulated as such

@afraid_of_zombies@lemmy.world

Neither is a corporation and yet they claim first amendment rights.

@BURN@lemmy.world

That’s an entirely separate problem, but is certainly a problem

@afraid_of_zombies@lemmy.world

I don’t think it is. We have all these non-human stuff we are awarding more rights to than we have. You can’t put a corporation in jail but you can put me in jail. I don’t have freedom from religion but a corporation does.

@BURN@lemmy.world

Corporations are not people, and should not be treated as such.

If a company does something illegal, the penalty should be spread to the board. It’d make them think twice about breaking the law.

We should not be awarding human rights to non-human, non-sentient creations. LLMs and any kind of Generative AI are not human and should not in any case be treated as such.

@afraid_of_zombies@lemmy.world

Corporations are not people, and should not be treated as such.

Understand. Please tell Disney that they no longer own Mickey Mouse.

@S410@lemmy.ml

Not necessarily. There’s plenty that are open source and available for free to anyone willing to provide their own computational power.
In cases where you pay for a service, it could be argued that you aren’t paying for the access to the model or its results, but the convenience and computational power necessary to run the model.

deweydecibel

And ultimately replace the humans it learned from.

@afraid_of_zombies@lemmy.world

Yes clearly 90 years plus death of artist is acceptable

@testfactor@lemmy.world

And real children aren’t in a capitalist society?

Exatron

The difference here is that a child can’t absorb and suddenly use massive amounts of data.

@S410@lemmy.ml

The act of learning is absorbing and using massive amounts of data. Almost any child can, for example, re-create copyrighted cartoon characters in their drawing or whistle copyrighted tunes.

If you look at, pretty much, any and all human created works, you will be able to trace elements of those works to many different sources. We, usually, call that “sources of inspiration”. Of course, in case of human created works, it’s not a big deal. Generally, it’s considered transformative and a fair use.

Exatron

The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn’t include taking entire works, shoving them in a box, and shaking it until something you want comes out.

@S410@lemmy.ml

Expect for all the cases when humans do exactly that.

A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don’t count?

Then there’s things like sayings, which are entire phrases that only really work if they’re repeated verbatim. You sure can deliver the same idea using different words, but it’s not the same saying at that point.

To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it’s composed by John Williams)

Sometimes the artists don’t even realize they’re doing exactly that, so we end up with with “subconscious plagiarism” cases, e.g. Bright Tunes Music v. Harrisongs Music.

Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they’re praised.

Exatron

Except they literally don’t. Human memory doesn’t retain an exact copy of things. Very good isn’t the same as exactly. And human beings can’t grab everything they see and instantly use it.

@S410@lemmy.ml

Machine learning doesn’t retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain “exact copies” of everything? If “AI” could function as a compression algorithm, it’d definitely be used as one. But it can’t, so it isn’t.

Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an “exact” re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

Here’s an image from an article claiming that machine learning image generators plagiarize things.

However, if you take a second to look at the image, you’ll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren’t one-to-one copies. It doesn’t take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they’ve seen a couple dozen or hundred times, they’d most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.

@positiveWHAT@lemmy.world

Is this the point where we start UBI and start restructuring society for the future of AI?

@LouNeko@lemmy.world

Copyright protection only exists in the context of generating profit from someone else’s work. If you were to figure out cold fusion and I’d look at your research and say “That’s cool, but I am going to go do some woodworking.” I am not infringing any copyrights. It’s only ever an issue if the financial incentive to trace the profits back to it’s copyrighted source outway the cost of doing so. That’s why China has had free reign to steal any western technology, fighting them in their courts is not worth it. But with AI it’s way easier to trace the output back to it’s source (especially for art), so the incentive is there.

The main issue is the extraction of value from the original data. If I where to steal some bricks from your infinite brick pile and build a house out of them, do you have a right to my house? Technically I never stole a house from you.

@afraid_of_zombies@lemmy.world

You stole bricks. How rich I am does not impact what you did. Copying is not theft. You can keep stretching any shady analogy you want but you can’t change the fundamentals.

bravesirrbn ☑️

Then don’t

@kibiz0r@lemmy.world

I’m dumbfounded that any Lemmy user supports OpenAI in this.

We’re mostly refugees from Reddit, right?

Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content’s real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn’t a huge problem, because that’s the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.

But as Reddit started to dominate Google search results, it displaced results that might have linked to the “real home” of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.

At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you’ll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.

This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

–

There are genuine problems with copyright law. Don’t get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don’t even own the copyright to the stuff they make. It was invented to protect creators, but in practice this “protection” gets assigned to a publisher immediately after the protected work comes into being.

And then that copyright – the very same thing that was intended to protect creators – is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can’t speak out of turn. Fans can’t remix their favorite content and share it back to the community.

This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can’t pay for admission. Creators are underpaid, and their creative ambitions are redirected to what’s popular. We end up with an auto-tuned culture – insular, uncritical, and predictable. Creativity reduced to a product.

But.

If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won’t solve it by giving OpenAI a free pass to do the exact same thing at massive scale.

@afraid_of_zombies@lemmy.world

Too long didn’t read, busy downloading a car now. How much did Disney pay for this comment?

@Evotech@lemmy.world

Then LLMs should be FOSS

@BURN@lemmy.world

That does nothing to solve the problem of data being used without consent to train the models. It doesn’t matter if the model is FOSS if it stole all the data it trained on.

@afraid_of_zombies@lemmy.world

The only way I can steal data from you is if I break into your office and walk off with your hard drive. Do you have access to something? It hasn’t been stolen.

rivermonster

All AI should be FOSS and public domain, owned by the people, and all gains from its use taxed at 100%. It’s only because of the public that AI exists, through the schools, universities, NSF, grants, etc and all the other places that taxes have been poured into that created the advances upon which AI stands, and the AI critical research as well.

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

Technology

Our Rules

Approved Bots

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI saysplus-square

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI saysplus-square

Technology

Our Rules

Approved Bots

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says