The critical window of shadow libraries

@General_Effort@lemmy.world

via https://duckduckgo.com/?q=DuckDuckGo+AI+Chat&ia=chat&duckai=1 with GPT-4o mini

@General_Effort@lemmy.world

The FTC under Biden has begun to push back against tech monopolies.

@General_Effort@lemmy.world

This isn’t sci-fi AI. It’s not going to use itself. It’s a new tool for digital artists, including VFX artists. I expect it will benefit them by making them more useful.

@General_Effort@lemmy.world

That’s what’s so depressing about lemmy. People convince me that there is some genuine issue that should be addressed. The mob grabs torches and pitchforks and goes to demand that… Money be given to rich people instead of changing anything. It certainly makes you understand why the world is as it is and that it will only become more so.

@General_Effort@lemmy.world

I see how I misunderstood.

This conception of individual rights seems rather ad hoc. I don’t think I could have guessed that that’s what you meant, rather than copyrights.

I don’t see the connection to copyright, in any case. How does fair use interfere with anyone’s right to earn a living? And if it does, why support the Internet Archive?

@General_Effort@lemmy.world

As far as I can tell, this community hates open models just as much as any others. Some seem to hate them even more. That’s the point about this “nightshade” tool.

@General_Effort@lemmy.world

I may not be understanding the logic here. It sounds like your issue is control. You want to have control over media you bought, and you want to have control over AI models rather than just a subscription.

There are a number of open models. As far as I can see, these are also largely rejected by this community. In lawsuits against their makers, the community also sides against fair use.

@General_Effort@lemmy.world

Why isn’t the fact that AI is largely garnering the same responses even from DIAMETRICALLY OPPOSED GROUPS telling you something about how bad of an idea it is in it’s current incarnation?

I’m not seeing anything remarkable from organized groups. For example, the Internet Archive and libraries favor strong fair use. The copyright industry obviously sees this as an opportunity to expand property rights against the public interest. Tech companies have always been on either side, depending on their particular interest. Basically, everyone is on the usual side, just as you’d expect. Only on social media are things kinda weird. I don’t think people are considering their own interests, but I really don’t get what drives this.

@General_Effort@lemmy.world

You have a corporation that doesn’t want to spend money to care for individual copyrights, or even lose customers over it. That describes ISPs. Still, people side with the corporation.

When you say individual rights, you, of course, mean copyrights; intellectual property rights. Giving property such a high priority is such a clash to the otherwise anti-capitalist attitudes here. It’s not just pro capitalist. It’s pro conservative capitalist.

@General_Effort@lemmy.world

I never understand how this community relates to copyright. It’s all the freedom of the high seas until AI gets mentioned. Then the most dogmatic copyright maximalists come out It’s all anti-capitalist until AI is mentioned and then the most conservative, devout Ayn Rand followers show up.

@General_Effort@lemmy.world

Yes, I shouldn’t bother replying in these threads. In truth, I’ve already given up on this community but sometimes when I’m bored I can’t help a little peek. Maybe in a few years, some of the smarter ones will wonder why nothing ever came of this. Anyway, be careful with those AI detectors. They don’t work and sooner or later someone is going to get in trouble over that.

@General_Effort@lemmy.world

There is no problem with ingesting synthetic data. Well, at least none coming from the fact that it is synthetic. If there was a fundamental difference between the 1s and 0s encoding synthetic data and the 1s and 0s encoding any other data, then you could easily filter it. But there isn’t. The ideas that this community has are magical thinking.

@General_Effort@lemmy.world

How am I supposed to take seriously an article that misuses a basic term like “scraping”?

@General_Effort@lemmy.world

No. I simply don’t see a plausible scenario for that. The social media comments are quite deplorable. You really have to look for bubbles with educated people. I don’t know why this gets so much traction. Maybe it’s because the copyright industry likes it, or maybe it feeds some psychological need like Intelligent Design.

@General_Effort@lemmy.world

hindered.

I doubt that.

@General_Effort@lemmy.world

Hmm. Per Facebook v. Power Ventures, it could be a (criminal) violation of the CFAA to “circumvent” IP blocks.

@General_Effort@lemmy.world

Heh. Funny that this comment is uncontroversial. The Internet Archive supports Fair Use because, of course, it does.

This is from a position paper explicitly endorsed by the IA:

Based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.

By

Library Copyright Alliance
American Library Association
Association of Research Libraries

@General_Effort@lemmy.world

The copyright industry wants money. So, 4 legs good, 2 legs better. It’s depressing to see how easily people are led around by the nose.

@General_Effort@lemmy.world

Let’s engage in a little fantasy. Someone invents a magic machine that is able to duplicate apartments, condos, houses, … You want to live in New York? You can copy yourself a penthouse overlooking the Central Park for just a few cents. It’s magic. You don’t need space. It’s all in a pocket dimension like the Tardis or whatever. Awesome, right? Of course, not everyone would like that. The owner of that penthouse, for one. Their multi-million dollar investment is suddenly almost worthless. They would certainly demand that you must not copy their property without consent. And so would a lot of people. And what about the poor construction workers, ask the owners of constructions companies? And who will pay to have any new house built?

So in this fantasy story, the government goes and bans the magic copy machine. Taxes are raised to create a big new police bureau to monitor the country and to make sure that no one use such a machine without a license.

That’s turned from magical wish fulfillment into a dystopian story. A society that rejects living in a rent-free wonderland but instead chooses to make itself poor. People work to ensure poverty, not to create wealth.

You get that I’m talking about data, information, knowledge. The first magic machine was the printing press. Now we have computers and the Internet.

I’m not talking about a utopian vision here. Facts, scientific theories, mathematical theorems, … All such is free for all. Inventors can get patents, but only for 20 years and only if they publish them. They can keep their invention secret and take their chances. But if they want a government enforced monopoly, they must publish their inventions so that others may learn from it.

In the US, that’s how the Constitution demands it. The copyright clause: [The United States Congress shall have power] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

Cutting down on Fair Use makes everyone poorer and only a very few, very rich people richer. Have you ever thought about where the money goes if AI training requires a license?

For example, to Reddit, because Reddit has rights to all those posts. So do Facebook and Xitter. Of course, there’s also old money, like the NYT or Getty. The NYT has the rights to all their old issue about a century back. If AI training requires a license, they can sell all their old newspapers again. That’s pure profit. Do you think they will their employees raises out of the pure goodness of their heart if they win their lawsuits? They have no legal or economics reason to do so. The belief that this would happen is trickle-down economics.

@General_Effort@lemmy.world

“AI has the potential to disrupt many professions, not just individual creators. The response to this disruption (e.g., support for worker retraining through institutions such as community colleges and public libraries) should be developed on an economy-wide basis, and copyright law should not be treated as a means for addressing these broader societal challenges.” Going down a typical copyright path of creating new rights and licensing markets could, for AI, serve to worsen social problems like inequality, surveillance and monopolistic behavior of Big Tech and Big Media.

Second, any new copyright regulation of AI should not negatively impact the public’s right and ability to access information, knowledge, and culture. A primary purpose of copyright is to expand access to knowledge. See Authors Guild v. Google, 804 F.3d 202, 212 (2d Cir. 2015) (“Thus, while authors are undoubtedly important intended beneficiaries of copyright, the ultimate, primary intended beneficiary is the public, whose access to knowledge copyright seeks to advance . . . .”). Proposals to amend the Copyright Act to address AI should be evaluated by the impact such new regulations would have on the public’s access to information, knowledge, and culture. In cases where proposals would have the effect of reducing public access, they should be rejected or balanced out with appropriate exceptions and limitations.

Third, universities, libraries, and other publicly-oriented institutions must be able to continue to ensure the public’s access to high quality, verifiable sources of news, scientific research and other information essential to their participation in our democratic society. Strong libraries and educational institutions can help mitigate some of the challenges to our information ecosystem, including those posed by AI. Libraries should be empowered to provide access to educational resources of all sorts– including the powerful Generative AI tools now being developed.

Perhaps controversial statements.

@General_Effort@lemmy.world

Now, now, let’s not get hung upon our differences. We all took different journeys to get here. The important thing is that we all agree now that property owners are entitled to a share of the money that other people make with their labor. Obviously only intellectual property owners. I’m sure those filthy landlords are still parasites. It’s not like apartments can be copied at almost 0 cost.

@General_Effort@lemmy.world

In a way this thread is heart-warming. There are so many different people here - liberals, socialists, anarchists, communists, progressives, … - and yet they can all agree on 1 fundamental ethical principle: The absolute sanctity of intellectual property.

@General_Effort@lemmy.world

He has committed the greatest crime imaginable! A crime against capitalism!

@General_Effort@lemmy.world

[French media] said the investigation was focused on a lack of moderators on Telegram, and that police considered that this situation allowed criminal activity to go on undeterred on the messaging app.

Europe defending its citizens against the tech giants, I’m sure.

@General_Effort@lemmy.world

Conservatives hate the ACLU. That’s not new.

@General_Effort@lemmy.world

It’s rather more than that. In the very least, it is a DRM system, meant to curtail fair use. We’re not just talking about AI training. The AutoTLDR bot here would also be affected. Manually copy/pasting articles while removing the metadata becomes illegal. Platforms have a legal duty to stop copyright infringement. In practice, they will probably have to use the metadata label to stop reposts and re-uploads of images and articles.

This bill was obviously written by lobbyists for major corpos like Adobe. This wants to make the C2PA standard legally binding. They have been working on this for the last couple years. OpenAI already uses it.

In the very least, this bill will entrench the monopolies of the corporations behind it; at the expense of the rights of ordinary people.

I don’t think it’ll stop there. Look at age verification laws in various red states and around the world. Once you have this system in place, it would be obvious to demand mandatory content warnings in the metadata. We’re not just talking about erotic images but also about articles on LGBTQ matters.

More control over the flow of information is the way we are going anyway. From age-verification to copyright enforcement, it’s all about making sure that only the right people can access certain information. Copyright used to be about what businesses can print a book. Now it’s about what you can do at home with your own computer. We’re moving in this dystopian direction, anyway, and this bill is a big step.

The bill talks about “provenance”. The ambition is literally a system to track where information comes from and how it is processed. If this was merely DRM, that would be bad enough. But this is an intentionally dystopian overreach.

EG you have cameras that automatically add the tracking data to all photos and then photoshop adds data about all post-processing. Obviously, this can’t be secure. (NB: This is real and not hypothetical. More)

The thing is, a door lock isn’t secure either. It takes seconds to break down a door, or to break a window instead. The secret ingredient is surveillance and punishment. Someone hears or sees something and calls the police. To make the ambition work, you need something at the hardware level in any device that can process and store data. You also need a lot of surveillance to crack down on people who deal in illegal hardware.

I’m afraid, this is not as crazy as it sounds. You may have heard about the recent “Chat Control” debate in the EU. That is a proposal, with a lot of support, that would let police scan the files on a phone to look for “child porn” (mind that this includes sexy selfies that 17-year-olds exchange with their friends). Mandatory watermarking, that let the government trace a photo to the camera and its owner, is mild by comparison.

The bill wants government agencies like DARPA to help in the development of better tracking systems. Nice for the corpos that they get some of that tax money. But it also creates a dynamic in the government that will make it much more likely that we continue on a dystopian path. For agencies, funding will be on the line; plus there are egos. Meanwhile, you still have the content industry lobbying for more control over its intellectual “property”.

@General_Effort@lemmy.world

This bill reads like it was written by Adobe.

This provenance labelling scheme already exists. Adobe was a major force behind it. (see here: https://en.wikipedia.org/wiki/Content_Authenticity_Initiative ). This bill would make it so that further development will be tax-funded through organizations like DARPA.

Of course, they are also against fair use. They pay license fees for AI training. For them, it means more cash flow.

@General_Effort@lemmy.world

Don’t be a fool. Of course, content corporations like Disney or the NYT are able to prove just when something was made.

@General_Effort@lemmy.world

This is a brutally dystopian law. Forget the AI angle and turn on your brain.

Any information will get a label saying who owns it and what can be done with it. Tampering with these labels becomes a crime. This is the infrastructure for the complete control of the flow of all information.

@General_Effort@lemmy.world

The article lies about what’s in the bill. No idea why.

https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202320240SB1047

@General_Effort@lemmy.world

Why is she claiming that the bill is about liability?

@General_Effort@lemmy.world

The winners of a system don’t have an incentive to undermine the rules. Quite the opposite. The NYT wants these rules because it would benefit from them. There are at least 2 image generators that adhere to capitalist ethics. I don’t know what Claro uses, but I see no indication that they are being uppity.

@General_Effort@lemmy.world

Another rubbish hit piece on open source.

@General_Effort@lemmy.world

It’s not. It’s supposed to target certain open source AIs (Stable Diffusion specifically).

Latent diffusion models work on compressed images. That takes less resources. The compression is handled by a type of AI called VAE. For this attack to work, you must have access to the specific VAE that you are targeting.

The image is subtly altered so that the compressed image looks completely different from the original. You can only do that if you know what the compression AI does. Stable Diffusion is a necessary part of the Glaze software. It is ineffective against any closed source image generators that have trained their own VAE (or equivalent).

This kind of attack is notoriously fickle and thwarted by even small changes. It’s probably not even very effective against the intended target.

If you’re all about intellectual property, it kinda makes sense that freely shared AI is your main enemy.

@General_Effort@lemmy.world

As far as I know, federated learning is pretty much dead. The point would be that it allows organizations to create a joint model without sharing data. But it doesn’t look like anyone who doesn’t want to share data wants to share a model.

@General_Effort@lemmy.world

Thanks for the explanation. You are certainly more polite and productive than most people here.

The DMCA gives explicit rules on takedowns in section c here. Complying with DMCA notices is not adjudicating the law, nor prosecuting anyone. It is simply taking the necessary steps to avoid liability. If youtube were to prosecute fraudulent DMCA notices, then it would be engaging in (probably) criminal vigilantism.

Courts have ruled that merely reacting to DMCA notices is not sufficient to avoid liability. Youtube was taken to court over this, and Content ID is the result. (EU law is considerably harsher and positively demands something like it,)

It was a predicted consequence of these laws that they would favor major rights-holders. Mind that the same people here, who want youtube to adjudicate the law, also are against fair use. They would have cheered these lawsuits against youtube/Big Tech, just as cheer now cheer lawsuits against fair use. They want more capitalism. Maybe they delude themselves into thinking that more of the same will have a different outcome.

@General_Effort@lemmy.world

I understand the insanity. They want a private company to prosecute “fraud”. Yikes. Less Ayn Rand and more civics lessons, please.

@General_Effort@lemmy.world

Most false claims are not fraudulent. The burden of proof is where US law puts it.

Thanks for explaining how people see this.

@General_Effort@lemmy.world

Over 1 billion claims from Content ID in 6 months…

@General_Effort@lemmy.world

Well, what is the point?

The critical window of shadow libraries

The critical window of shadow libraries

Internet Archive Submits Comments on Copyright and Artificial Intelligence

Internet Archive Submits Comments on Copyright and Artificial Intelligence

Top EU Court Says There’s No Right To Online Anonymity, Because Copyright Is More Important

Top EU Court Says There’s No Right To Online Anonymity, Because Copyright Is More Important

Mozilla Builders Accelerator 2024 Advancing innovation in open source AI

Mozilla Builders Accelerator 2024 Advancing innovation in open source AI

The critical window of shadow libraries

The critical window of shadow libraries

Internet Archive Submits Comments on Copyright and Artificial Intelligence

Internet Archive Submits Comments on Copyright and Artificial Intelligence

Top EU Court Says There’s No Right To Online Anonymity, Because Copyright Is More Importantplus-square

Top EU Court Says There’s No Right To Online Anonymity, Because Copyright Is More Importantplus-square

Mozilla Builders Accelerator 2024 Advancing innovation in open source AI

Mozilla Builders Accelerator 2024 Advancing innovation in open source AI

Top EU Court Says There’s No Right To Online Anonymity, Because Copyright Is More Important

Top EU Court Says There’s No Right To Online Anonymity, Because Copyright Is More Important