The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out.
Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.
This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.
The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.
EDIT: In case it isn’t clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.
I clarified the comment above which was misunderstood, whether it makes a moral/sane argument is subjective and i am not covering that.
I am not sure why you think there is a claim that openAI is trying to make companies pay, on the contrary the comment i was clarifying (so not my opinion/words) states that openAI is making an argument that anyone should be able to use copyrighted materials for free to train AI.
The costs of running an online service like chatgpt is wildly besides the argument presented. You can run your own open source large language models at home about as well as you can run Bethesda’s Starfield on a same spec’d PC
Those Open source large language models are trained on the same collections of data including copyrighted data.
The logic being used here is:
If It becomes globally forbidden to train AI with copyrighted materials or there is a large price or fine in order to use them for training then the Non-Corporate, Free, Open Source Side of AI will perish or have to go underground while to the For-Profit mega corporations will continue exploit and train ai as usual because they can pay to settle in court.
The Ethical dilemma as i understand it is:
Allowing Ai to train for free is a direct threat towards creatives and a win for BigProfit Enthertainment, not allowing it to train to free is treat to public democratic AI and a win for BigTech merging with BigCrime
That is very well put, I really wish I could have started with that.
Though I envision it as a loss for BigProfit Enthertainment since I see this as a real boon for the indie gaming, animation and eventually filmmaking industry.
I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.
Eh, the issue is lots of people wouldn’t be willing to sell tho.
Like, you think an author wants the chatbot to read their collected works and use that? Regardless of if it’s quoting full texts or “creating” text in their style.
No author is going to want that.
And if it’s up to publishers, they likely won’t either. Why take one small payday if that could potentially lead to loss of sales a few years down the row.
It’s not like the people making the chatbits just need to buy a retail copy of the text to be in the legal clear.
And using publicly available data to train gets you a shitty chatbot…
Hell, even using copyrighted data to train isn’t that great.
Like, what do you even think they’re doing here for your conspiracy?
You think OpenAI is saying they should pay for the data? They’re trying to use it for free.
Was this a meta joke and you had a chatbot write your comment?
if someone said this to me I’d cry
The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.
This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.
The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.
EDIT: In case it isn’t clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.
That’s insane logic…
Like you’re essentially saying I can copy/paste any article without a paywall to my own blog and sell adspace on it…
And your still saying OpenAI is trying to make AI companies pay?
Like, do you think AI runs off free cloud services? The hardware is insanely expensive.
And OpenAI is trying to argue the opposite, that AI companies shouldn’t have to pay to use copyrighted works.
You have zero idea what is going on, but you are really confident you do
I clarified the comment above which was misunderstood, whether it makes a moral/sane argument is subjective and i am not covering that.
I am not sure why you think there is a claim that openAI is trying to make companies pay, on the contrary the comment i was clarifying (so not my opinion/words) states that openAI is making an argument that anyone should be able to use copyrighted materials for free to train AI.
The costs of running an online service like chatgpt is wildly besides the argument presented. You can run your own open source large language models at home about as well as you can run Bethesda’s Starfield on a same spec’d PC
Those Open source large language models are trained on the same collections of data including copyrighted data.
The logic being used here is:
The Ethical dilemma as i understand it is:
That is very well put, I really wish I could have started with that.
Though I envision it as a loss for BigProfit Enthertainment since I see this as a real boon for the indie gaming, animation and eventually filmmaking industry.
It’s definitely overall quite a messy situation.
I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.
It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.
That’s what the drive for AI is all about.
No doubt.
You don’t have to use it. You can even disconnect from the internet completely.
Whats the benefit of stopping me from using it?
If the data has to be paid for, openAI will gladly do it with a smile on their face. It guarantees them a monopoly and ownership of the economy.
Paying more but having no competition except google is a good deal for them.
Eh, the issue is lots of people wouldn’t be willing to sell tho.
Like, you think an author wants the chatbot to read their collected works and use that? Regardless of if it’s quoting full texts or “creating” text in their style.
No author is going to want that.
And if it’s up to publishers, they likely won’t either. Why take one small payday if that could potentially lead to loss of sales a few years down the row.
It’s not like the people making the chatbits just need to buy a retail copy of the text to be in the legal clear.
The publisher’s will absolutely sell imo. They just publish, the book will be worth the same with or without the help of AI to write it.
I guess there is a possibility that people start replacing bought books with personalized book llm outputs but that strikes me as unlikely.