New York Becomes First State to Ban Bots That Scrape News Sites

TryingToBeGood@reddthat.com · 18 hours ago

New York Becomes First State to Ban Bots That Scrape News Sites

Zarxrax@lemmy.world · 15 hours ago

Inaccurate headline. The bill doesn’t ban web scraping, it just requires that bots accurately identify themselves through the user agent string, and maybe some additional requirements to disclose the purpose of scraping the data.

deliriousdreams@fedia.io · 12 hours ago

And there’s a fine if the company doesn’t comply which is basically now gonna be considered the cost of doing business

toiletobserver@lemmy.world · 18 hours ago

How will they enforce it? Pretty please?

TryingToBeGood@reddthat.com · 18 hours ago

The New York measure defines a stealth crawler as any software that retrieves, scrapes or otherwise accesses a website, including AI agents. Under the bill, the attorney general’s office would be able to sue companies that fail to disclose such activity. Violations could net civil penalties of up to $15,000 per day.

🤔

Pika@sh.itjust.works · 17 hours ago

“This website or search engine is not designed for the state of new york” I assume is going to be a disclaimer we will be seeing soon.

pelespirit@sh.itjust.works · 17 hours ago

Soooo, that means archive.is too. This fucking sucks.

XLE@piefed.social · edit-2 15 hours ago

I hope those tools are exempt because, just like a browser, they respond only to specific commands issued by a human user. They don’t “crawl” pages in the way we describe bots that jump from page to page.

Prove_your_argument@piefed.social · 15 hours ago

Isn’t there something like 340 news sites that actively block them already?

toiletobserver@lemmy.world · 17 hours ago

Thanks, that becomes the cost of doing business. Harumph.

AceFuzzLord@lemmy.zip · 17 hours ago

The only bots I hope gets exempt are Internet Archive bots. Only bots I support.

Rob T Firefly@lemmy.world · 14 hours ago

The article is about “stealth bots” that don’t identify themselves as such. The Internet Archive bots have always been clearly identifiable.

rob200@retrofed.com · 15 hours ago

Realistically what good would it do once you already had scraped the pattern of news sites it’s already over. All this is doing in actuality is preventing new start ups from competing in the ai space. so really this is the fastest enshitification world record of a medium. Whether you like or hate ai this is actually an enshitification of it. ( I hate ai.)

nullspace@lemmy.world · 14 hours ago

I’m guessing it’s to eliminate the issue of a site not getting clicks because the article you were about to read is already summarized for you. It also opens the door for revenue negotiations for allowing their content to be scraped for that purpose, as the scraper bots would now be identified.

Pika@sh.itjust.works · 17 hours ago

Honestly, I would love if forced ident was required. but archival services need a hard exemption from being blocked as well.

tangeli@piefed.social · 17 hours ago

Why only news sites?

Any bot that ignores robots.txt should be banned.