At least 3 major outlets — The New York Times, The Guardian, and Reddit — have blocked the Internet Archive’s Wayback Machine from accessing their content

Innerworld@lemmy.world · 1 day ago

At least 3 major outlets — The New York Times, The Guardian, and Reddit — have blocked the Internet Archive’s Wayback Machine from accessing their content

lechekaflan@lemmy.world · 3 hours ago

Fuck you spez

M0oP0o@mander.xyz · 12 hours ago

I noticed a few days ago when looking into americans leaving loaded firearms in ovens that we are losing archived news. I would find an article or story that is just missing now, all it is is a headline link to no where. And I have seen this trend on all things, we are losing the knowledge and for no other reason then the possibility of an extra dollar at some point. Take this and mix in the overwhelming amount of LLM generated bullshit pretending to be information tailored to peoples perceived interests (if you live in a religious area for example you see more religious bullshit) and we have almost inescapable silos.

I don’t think I need to explain how dangerous this is.

AlexLost@lemmy.world · 9 hours ago

Remember to support your local library. Only physically written words are going to be safe in the coming age.

M0oP0o@mander.xyz · 8 hours ago

As someone that has been on my local library board… I got bad news for you on that one. Libraries are culling books like never before, facing licensing issues like never before, and funding issues like never before.

traxex@lemmy.dbzer0.com · 16 hours ago

Reminder to donate to the Internet Archive so they can keep fighting the good fight.

Saryn@lemmy.world · 19 hours ago

Content scraping is harming the information business in ways that could not have been foreseen.

What an absolute ridiculous thing to say.

Cantaloupe@lemmy.fedioasis.cc · 12 hours ago

Gee, I wonder what else could be scraping content from all these websites?

REDACTED@infosec.pub · 13 hours ago

To be fair, the archive indeed got heavily abused into simply reading without paywalls. I know this is a controversial opinion, but seeing comments on other threads like “Remember to support news media”, then “use archive to bypass paywalls” then anger towards said companies for caring about getting paid or growing, makes one question where exactly does Lemmy draw the line between pirating and paid content. Or are we simply altogether against sites like 404Media just because of paywalls?

Saryn@lemmy.world · 11 hours ago

That’s not the point. The point is content scraping (and crawling) is the cornerstone of the contemporary information environment. It’s how we got to this technological paradigm in the first place.

This whole “people are bypassing paywalls” is a badly evidenced non-issue, and all too convenient. What these companies are really saying is “Content scraping is bad when others do it. Only I and other big fish get to do it and profit billions out of it. Fuck ordinary citizens. Fuck everyone and everything but me and my dreams of endless wealth and power.”

To be fair.

gagcar@lemmus.org · 2 hours ago

You say bypassing paywalls is a non-issue, but it is basically the only thing I have heard people say to use it for on social media. You can have your problems about data harvesting, but don’t pretend like getting around paywalls was not what the average individual user was using it for.

ameancow@lemmy.world · 18 hours ago

“This isn’t letting us shape reality, that’s our entire business model, we are working tirelessly to shape people’s reality so this is definitely a no-go.”

Takeshidude@lemmy.world · 18 hours ago

Start self-hosting archive box They cant block everyone

fierysparrow89@lemmy.world · 4 hours ago

If you have a concrete suggestion as to the stack don’t hold back 😃

green_goglin@thelemmy.club · 23 hours ago

Nobody tell NYT about being able to add another “.” Subsequent to”.com” to bypass their paywall.

brucethemoose@lemmy.world · 13 hours ago

Why does this work? Is it a deliberate bypass on the NYT’s part?

green_goglin@thelemmy.club · 12 hours ago

No idea, but I love it.

anon_8675309@lemmy.world · 12 hours ago

Hmmm. Interesting.

gAlienLifeform@lemmy.world · 18 hours ago

I’m probably screwing it up here, but neither of these are working for me

https://www.nytimes.com.2026/02/04/us/politics/supreme-court-california-congressional-map.html

https://www.nytimes…com/2026/02/04/us/politics/supreme-court-california-congressional-map.html

SocialMediaRefugee@lemmy.world · edit-2 5 hours ago

Put the extra “.” after the “.com” so “.com.”

gAlienLifeform@lemmy.world · 17 hours ago

Ah, https://www.nytimes.com/2026/02/04/us/politics/supreme-court-california-congressional-map.html won’t work on my usual browser (which just ends up loading NYTs homepage) but it does work in a Chrome incognito window

Thank you!

green_goglin@thelemmy.club · edit-2 12 hours ago

you’re welcome:

Viking_Hippie@lemmy.dbzer0.com · edit-2 4 hours ago

Still getting this bullshit:

green_goglin@thelemmy.club · 3 hours ago

beep boop

M0oP0o@mander.xyz · 12 hours ago

https://www.nytimes.com/2026/02/04/us/politics/supreme-court-california-congressional-map.html

You are missing the .com. part…

green_goglin@thelemmy.club · 12 hours ago

oof I brainfarted - markdown and code auto formats and in doing so autocorrects.

M0oP0o@mander.xyz · 12 hours ago

It happens, just wanted to point it out in case people trying it thought it did not actually work

gAlienLifeform@lemmy.world · 16 hours ago

I think auto complete or something might have messed with what you intended to post, that link still hits the paywall for me, but using your guidance I was eventually able to figure out that

nytimes.com./2026 etc.

works in a Chrome incognito window. The “.” after “com” and the “/” after that “.” are apparently the critical bits

stegosaur@lemmy.world · 21 hours ago

Awesome, this is the best paywall hack I have ever seen!

tackleberry@thelemmy.club · 23 hours ago

Fuck Reddit. That website has been selling our data and using it to train AI… I say fuck 'em

Buddahriffic@lemmy.world · 13 hours ago

FYI, any data on Lemmy can be used for the same for free. The federation infra can even be used to give AI models more direct access than even reddit is likely giving them. Just in case anyone is assuming that because this is community-run that it means the data isn’t being sold. It’s not, but it is being accessed by the same entities, if they want it.

brucethemoose@lemmy.world · 13 hours ago

The users are still the users though, not the product like Reddit.

Buddahriffic@lemmy.world · 12 hours ago

Oh yeah, not saying they are generally equivalent, just in that one particular aspect: access to comment data for any purpose.

brucethemoose@lemmy.world · 12 hours ago

Yep.

TBH I think it’s kind of silly for the Fediverse to try and block scraping, as long as that scraping isn’t effectively a DDoS. It’s public.

ameancow@lemmy.world · 18 hours ago

about 1 out of every 5 posts is an advertisement in disguise, and about 15% or more of users are actually bots.

All of this is expected as a consequence of partnership with AI companies and google, and the site is basically walking dead, just a shell of corporate interests, manufactured conversations, algorithmically fed bait posts and so on… but it is a tad creepy how many of the AI bots keep making posts in “explain the joke” subreddits. We are so fucked.

tackleberry@thelemmy.club · 16 hours ago

great catch! you can actually see the AI slop when it pops up. REddit is dead, and you should delete your data from that cesspool

ameancow@lemmy.world · 15 hours ago

I got 12 years of some of the top submissions and comments of all time before they banned then repeatedly shadowbanned me for no reason*, I will leave my data there because I want our granddroids to learn the very best from us.

* It was speaking out against genocide and using stern language talking to MAGA chuds I think

Tony Bark@pawb.social · 1 day ago

Really? They think Internet Archive is the problem?

ameancow@lemmy.world · 17 hours ago

Yah it’s a problem for their agenda of manufacturing culture, social discourse and consent for hundreds of millions of people.

Tollana1234567@lemmy.today · 3 hours ago

its thier excuse to silence dissent, when the time comes to censor overreaching “political trends”

AmbitiousProcess (they/them)@piefed.social · 1 day ago

They think AI companies are using it as a “backdoor” to scrape their content. Which is patently ridiculous, but that won’t stop them.

Tollana1234567@lemmy.today · 3 hours ago

reddit already allows AI to scrape the site, specifically google.

ohulancutash@feddit.uk · 1 day ago

They think they want their revenue streams

gAlienLifeform@lemmy.world · edit-2 23 hours ago

Is the Guardian actually blocking the Internet Archive? Seems to work for me

https://web.archive.org/web/20260224104430/https://www.theguardian.com/us-news/2026/feb/23/trump-iran-airstrikes-nuclear-deal

Meanwhile,

https://web.archive.org/web/20260224121247/https://www.mediapost.com/publications/article/413017/ai-basic-training-newsrooms-offer-little-practica.html?initial_article=412911&es_index_start=3&es_index=0

e; huh, Mediapost article did in fact start loading on the Archive a few minutes after I posted this

CombatWombat@feddit.online · 1 day ago

I’m certain they’ve wanted to do this for a long time, and AI is a convenient way to justify it, rather than admitting they don’t want humans using it to circumvent the paywall. It does solidify for me personally that the LA Times is the paper of record for the United States going forward, rather than the New York Times.

hector@lemmy.today · 17 hours ago

I just got a gift subscription to the NYTimes, for the first time since I quit in 2018, and it’s really gone downhill. I am learning about more big scoops from the guardian from lemmy posts than I see in their paper. I think Israel’s final solution for gaza here broke their brain, they had an identity crisis and sided with Israel and fascism over all the fourth estate democracy mumbo jumbo.

They haven’t broken a single big story that I recall in the past year. Not a single one, even the wall street journal published epstein’s birthday letter from the president. The NYTimes gave up, they are no longer the paper of record, whatever problems before they covered events more thoroughly and had courage to break big stories, and now they don’t.

teslekova@sh.itjust.works · 1 hour ago

That’s actually pretty sad. Also a serious problem for the USA. NYT, for all its faults, really was the best one.

gAlienLifeform@lemmy.world · 23 hours ago

The LA Times also blocks the Internet Archive unfortunately. I’d recommend PBS NPR ProPublica or some other nonprofit organization for your US paper of record.

CombatWombat@feddit.online · 19 hours ago

Ugh. Thanks for the heads’ up — I’ve definitely posted archive links without noticing they’re blocked before. PBS and NPR have really gone downhill with the budget cuts. ProPublica is great, but their coverage is pretty narrow, so there’s a lot of stories they don’t cover at all. It’s getting harder and harder to find a quality source.

cecinestpasunbot@lemmy.ml · 15 hours ago

Unfortunately, I think most quality sources with broad coverage aren’t free. Even the paid sources almost always have a corporate bias. Of those the financial times probably does the least to editorialize. Beyond that I think you just have to find independent journalists or outlets with a narrower investigative focus that you can trust.

WesternInfidels@feddit.online · 22 hours ago

The South African billionaire paper that wouldn’t endorse Harris? Well, our options all suck, I guess.

9tr6gyp3@lemmy.world · 1 day ago

Wait until they find out that AI is scraping their web sites.

ameancow@lemmy.world · edit-2 15 hours ago

All of these companies only benefit from AI being employed to manufacture consent, alter reality and shape people’s social trends and habits. This is why they don’t want their data archived, they want to be able to use mobs of AI agents disguised as people to shape narrative and decide what people think is true.

It’s already in massive progress across Reddit because it’s so easy to disperse undercover AI instances and create conversations to influence people.

Even if you think yourself to be a critical thinker and reasonable, if you go into a huge, popular post and everyone in there is saying how the sky is green, and you ask what they’re talking about because you know the sky is blue, and then dozens of people pile on you, downvote you, call you names and insult you for believing in false facts and calling you naive and easily programmed, you’re really going to question reality and you may even go outside to take a second look at the sky.

Of course, they wouldn’t do anything this bold, they will instead make far more subtle forms of “common knowledge” sentiments, able to change the minds of people who are otherwise smart and logical, but like every person, everywhere, just wants to fit in. So if those people see constant messages like “Of course it’s not a genocide, that was obviously manufactured propaganda, I have a brother over there and he’s saying…” etc, etc. That will absolutely change public perception of events and issues. To a cataclysmic degree.

It’s already happening and it’s even happening here. Everyone needs to get a lot more skeptical and a lot less online.

teslekova@sh.itjust.works · 1 hour ago

Ideally, we would have our own personally run AI instances that can give us a probability that what we are reading is LLM generated. It’s still pretty good at recognising itself. That will be an arms race, though.

Tollana1234567@lemmy.today · 3 hours ago

reddit trying to achieve what FB is doing, meta already has complete control of FB what they push as propaganda.

The Velour Fog @lemmy.world · 1 day ago

Well, Reddit’s got a contract for AI companies to scrape their content, so pig boy Spez is getting paid, he don’t give a fuck

Fuckfuckmyfuckingass@lemmy.world · 1 day ago

I’m sure they don’t care, or are all about it.

Aritoteles@lemmy.world · 12 hours ago

is this a meme? I don’t get the joke.

TrackinDaKraken@lemmy.world · 1 day ago

Gotta control the press before you can rewrite history.

user314_lemmus_v3s@lemmy.world · 1 day ago

I wander what happened to Archive in 2024 when it was “hacked” and some pages “disappeared”…

Formfiller@lemmy.world · 1 day ago

That’s very 1984 of them

At least 3 major outlets — The New York Times, The Guardian, and Reddit — have blocked the Internet Archive’s Wayback Machine from accessing their content

At least 3 major outlets — The New York Times, The Guardian, and Reddit — have blocked the Internet Archive’s Wayback Machine from accessing their content

Not In Our Back Yard: Publishers Block Wayback Machine