Google is cannibalizing the web to feed AI

Sahwa@reddthat.com · 7 days ago

Google is cannibalizing the web to feed AI

chunes@lemmy.world · 7 days ago

Model collapse isn’t a thing anymore. https://arxiv.org/html/2510.16657v1

Grandwolf319@sh.itjust.works · 7 days ago

Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse.

Yeah if you have a source of truth then your model is basically getting trained on that.

It’s like already having the answer

chunes@lemmy.world · 7 days ago

The point is that it only needs to comprise a very small part of the model.

Grandwolf319@sh.itjust.works · 7 days ago

My point was that having a verifier means your not really training a model on another model’s data, it’s basically as if you get new raw data from a non AI source

CmdrShepard49@sh.itjust.works · 7 days ago

Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse.

Lol, so to make a great model, they just need to have an even better one available first or a human who can verify every single thing it ingests.

Hmm, call me skeptical on this claim.

corsicanguppy@lemmy.ca · 7 days ago

This assumes everything is valid on the external. If one slop cluster feeds off another - a slopveyor? - then there is nothing external for the validation hall-monitor to compare against. They’re trusting another model’s output as if it were gospel.

artyom@piefed.social · 7 days ago

LOL OK