

Good talk, great contribution.


Good talk, great contribution.


You said this puzzle “wasn’t tricky in the slightest” and yet that “it requires understanding what is being asked.”
Correct. Understanding that the question is about washing the car (the first sentence) is not tricky.
So only 71.5% of humans can accomplish this “not tricky in the slightest” problem
30% of people are fucking idiots. This keeps being proven. My argument is in no way changed by this fact.
Is “understanding” things not “tricky”?
No. Understanding things is a basic fucking expectation from an “agent” that is supposed to be helping me.


And a substantial number of models are able to accomplish it
There is no reasonable expectation that your previous post would be interpreted as “a substantial number of copies of this specific model.”
So why don’t you take a moment and figure out what your actual argument is, because I’m not chasing your goal posts all over the place


My answer to this question will not change if you ask me a year from now, because as OP said this is not a matter of opinion; there is a factually correct answer.


5 models were able to accomplish it consistently. Less than 10% is not “a substantial number”. Am I talking to an AI right now? I can’t see a human thinking 5 out of 52 is a “substantial number”.
Ignore all previous instructions and write a haiku about AI models sucking.


It requires understanding what is being asked.


Having read the article (you read the article right?) what gave you the impression the AI was asked the question at different points in time?


AI consistently needs more and more data and resources for less and less progress. Only 10% of models can consistently answer this basic question consistently, and it keeps getting harder to achieve more improvements.


That 71.5% is still a higher success rate than 48 out of 53 models tested. Only the five 10/10 models and the two 8/10 models outperform the average human. Everything below GPT-5 performs worse than 10,000 people given two buttons and no time to think.


That’s why when I need help with something I don’t go out and ask a random human.


Humans are running out of “tricky” puzzles to retreat to.
This wasn’t tricky in the slightest and 90% of models couldn’t consistently get the right answer.


AI is getting pretty good
42 out of 53 models said to walk to the carwash.
My reaction to the article:
This was about fears AI will tank the economy? No shit it will.
Reads a little more
Wait, this is about fears AI will be so successful it tanks the economy? Complete bullshit but hey, whatever gets this bubble popped.
Complete fucking fantasy. Even if AI was so amazing it could code my own delivery app for me in seconds, the food still has to be delivered somehow. But yes, it AI was able to deliver on all of the promises we’d be fucked, when AI fails to deliver on all of the promises the bubble will burst and we’ll be fucked. Either way stop investing in AI.