Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

@nutsack@lemmy.world

cracks? it doesn’t even exist. we figured this out a long time ago.

@CombatWombat1212@lemmy.ml

So do I every time I ask it a slightly complicated programming question

@anon_8675309@lemmy.world

Did anyone believe they had the ability to reason?

@Aeri@lemmy.world

People are stupid OK? I’ve had people who think that it can in fact do math, “better than a calculator”

Semperverus

I still believe they have the ability to reason to a very limited capacity. Everyone says that they’re just very sophisticated parrots, but there is something emergent going on. These AIs need to have a world-model inside of themselves to be able to parrot things as correctly as they currently do (yes, including the hallucinations and the incorrect answers). Sure they are using tokens instead of real dictionary words, which comes with things like the strawberry problem, but just because they are not nearly as sophisticated as us doesnt mean there is no reasoning happening.

We are not special.

@trolololol@lemmy.world

What’s the strawberry problem? Does it think it’s a berry? I wonder why

@galanthus@lemmy.world

If the only thing you feed an AI is words, then how would it possibly understand what these words mean if it does not have access to the things the words are referring to?

If it does not know the meaning of words, then what can it do but find patterns in the ways they are used?

This is a shitpost.

We are special, I am in any case.

Semperverus

It is akin to the relativity problem in physics. Where is the center of the universe? What “grid” do things move through? The answer is that everything moves relative to one another, and somehow that fact causes the phenomena in our universe (and in these language models) to emerge.

Likewise, our brains do a significantly more sophisticated but not entirely different version of this. There are more “cores” in our brains that are good at differen tasks that all constantly talk back and forth between eachother, and our frontal lobe provides the advanced thinking and networking on top of that. The LLMs are more equivalent to the broca’s area, they havent built out the full frontal lobe yet (or rather, the “Multiple Demand network”)

You are right in that an AI will never know what an apple tastes like, or what a breeze on its face feels like until we give them sensory equipment to read from.

In this case though, its the equivalent of a college student having no real world experience and only the knowledge from their books, lectures, and labs. You can still work with the concepts of and reason against things you have never touched if you are given enough information about them beforehand.

@galanthus@lemmy.world

The two rhetorical questions in your first paragraph assume the universe is discrete and finite, and I am not sure why. But also, that has nothing to do with what we are talking about. You think that if you show the computers and brains work the same way(they don’t), or in a similar way(maybe) I will have to accept an AI can do everything a human can, but that is not true at all.

Treating an AI like a subject capable of receiving information is inaccurate, but I will still assume it is identical to a human in that regard for the sake of argument.

It would still be nothing like a college student grappling with abstract concepts. It would be like giving you university textbooks on quantum mechanics written in chinese, and making you study them(it would be even more accurate if you didn’t know any language at all). You would be able to notice patterns in the ways the words are placed relative to each other, and also use this information(theoretically) to make a combination of characters that resembles the texts you have, but you wouldn’t be able to understand what they reference. Even if you had a dictionary you wouldn’t be, because you wouldn’t be able to understand the definitions. Words don’t magically have their meanings stored inside, they are jnterpreted in our heads, but an AI can’t do that, the word means nothing to it.

@whotookkarl@lemmy.world

Here’s the cycle we’ve gone through multiple times and are currently in:

AI winter (low research funding) -> incremental scientific advancement -> breakthrough for new capabilities from multiple incremental advancements to the scientific models over time building on each other (expert systems, LLMs, neutral networks, etc) -> engineering creates new tech products/frameworks/services based on new science -> hype for new tech creates sales and economic activity, research funding, subsidies etc -> (for LLMs we’re here) people become familiar with new tech capabilities and limitations through use -> hype spending bubble bursts when overspend doesn’t keep up with infinite money line goes up or new research breakthroughs -> AI winter -> etc…

kingthrillgore

I feel like a draft landed on Tim’s desk a few weeks ago, explains why they suddenly pulled back on OpenAI funding.

People on the removed superfund birdsite are already saying Apple is missing out on the next revolution.

@BreadstickNinja@lemmy.world

“Superfund birdsite” I am shamelessly going to steal from you

kingthrillgore

please, be my guest.

@RaoulDook@lemmy.world

I hope this gets circulated enough to reduce the ridiculous amount of investment and energy waste that the ramping-up of “AI” services has brought. All the companies have just gone way too far off the deep end with this shit that most people don’t even want.

@WrenFeathers@lemmy.world

Someone needs to pull the plug on all of that stuff.

@jabathekek@sopuli.xyz

@seaQueue@lemmy.world

@jabathekek@sopuli.xyz

*starts sweating

Look at that subtle pixel count, the tasteful colouring… oh my god, it’s even transparent…

@WhatAmLemmy@lemmy.world

The results of this new GSM-Symbolic paper aren’t completely new in the world of AI research. Other recent papers have similarly suggested that LLMs don’t actually perform formal reasoning and instead mimic it with probabilistic pattern-matching of the closest similar data seen in their vast training sets.

WTF kind of reporting is this, though? None of this is recent or new at all, like in the slightest. I am shit at math, but have a high level understanding of statistical modeling concepts mostly as of a decade ago, and even I knew this. I recall a stats PHD describing models as “stochastic parrots”; nothing more than probabilistic mimicry. It was obviously no different the instant LLM’s came on the scene. If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…

@aesthelete@lemmy.world

If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…

This is outrageous! I mean the pure gall of suggesting journalists should be something other than part of a human centipede!

no banana

It’s written as if they literally expected AI to be self reasoning and not just a mirror of the bullshit that is put into it.

@Sterile_Technique@lemmy.world

Probably because that’s the common expectation due to calling it “AI”. We’re well past the point of putting the lid back on that can of worms, but we really should have saved that label for… y’know… intelligence, that’s artificial. People think we’ve made an early version of Halo’s Cortana or Star Trek’s Data, and not just a spellchecker on steroids.

The day we make actual AI is going to be a really confusing one for humanity.

JDPoZ

…a spellchecker on steroids.

Ask literally any of the LLM chat bots out there still using any headless GPT instances from 2023 how many Rs there are in “strawberry,” and enjoy. 🍓

@Sterile_Technique@lemmy.world

That was both hilarious and painful.

And I don’t mean to always hate on it - the tech is useful in some contexts, I just can’t stand that we call it ‘intelligence’.

Semperverus

This problem is due to the fact that the AI isnt using english words internally, it’s tokenizing. There are no Rs in {35006}.

@jabathekek@sopuli.xyz

describing models as “stochastic parrots”

That is SUCH a good description.

@fluxion@lemmy.world

Clearly this sort of reporting is not prevalent enough given how many people think we have actually come up with something new these last few years and aren’t just throwing shitloads of graphics cards and data at statistical models

sircac

They predict, not reason…

The Snark Urge

One time I exposed deep cracks in my calculator’s ability to write words with upside down numbers. I only ever managed to write BOOBS and hELLhOLE.

LLMs aren’t reasoning. They can do some stuff okay, but they aren’t thinking. Maybe if you had hundreds of them with unique training data all voting on proposals you could get something along the lines of a kind of recognition, but at that point you might as well just simulate cortical columns and try to do Jeff Hawkins’ idea.

@CosmoNova@lemmy.world

Are you telling me Apple hasn’t seen through the grift and is approaching this with an open mind just to learn how full off bullshit most of the claims from the likes of Altman are? And now they’re sharing their gruesome discoveries with everyone while they’re unveiling them?

@WhatAmLemmy@lemmy.world

I would argue that Apple Intelligence™️ is evidence they never bought the grift. It’s very focused on tailored models scoped to the specific tasks that AI does well; creative and non-critical tasks like assisting with text processing/transforming, image generation, photo manipulation.

The Siri integrations seem more like they’re using the LLM to stitch together the API’s that were already exposed between apps (used by shortcuts, etc); each having internal logic and validation that’s entirely programmed (and documented) by humans. They market it as a whole lot more, but they market every new product as some significant milestone for mankind … even when it’s a feature that other phones have had for years, but in an iPhone!

@Chickenstalker@lemmy.world

Are we not flawed too? Does that not makes AI…human?

@rickdg@lemmy.world

Real headline: Apple research presents possible improvements in benchmarking LLMs.

@patatahooligan@lemmy.world

Not even close. The paper is questioning LLMs ability to reason. The article talks about fundamental flaws of LLMs and how we might need different approaches to achieve reasoning. The benchmark is only used to prove the point. It is definitely not the headline.

@rickdg@lemmy.world

Once there’s a benchmark, LLMs can optimise for it. This is just another piece of news where people call “game over” but the money poured into R&D isn’t stopping anytime soon. Wasn’t synthetic data supposed to be game over for LLMs? Its limitations have been identified and it’s still being leveraged.

Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Technology

Our Rules

Approved Bots

Apple study exposes deep cracks in LLMs’ “reasoning” capabilitiesplus-square

Apple study exposes deep cracks in LLMs’ “reasoning” capabilitiesplus-square

Technology

Our Rules

Approved Bots

Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Apple study exposes deep cracks in LLMs’ “reasoning” capabilities