Tech experts are starting to doubt that ChatGPT and A.I. 'hallucinations' will ever go away: 'This isn’t fixable'
fortune.com
external-link
Even OpenAI CEO Sam Altman was skeptical a few weeks ago: "I probably trust the answers that come out of ChatGPT the least of anybody on Earth."

Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’::Experts are starting to doubt it, and even OpenAI CEO Sam Altman is a bit stumped.

@Zeshade@lemmy.world
link
fedilink
English
251Y

In my limited experience the issue is often that the “chatbot” doesn’t even check what it says now against what it said a few paragraphs above. It contradicts itself in very obvious ways. Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously? Or a check to ensure recipes are edible (for this specific application)? A bit like those physics informed NN.

@Zeth0s@lemmy.world
link
fedilink
English
23
edit-2
1Y

That’s called context. For chatgpt it is a bit less than 4k words. Using api it goes up to a bit less of 32k. Alternative models goes up to a bit less than 64k.

Model wouldn’t know anything you said before that

That is one of the biggest limitations of current generation of LLMs.

@Womble@lemmy.world
link
fedilink
English
31Y

Thats not 100% true. they also work by modifying meanings of words based on context and then those modified meanings propagate indefinitely forwards. But yes, direct context is limited so things outside it arent directly used.

@Zeth0s@lemmy.world
link
fedilink
English
11Y

They don’t really chance the meaning of the words, they just look for the “best” words given the recent context, by taking into account the different possible meanings of the words

@Womble@lemmy.world
link
fedilink
English
1
edit-2
1Y

No they do, thats one of the key innovations of LLMs the attention and feed forward steps where they propagate information from related words into each other based on context. from https://www.understandingai.org/p/large-language-models-explained-with?r=cfv1p

For example, in the previous section we showed a hypothetical transformer figuring out that in the partial sentence “John wants his bank to cash the,” his refers to John. Here’s what that might look like under the hood. The query vector for his might effectively say “I’m seeking: a noun describing a male person.” The key vector for John might effectively say “I am: a noun describing a male person.” The network would detect that these two vectors match and move information about the vector for John into the vector for his.

@Zeth0s@lemmy.world
link
fedilink
English
1
edit-2
1Y

That’s exactly what I said

They don’t really chance the meaning of the words, they just look for the “best” words given the recent context, by taking into account the different possible meanings of the words

The word’s meanings haven’t changed, but the model can choose based on the context accounting for the different meanings of words

@Womble@lemmy.world
link
fedilink
English
11Y

The key vector for John might effectively say “I am: a noun describing a male person.” The network would detect that these two vectors match and move information about the vector for John into the vector for his.

This is the bit you are missing, the attention network actively changes the token vectors depending on context, this is transferring new information into the meanings of that word.

@Zeth0s@lemmy.world
link
fedilink
English
1
edit-2
1Y

The network doesn’t detect matches, but the model definitely works on similarities. Words are mapped in a hyperspace, with the idea that that space can mathematically retain conceptual similarity as spatial representation.

Words are transformed in a mathematical representation that is able (or at least tries) to retain semantic information of words.

But different meanings of the different words belongs to the words themselves and are defined by the language, model cannot modify them.

Anyway we are talking about details here. We could kill the audience of boredom

Edit. I asked gpt-4 to summarize the concepts. I believe it did a decent job. I hope it helps:

  1. Embedding Space:

    • Initially, every token is mapped to a point (or vector) in a high-dimensional space via embeddings. This space is typically called the “embedding space.”
    • The dimensionality of this space is determined by the size of the embeddings. For many Transformer models, this is often several hundred dimensions, e.g., 768 for some versions of GPT and BERT.
  2. Positional Encodings:

    • These are vectors added to the embeddings to provide positional context. They share the same dimensionality as the embedding vectors, so they exist within the same high-dimensional space.
  3. Transformations Through Layers:

    • As tokens’ representations (vectors) pass through Transformer layers, they undergo a series of linear and non-linear transformations. These include matrix multiplications, additions, and the application of functions like softmax.
    • At each layer, the vectors are “moved” within this high-dimensional space. When we say “moved,” we mean they are transformed, resulting in a change in their coordinates in the vector space.
    • The self-attention mechanism allows a token’s representation to be influenced by other tokens’ representations, effectively “pulling” or “pushing” it in various directions in the space based on the context.
  4. Nature of the Vector Space:

    • This space is abstract and high-dimensional, making it hard to visualize directly. However, in this space, the “distance” and “direction” between vectors can have semantic meaning. Vectors close to each other can be seen as semantically similar or related.
    • The exact nature and structure of this space are learned during training. The model adjusts the parameters (like weights in the attention mechanisms and feed-forward networks) to ensure that semantically or syntactically related concepts are positioned appropriately relative to each other in this space.
  5. Output Space:

    • The final layer of the model transforms the token representations into an output space corresponding to the vocabulary size. This is a probability distribution over all possible tokens for the next word prediction.

In essence, the entire process of token representation within the Transformer model can be seen as continuous transformations within a vector space. The space itself can be considered a learned representation where relative positions and directions hold semantic and syntactic significance. The model’s training process essentially shapes this space in a way that facilitates accurate and coherent language understanding and generation.

@Womble@lemmy.world
link
fedilink
English
11Y

Yes of course it works on similarities, I havent disputed that. My point was that the transformations of the token vectors are a transfer of information, and that this transfer of information is not lost as things move out of the context length. That information may slowly decohere over time if it is not reinforced, but the model does not immediately forget things as they move out of context as you originally claimed.

@cryball@sopuli.xyz
link
fedilink
English
3
edit-2
1Y

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

Maybe, but it might not be that simple. The issue is that one would have to design that logic in a manner that can be verified by a human. At that point the logic would be quite specific to a single task and not generally useful at all. At that point the benefit of the AI is almost nil.

And if there were an algorithm that was better at determining what was or was not the goal, why is that algorithm not used in the first place?

@kromem@lemmy.world
link
fedilink
English
11Y

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

You, in your “limited experience” pretty much exactly described the fix.

The problem is that most of the applications right now of LLMs are low hanging fruit because it’s so new.

And those low hanging fruit examples are generally adverse to 2-10x the query cost in both time and speed just to fix things like jailbreaking or hallucinations, which is what multiple passes, especially with additional context lookups, would require.

But you very likely will see in the next 18 months multiple companies being thrown at exactly these kinds of scenarios with a focus for more business critical LLM integrations.

To put it in perspective, this is like people looking at AIM messenger back in the day and saying that the New York Times has nothing to worry about regarding the growth of social media.

We’re still very much in the infancy of this technology in real world application, and because of that infancy, a lot of the issues present that aren’t fixable inherent to the core product don’t yet have mature secondary markets around fixing those shortcomings yet.

So far, yours was actually the most informed comment in this thread I’ve seen - well done!

@Zeshade@lemmy.world
link
fedilink
English
21Y

Thanks! And thanks for your insights. Yes I meant that my experience using LLM is limited to just asking bing chat questions about everyday problems like I would with a friend that “knows everything”. But I never looked at the science of formulating “perfect prompts” like I sometimes hear about. I do have some experience in AI/ML development in general.

@doggle@lemmy.world
link
fedilink
English
31Y

They do keep context to a point, but they can’t hold everything in their memory, otherwise the longer a conversation went on the slower and more performance intensive doing that logic check would become. Server CPUs are not cheap, and ai models are already performance intensive.

Create a post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


  • 1 user online
  • 191 users / day
  • 586 users / week
  • 1.37K users / month
  • 4.49K users / 6 months
  • 1 subscriber
  • 7.41K Posts
  • 84.7K Comments
  • Modlog