Study finds that Chat GPT will cheat when given the opportunity and lie to cover it up later.

@kromem@lemmy.world

I see a lot of comments that aren’t up to date with what’s being discovered in research claiming that “given a LLM doesn’t know the difference between true and false” that it can’t be described as ‘lying.’

Here’s a paper from October 2023 showing that in fact LLMs can and do develop internal representations of whether it is aware a statement is true or false: The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

Which is just the latest in a series of multiple studies this past year that LLMs can and do develop abstracted world models in linear representations. For those curious and looking for a more digestible writeup, see Do Large Language Models learn world models or just surface statistics? from the researchers behind one of the first papers finding this.

Study finds that Chat GPT will cheat when given the opportunity and lie to cover it up later.

Study finds that Chat GPT will cheat when given the opportunity and lie to cover it up later.

Technology

Our Rules

Approved Bots