ChatGPT can get worse over time, Stanford study finds | Fortune
fortune.com
external-link
ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi…::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

@blue_zephyr@lemmy.world
link
fedilink
English
15
edit-2
1Y

This paper is pretty unbelievable to me in the literal sense. From a quick glance:

First of all they couldn’t even bother to check for simple spelling mistakes. Second, all they’re doing is asking whether a number is prime or not and then extrapolating the results to be representative of solving math problems.

But most importantly I don’t believe for a second that the same model with a few adjustments over a 3 month period would completely flip performance on any representative task. I suspect there’s something seriously wrong with how they collect/evaluate the answers.

And finally, according to their own results, GPT3.5 did significantly better at the second evaluation. So this title is a blatant misrepresentation.

Also the study isn’t peer-reviewed.

Create a post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


  • 1 user online
  • 191 users / day
  • 586 users / week
  • 1.37K users / month
  • 4.49K users / 6 months
  • 1 subscriber
  • 7.41K Posts
  • 84.7K Comments
  • Modlog