Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • mirshafie@europe.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 hours ago

    Yeah, I think it’s fascinating to read Claude’s transcripts while it’s working. It’s crazy how you can give it a two-sentence prompt that really is quite complex task, and it splits the problems into chunks that it works through and second-guesses until it’s confident (and usually correct).