Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Pup Biru@aussie.zone
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    yeah i find the thinking fascinating with maths too… like LLMs are horrible at maths but so am i if i have to do it in my head… the way it breaks a problem down into tiny bits that is certainly in its training data, and then combine those bits is an impressive emergent behaviour imo given it’s just doing statistical next token

    • mirshafie@europe.pub
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Your verbal faculties are bad at math. Other parts of your brain do calculations.

      LLMs are a computer’s verbal faculties. But guess what, they’re just a really big calculator. So when LLMs realize that they’re doing a math problem and launch a calculator/equation solver, they’re not so bad after all.

      • Pup Biru@aussie.zone
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 days ago

        that solver would be tool use though… i’m talking about just the “thinking” LLMs. it’s fascinating to read the thinking block, because it breaks the problem down into basic chunks, solves the basic chunks (which it would have been in its training data, so easy), and solves them with multiple methods and then compares to check itself

        • mirshafie@europe.pub
          link
          fedilink
          English
          arrow-up
          1
          ·
          14 hours ago

          Yeah, I think it’s fascinating to read Claude’s transcripts while it’s working. It’s crazy how you can give it a two-sentence prompt that really is quite complex task, and it splits the problems into chunks that it works through and second-guesses until it’s confident (and usually correct).