• mesa@piefed.social
    link
    fedilink
    English
    arrow-up
    11
    ·
    9 days ago

    Ive heard the same but I haven’t seen real evidence anywhere, so im skeptical. But yes I agree, if they CAN get that data, it means the training data is better-ish…

    But we are still on this site for a reason :)

    • NuXCOM_90Percent@lemmy.zip
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      1
      ·
      9 days ago

      I mean… if the reason you left is because you didn’t want your data scraped then… the fediverse is one of the worst places to go? Because anyone can run a modified lemmy instance to pull everything through the tools specifically designed to do that.

      Let alone just scraping websites that don’t have teams of big corporate lawyers.

      • Cybersteel@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        What kinda of data would be valuable that people leave behind on Reddit? Personal information? Personally I don’t really do that when it comes to commenting, though Reddit probably knows where I live through my unsecured connection online. When it comes to comments or posts on Reddit, it’s not real data anyways because I always lie online.

    • Alex@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 days ago

      It’s all relative I guess. I can see why the original GPT’s used the Reddit corpus for training. However I’ve always been a little sceptical about the quality of the training set in any social media given how much it exaggerates the extremes of people’s behaviour.