• NuXCOM_90Percent@lemmy.zip
    link
    fedilink
    English
    arrow-up
    39
    arrow-down
    1
    ·
    7 days ago

    For unauthorized scrapers? Definitely

    For paid API usage? That tends to not be public for obvious reasons but, allegedly, people have, allegedly, done tests and found “deleted” content in the results.

    • mesa@piefed.social
      link
      fedilink
      English
      arrow-up
      11
      ·
      7 days ago

      Ive heard the same but I haven’t seen real evidence anywhere, so im skeptical. But yes I agree, if they CAN get that data, it means the training data is better-ish…

      But we are still on this site for a reason :)

      • NuXCOM_90Percent@lemmy.zip
        link
        fedilink
        English
        arrow-up
        22
        arrow-down
        1
        ·
        7 days ago

        I mean… if the reason you left is because you didn’t want your data scraped then… the fediverse is one of the worst places to go? Because anyone can run a modified lemmy instance to pull everything through the tools specifically designed to do that.

        Let alone just scraping websites that don’t have teams of big corporate lawyers.

        • Cybersteel@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 days ago

          What kinda of data would be valuable that people leave behind on Reddit? Personal information? Personally I don’t really do that when it comes to commenting, though Reddit probably knows where I live through my unsecured connection online. When it comes to comments or posts on Reddit, it’s not real data anyways because I always lie online.

      • Alex@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 days ago

        It’s all relative I guess. I can see why the original GPT’s used the Reddit corpus for training. However I’ve always been a little sceptical about the quality of the training set in any social media given how much it exaggerates the extremes of people’s behaviour.

      • aramis87@fedia.io
        link
        fedilink
        arrow-up
        6
        ·
        7 days ago

        I edited everything before deleting it, double-checked it was still deleted periodically, and it all got restored sometime earlier this year.

        • bamboo@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          5
          ·
          7 days ago

          Just checked my old account again, and all edited content is still there, with “Fuck u/spez” appearing as the top comment in some posts that are like 12 years old

      • NuXCOM_90Percent@lemmy.zip
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        7 days ago

        Version control mother fucker, do you speak it?

        But, in all seriousness: That is what they use for the comments. It is why a lot of the mass delete tools were “accidentally” undone during The Exodus. Because it was literally just rolling back. Theoretically there might be a limited number of revisions available but if you believe that I have a bridge to sell you. Because imagine if A Brown Person wrote a message then edited it five times so that Chloe couldn’t alert Jack Bauer to who to torture