I’m finding myself with a couple of really big databases and my PC is throwing memory errors so I’m moving the project to polars and learning on the way in, and would like to read your experience in how you did it, what frustrate you and what you found good (I’m still getting used with the syntax, but I’m loving how fast it reads the databases)

  • driving_crooner@lemmy.eco.brOP
    link
    fedilink
    arrow-up
    2
    ·
    11 days ago

    I had to move away from apply a while ago because it was extremely slow, and started using masks and vectorize operations. That’s actually what is being a roadblock for me right now, can’t find a way to make it work (use to do df.loc[mask, ‘column’], but df.with_columns(pl.when(mask).then()…) is not working as expected)

    • 8uurg@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      11 days ago

      It is unclear to me what you are trying to accomplish, do you want to update the elements for where masked?

      • driving_crooner@lemmy.eco.brOP
        link
        fedilink
        arrow-up
        1
        ·
        11 days ago

        There’s this categorical column of integers that have some excepcional cases where some letters are included. I need to process the column except the excepcional cases to format the column, but I just found put that it was giving me a problem because pandas imported it as mixed type while polars just imported it as string respecting the original correct formatting.