I’m finding myself with a couple of really big databases and my PC is throwing memory errors so I’m moving the project to polars and learning on the way in, and would like to read your experience in how you did it, what frustrate you and what you found good (I’m still getting used with the syntax, but I’m loving how fast it reads the databases)

  • misk@sopuli.xyz
    link
    fedilink
    arrow-up
    4
    ·
    10 days ago

    I thought I’d be using Polars more but in the end, professionally, when I have to process large amounts of data I won’t be doing that on my computer but on a Hadoop cluster via PySpark which also has a very non-pythonic syntax. For smaller stuff Pandas is just more convenient.

    • driving_crooner@lemmy.eco.brOP
      link
      fedilink
      arrow-up
      3
      ·
      10 days ago

      My company is moving to databricks, that I know uses pyspark but never used it, guess eventually I’m going to have to learn it too.