The talk presents the experience of converting a large codebase from Pandas to Polars, highlighting significant performance improvements, maintainability, and practical lessons learned throughout the process. Jeroen and I discuss the challenges faced, the benefits of using Polars, and the methodologies applied during the migration.

Key Points

  • Codebase Transition: Converted a 20,000-line codebase from Pandas to Polars, resulting in a 98% cost reduction and improved maintainability.
  • Performance Gains: Processing time reduced from 5 hours (Pandas) to 1 second (Polars) for large datasets.
  • Use of Lazy API: Leveraged Polars’ Lazy API for deferred computation, reducing memory usage and execution time.
  • Benchmarking Importance: Regular benchmarking identified performance improvements and validated optimizations.
  • Community Support: Engaged with the Polars community for insights and problem-solving during the transition.
  • Iterative Improvement Approach: Incremental migration, starting with low-hanging fruit, allowed gradual integration of Polars.
  • Final Results: Achieved 20% overall processing time reduction, handled 50 sample datasets with 40 GB RAM.
  • Book Announcement: Announced ‘The Definitive Guide to Polars’ to share project insights and help others transition.

This talk repeats the PyData NYC talk I gave together with Jeroen.

See also my related talk: PyData NYC 2024