The talk presents the experience of converting a large codebase from Pandas to Polars, highlighting significant performance improvements, maintainability, and practical lessons learned throughout the process. Jeroen and I discuss the challenges faced, the benefits of using Polars, and the methodologies applied during the migration.
Key Points
- Codebase Transition: Converted a 20,000-line codebase from Pandas to Polars, resulting in a 98% cost reduction and improved maintainability.
- Performance Gains: Processing time reduced from 5 hours (Pandas) to 1 second (Polars) for large datasets.
- Use of Lazy API: Leveraged Polars’ Lazy API for deferred computation, reducing memory usage and execution time.
- Benchmarking Importance: Regular benchmarking identified performance improvements and validated optimizations.
- Community Support: Engaged with the Polars community for insights and problem-solving during the transition.
- Iterative Improvement Approach: Incremental migration, starting with low-hanging fruit, allowed gradual integration of Polars.
- Final Results: Achieved 20% overall processing time reduction, handled 50 sample datasets with 40 GB RAM.
- Book Announcement: Announced ‘The Definitive Guide to Polars’ to share project insights and help others transition.
This talk repeats the PyData NYC talk I gave together with Jeroen.
See also my related talk: PyData NYC 2024