Skip to content

Migration to Rust-Only with Polars - Status

βœ… Completed

  1. Removed Python/PyO3 dependencies
  2. Removed pyproject.toml
  3. Removed src/robin_sparkless/ Python package directory
  4. Removed Python tests
  5. Updated Cargo.toml to remove PyO3 and add Polars

  6. Replaced DataFusion with Polars

  7. Updated all modules to use Polars instead of DataFusion
  8. Removed arrow_conversion.rs (no longer needed)
  9. Removed lazy.rs (using Polars LazyFrame directly)

  10. Created Rust-only API

  11. lib.rs: Public Rust API exports
  12. session.rs: SparkSession with Polars backend
  13. dataframe.rs: DataFrame using Polars LazyFrame
  14. column.rs: Column using Polars Expr
  15. functions.rs: Helper functions using Polars
  16. expression.rs: Expression utilities
  17. schema.rs: Schema conversion for Polars

  18. Updated documentation

  19. README.md: Reflects Rust-only project with Polars

πŸ”§ Remaining Work

The migration itself is complete (Rust-only + Polars backend, build/test green). Optional features are implemented:

  • SQL (optional sql feature): SparkSession::sql(), temp views, in-memory saveAsTable/write_delta_table, catalog listTables/dropTable, read_delta(name_or_path); see QUICKSTART.md.
  • Delta Lake (optional delta feature): read_delta, read_delta_with_version, write_delta.
  • Benchmarks: cargo bench (robin vs Polars); target within ~2x.

Remaining work is parity and feature expansion:

  • Broader function coverage: Phase 6 array_position, array_remove, posexplode implemented; cume_dist, ntile, nth_value API (fixtures covered via multi-step workaround). Phase 8 completed: array_repeat, array_flatten, Map (create_map, map_keys, map_values, map_entries, map_from_arrays), String 6.4 (soundex, levenshtein, crc32, xxhash64). JSON (get_json_object, from_json, to_json) implemented. Additional edge-case parity fixtures.
  • Path to 100% (ROADMAP Phases 16–27): Phases 18–25 completed (~283 functions, 159 fixtures, plan interpreter). Phase 26 (publish Rust crate on crates.io), Phase 27 (Sparkless integration, 200+ tests). See ROADMAP.md and FULL_BACKEND_ROADMAP.md.

Sparkless integration: Robin-sparkless is designed to replace the backend of Sparkless. See SPARKLESS_INTEGRATION_ANALYSIS.md for phases: fixture converter, structural alignment, function parity, and test conversion.

Architecture

The new architecture: - SparkSession: Entry point, uses Polars for file I/O - DataFrame: Wraps Polars LazyFrame/DataFrame, provides PySpark-like API - Column: Wraps Polars Expr, provides column operations - Functions: Helper functions that return Polars expressions - Schema: Converts between Polars schemas and custom schema types

All operations are lazy by default (using Polars LazyFrame) and execute when actions like collect() or show() are called.

Historical Notes (archived)

Earlier versions of this doc tracked Polars API mismatches and compilation errors during the migration. Those items are no longer current; parity coverage is tracked in PARITY_STATUS.md and future work in ROADMAP.md.