Migration to Rust-Only with Polars - Status¶
β Completed¶
- Removed Python/PyO3 dependencies
- Removed
pyproject.toml - Removed
src/robin_sparkless/Python package directory - Removed Python tests
-
Updated
Cargo.tomlto remove PyO3 and add Polars -
Replaced DataFusion with Polars
- Updated all modules to use Polars instead of DataFusion
- Removed
arrow_conversion.rs(no longer needed) -
Removed
lazy.rs(using Polars LazyFrame directly) -
Created Rust-only API
lib.rs: Public Rust API exportssession.rs: SparkSession with Polars backenddataframe.rs: DataFrame using Polars LazyFramecolumn.rs: Column using Polars Exprfunctions.rs: Helper functions using Polarsexpression.rs: Expression utilities-
schema.rs: Schema conversion for Polars -
Updated documentation
README.md: Reflects Rust-only project with Polars
π§ Remaining Work¶
The migration itself is complete (Rust-only + Polars backend, build/test green). Optional features are implemented:
- SQL (optional
sqlfeature):SparkSession::sql(), temp views, in-memorysaveAsTable/write_delta_table, cataloglistTables/dropTable,read_delta(name_or_path); see QUICKSTART.md. - Delta Lake (optional
deltafeature):read_delta,read_delta_with_version,write_delta. - Benchmarks:
cargo bench(robin vs Polars); target within ~2x.
Remaining work is parity and feature expansion:
- Broader function coverage: Phase 6 array_position, array_remove, posexplode implemented; cume_dist, ntile, nth_value API (fixtures covered via multi-step workaround). Phase 8 completed: array_repeat, array_flatten, Map (create_map, map_keys, map_values, map_entries, map_from_arrays), String 6.4 (soundex, levenshtein, crc32, xxhash64). JSON (get_json_object, from_json, to_json) implemented. Additional edge-case parity fixtures.
- Path to 100% (ROADMAP Phases 16β27): Phases 18β25 completed (~283 functions, 159 fixtures, plan interpreter). Phase 26 (publish Rust crate on crates.io), Phase 27 (Sparkless integration, 200+ tests). See ROADMAP.md and FULL_BACKEND_ROADMAP.md.
Sparkless integration: Robin-sparkless is designed to replace the backend of Sparkless. See SPARKLESS_INTEGRATION_ANALYSIS.md for phases: fixture converter, structural alignment, function parity, and test conversion.
Architecture¶
The new architecture: - SparkSession: Entry point, uses Polars for file I/O - DataFrame: Wraps Polars LazyFrame/DataFrame, provides PySpark-like API - Column: Wraps Polars Expr, provides column operations - Functions: Helper functions that return Polars expressions - Schema: Converts between Polars schemas and custom schema types
All operations are lazy by default (using Polars LazyFrame) and execute when actions like collect() or show() are called.
Historical Notes (archived)¶
Earlier versions of this doc tracked Polars API mismatches and compilation errors during the migration. Those items are no longer current; parity coverage is tracked in PARITY_STATUS.md and future work in ROADMAP.md.