Isolating All Polars Code into robin-sparkless-polars¶

This document describes how to put all Polars-using code into a single crate (robin-sparkless-polars) so that only one workspace crate depends on the polars library. The root crate becomes a thin facade with no direct Polars dependency.

Goals¶

Single Polars boundary: Only robin-sparkless-polars depends on the polars crate. Root and core do not.
Preserve public API: Existing robin_sparkless::* and robin_sparkless::prelude::* remain unchanged for downstream users.
Faster non-Polars iteration: Changes to core (config, schema types, error types, date_utils) do not trigger a Polars rebuild.

Current State (as implemented)¶

Crate / location	Polars usage
robin-sparkless-core	None
robin-sparkless-polars	Column, functions, UDFs, type_coercion, expression, DataFrame, Session, plan, schema_conv, traits, error (`From<PolarsError>`) — the only crate that depends on Polars
robin-sparkless (root)	Facade only; re-exports from core and robin-sparkless-polars. No direct Polars dependency.

Polars is used only in robin-sparkless-polars. The former robin-sparkless-expr crate was merged into robin-sparkless-polars.

Target Layout¶

robin-sparkless-core
Unchanged: schema types, config, error (no From<PolarsError>), date_utils. No Polars.
robin-sparkless-polars (new)
Only crate that depends on the polars library. It contains:
Everything currently in robin-sparkless-expr: column, expression, functions, udfs, udf_registry, udf_context, type_coercion.
Everything in the root that uses Polars: dataframe/, session, plan/, schema_conv, traits, root functions (e.g. broadcast), and the Polars-specific error impl.
Optional features: sql, delta (same as today, with same deps: spark-sql-parser, sqlparser, deltalake, tokio).
robin-sparkless (root)
Facade only:
Depends on robin-sparkless-core and robin-sparkless-polars (no direct polars dependency).
Re-exports public API from core and from robin-sparkless-polars so that robin_sparkless::* and robin_sparkless::prelude::* stay the same.
Keeps config (re-export of core), schema (re-export of core types + schema_from_json and StructTypePolarsExt from the polars crate), prelude, and feature flags for sql/delta that are passed through to robin-sparkless-polars.
robin-sparkless-expr
Removed as a separate crate; its code lives inside robin-sparkless-polars.

Dependency Graph (After)¶

robin-sparkless-core  (no Polars)
         │
         ▼
robin-sparkless-polars  (depends on polars; contains expr + dataframe + session + plan + schema_conv + traits + Polars error impl)
         │
         ▼
robin-sparkless  (facade; re-exports core + robin-sparkless-polars)

Implementation Notes¶

Error handling¶

Core keeps the single EngineError type and existing From impls (e.g. serde_json::Error, std::io::Error). It does not implement From<PolarsError>.
robin-sparkless-polars adds impl From<PolarsError> for robin_sparkless_core::EngineError (moving the current root error.rs logic into the polars crate). No duplicate enum.
Root re-exports EngineError from core; callers still get the same type, with Polars errors convertible via the impl in the polars crate.

Schema¶

Core: DataType, StructField, StructType (unchanged).
robin-sparkless-polars: schema_conv (Polars schema conversion), StructTypePolarsExt, and schema_from_json (can live in a small schema module that re-exports core types and adds these).
Root schema.rs: re-exports from core and from robin_sparkless_polars (StructTypePolarsExt, schema_from_json) so the public API is unchanged.

Types currently re-exported from root¶

Expr and LiteralValue: today pub type Expr = polars::prelude::Expr. These can become re-exports from robin_sparkless_polars (which re-exports polars::prelude::Expr / LiteralValue), so root still exposes robin_sparkless::Expr and robin_sparkless::LiteralValue without depending on Polars.

Cargo and features¶

robin-sparkless-polars:
Same polars and polars-plan versions/features as today.
Same extra deps as current root + expr (serde, chrono, regex, rand, etc.).
Features sql and delta mirror current root (spark-sql-parser/sqlparser, deltalake/tokio).
robin-sparkless (root):
[dependencies]: robin-sparkless-core, robin-sparkless-polars (with optional features for sql/delta). No direct polars dependency.
[features]: sql and delta forwarded to robin-sparkless-polars.

Tests and examples¶

Tests and examples that use DataFrame/Session/Column stay as they are; they depend on the root crate. The root crate’s re-exports ensure they still see the same API. No need to depend on robin-sparkless-polars directly unless we want to add crate-specific tests there.

Summary¶

Yes, all Polars code can be isolated into a single crate, robin-sparkless-polars, by:

Adding the new crate and moving into it everything that currently uses Polars (current expr crate contents + root’s dataframe, session, plan, schema_conv, traits, broadcast, and the From<PolarsError> impl).
Removing the robin-sparkless-expr crate (merged into robin-sparkless-polars).
Turning the root into a facade that depends only on core and robin-sparkless-polars and re-exports the existing public API.

That leaves only robin-sparkless-polars depending on the polars library and keeps the current public API and behavior intact.