Isolating All Polars Code into robin-sparkless-polars¶
This document describes how to put all Polars-using code into a single crate (robin-sparkless-polars) so that only one workspace crate depends on the polars library. The root crate becomes a thin facade with no direct Polars dependency.
Goals¶
- Single Polars boundary: Only
robin-sparkless-polarsdepends on thepolarscrate. Root and core do not. - Preserve public API: Existing
robin_sparkless::*androbin_sparkless::prelude::*remain unchanged for downstream users. - Faster non-Polars iteration: Changes to core (config, schema types, error types, date_utils) do not trigger a Polars rebuild.
Current State (as implemented)¶
| Crate / location | Polars usage |
|---|---|
| robin-sparkless-core | None |
| robin-sparkless-polars | Column, functions, UDFs, type_coercion, expression, DataFrame, Session, plan, schema_conv, traits, error (From<PolarsError>) — the only crate that depends on Polars |
| robin-sparkless (root) | Facade only; re-exports from core and robin-sparkless-polars. No direct Polars dependency. |
Polars is used only in robin-sparkless-polars. The former robin-sparkless-expr crate was merged into robin-sparkless-polars.
Target Layout¶
-
robin-sparkless-core
Unchanged: schema types, config, error (noFrom<PolarsError>), date_utils. No Polars. -
robin-sparkless-polars (new)
Only crate that depends on thepolarslibrary. It contains: - Everything currently in robin-sparkless-expr:
column,expression,functions,udfs,udf_registry,udf_context,type_coercion. - Everything in the root that uses Polars:
dataframe/,session,plan/,schema_conv,traits, rootfunctions(e.g.broadcast), and the Polars-specific error impl. -
Optional features:
sql,delta(same as today, with same deps: spark-sql-parser, sqlparser, deltalake, tokio). -
robin-sparkless (root)
Facade only: - Depends on
robin-sparkless-coreandrobin-sparkless-polars(no directpolarsdependency). - Re-exports public API from core and from
robin-sparkless-polarsso thatrobin_sparkless::*androbin_sparkless::prelude::*stay the same. -
Keeps
config(re-export of core),schema(re-export of core types +schema_from_jsonandStructTypePolarsExtfrom the polars crate),prelude, and feature flags forsql/deltathat are passed through torobin-sparkless-polars. -
robin-sparkless-expr
Removed as a separate crate; its code lives insiderobin-sparkless-polars.
Dependency Graph (After)¶
robin-sparkless-core (no Polars)
│
▼
robin-sparkless-polars (depends on polars; contains expr + dataframe + session + plan + schema_conv + traits + Polars error impl)
│
▼
robin-sparkless (facade; re-exports core + robin-sparkless-polars)
Implementation Notes¶
Error handling¶
- Core keeps the single
EngineErrortype and existingFromimpls (e.g.serde_json::Error,std::io::Error). It does not implementFrom<PolarsError>. - robin-sparkless-polars adds
impl From<PolarsError> for robin_sparkless_core::EngineError(moving the current rooterror.rslogic into the polars crate). No duplicate enum. - Root re-exports
EngineErrorfrom core; callers still get the same type, with Polars errors convertible via the impl in the polars crate.
Schema¶
- Core:
DataType,StructField,StructType(unchanged). - robin-sparkless-polars:
schema_conv(Polars schema conversion),StructTypePolarsExt, andschema_from_json(can live in a smallschemamodule that re-exports core types and adds these). - Root
schema.rs: re-exports from core and fromrobin_sparkless_polars(StructTypePolarsExt,schema_from_json) so the public API is unchanged.
Types currently re-exported from root¶
ExprandLiteralValue: todaypub type Expr = polars::prelude::Expr. These can become re-exports fromrobin_sparkless_polars(which re-exportspolars::prelude::Expr/LiteralValue), so root still exposesrobin_sparkless::Exprandrobin_sparkless::LiteralValuewithout depending on Polars.
Cargo and features¶
- robin-sparkless-polars:
- Same
polarsandpolars-planversions/features as today. - Same extra deps as current root + expr (serde, chrono, regex, rand, etc.).
- Features
sqlanddeltamirror current root (spark-sql-parser/sqlparser, deltalake/tokio). - robin-sparkless (root):
[dependencies]:robin-sparkless-core,robin-sparkless-polars(with optional features for sql/delta). No directpolarsdependency.[features]:sqlanddeltaforwarded torobin-sparkless-polars.
Tests and examples¶
- Tests and examples that use DataFrame/Session/Column stay as they are; they depend on the root crate. The root crate’s re-exports ensure they still see the same API. No need to depend on
robin-sparkless-polarsdirectly unless we want to add crate-specific tests there.
Summary¶
Yes, all Polars code can be isolated into a single crate, robin-sparkless-polars, by:
- Adding the new crate and moving into it everything that currently uses Polars (current expr crate contents + root’s dataframe, session, plan, schema_conv, traits, broadcast, and the
From<PolarsError>impl). - Removing the robin-sparkless-expr crate (merged into robin-sparkless-polars).
- Turning the root into a facade that depends only on core and robin-sparkless-polars and re-exports the existing public API.
That leaves only robin-sparkless-polars depending on the polars library and keeps the current public API and behavior intact.