Rust-first testing
Rust-first testing and parity¶
This project is now Rust-first. The core behavior of SparkSession,
DataFrame, Column, and related APIs is validated by Rust tests in
tests/:
dataframe_core.rs– basic DataFrame creation,filter,select, column–column comparison semantics, etc.groupby_orderby_core.rs–group_byandorder_bybehavior using both string andColumnarguments.sql_core.rs–SparkSession::sql, temp views, and basic DDL (e.g.DROP TABLE/DROP VIEW).delta_core.rs–write_delta/read_delta_from_pathround‑trip tests when thedeltafeature is enabled.lazy_backend.rs,error_handling.rs,parity.rs, and other existing Rust tests.
PySpark parity fixtures¶
PySpark is still used as an external reference via JSON fixtures in
tests/fixtures/, but it is only needed when (re)generating those
fixtures – not for normal Rust test runs:
tests/parity.rsloads fixtures fromtests/fixtures/(and, if present,tests/fixtures/converted/) and validates thatrobin-sparklessmatches the recorded behavior.- Python helper scripts such as
tests/convert_sparkless_fixtures.pyandtests/regenerate_expected_from_pyspark.pycan be run manually – or viamake sparkless-parity– to convert Sparklessexpected_outputsand refresh fixtures from PySpark when behavior changes.
Python bindings and CI¶
The previous Python package (PyO3 bindings, pyproject.toml, and
Python‑focused CI) has been removed from this repository. Any future
language bindings are expected to live out‑of‑tree and call the Rust
crate via FFI.
The only remaining Python code in this repo is test tooling:
- Parity fixture generation and regeneration scripts under
tests/.
New behavior tests should be written in Rust, ideally:
- As focused unit or integration tests against
SparkSession/DataFrame/Column, and - Backed by PySpark‑derived fixtures when subtle semantics need to be preserved.
Running tests¶
Common commands for day‑to‑day development:
make test– run Rust tests (cargo test) including core, parity, and plan tests.make check-full– format check, Clippy,cargo audit,cargo deny, and Rust tests (what CI runs).make test-parity-phase-a…make test-parity-phase-g– run parity fixtures for a specific phase (seedocs/PARITY_STATUS.mdandtests/fixtures/phase_manifest.json).make test-parity-phases– run all parity phases (A–G).make sparkless-parity– whenSPARKLESS_EXPECTED_OUTPUTSis set and PySpark/Java are available, convert Sparkless fixtures, regenerate expected results from PySpark, and run the Rust parity suite.