Closed GitHub Issues – Test Coverage¶

This document maps closed GitHub issues to tests in this repo. Tests are run by cargo test --workspace, pytest tests/, and pytest tests/parity/ (see TESTING_GUIDE.md).

Summary¶

PySpark parity (feature) issues (#141–#157)
Closed as deferred / out of scope. No implementation to test; no fixture required. Some have stub coverage (e.g. JVM stubs, hash/xxhash64) via fixtures that assert our implementation’s behavior.
[Sparkless parity] test_* issues (#1–#140)
Closed when parity was achieved. Coverage is via parity fixtures in tests/fixtures/ and tests/fixtures/converted/. Fixture names often differ from the test name (e.g. test_string_upper → string_upper_lower.json, test_group_by → groupby_*.json).
#186 (lit date/datetime), #187 (Window API)
Python tests: test_robin_sparkless.py (test_lit_*, test_window_row_number_rank_over), test_lit_date_datetime_pyspark_parity.py, test_window_pyspark_parity.py, test_column_vs_column_pyspark_parity.py, test_issue_176_pyspark_parity.py. These parity tests use predetermined expected outputs (from a prior PySpark run); no PySpark at test runtime.

Working tests for previously skipped fixtures¶

The following fixtures were previously skipped (platform/algorithm difference). They are now un-skipped and expected values were set to our implementation’s output so we have a working test for the corresponding closed issues:

Fixture	Closed issue	Note
`string_xxhash64`	#116	Expected values updated to twox_hash XXH64 output (our implementation).
`with_hash`	#149	Expected values updated to our `hash()` implementation output.

PySpark parity (feature) issues (#141–#157)¶

#	Title	Test coverage
157	from_csv, to_csv, schema_of_csv, schema_of_json	Deferred. `read_csv` / `read_json` path-based I/O covered by `read_csv.json`, `read_json.json`.
156	DataFrame.pivot()	Deferred. No fixture.
155	join(how='left_semi' \| 'left_anti')	Deferred. `converted/semi_join.json`, `converted/anti_join.json` exist but skipped.
154	JVM/runtime stubs (broadcast, partition_id, input_file_name, etc.)	`with_jvm_stubs.json` exists; skipped (environment-specific expected values). Stubs are implemented and used.
153	JSON write append mode	Deferred. No fixture.
152	Delta Lake schema evolution and MERGE	Deferred. No fixture.
151	createDataFrame input types and schema inference	Covered by session/creation tests and parity fixtures that use various schemas.
150	array_distinct ordering (first-occurrence)	`array_distinct.json` runs; ordering semantics may differ from PySpark.
149	hash() algorithm (Murmur3 vs xxHash64)	`with_hash.json` – un-skipped; expected values set to our implementation.
148	sentences (NLP) and JVM/UDTF helpers	Deferred. No fixture.
147	Sketch-based approximate aggregate functions	Deferred. No fixture.
146	XML and XPath functions	Deferred. No fixture.
145	Structured Streaming	Deferred. No fixture.
144	Catalog and DataFrameWriterV2 (writeTo)	Deferred. No fixture.
143	UDF and UDTF support	Deferred. No fixture.
142	RDD and distributed execution APIs	Deferred. No fixture.
141	SQL — full DDL/DML and advanced SQL	Deferred. Plan fixtures and SQL tests cover supported subset.

[Sparkless parity] test_* issues (#1–#140)¶

Covered by pytest parity tests (tests/parity/), JSON fixtures in tests/fixtures/ and tests/fixtures/converted/, and plan fixtures in tests/fixtures/plans/. Many converted fixtures are still skipped. Root fixtures and tests/parity/ are the main source of running tests.

Representative mapping (issue test name → fixture(s)):

Joins: test_join_* → inner_join.json, left_join.json, right_join.json, outer_join.json, etc.
SQL / group / filter: test_group_by → groupby_*.json; test_basic_select → filter_age_gt_30.json and similar; test_filtered_select → filter + select fixtures.
String: test_string_upper / test_string_lower → string_upper_lower.json; test_string_length → string_length_trim.json; test_crc32 → string_crc32.json; test_xxhash64 → string_xxhash64.json (now running).
Math: test_math_* → math_sin_cos.json, math_sqrt_pow.json, math_cosh_cbrt.json, etc.
Conditionals: test_when_otherwise → when_otherwise.json, when_then_otherwise.json; test_coalesce → coalesce.json; test_nullif / test_isnull / test_ifnull → phase15_aliases_nvl_isnull.json, etc.
Arrays: test_array_* → array_*.json (array_distinct, array_union, array_contains, etc.); test_size → array_size.json; test_element_at → element_at.json.
Window: test_row_number, test_rank, test_dense_rank, etc. → row_number_window.json, rank_window.json, lag_lead_window.json, ntile_window.json, etc.
Hash: test_xxhash64 → string_xxhash64.json; hash() → with_hash.json (both now running).

To run parity tests:

make test-parity-phases
# or: pytest tests/parity/ -v

Plan fixtures (for execute_plan):

cargo test -p robin-sparkless-polars plan_parity --features sql

Verifying all tests pass¶

make check-full
pytest tests -n 12
pytest tests/parity/ -v

This runs Rust checks (all features), Python lint, the main pytest suite, and parity tests. No failures expected on main.