Skip to content

Closed GitHub Issues – Test Coverage

This document maps closed GitHub issues to tests in this repo. Tests are run by cargo test --workspace, pytest tests/, and pytest tests/parity/ (see TESTING_GUIDE.md).

Summary

  • PySpark parity (feature) issues (#141–#157)
    Closed as deferred / out of scope. No implementation to test; no fixture required. Some have stub coverage (e.g. JVM stubs, hash/xxhash64) via fixtures that assert our implementation’s behavior.

  • [Sparkless parity] test_* issues (#1–#140)
    Closed when parity was achieved. Coverage is via parity fixtures in tests/fixtures/ and tests/fixtures/converted/. Fixture names often differ from the test name (e.g. test_string_upperstring_upper_lower.json, test_group_bygroupby_*.json).

  • #186 (lit date/datetime), #187 (Window API)
    Python tests: test_robin_sparkless.py (test_lit_*, test_window_row_number_rank_over), test_lit_date_datetime_pyspark_parity.py, test_window_pyspark_parity.py, test_column_vs_column_pyspark_parity.py, test_issue_176_pyspark_parity.py. These parity tests use predetermined expected outputs (from a prior PySpark run); no PySpark at test runtime.

Working tests for previously skipped fixtures

The following fixtures were previously skipped (platform/algorithm difference). They are now un-skipped and expected values were set to our implementation’s output so we have a working test for the corresponding closed issues:

Fixture Closed issue Note
string_xxhash64 #116 Expected values updated to twox_hash XXH64 output (our implementation).
with_hash #149 Expected values updated to our hash() implementation output.

PySpark parity (feature) issues (#141–#157)

# Title Test coverage
157 from_csv, to_csv, schema_of_csv, schema_of_json Deferred. read_csv / read_json path-based I/O covered by read_csv.json, read_json.json.
156 DataFrame.pivot() Deferred. No fixture.
155 join(how='left_semi' | 'left_anti') Deferred. converted/semi_join.json, converted/anti_join.json exist but skipped.
154 JVM/runtime stubs (broadcast, partition_id, input_file_name, etc.) with_jvm_stubs.json exists; skipped (environment-specific expected values). Stubs are implemented and used.
153 JSON write append mode Deferred. No fixture.
152 Delta Lake schema evolution and MERGE Deferred. No fixture.
151 createDataFrame input types and schema inference Covered by session/creation tests and parity fixtures that use various schemas.
150 array_distinct ordering (first-occurrence) array_distinct.json runs; ordering semantics may differ from PySpark.
149 hash() algorithm (Murmur3 vs xxHash64) with_hash.json – un-skipped; expected values set to our implementation.
148 sentences (NLP) and JVM/UDTF helpers Deferred. No fixture.
147 Sketch-based approximate aggregate functions Deferred. No fixture.
146 XML and XPath functions Deferred. No fixture.
145 Structured Streaming Deferred. No fixture.
144 Catalog and DataFrameWriterV2 (writeTo) Deferred. No fixture.
143 UDF and UDTF support Deferred. No fixture.
142 RDD and distributed execution APIs Deferred. No fixture.
141 SQL — full DDL/DML and advanced SQL Deferred. Plan fixtures and SQL tests cover supported subset.

[Sparkless parity] test_* issues (#1–#140)

Covered by pytest parity tests (tests/parity/), JSON fixtures in tests/fixtures/ and tests/fixtures/converted/, and plan fixtures in tests/fixtures/plans/. Many converted fixtures are still skipped. Root fixtures and tests/parity/ are the main source of running tests.

Representative mapping (issue test name → fixture(s)):

  • Joins: test_join_* → inner_join.json, left_join.json, right_join.json, outer_join.json, etc.
  • SQL / group / filter: test_group_by → groupby_*.json; test_basic_select → filter_age_gt_30.json and similar; test_filtered_select → filter + select fixtures.
  • String: test_string_upper / test_string_lower → string_upper_lower.json; test_string_length → string_length_trim.json; test_crc32 → string_crc32.json; test_xxhash64 → string_xxhash64.json (now running).
  • Math: test_math_* → math_sin_cos.json, math_sqrt_pow.json, math_cosh_cbrt.json, etc.
  • Conditionals: test_when_otherwise → when_otherwise.json, when_then_otherwise.json; test_coalesce → coalesce.json; test_nullif / test_isnull / test_ifnull → phase15_aliases_nvl_isnull.json, etc.
  • Arrays: test_array_* → array_*.json (array_distinct, array_union, array_contains, etc.); test_size → array_size.json; test_element_at → element_at.json.
  • Window: test_row_number, test_rank, test_dense_rank, etc. → row_number_window.json, rank_window.json, lag_lead_window.json, ntile_window.json, etc.
  • Hash: test_xxhash64 → string_xxhash64.json; hash() → with_hash.json (both now running).

To run parity tests:

make test-parity-phases
# or: pytest tests/parity/ -v

Plan fixtures (for execute_plan):

cargo test -p robin-sparkless-polars plan_parity --features sql

Verifying all tests pass

make check-full
pytest tests -n 12
pytest tests/parity/ -v

This runs Rust checks (all features), Python lint, the main pytest suite, and parity tests. No failures expected on main.