Parity Status (PySpark vs Robin Sparkless)¶
This doc is the living parity matrix for robin-sparkless.
- Oracle: PySpark (fixtures generated by
tests/gen_pyspark_cases.py); 4.9.0+ adds opt-in PySpark 4.1 oracle viatests/requirements-pyspark4.txtand nightly workflow.github/workflows/pyspark4-oracle.yml - Compat profiles: Default oracle remains PySpark 3.5 /
compat=3.5; PySpark 4 tests useSPARKLESS_PYSPARK_COMPAT=4.0— see PYSPARK_COMPAT_PROFILES.md - Harness:
pytest tests/parity/(and issue-specific tests undertests/dataframe/,tests/sql/). The legacy Rust integration harnesstests/parity.rswas removed; usemake test-parity-phases. - Fixtures:
tests/fixtures/*.json(operations format);tests/fixtures/plans/*.json(plan format, see LOGICAL_PLAN_FORMAT.md);tests/fixtures/phase_manifest.json(phase-to-fixture mapping) - Sparkless integration: Robin-sparkless is designed to replace Sparkless's backend. Sparkless has 270+ expected_outputs; a fixture converter can convert those to robin-sparkless format. See SPARKLESS_INTEGRATION_ANALYSIS.md §4.
Status as of May 2026: Main pytest suite 3115 passed, 64 skipped (pytest tests -n 12). Parity JSON fixtures: 212+ hand-written fixtures in phases A–G; run via make test-parity-phases. Fixture with_rand_seed.json is marked "skip": true in tooling (non-deterministic seed parity). Phase G ✅ COMPLETED: Parity fixture expansion — 201 hand-written fixtures passing (filter_age_lt_25, filter_name_eq, select_single_column, groupby_count_desc, limit_one, orderby_desc, with_column_lit, distinct_all, fillna_simple, filter_then_select, groupby_sum_simple, filter_ge, filter_ne, filter_le, filter_or_simple, filter_eq_lit, select_reorder, and 40+ more added). Phase C ✅ COMPLETED: DataFrameReader/Writer parity — spark.read().option/options/format/load/table/csv/parquet/json; df.write().option/options/partition_by/parquet/csv/json; fixtures read_csv_with_options, read_table. Phase D ✅ COMPLETED: DataFrame method gaps — df.createOrReplaceTempView, df.corr(col1,col2), df.cov(col1,col2), toDF/toJSON/toPandas, columns, cache, hint, repartitionByRange, sortWithinPartitions, sameSemantics, semanticHash, isLocal, inputFiles, writeTo (stub). Phase E ✅ COMPLETED: SparkSession & Catalog stubs — spark.catalog(), spark.conf(), spark.range(), spark.version, spark.newSession(), spark.stop(), spark.getActiveSession(), spark.getDefaultSession(), spark.udf() (stub); Catalog 27 methods (functional: dropTempView, listTables, tableExists, etc.; stubs: cacheTable, createTable, etc.). Gap closure (Feb 2026): bitmap (5), make_dt_interval, make_ym_interval, to_timestamp_ltz/ntz, sequence, shuffle, inline, inline_outer, regr_ (9); DataFrame cube, rollup, write, data, toLocalIterator, persist/unpersist and stubs (rdd, foreach, foreachPartition, mapInPandas, mapPartitions, storageLevel, isStreaming, withWatermark). Signature alignment (optional params and two-arg when): fixtures position_start, assert_true_err_msg, like_escape_char, ilike_escape_char, months_between_round_off, parse_url_key, make_timestamp_timezone, to_timestamp_format, to_char_format, when_two_arg added. Phase 25 ✅ COMPLETED: Plan interpreter (execute_plan), expression interpreter, LOGICAL_PLAN_FORMAT.md, plan fixtures in tests/fixtures/plans/ (filter_select_limit, join_simple, with_column_functions), plan_parity_fixtures test; create_dataframe_from_rows (Rust + Python). Remaining: Phase 26 (crate publish), Phase 27 (Sparkless integration). Phase 24 ✅ COMPLETED: bit (bit_and, bit_or, bit_xor, bit_count, bit_get, bitwise_not/bitwiseNOT), control (assert_true, raise_error), JVM stubs (broadcast, spark_partition_id, input_file_name, monotonically_increasing_id, current_catalog, current_database, current_schema, current_user, user), random (rand, randn with per-row values when used in with_column/with_columns), crypto (aes_encrypt, aes_decrypt, try_aes_decrypt; AES-128-GCM). Fixtures with_bit_ops, with_rand_seed, with_jvm_stubs. See PYSPARK_DIFFERENCES.md for crypto semantics. Phase 23 ✅ COMPLETED: JSON/URL/misc (isin, url_decode, url_encode, json_array_length, parse_url, hash, shift_left, shift_right, version, equal_null, stack); fixtures with_isin, with_url_decode, with_url_encode, json_array_length_test, with_hash, with_shift_left. Phase 22 ✅ COMPLETED: Datetime extensions (curdate, now, localtimestamp, date_diff, dateadd, datepart, extract, date_part, unix_micros, unix_millis, unix_seconds, dayname, weekday, make_timestamp, make_timestamp_ntz, make_interval, timestampadd, timestampdiff, days, hours, minutes, months, years, from_utc_timestamp, to_utc_timestamp, convert_timezone, current_timezone, to_timestamp); fixtures with_dayname, with_weekday, with_extract, with_unix_micros, make_timestamp_test, timestampadd_test, from_utc_timestamp_test. Phase 21 ✅ COMPLETED: String (btrim, locate, conv), binary (hex, unhex, bin, getbit), type (to_char, to_varchar, to_number, try_to_number, try_to_timestamp), array (arrays_overlap, arrays_zip, explode_outer, posexplode_outer, array_agg), map (str_to_map), struct (transform_keys, transform_values). Phase 20 ✅ COMPLETED: Ordering (asc, desc, nulls_first/last), aggregates (median, mode, stddev_pop, var_pop, try_sum, try_avg), numeric (bround, negate, positive, cot, csc, sec, e, pi); fixtures groupby_median, with_bround; OrderBy supports optional nulls_first. Phase 19 ✅ COMPLETED: Aggregates (any_value, bool_and, bool_or, count_if, max_by, min_by, percentile, product, collect_list, collect_set), try_ (try_divide, try_add, try_subtract, try_multiply), misc (width_bucket, elt, bit_length, typeof); fixtures groupby_any_value, groupby_product, try_divide, width_bucket. Phase 18 ✅ COMPLETED: array/map/struct (map_filter, zip_with, map_zip_with). Phase 17 ✅ COMPLETED: Datetime/unix, math (pmod, factorial). Phase 16 ✅ COMPLETED: String/regex. Phase 15 ✅ COMPLETED: aliases, string, math, array_distinct. Remaining: ROADMAP Phases 25–26 (crate publish, Sparkless integration). Phase 14: Math (sin, cos, tan, asin, acos, atan, atan2, degrees, radians, signum), datetime (quarter, weekofyear, dayofweek, dayofyear, add_months, months_between, next_day), type/conditional (cast, try_cast, isnan, greatest, least); parity parser extended; fixtures math_sin_cos, datetime_quarter_week. Phase 13: String/binary/collection batch 1: ascii, format_number, overlay, position, char, chr, base64, unbase64, sha1, sha2, md5, array_compact implemented in Rust; parity parser and fixtures string_ascii, string_format_number. Phase 12: DataFrame methods implemented in Rust and exposed in Python: sample, random_split, first, head, tail, take, is_empty, to_json, to_pandas, explain, print_schema, checkpoint, repartition, coalesce, offset, summary, to_df, select_expr, col_regex, with_columns, with_columns_renamed, stat (cov/corr), na (fill/drop), freq_items, approx_quantile, crosstab, melt, except_all, intersect_all, sample_by, and Spark no-ops. Parity fixtures for first/head/offset: first_row, head_n, offset_n. Phase 11: Parity harness supports date, timestamp, and boolean in fixture input; datetime fixtures date_add_sub, datediff, datetime_hour_minute; String 6.4 fixtures string_soundex, string_levenshtein, string_crc32, string_xxhash64. Window fixtures percent_rank, cume_dist, ntile, nth_value are covered (multi-step workaround in harness). Phase 6: array functions array_position, array_remove, posexplode are implemented (via Polars list.eval); array fixtures array_contains, element_at, array_size, array_sum; array extensions (exists, forall, filter, transform, array_sum, array_mean; Phase 8: array_flatten, array_repeat implemented via map UDFs). Phase 8: Map (create_map, map_keys, map_values, map_entries, map_from_arrays implemented; Map as List(Struct{key, value})). JSON (get_json_object, from_json, to_json implemented). CI runs format, clippy, audit, deny, and all tests (including parity). Python smoke tests in tests/python/ (run via make test or make test-python); see EMBEDDING.md.
Phase test coverage¶
Parity fixtures are grouped into phases (A–G) defined in tests/fixtures/phase_manifest.json. Run phase-specific tests:
make test-parity-phases # pytest tests/parity/
# Full suite (includes delta/integration markers): see docs/TESTING_GUIDE.md
pytest tests -n 10
Python phase smoke tests: test_phase_a_signature_alignment, test_phase_b_functions, test_phase_c_reader_writer, test_phase_d_dataframe_methods, test_phase_e_spark_session_catalog, test_phase_f_behavioral. When adding new fixtures, add the fixture name to the appropriate phase in phase_manifest.json. See TEST_CREATION_GUIDE.md for phase testing details.
Legend¶
- ✅ Covered: Covered by one or more fixtures (listed)
- 🚧 Not yet covered: Supported/partially supported but missing fixture coverage
- ❌ Not implemented: Not implemented in the Rust API yet
- ⚠️ Diverges: Implemented but intentionally differs from PySpark (must be documented)
Coverage Matrix (high level)¶
| Area | Capability | Status | Fixtures |
|---|---|---|---|
| Data creation | SparkSession::create_dataframe (simple rows) |
✅ Covered | filter_age_gt_30, groupby_count, groupby_with_nulls (and most others) |
| Data creation | SparkSession::create_dataframe_from_rows (arbitrary schema) |
✅ Covered | Used by plan interpreter; plan fixtures |
| Plan execution | execute_plan (serialized logical plan) |
✅ Covered | tests/fixtures/plans/filter_select_limit, join_simple, with_column_functions (plan_parity_fixtures) |
| IO | read_csv |
✅ Covered | read_csv |
| IO | read_parquet |
✅ Covered | read_parquet |
| IO | read_json |
✅ Covered | read_json |
| IO | spark.read().option/options().csv (reader options) |
✅ Covered | read_csv_with_options |
| IO | spark.read().table(name) (temp view) |
✅ Covered | read_table |
| DataFrame | select |
✅ Covered | many (e.g. filter_age_gt_30) |
| DataFrame | filter basic comparisons |
✅ Covered | filter_age_gt_30 |
| DataFrame | filter nested boolean logic |
✅ Covered | filter_and_or, filter_nested, filter_not |
| DataFrame | orderBy |
✅ Covered | many (e.g. filter_age_gt_30, groupby_count) |
| GroupBy | groupBy(...).count() |
✅ Covered | groupby_count, groupby_with_nulls |
| GroupBy | groupBy(...).sum() |
✅ Covered | groupby_sum |
| GroupBy | groupBy(...).avg() |
✅ Covered | groupby_avg |
| GroupBy | groupBy(...).min() |
✅ Covered | groupby_min |
| GroupBy | groupBy(...).max() |
✅ Covered | groupby_max |
| GroupBy | groupBy with NULL keys | ✅ Covered | groupby_null_keys |
| GroupBy | groupBy single-row groups / single group | ✅ Covered | groupby_single_row_groups, groupby_single_group |
| GroupBy | multi-agg agg([..]) |
✅ Covered | groupby_multi_agg |
| GroupBy | stddev, variance, count_distinct in agg | ✅ Covered | groupby_stddev_count_distinct |
| DataFrame | withColumn (arithmetic) |
✅ Covered | type_coercion_mixed |
| DataFrame | withColumn (logical/boolean) |
✅ Covered | with_logical_column |
| DataFrame | withColumn (mixed arithmetic + comparison) |
✅ Covered | with_arithmetic_logical_mix |
| Functions | when().then().otherwise() |
✅ Covered | when_otherwise, when_then_otherwise |
| Functions | coalesce() |
✅ Covered | coalesce |
| Null semantics | NULL equality/inequality | ✅ Covered | null_comparison_equality |
| Null semantics | NULL ordering comparisons | ✅ Covered | null_comparison_ordering |
| Null semantics | eqNullSafe |
✅ Covered | null_safe_equality |
| Null semantics | NULLs inside filter predicates | ✅ Covered | null_in_filter |
| Type coercion | numeric comparison coercion (int vs double) | ✅ Covered | type_coercion_numeric |
| Type coercion | numeric arithmetic coercion (int + double) | ✅ Covered | type_coercion_mixed |
| Joins | inner/left/right/outer joins | ✅ Covered | inner_join, left_join, right_join, outer_join |
| Joins | join with NULL keys (inner: nulls excluded) | ✅ Covered | join_null_keys |
| Joins | join with duplicate keys (cartesian match) | ✅ Covered | join_duplicate_keys |
| Windows | row_number, rank, dense_rank, lag, lead | ✅ Covered | row_number_window, rank_window, lag_lead_window |
| Strings | upper, lower, substring, concat, concat_ws | ✅ Covered | string_upper_lower, string_substring, string_concat |
| Strings | length, trim, ltrim, rtrim, regexp_extract, regexp_replace, split, initcap | ✅ Covered | string_length_trim |
| Config | spark.sql.caseSensitive (case-insensitive column resolution) |
✅ Covered | case_insensitive_columns |
| DataFrame | union / unionAll |
✅ Covered | union_all |
| DataFrame | unionByName |
✅ Covered | union_by_name |
| DataFrame | distinct / dropDuplicates |
✅ Covered | distinct |
| DataFrame | drop (columns) |
✅ Covered | drop_columns |
| DataFrame | dropna |
✅ Covered | dropna |
| DataFrame | fillna (single value) |
✅ Covered | fillna |
| DataFrame | limit |
✅ Covered | limit |
| DataFrame | withColumnRenamed |
✅ Covered | with_column_renamed |
| Array/List | array, array_contains, element_at, size/array_size, array_join, array_sort, array_slice, explode; array_position, array_remove, posexplode (implemented) | ✅ Covered | array_contains, element_at, array_size |
| Windows | first_value, last_value, percent_rank | ✅ Covered | first_value_window, last_value_window, percent_rank_window |
| Windows | cume_dist, ntile, nth_value | ✅ Covered | cume_dist_window, ntile_window, nth_value_window (multi-step workaround in harness) |
| Strings | regexp_extract_all, regexp_like | ✅ Covered | regexp_extract_all, regexp_like |
| Strings | repeat, reverse, instr, lpad, rpad | ✅ Covered | string_repeat_reverse, string_lpad_rpad |
| Strings | mask, translate, substring_index; soundex, levenshtein, crc32, xxhash64 (Phase 8) | ✅ Covered | string_mask, string_translate, string_substring_index, string_soundex, string_levenshtein, string_crc32, string_xxhash64 |
| Strings (Phase 13) | ascii, format_number, overlay, position, char, chr, base64, unbase64, sha1, sha2, md5 | ✅ Implemented | string_ascii, string_format_number |
| Strings (Phase 16) | regexp_count, regexp_instr, regexp_substr, split_part, find_in_set, format_string, printf | ✅ Covered | regexp_count, regexp_substr, regexp_instr, split_part, find_in_set, format_string |
| Datetime (Phase 17) | unix_timestamp, from_unixtime, make_date, timestamp_seconds/millis/micros, unix_date, date_from_unix_date | ✅ Covered | unix_timestamp, from_unixtime, make_date, timestamp_seconds, timestamp_millis, timestamp_micros, unix_date, date_from_unix_date |
| Math (Phase 17) | pmod, factorial | ✅ Covered | pmod, factorial |
| Array | array_sum, array_exists, forall, filter, transform; array_flatten, array_repeat (Phase 8); array_compact (Phase 13) | ✅ Implemented | array_sum |
| Map | create_map, map_keys, map_values, map_entries, map_from_arrays (Phase 8) | ✅ Implemented | No fixture yet |
| JSON | get_json_object, from_json, to_json (Phase 10) | ✅ get_json_object covered | json_get_json_object |
| Math | sqrt, pow, exp, log | ✅ Covered | math_sqrt_pow |
| GroupBy | first, last, approx_count_distinct in agg | ✅ Covered | groupby_first_last |
| GroupBy (Phase 19) | any_value, bool_and, bool_or, product, collect_list, collect_set, count_if, percentile, max_by, min_by | ✅ Covered | groupby_any_value, groupby_product |
| Misc (Phase 19) | try_divide, try_add, try_subtract, try_multiply, width_bucket, elt, bit_length, typeof | ✅ Covered | try_divide, width_bucket |
| DataFrame | replace, crossJoin, describe, subtract, intersect | ✅ Covered | replace, cross_join, describe, subtract, intersect |
| SQL | SparkSession::sql() (optional sql feature) |
✅ Implemented | No fixture (SQL translated to DataFrame ops; parity via DataFrame fixtures) |
| Datetime | year, month, day, to_date, date_format; current_date, date_add, hour, etc. | ✅ Covered | date_add_sub, datediff, datetime_hour_minute |
| DataFrame (Phase 12) | first, head, offset, sample, to_json, summary, stat, select_expr, freq_items, crosstab, melt, etc. (Rust + PyO3) | ✅ first/head/offset/summary covered | first_row, head_n, offset_n, summary; additional Phase 12 ops implemented, fixtures TBD |
| DataFrame (Phase D) | createOrReplaceTempView, corr(col1,col2), cov(col1,col2), toDF/toJSON/toPandas, columns, cache, hint, repartitionByRange, sortWithinPartitions, sameSemantics, semanticHash, isLocal, inputFiles, writeTo (stub) | ✅ Implemented | Python: test_phase_d_dataframe_methods; table read via read_table fixture |
Fixture Index¶
| Fixture | What it covers |
|---|---|
filter_age_gt_30 |
Filter + select + orderBy (baseline) |
filter_and_or |
AND/OR precedence + parentheses |
filter_nested |
Nested boolean logic |
filter_not |
NOT / negation |
groupby_count |
groupBy + count + orderBy |
groupby_with_nulls |
groupBy with NULLs |
groupby_sum |
groupBy + sum |
groupby_avg |
groupBy + avg |
groupby_min |
groupBy + min |
groupby_max |
groupBy + max |
groupby_null_keys |
groupBy with NULL keys |
groupby_single_row_groups |
groupBy with single-row groups (each key once) |
groupby_single_group |
groupBy with single group (all same key) |
join_null_keys |
inner join with NULL join keys (nulls excluded) |
join_duplicate_keys |
inner join with duplicate keys (multiple matches) |
case_insensitive_columns |
case-insensitive column resolution (filter/select/orderBy with mixed-case names) |
read_csv |
CSV read path + operations |
read_parquet |
Parquet read path + operations |
read_json |
JSON read path + operations |
read_csv_with_options |
spark.read.option("header","true").csv(path) with reader_options |
read_table |
spark.read.table("name") via table_source (temp view) |
with_logical_column |
Logical columns/expressions in withColumn |
with_arithmetic_logical_mix |
Mixed arithmetic + comparison in withColumn |
when_otherwise |
when/then/otherwise |
when_then_otherwise |
chained when |
coalesce |
coalesce null handling |
null_comparison_equality |
NULL equality/inequality semantics |
null_comparison_ordering |
NULL ordering semantics |
null_safe_equality |
eqNullSafe semantics |
null_in_filter |
NULLs in filter predicates |
type_coercion_numeric |
int/double comparison coercion |
type_coercion_mixed |
int+double arithmetic coercion |
inner_join |
inner join on dept_id |
left_join |
left join + orderBy |
right_join |
right join + orderBy |
outer_join |
outer join + orderBy |
groupby_multi_agg |
groupBy + multiple aggregations in one agg() |
groupby_stddev_count_distinct |
groupBy + stddev and count_distinct in agg |
row_number_window |
row_number() over partition by dept order by salary desc |
rank_window |
rank() over partition with ties |
lag_lead_window |
lag and lead over partition |
string_upper_lower |
upper(), lower() |
string_substring |
substring() 1-based |
string_concat |
concat(), concat_ws() |
string_length_trim |
length(), trim() in withColumn |
union_all |
union (vertical stack, same schema) |
union_by_name |
unionByName (align columns by name) |
distinct |
distinct (drop duplicate rows) |
drop_columns |
drop(columns) |
dropna |
dropna (drop rows with nulls) |
fillna |
fillna (fill nulls with value) |
limit |
limit(n) |
with_column_renamed |
withColumnRenamed(old, new) |
array_contains |
split + array_contains(col, lit) |
element_at |
split + element_at(col, 1-based index) |
array_size |
split + size(col) |
first_value_window |
first_value over partition |
last_value_window |
last_value over partition |
percent_rank_window |
percent_rank over partition |
cume_dist_window |
cume_dist over partition |
ntile_window |
ntile(n) over partition |
nth_value_window |
nth_value over partition |
regexp_like |
regexp_like(col, pattern) boolean match |
regexp_extract_all |
regexp_extract_all(col, pattern) list of matches |
string_repeat_reverse |
repeat(col, n), reverse(col) |
string_lpad_rpad |
lpad(col, len, pad), rpad(col, len, pad) |
math_sqrt_pow |
sqrt(col), pow(col, exp) |
groupby_first_last |
groupBy + first(name), last(name) |
groupby_any_value |
groupBy + any_value(column) |
groupby_product |
groupBy + product(column) |
try_divide |
try_divide(col, col) — null on divide-by-zero |
width_bucket |
width_bucket(value, min, max, num_bucket) |
cross_join |
crossJoin (cartesian product) |
describe |
describe() summary statistics |
summary |
summary() (same as describe) |
replace |
replace(column, old_value, new_value) |
subtract |
subtract (set difference) |
intersect |
intersect (set intersection) |
first_row |
first() – first row as one-row DataFrame |
head_n |
head(n) – first n rows |
offset_n |
offset(n) – skip first n rows |
string_mask |
mask(col) – replace upper/lower/digit with X/x/n |
string_translate |
translate(col, from_str, to_str) |
string_substring_index |
substring_index(col, delim, count) before/after nth delim |
array_sum |
array(cols) + array_sum(col) |
json_get_json_object |
get_json_object(col, '$.path') |
date_add_sub |
date_add(col('d'), 7), date_sub(col('d'), 3) |
datediff |
datediff(col('end'), col('start')) |
datetime_hour_minute |
hour(col('ts')), minute(col('ts')) with timestamp input |
string_soundex |
soundex(col('name')) |
string_levenshtein |
levenshtein(col('a'), col('b')) |
string_crc32 |
crc32(col('s')) |
string_xxhash64 |
xxhash64(col('s')) |
string_ascii |
ascii(col('name')) → first-char code point |
string_format_number |
format_number(col('value'), 2) → fixed-decimal string |
phase15_aliases_nvl_isnull |
nvl, nvl2, isnull, isnotnull (Phase 15) |
string_left_right_replace |
left, right, replace, startswith, endswith, contains, like, ilike, rlike |
math_cosh_cbrt |
cosh, sinh, tanh, acosh, asinh, atanh, cbrt, expm1, log1p, log10, log2, rint, hypot |
array_distinct |
array_distinct(col) — JSON fixture may be skipped; Python tests in tests/dataframe/test_issue_415_array_distinct*.py and test_issue_439_* run in main suite |
regexp_count |
regexp_count(col, pattern) – count non-overlapping matches |
regexp_substr |
regexp_substr(col, pattern) – first match substring |
regexp_instr |
regexp_instr(col, pattern) – 1-based position of first match |
split_part |
split_part(col, delim, part_num) – 1-based part of split |
find_in_set |
find_in_set(col('str'), col('set')) – 1-based index in comma-delimited list |
format_string |
format_string('%d %s', col('a'), col('b')) – printf-style formatting |
unix_timestamp |
unix_timestamp(col), unix_timestamp(col, format) – string to seconds |
from_unixtime |
from_unixtime(col), from_unixtime(col, format) – seconds to formatted string |
make_date |
make_date(year, month, day) – build date from parts |
timestamp_seconds |
timestamp_seconds(col) – seconds epoch to timestamp |
timestamp_millis |
timestamp_millis(col) – millis epoch to timestamp |
timestamp_micros |
timestamp_micros(col) – micros epoch to timestamp |
unix_date |
unix_date(col) – date to days since epoch |
date_from_unix_date |
date_from_unix_date(col) – days to date |
pmod |
pmod(a, b) – positive modulus |
factorial |
factorial(n) – n! for n 0..20 |
with_bit_ops |
bit operations (bit_and, bit_or, bit_xor, bit_count, bit_get) via withColumn |
Next additions to the matrix (recommended)¶
- Add more join edge-case fixtures (e.g. left/outer with null keys) if needed.
- ROADMAP Phases 16–27: Phases 18–19 completed. Phases 20–24 (full parity in 5 parts), Phase 25 (readiness for post-refactor merge), Phase 26 (publish crate on crates.io), Phase 27 (Sparkless integration, 200+ tests). See ROADMAP.md, GAP_ANALYSIS_SPARKLESS_3.28.md.
Sparkless Test Conversion¶
Sparkless (github.com/eddiethedean/sparkless) has 270+ JSON expected outputs in tests/expected_outputs/. These can drive robin-sparkless parity tests via a fixture converter that maps Sparkless JSON format → robin-sparkless fixture format. See SPARKLESS_INTEGRATION_ANALYSIS.md §4 for:
- Fixture format comparison (input_data vs input/rows; expected_output vs expected)
- Conversion steps per test
- Priority order: parity/dataframe, parity/functions, then parity/sql