Parity Check: Robin-Sparkless vs Sparkless 3.28.0¶
Date: February 2026
Method: Direct comparison of Sparkless 3.28.0 sparkless.sql.functions API vs robin-sparkless implementation.
Gap closure (Feb 2026): Implemented: bitmap_ (5), make_dt_interval, make_ym_interval, to_timestamp_ltz, to_timestamp_ntz, sequence, shuffle, inline, inline_outer, regr_ (9). DataFrame: cube, rollup, write, data, toLocalIterator, persist, unpersist; stubs for rdd, foreach, foreachPartition, mapInPandas, mapPartitions, storageLevel, isStreaming, withWatermark. Deferred: from_xml, to_xml, schema_of_xml, xpath_*, sentences.
Robin-Sparkless Implemented Functions (by Sparkless name)¶
The following Sparkless function names have an equivalent implementation in robin-sparkless:
Implemented (matches Sparkless 3.28.0)¶
| Sparkless name | Robin-Sparkless | Notes |
|---|---|---|
| abs | ✅ | |
| acos | ✅ | |
| acosh | ✅ | |
| add_months | ✅ | |
| any_value | ✅ | Phase 19 |
| approx_count_distinct | ✅ | GroupedData method |
| array | ✅ | |
| array_append | ✅ | Phase 18 |
| array_compact | ✅ | |
| array_contains | ✅ | |
| array_distinct | ✅ | |
| array_except | ✅ | Phase 18 |
| array_insert | ✅ | Phase 18 |
| array_intersect | ✅ | Phase 18 |
| array_join | ✅ | |
| array_max | ✅ | |
| array_min | ✅ | |
| array_position | ✅ | |
| array_prepend | ✅ | Phase 18 |
| array_remove | ✅ | |
| array_repeat | ✅ | |
| array_size | ✅ | |
| array_sort | ✅ | |
| array_union | ✅ | Phase 18 |
| ascii | ✅ | |
| asin | ✅ | |
| asinh | ✅ | |
| atan | ✅ | |
| atan2 | ✅ | |
| atanh | ✅ | |
| avg | ✅ | |
| base64 | ✅ | |
| bit_length | ✅ | Phase 19 |
| bool_and | ✅ | Phase 19 |
| bool_or | ✅ | Phase 19 |
| cast | ✅ | |
| cbrt | ✅ | |
| ceil | ✅ | |
| ceiling | ✅ | |
| char | ✅ | |
| chr | ✅ | |
| coalesce | ✅ | |
| col | ✅ | |
| collect_list | ✅ | Phase 19 |
| collect_set | ✅ | Phase 19 |
| concat | ✅ | |
| concat_ws | ✅ | |
| contains | ✅ | |
| cos | ✅ | |
| cosh | ✅ | |
| count | ✅ | |
| count_distinct | ✅ | |
| count_if | ✅ | Phase 19 |
| crc32 | ✅ | |
| create_map | ✅ | |
| cume_dist | ✅ | |
| current_date | ✅ | |
| current_timestamp | ✅ | |
| date_add | ✅ | |
| date_format | ✅ | |
| date_from_unix_date | ✅ | Phase 17 |
| date_sub | ✅ | |
| datediff | ✅ | |
| day | ✅ | |
| dayofmonth | ✅ | |
| dayofweek | ✅ | |
| dayofyear | ✅ | |
| decode | ❌ | Not implemented |
| degrees | ✅ | |
| dense_rank | ✅ | |
| element_at | ✅ | |
| elt | ✅ | Phase 19 |
| endswith | ✅ | |
| every | ✅ | Phase 19 (alias bool_and) |
| exp | ✅ | |
| explode | ✅ | |
| expm1 | ✅ | |
| factorial | ✅ | Phase 17 |
| find_in_set | ✅ | Phase 16 |
| first | ✅ | GroupedData |
| first_value | ✅ | |
| floor | ✅ | |
| format_number | ✅ | |
| format_string | ✅ | Phase 16 |
| from_json | ✅ | |
| btrim | ✅ | Phase 21 |
| locate | ✅ | Phase 21 |
| conv | ✅ | Phase 21 |
| hex | ✅ | Phase 21 |
| unhex | ✅ | Phase 21 |
| bin | ✅ | Phase 21 |
| getbit | ✅ | Phase 21 |
| to_char | ✅ | Phase 21 |
| to_varchar | ✅ | Phase 21 |
| to_number | ✅ | Phase 21 |
| try_to_number | ✅ | Phase 21 |
| try_to_timestamp | ✅ | Phase 21 |
| str_to_map | ✅ | Phase 21 |
| arrays_overlap | ✅ | Phase 21 |
| arrays_zip | ✅ | Phase 21 |
| explode_outer | ✅ | Phase 21 |
| posexplode_outer | ✅ | Phase 21 |
| array_agg | ✅ | Phase 21 |
| transform_keys | ✅ | Phase 21 |
| transform_values | ✅ | Phase 21 |
| from_unixtime | ✅ | Phase 17 |
| get | ✅ | Phase 18 (map element) |
| get_json_object | ✅ | |
| greatest | ✅ | |
| hour | ✅ | |
| hypot | ✅ | |
| ifnull | ✅ | |
| ilike | ✅ | |
| initcap | ✅ | |
| instr | ✅ | |
| isnan | ✅ | |
| isnotnull | ✅ | |
| isnull | ✅ | |
| lag | ✅ | |
| last | ✅ | GroupedData |
| last_day | ✅ | |
| last_value | ✅ | |
| lcase | ✅ | |
| lead | ✅ | |
| least | ✅ | |
| left | ✅ | |
| length | ✅ | |
| levenshtein | ✅ | |
| like | ✅ | |
| lit | ✅ | lit_i32, lit_i64, lit_f64, lit_bool, lit_str |
| ln | ✅ | |
| log | ✅ | |
| log10 | ✅ | |
| log1p | ✅ | |
| log2 | ✅ | |
| lower | ✅ | |
| lpad | ✅ | |
| ltrim | ✅ | |
| make_date | ✅ | Phase 17 |
| map_concat | ✅ | Phase 18 |
| map_contains_key | ✅ | Phase 18 |
| map_entries | ✅ | |
| map_filter | ✅ | Phase 18 |
| map_from_arrays | ✅ | |
| map_from_entries | ✅ | Phase 18 |
| map_keys | ✅ | |
| map_values | ✅ | |
| map_zip_with | ✅ | Phase 18 |
| mask | ✅ | |
| max | ✅ | |
| max_by | ✅ | Phase 19 |
| md5 | ✅ | |
| min | ✅ | |
| min_by | ✅ | Phase 19 |
| minute | ✅ | |
| month | ✅ | |
| months_between | ✅ | |
| named_struct | ✅ | Phase 18 |
| nanvl | ✅ | |
| next_day | ✅ | |
| nth_value | ✅ | |
| ntile | ✅ | |
| nullif | ✅ | |
| nvl | ✅ | |
| nvl2 | ✅ | |
| overlay | ✅ | |
| percent_rank | ✅ | |
| percentile | ✅ | Phase 19 |
| pmod | ✅ | Phase 17 |
| posexplode | ✅ | |
| position | ✅ | |
| pow | ✅ | |
| power | ✅ | |
| printf | ✅ | Phase 16 |
| product | ✅ | Phase 19 |
| quarter | ✅ | |
| radians | ✅ | |
| rank | ✅ | |
| regexp_count | ✅ | Phase 16 |
| regexp_extract | ✅ | |
| regexp_extract_all | ✅ | |
| regexp_instr | ✅ | Phase 16 |
| regexp_like | ✅ | |
| regexp_replace | ✅ | |
| regexp_substr | ✅ | Phase 16 |
| repeat | ✅ | |
| replace | ✅ | |
| reverse | ✅ | |
| right | ✅ | |
| rint | ✅ | |
| rlike | ✅ | (regexp alias) |
| round | ✅ | |
| row_number | ✅ | |
| rpad | ✅ | |
| rtrim | ✅ | |
| second | ✅ | |
| sha1 | ✅ | |
| sha2 | ✅ | |
| signum | ✅ | (sign alias) |
| sin | ✅ | |
| sinh | ✅ | |
| size | ✅ | |
| some | ✅ | Phase 19 (alias bool_or) |
| soundex | ✅ | |
| split | ✅ | |
| split_part | ✅ | Phase 16 |
| sqrt | ✅ | |
| startswith | ✅ | |
| stddev | ✅ | |
| struct | ✅ | Phase 18 |
| substr | ✅ | |
| substring | ✅ | |
| substring_index | ✅ | |
| sum | ✅ | |
| tan | ✅ | |
| tanh | ✅ | |
| timestamp_micros | ✅ | Phase 17 |
| timestamp_millis | ✅ | Phase 17 |
| timestamp_seconds | ✅ | Phase 17 |
| to_date | ✅ | |
| to_degrees | ✅ | |
| to_json | ✅ | |
| to_radians | ✅ | |
| to_unix_timestamp | ✅ | Phase 17 |
| translate | ✅ | |
| trim | ✅ | |
| trunc | ✅ | |
| try_add | ✅ | Phase 19 |
| try_divide | ✅ | Phase 19 |
| try_element_at | ✅ | Phase 19 |
| try_multiply | ✅ | Phase 19 |
| try_subtract | ✅ | Phase 19 |
| typeof | ✅ | Phase 19 |
| ucase | ✅ | |
| unbase64 | ✅ | |
| unix_date | ✅ | Phase 17 |
| unix_timestamp | ✅ | Phase 17 |
| upper | ✅ | |
| variance | ✅ | |
| weekofyear | ✅ | |
| when | ✅ | (case_when equivalent) |
| width_bucket | ✅ | Phase 19 |
| xxhash64 | ✅ | |
| year | ✅ | |
| zip_with | ✅ | Phase 18 |
Not Implemented in Robin-Sparkless (Gaps)¶
Approx / distinct¶
- approx_percentile
Crypto / binary¶
- aes_decrypt, aes_encrypt, try_aes_decrypt
- to_binary, try_to_binary
- decode, encode
- ~~hex, unhex~~ ✅ Phase 21
- ~~bin~~ ✅ Phase 21
- ~~getbit~~ ✅ Phase 21
Array (additional)¶
- aggregate (array aggregate) — deferred
- ~~array_agg~~ ✅ Phase 21
- ~~arrays_overlap, arrays_zip~~ ✅ Phase 21
- ~~explode_outer, posexplode_outer~~ ✅ Phase 21
Map¶
- ~~str_to_map~~ ✅ Phase 21
Struct¶
- ~~transform_keys, transform_values~~ ✅ Phase 21
Ordering (Phase 20)¶
- asc, asc_nulls_first, asc_nulls_last ✅
- desc, desc_nulls_first, desc_nulls_last ✅
Control¶
- ~~assert_true, raise_error~~ — implemented (Phase 24; control functions)
Bit / bitmap¶
- ~~bit_and, bit_or, bit_xor, bit_count, bit_get~~ — implemented (Phase 24; bit operations)
- ~~bitwiseNOT, bitwise_not~~ — implemented (Phase 24; aliases)
- bitmap_* functions
JVM / runtime (defer)¶
- ~~broadcast, spark_partition_id, input_file_name~~ — implemented as stubs (Phase 24; see PYSPARK_DIFFERENCES.md)
- ~~monotonically_increasing_id~~ — implemented as stub (Phase 24; constant 0; see PYSPARK_DIFFERENCES.md)
- ~~current_catalog, current_database, current_schema, current_user, user~~ — implemented as stubs (Phase 24; placeholders; see PYSPARK_DIFFERENCES.md)
Numeric (Phase 20)¶
- bround ✅
String¶
- ~~btrim, conv, locate~~ ✅ Phase 21
Math (Phase 20)¶
- cot, csc, sec, e, pi ✅
- negate, negative, positive ✅
Aggregates (Phase 20)¶
- covar_pop, covar_samp, corr (deferred for groupBy agg)
- median, mode ✅
- percentile_approx (deferred)
- stddev_pop, stddev_samp, var_pop, var_samp ✅
- kurtosis, skewness (deferred)
- try_sum, try_avg ✅
Datetime¶
- convert_timezone, current_timezone
- curdate, date_diff, date_part, date_trunc
- dateadd, datepart
- dayname, days, hours, months, years
- extract, localtimestamp
- make_dt_interval, make_interval, make_timestamp, make_timestamp_ltz, make_timestamp_ntz, make_ym_interval
- now
- timestampadd, timestampdiff
- to_timestamp, to_timestamp_ltz, to_timestamp_ntz
- from_utc_timestamp, to_utc_timestamp
- unix_micros, unix_millis, unix_seconds
- weekday
JSON / XML¶
- json_array_length, json_object_keys, json_tuple
- from_xml, to_xml, schema_of_xml
- parse_url
Schema / I/O¶
- from_csv, to_csv
- schema_of_csv, schema_of_json
Type / cast¶
- ~~to_char, to_number, to_varchar~~ ✅ Phase 21
- ~~try_to_number, try_to_timestamp~~ ✅ Phase 21
URL¶
- url_decode, url_encode
Misc¶
- call_function, equal_null
- grouping, grouping_id
- hash
- inline, inline_outer
- isin
- sentences, sequence
- sha (generic)
- shiftLeft, shiftRight, shiftRightUnsigned
- shuffle, stack
- version
- window, window_time
- xpath_* functions
Regression (defer)¶
- regr_*
Random / UDF¶
- ~~rand, randn~~ — implemented (Phase 24): real RNG with seed; per-row values in with_column/with_columns (see PYSPARK_DIFFERENCES.md)
- udf, pandas_udf
Summary¶
| Metric | Sparkless 3.28.0 | Robin-Sparkless | Coverage |
|---|---|---|---|
| Function names | ~280+ distinct | ~290+ implemented | ~95%+ |
| Parity fixtures | — | 159 passing | — |
Conclusion: Robin-sparkless has substantial parity with Sparkless 3.28.0 for the core PySpark operations used in typical data pipelines (filter, select, groupBy, join, window, array, map, string, math, datetime, type/conditional). Phases 20–22 completed (~25 datetime extensions in Phase 22). Remaining gaps are addressed in ROADMAP Phases 23–24. The 159 parity fixtures validate behavior for implemented functions.