What Robin-Sparkless Does NOT Have (vs Sparkless 3.28.0)¶

This list is only items that Sparkless has and robin-sparkless does not (or has only as a stub/no-op). Implemented equivalents (e.g. signum for sign, avg for mean, trunc for date_trunc) are not listed.

PySpark parity scope: All items below are PySpark APIs (or direct Sparkless equivalents of PySpark), unless marked as Sparkless-specific. PySpark reference: pyspark.sql (SparkSession, DataFrame, Catalog, functions), pyspark.sql.functions, and RDD / SparkContext. Tracking these as unimplemented is for PySpark parity; implementing them would align robin-sparkless with standard PySpark behavior.

For a consolidated view of deferred/optional scope (XML, UDF, streaming, RDD, sketch, Catalog DDL), see DEFERRED_SCOPE.md.

Functions (sparkless.sql.functions) — Missing or stub only¶

Crypto / binary¶

~~aes_decrypt, aes_encrypt, try_aes_decrypt~~ — implemented (AES-128-GCM; see PYSPARK_DIFFERENCES)
~~to_binary, try_to_binary~~ — implemented
~~decode, encode~~ — implemented (UTF-8, hex)
~~octet_length, char_length, character_length~~ — implemented

Approx / aggregates¶

~~approx_percentile~~ — implemented
~~percentile_approx~~ — implemented
~~covar_pop, covar_samp, corr as groupBy aggregations~~ — implemented
~~kurtosis, skewness~~ — implemented

Array¶

~~aggregate (array fold)~~ — implemented (simplified: zero + sum(list))
~~cardinality~~ — implemented (alias for size)

Bitmap (PySpark 3.5+)¶

~~bitmap_bit_position, bitmap_bucket_number, bitmap_construct_agg, bitmap_count, bitmap_or_agg~~ — implemented

Datetime / interval¶

~~make_dt_interval, make_ym_interval~~ — implemented
~~to_timestamp_ltz, to_timestamp_ntz~~ — implemented (we have to_timestamp)

JSON / XML / CSV¶

~~json_object_keys, json_tuple~~ — implemented
from_xml, to_xml, schema_of_xml (PySpark 3.4+/4.0+; optional/deferred)
~~from_csv, to_csv~~ — implemented (minimal)
~~schema_of_csv, schema_of_json~~ — implemented (stub: return literal schema string)

Misc / UDF / JVM¶

call_function (stub: not supported; Sparkless-specific — PySpark equivalent: register UDF with spark.udf.register then use the name in expr() or SQL)
~~grouping, grouping_id~~ — implemented (stub: return 0)
~~inline, inline_outer~~ — implemented (explode list of structs; use unnest for struct fields)
sentences (optional/deferred): PySpark pyspark.sql.functions.sentences — NLP string→array of array of words; implement only if prioritized.
~~sequence~~ — implemented (generate array of numbers)
~~sha~~ — we have sha1, sha2
~~shuffle~~ — implemented
window, window_time (PySpark; we have .over(); thin wrappers if needed; see PYSPARK_DIFFERENCES)
pandas_udf (PySpark pandas_udf decorator for scalar, grouped map, and other function types; robin-sparkless currently supports only a minimal grouped aggregation variant via pandas_udf(..., function_type="grouped_agg") on the Python side)
count_min_sketch, histogram_numeric, hll_sketch_agg, hll_sketch_estimate, hll_union, hll_union_agg (PySpark 3.5+; stub/defer)
session_window (PySpark Structured Streaming; stub: no streaming)
call_udf, udtf, reduce, reflect, java_method (PySpark JVM/UDTF APIs; stub: not supported)

Regression¶

~~regr_avgx, regr_avgy, regr_count, regr_intercept, regr_r2, regr_slope, regr_sxx, regr_sxy, regr_syy~~ — implemented

XPath (deferred)¶

XPath (deferred): PySpark xpath, xpath_boolean, xpath_double, etc. (3.5+) — require XML support; stub or defer (see PYSPARK_DIFFERENCES).

Aliases we don’t expose (Sparkless has, we have equivalent under different name)¶

~~sign~~ — alias of signum
~~std~~ — alias of stddev
~~mean~~ — alias of avg
~~date_trunc~~ — alias of trunc
~~regexp~~ — alias of rlike

DataFrame methods — Missing or no-op only¶

corr — We have df.stat().corr(col1, col2) (scalar) and df.corr() (correlation matrix DataFrame).
~~createGlobalTempView, createOrReplaceGlobalTempView~~ — implemented (stub: same catalog as temp view).
createTempView, createOrReplaceTempView — we expose create_or_replace_temp_view (SQL feature).
~~cube, rollup~~ — implemented (multiple grouping sets then union).
~~data~~ — implemented (best-effort: same as collect(), list of row dicts).
~~dtypes~~ — implemented (returns list of (name, dtype_string)).
foreach, foreachPartition — PySpark DataFrame; stub: raise NotImplementedError (see PYSPARK_DIFFERENCES).
mapInPandas, mapPartitions — PySpark DataFrame; stub: raise NotImplementedError.
rdd — PySpark DataFrame.rdd; stub: raise NotImplementedError (use collect() or toLocalIterator() for local data).
registerTempTable — legacy; we have create_or_replace_temp_view.
~~repartitionByRange~~ — implemented (no-op).
sameSemantics, semanticHash — we have no-op stubs that return fixed values.
~~sortWithinPartitions~~ — implemented (no-op).
storageLevel — PySpark; stub: returns None (DataFrame is lazy; no storage level).
to — PySpark generic writer; we have write_delta with delta feature.
~~toLocalIterator~~ — implemented (best-effort: same as collect(), iterable of rows).
~~unpersist~~ — we have it (no-op).
withWatermark — PySpark Structured Streaming; no-op stub (streaming not supported).
~~write~~ — implemented (generic write: parquet/csv/json, mode overwrite/append). writeTo — PySpark DataFrameWriterV2 / catalog table API; stub or use write to path.
isStreaming — PySpark; stub: always returns False.

Summary counts (approximate)¶

Category	Sparkless	Robin-Sparkless missing (approx)
Functions	417	~55–60 (crypto, XML, regr, bitmap, CSV/JSON schema, UDF/UDTF, etc.)
DataFrame methods	95	~20–25 (cube/rollup, RDD/foreach, writeTo, streaming, etc.)

For full “we have it” lists see GAP_ANALYSIS_SPARKLESS_3.28.md and PARITY_CHECK_SPARKLESS_3.28.md.