Skip to content

PySpark compatibility profiles

Sparkless 4.9+ supports compatibility profiles that bundle Spark SQL and PySpark semantics. Profiles let you opt into PySpark 4.x behavior while keeping PySpark 3.5 as the default.

See also: PYSPARK_4_PARITY_PLAN.md, PYSPARK_DIFFERENCES.md.

Setting a profile

from sparkless.sql import SparkSession

spark = SparkSession.builder.app_name("demo").get_or_create()

# Tier A (default): PySpark 3.5-like semantics
# spark.conf.set("sparkless.pyspark.compat", "3.5")  # optional; this is the default

# Tier B: PySpark 4.0+ semantics (ANSI on, map normalization, etc.)
spark.conf.set("sparkless.pyspark.compat", "4.0")

Environment (tests / CI):

export SPARKLESS_PYSPARK_COMPAT=4.0   # or 3.5 (default)

When sparkless.pyspark.compat is set, Sparkless applies the bundle below. Individual keys can still override the bundle after the profile is applied.

Profile bundles

Setting compat=3.5 (default) compat=4.0
spark.sql.ansi.enabled false true
spark.sql.legacy.disableMapKeyNormalization true false
spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled true false
spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled true false

Individual overrides

After setting a profile, you may override specific keys (PySpark-compatible names):

Key Effect
spark.sql.ansi.enabled When true, invalid casts, integer overflow, and divide-by-zero may raise errors instead of returning null.
spark.sql.legacy.disableMapKeyNormalization When true, skip -0.00.0 map key normalization.
spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled When true, infer map schema from first non-null pair only (3.x style).
spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled When true, infer array schema from first element only.
spark.sql.legacy.postgres.datetimeMapping.enabled When true, PostgreSQL JDBC uses PySpark 3.5 datetime mapping.
spark.sql.legacy.mysql.datetimeMapping.enabled When true, MySQL/MariaDB JDBC uses PySpark 3.5 type mapping.
spark.sql.legacy.oracle.timestampMapping.enabled When true, Oracle JDBC uses PySpark 3.5 timestamp write mapping.
spark.sql.legacy.mssqlserver.datetimeMapping.enabled When true, SQL Server JDBC uses PySpark 3.5 type mapping.
spark.sql.legacy.db2.datetimeMapping.enabled When true, DB2 JDBC uses PySpark 3.5 type mapping.

Environment variables

Variable Values Purpose
SPARKLESS_PYSPARK_COMPAT 3.5 (default), 4.0 Applied at session creation if sparkless.pyspark.compat is not set
PYSPARK_YM_INTERVAL_LEGACY 1 Legacy YearMonthIntervalType shapes in collect() (4.0 PySpark change)
SPARKLESS_TEST_MODE sparkless, pyspark Dual-backend pytest (TESTING_GUIDE.md)

Sparkless 5.0.0 note

Sparkless 5.0.0 will default to compat=4.0. compat=3.5 will remain available via config for at least 12 months after 5.0.0 ships.