Integration test guide
Integration Test Guide (Python)¶
This guide describes how to write and maintain Python integration tests under tests/integration so they:
- Use the shared test harness (fixtures + backend abstraction).
- Treat PySpark behavior as the source of truth.
- Work against both the PySpark backend and the Robin backend without changing test code.
1. Use the shared harness¶
- Do NOT import PySpark or sparkless directly in tests.
- Import through the shared helpers:
- Use the
sparkfixture fromtests/conftest.pyfor the default session (backend chosen viaSPARKLESS_TEST_BACKEND/MOCK_SPARK_TEST_BACKENDor@pytest.mark.backend(...)). - Do not construct
SparkSessionmanually (e.g.SparkSession("TestApp")orSparkSession.builder...getOrCreate()) except via fixtures; never callspark.stop()in tests.
2. Sessions with custom config¶
For tests that need a session with specific config (e.g. spark.sql.caseSensitive=true), use a dedicated fixture instead of building a session inside the test:
- Default session: use the
sparkfixture (case-insensitive by default on both backends). - Case-sensitive session: use the
spark_case_sensitivefixture provided intests/integration/test_case_sensitivity.py(or a similar fixture in conftest). It creates a session withspark.sql.caseSensitive=true, yields it, and stops it after the test.
Example:
def test_case_sensitive_mode(self, spark_case_sensitive):
spark = spark_case_sensitive
df = spark.createDataFrame([{"Name": "Alice"}])
with pytest.raises(Exception):
df.select("name").collect()
3. Backend-agnostic Row and conf¶
- Row field names / values: Do not rely on backend-specific Row internals (e.g.
row.__dict__["_data_dict"]orrow._schema.fields). Use a helper that works on both backends: row.asDict()(PySpark) ordict(row)/_row_to_dict(row)(seetests/integration/test_case_sensitivity.py).- For column names only:
list(_row_to_dict(row).keys())or a_row_keys(row)helper. - Case sensitivity config: PySpark’s
RuntimeConfigmay not haveis_case_sensitive(). Use a helper that checksconf.get("spark.sql.caseSensitive", "false").strip().lower() == "true"when the method is missing (see_is_case_sensitive(spark)in the case sensitivity tests).
4. Backend-specific behavior¶
When behavior differs by backend (e.g. PySpark raises AMBIGUOUS_REFERENCE for ambiguous column names, while mock may return a first match), write the test so both outcomes are accepted:
- Use try/except: assert the “lenient” outcome when the operation succeeds, and allow the known exception (e.g. message containing
"ambiguous") when the backend is strict. - Or use
@pytest.mark.backend("pyspark")/get_backend_type()and skip or xfail on the other backend, and document the divergence.
5. Running integration tests¶
- PySpark backend (parity / expectation updates):
- Robin backend (default):
Use a 15-minute timeout for full runs if needed.
6. References¶
- Harness:
tests/conftest.py,tests/fixtures/spark_backend.py,tests/fixtures/spark_imports.py - Example:
tests/integration/test_case_sensitivity.py(fixturesspark,spark_case_sensitive; helpers_row_to_dict,_row_keys,_is_case_sensitive)