JDBC and SQLite Testing¶
This document describes how to run JDBC tests for all supported database backends.
Supported Backends¶
| Backend | Feature | Crate Dependency | URL Scheme / Driver Hint |
|---|---|---|---|
| PostgreSQL | jdbc |
postgres |
jdbc:postgresql:, postgres://, postgresql:// |
| SQLite | sqlite |
rusqlite |
jdbc:sqlite:, sqlite: |
| MySQL | jdbc_mysql |
mysql |
jdbc:mysql: / com.mysql.cj.jdbc.Driver |
| MariaDB | jdbc_mariadb |
mysql |
jdbc:mariadb: / org.mariadb.jdbc.Driver |
| SQL Server | jdbc_mssql |
tiberius |
jdbc:sqlserver: / com.microsoft.sqlserver.jdbc.SQLServerDriver |
| Oracle | jdbc_oracle |
oracle-rs |
jdbc:oracle: / oracle.jdbc.OracleDriver |
| DB2 | jdbc_db2 |
odbc-api |
jdbc:db2: / com.ibm.db2.jcc.DB2Driver |
Enable one or more features via Cargo (e.g. --features jdbc,sqlite,jdbc_mysql).
PySpark-Compatible JDBC Options¶
The following PySpark JDBC options are supported:
Connection & Query Options¶
| Option | Type | Description |
|---|---|---|
url |
String | Required. JDBC connection URL |
dbtable |
String | Table name to read/write |
query |
String | SQL query for reads (alternative to dbtable) |
user |
String | Database username |
password |
String | Database password |
driver |
String | Driver class hint (for disambiguation) |
Read Options¶
| Option | Type | Description |
|---|---|---|
sessionInitStatement |
String | SQL to execute after connection (e.g., SET timezone='UTC') |
queryTimeout |
Integer | Query timeout in seconds (backend-specific implementation) |
prepareQuery |
String | SQL to execute before main query (for CTEs, temp tables) |
fetchsize |
Integer | Number of rows to fetch per round trip |
partitionColumn |
String | Column for partitioned reads |
lowerBound |
Long | Lower bound for partitioning |
upperBound |
Long | Upper bound for partitioning |
numPartitions |
Integer | Number of partitions |
Write Options¶
| Option | Type | Description |
|---|---|---|
batchsize |
Integer | Rows per batch/transaction (default: 1000) |
truncate |
Boolean | Use TRUNCATE vs DELETE for Overwrite mode |
cascadeTruncate |
Boolean | Add CASCADE to TRUNCATE (Postgres/Oracle only) |
SaveMode Behavior¶
| Mode | Behavior |
|---|---|
Append |
Insert rows into existing table |
Overwrite |
Truncate/delete existing data, then insert |
ErrorIfExists |
Error if table has any existing rows |
Ignore |
Do nothing if table has existing rows |
Usage Examples¶
# Read with session initialization
df = spark.read.format("jdbc") \
.option("url", "jdbc:postgresql://localhost/db") \
.option("dbtable", "users") \
.option("sessionInitStatement", "SET timezone='UTC'") \
.option("queryTimeout", "30") \
.load()
# Write with batching
df.write.format("jdbc") \
.option("url", "jdbc:mysql://localhost/db") \
.option("dbtable", "results") \
.option("batchsize", "5000") \
.option("truncate", "true") \
.mode("overwrite") \
.save()
Quick Reference¶
| What | Command | Requires |
|---|---|---|
| Rust unit tests (URL routing, options) | cargo test -p robin-sparkless-polars jdbc --features "jdbc,sqlite" |
Nothing |
| Python SQLite tests (no server) | pytest tests/sql/test_jdbc_sqlite.py -v |
Nothing (uses temp files) |
| Python integration (Postgres) | pytest tests/sql/test_jdbc_read_write_postgres.py -v |
Postgres + env vars |
| Python integration (MySQL) | pytest tests/sql/test_jdbc_read_write_mysql.py -v |
MySQL + env vars |
| Python integration (MariaDB) | pytest tests/sql/test_jdbc_read_write_mariadb.py -v |
MariaDB + env vars |
| Python integration (SQL Server) | pytest tests/sql/test_jdbc_read_write_mssql.py -v |
SQL Server + env vars |
| Python integration (Oracle) | pytest tests/sql/test_jdbc_read_write_oracle.py -v |
Oracle + env vars |
| Python integration (DB2) | pytest tests/sql/test_jdbc_read_write_db2.py -v |
DB2 + ODBC + env vars |
All Python integration tests (except SQLite) are skipped if the corresponding env var is not set.
SQLite Tests (No External Dependencies)¶
The test_jdbc_sqlite.py test suite uses temporary SQLite database files and requires no external server or environment variables. It provides comprehensive coverage of JDBC functionality:
# Run SQLite JDBC tests (32 tests)
pytest tests/sql/test_jdbc_sqlite.py -v
# Run with parallel workers
pytest tests/sql/test_jdbc_sqlite.py -n 12
These tests cover:
- Basic read/write operations with all save modes (append, overwrite, error, ignore)
- PySpark-compatible options: sessionInitStatement, batchsize, truncate, prepareQuery
- Data type roundtrips: integers, floats, nulls, special characters, booleans, large strings
- DataFrame operations after JDBC read: filter, select, groupBy, orderBy, count
- Multiple API patterns: .option(), .options(), properties dict
Environment Variables¶
Each backend uses its own SPARKLESS_TEST_JDBC_<DB>_URL, _USER, _PASSWORD:
| Backend | URL Env Var | User Env Var | Password Env Var |
|---|---|---|---|
| PostgreSQL | SPARKLESS_TEST_JDBC_URL |
SPARKLESS_TEST_JDBC_USER |
SPARKLESS_TEST_JDBC_PASSWORD |
| MySQL | SPARKLESS_TEST_JDBC_MYSQL_URL |
SPARKLESS_TEST_JDBC_MYSQL_USER |
SPARKLESS_TEST_JDBC_MYSQL_PASSWORD |
| MariaDB | SPARKLESS_TEST_JDBC_MARIADB_URL |
SPARKLESS_TEST_JDBC_MARIADB_USER |
SPARKLESS_TEST_JDBC_MARIADB_PASSWORD |
| SQL Server | SPARKLESS_TEST_JDBC_MSSQL_URL |
SPARKLESS_TEST_JDBC_MSSQL_USER |
SPARKLESS_TEST_JDBC_MSSQL_PASSWORD |
| Oracle | SPARKLESS_TEST_JDBC_ORACLE_URL |
SPARKLESS_TEST_JDBC_ORACLE_USER |
SPARKLESS_TEST_JDBC_ORACLE_PASSWORD |
| DB2 | SPARKLESS_TEST_JDBC_DB2_URL |
SPARKLESS_TEST_JDBC_DB2_USER |
SPARKLESS_TEST_JDBC_DB2_PASSWORD |
Local Setup with Docker¶
NOTE: Tests use testcontainers by default, which automatically starts Docker containers. Manual setup is only needed for debugging or when testcontainers isn't available.
All database services are defined in a single docker-compose.yml:
# Start individual services
docker compose up postgres -d
docker compose up mysql mariadb -d
# Start all services
docker compose up -d
# Stop all services
docker compose down
PostgreSQL (port 5432)¶
docker compose up postgres -d
export SPARKLESS_TEST_JDBC_URL="postgres://sparkless:sparkless@localhost:5432/sparkless"
export SPARKLESS_TEST_JDBC_USER=sparkless
export SPARKLESS_TEST_JDBC_PASSWORD=sparkless
MySQL (port 3306)¶
docker compose up mysql -d
docker exec -i sparkless-mysql mysql -usparkless -psparkless sparkless < tests/sql/ddl/mysql.sql
export SPARKLESS_TEST_JDBC_MYSQL_URL="jdbc:mysql://localhost:3306/sparkless"
export SPARKLESS_TEST_JDBC_MYSQL_USER=sparkless
export SPARKLESS_TEST_JDBC_MYSQL_PASSWORD=sparkless
MariaDB (port 3307)¶
docker compose up mariadb -d
docker exec -i sparkless-mariadb mariadb -usparkless -psparkless sparkless < tests/sql/ddl/mariadb.sql
export SPARKLESS_TEST_JDBC_MARIADB_URL="jdbc:mariadb://localhost:3307/sparkless"
export SPARKLESS_TEST_JDBC_MARIADB_USER=sparkless
export SPARKLESS_TEST_JDBC_MARIADB_PASSWORD=sparkless
SQL Server (port 1433)¶
docker compose up mssql -d
# Wait for health check, then create tables
docker exec -i sparkless-mssql /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "Sparkless123!" -C -i /dev/stdin < tests/sql/ddl/mssql.sql
export SPARKLESS_TEST_JDBC_MSSQL_URL="jdbc:sqlserver://localhost:1433;databaseName=sparkless;encrypt=false;trustServerCertificate=true"
export SPARKLESS_TEST_JDBC_MSSQL_USER=sa
export SPARKLESS_TEST_JDBC_MSSQL_PASSWORD="Sparkless123!"
Oracle (port 1521)¶
docker compose up oracle -d
# Wait ~2-3 minutes for startup, then:
docker exec -i sparkless-oracle sqlplus system/oracle@FREEPDB1 < tests/sql/ddl/oracle.sql
export SPARKLESS_TEST_JDBC_ORACLE_URL="jdbc:oracle:thin:@//localhost:1521/FREEPDB1"
export SPARKLESS_TEST_JDBC_ORACLE_USER=system
export SPARKLESS_TEST_JDBC_ORACLE_PASSWORD=oracle
DB2 (port 50000)¶
DB2 requires ODBC driver manager (brew install unixodbc on macOS, apt install unixodbc-dev on Linux).
docker compose up db2 -d
# Wait several minutes for first startup, then:
docker exec -i sparkless-db2 su - db2inst1 -c "db2 connect to SPARKDB && db2 -tvf /dev/stdin" < tests/sql/ddl/db2.sql
export SPARKLESS_TEST_JDBC_DB2_URL="jdbc:db2://localhost:50000/SPARKDB"
export SPARKLESS_TEST_JDBC_DB2_USER=db2inst1
export SPARKLESS_TEST_JDBC_DB2_PASSWORD=password
CI Strategy¶
For CI, you can:
- Default workflow: Keep CI lightweight—only run Postgres tests (or skip all JDBC tests).
- Nightly / on-demand: Add optional jobs that spin up each backend via service containers.
Example GitHub Actions job for MySQL:
mysql-jdbc-tests:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: sparkless
MYSQL_USER: sparkless
MYSQL_PASSWORD: sparkless
ports:
- 3306:3306
options: >-
--health-cmd="mysqladmin ping -h 127.0.0.1 -uroot -proot --silent"
--health-interval=5s
--health-timeout=5s
--health-retries=10
steps:
- uses: actions/checkout@v4
- name: Setup DDL
run: mysql -h 127.0.0.1 -usparkless -psparkless sparkless < tests/sql/ddl/mysql.sql
- name: Build (with jdbc_mysql)
run: cargo build --features jdbc_mysql
- name: Rust tests
run: cargo test -p robin-sparkless-polars mysql --features jdbc_mysql
env:
ROBIN_SPARKLESS_TEST_JDBC_MYSQL_URL: "jdbc:mysql://127.0.0.1:3306/sparkless"
ROBIN_SPARKLESS_TEST_JDBC_MYSQL_USER: sparkless
ROBIN_SPARKLESS_TEST_JDBC_MYSQL_PASSWORD: sparkless
- name: Python tests
run: pytest tests/sql/test_jdbc_read_write_mysql.py -v
env:
SPARKLESS_TEST_JDBC_MYSQL_URL: "jdbc:mysql://127.0.0.1:3306/sparkless"
SPARKLESS_TEST_JDBC_MYSQL_USER: sparkless
SPARKLESS_TEST_JDBC_MYSQL_PASSWORD: sparkless
Repeat a similar pattern for each backend. Oracle and DB2 tests may be skipped in public CI due to licensing/image size constraints—run them locally or in private infrastructure.