This repo provides a self-contained, local PySpark development environment that runs on your laptop with Docker, VS Code, and Dev Containers.
It uses Visual Studio Code and the Dev Containers feature to run Spark and JupyterLab in Docker while keeping the workspace mounted in your editor.
- Install Docker Desktop
- Install Visual Studio Code
- Install the Dev Containers extension
- Clone this repository to your laptop.
- Open the repo folder in VS Code.
- Run the VS Code command palette command
Dev Containers: Reopen in Container. - Wait for the container to build. The
postCreateCommandinstalls the project and development tools frompyproject.toml. - Open test.ipynb in VS Code.
- Use the JupyterLab interface at
http://localhost:8888. - If the notebook asks for a kernel, select the
vscode_pysparkkernel. - Run the cells in order to explore the local Spark session.
The project now includes modern Python tooling for local checks and CI:
rufffor lintingblackfor formattingpytestfor testspre-commitfor consistent hooks
You can run the checks locally with:
python -m pip install -e '.[dev]'
ruff check .
black --check .
pytest- The devcontainer now uses a pinned base image (
jupyter/pyspark-notebook:spark-3.5.0) for reproducible builds. - The Spark UI is exposed on port
4040for debugging and monitoring.