FrameX

FrameX is an Arrow-backed Python library for parallel dataframe and array processing on a single machine.

It combines:

Pandas-like tabular APIs (DataFrame, Series, GroupBy)
NumPy-compatible chunked arrays (NDArray with NumPy protocol support)
Arrow-native storage/interop (to_arrow, Parquet/IPC I/O)
Eager execution with optional lazy pipelines (.lazy().collect())
Runtime backends for local threads/processes plus optional Ray/Dask executors

Why FrameX

FrameX is aimed at local analytics workflows that are bigger than comfortable single-threaded scripts but do not yet require distributed infrastructure.

Typical fit:

ETL and analytics pipelines on medium-to-large local datasets
feature engineering workflows that mix table and array operations
migration paths from Pandas scripts where API familiarity matters

Installation

From PyPI:

pip install pyframe-xpy

From source:

git clone https://github.com/aeiwz/FrameX.git
cd FrameX
pip install -e .

Requirements:

Python >=3.10
Core dependencies: pyarrow, numpy
Optional compatibility: pandas (pip install pyframe-xpy[pandas_compat])

Quick Start

import framex as fx

df = fx.DataFrame(
    {
        "group": ["a", "a", "b"],
        "value": [10, 20, 30],
        "is_refund": [False, True, False],
    }
)

result = (
    df.filter(~df["is_refund"])
      .groupby("group")
      .agg({"value": ["sum", "mean", "count"]})
      .sort("value_sum", ascending=False)
)

print(result.to_pandas())

Core API

Top-level imports:

import framex as fx

Main objects and helpers:

fx.DataFrame, fx.Series, fx.Index, fx.LazyFrame
fx.NDArray, fx.array(...)
fx.read_parquet, fx.write_parquet, fx.read_ipc, fx.write_ipc, fx.read_csv, fx.write_csv
fx.read_json, fx.write_json, fx.read_ndjson, fx.write_ndjson
fx.read_file, fx.write_file for format auto-detection

Compression:

transparent extension-based compression for read_file / write_file
supported wrappers: .gz, .bz2, .xz, .zip, and .zst/.zstd (when zstandard is installed)
fx.from_pandas, fx.from_dask, fx.from_ray, fx.from_dataframe
fx.get_config, fx.set_backend, fx.set_workers, fx.set_serializer, fx.set_kernel_backend
fx.set_array_backend for auto/NumExpr/Numba/JAX/PyTorch/CuPy acceleration modes
fx.recommend_best_performance_config() to inspect hardware-tuned settings
fx.auto_configure_hardware() to apply best-performance config automatically
fx.StreamProcessor for micro-batch streaming pipelines

Acceleration extras:

pip install pyframe-xpy[accel]      # numexpr + numba
pip install pyframe-xpy[gpu]        # cupy (CUDA)
pip install pyframe-xpy[ml_accel]   # jax + pytorch
pip install pyframe-xpy[pandas_fast]  # modin backend
pip install pyframe-xpy[distributed]  # Dask + Ray distributed/HPC backends
pip install zstandard  # .zst/.zstd file compression

Backend notes:

fx.set_backend("threads" | "processes" | "ray" | "dask" | "hpc")
Ray and Dask execution backends require their respective runtimes to be installed/available.
HPC mode ("hpc") uses cluster-oriented execution via Dask or Ray:
- FRAMEX_HPC_ENGINE=dask|ray
- FRAMEX_DASK_SCHEDULER_ADDRESS=<tcp://...> to connect existing Dask clusters
- FRAMEX_RAY_ADDRESS=<ray://...> to connect existing Ray clusters
- optional SLURM bootstrap: FRAMEX_DASK_SLURM=1 (requires dask-jobqueue)

Test support notes:

Some tests are optional-backend gated and intentionally skipped when deps are not installed.
Typical skip reasons: missing dask.distributed, dask.dataframe, ray, or ray.data.
Run full optional matrix locally:

pip install pyframe-xpy[distributed]
pytest -q

Documentation

Canonical docs are in docs/documents:

Website (Docs UI)

The docs website lives in website (Next.js App Router).

Main docs routes:

http://localhost:3000/docs/features
http://localhost:3000/docs/tutorial_etl_pipeline
http://localhost:3000/docs/use_cases
http://localhost:3000/docs/configuration_guide
http://localhost:3000/docs/performance_test

Run locally:

cd website
npm install
npm run dev

Production build:

npm run build
npm run start

Development

Install dev dependencies:

pip install -e .[dev]

Run tests:

pytest

Benchmarks

Benchmark code and generated reports are in benchmarks.

Run the full benchmark suite (includes in-terminal progress bar and report generation):

python3 -m benchmarks.benchmark_suite

Run workload capability matrix checks:

python3 -m benchmarks.check_framex_workloads

Benchmark outputs are written to benchmarks/results:

benchmark_results.json
benchmark_results.csv
benchmark_report.md
framex_workload_check.json
performance_speedup.png
parallel_processing_scaling.png
multiprocessing_scaling.png
memory_peak_rss.png

Project Status

FrameX is pre-1.0 (0.1.2) and in active development.

APIs are usable and documented
compatibility/performance behavior will continue to evolve
pin versions for production-critical workloads

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
framex.egg-info		framex.egg-info
framex		framex
manuscript		manuscript
tests		tests
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
asv.conf.json		asv.conf.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrameX

Why FrameX

Installation

Quick Start

Core API

Documentation

Website (Docs UI)

Development

Benchmarks

Project Status

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FrameX

Why FrameX

Installation

Quick Start

Core API

Documentation

Website (Docs UI)

Development

Benchmarks

Project Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages