PF-GAP

PF-GAP is a flexible, extensible framework for proximity-based learning on time series and structured data. It builds on the original Proximity Forest (PF) model and introduces:

GAP proximities
- Supervised imputation (test and train sets)
- Intra-class outlier scores
- Returnable for visualization, SVM kernel, etc.
Custom distance functions (in Python, Maple, or Java)
Support for multivariate and variable-length time series
Parallel training and proximity computation
Flexible data formatting and imputation options
Regression (Extrinsic)
Customizable node purity measures and aggregation schemes

🛠 Installation

Requirements

Java 17+
Recommended: Python 3.8+ (tested with Python 3.13)
Python packages (for running the demo files):

  pip install numpy pandas matplotlib scikit-learn aeon

Optional: Maple 2016+ (for Maple-based distance functions)

📂 Repository Structure

PF-GAP/
├── PFGAP/                  # Java source code
├── docs/                   # Project feature and useage documentation
│   ├── custom_distances/   # Documentation and examples for custom Java distances
│   ├── demo/               # Demo scripts (converted from notebooks), toy data, example Maple/Python
│   └── *.md                # Markdown files for feature documentation
├── Application/
│   ├── PFGAP.jar           # Compiled Java executable
│   └── PF_wrapper.py       # Python interface to PFGAP.jar
└── README.md

⚡ Quickstart

Simply download the PFGAP.jar file. For convenience, download the PR_wrapper.py file to call using python.

🚀 Usage

For more detailed descriptions, please refer to the documentation.

Training

Use PF_wrapper.train() to train a proximity forest:

import PF_wrapper as PF

PF.train(
    train_file="Data/GunPoint_TRAIN.tsv",
    model_name="Spartacus",
    return_proximities=True,
    output_directory="training_output",
    entry_separator="\t"
)

Prediction

Use PF_wrapper.predict() to evaluate a saved model on a test set:

PF.predict(
    model_name="training_output/Spartacus",
    testfile="Data/GunPoint_TEST.tsv",
    entry_separator="\t"
)

Imputation

PF-GAP supports iterative imputation for both training and test sets:

PF.train(
    train_file="Data/differentlengths.txt",
    test_file="Data/differentlengths_test.txt",
    train_labels="Data/differentlabels.txt",
    test_labels="Data/differentlabels_test.txt",
    impute_training_data=True,
    return_imputed_training=True,
    impute_testing_data=True,
    return_imputed_testing=True,
    impute_iterations=5,
    data_dimension=2,
    entry_separator=",",
    array_separator=":"
)

Custom Distances You can define your own distance function in:

Java: compile a .class or .jar file, or multiple. See docs/custom_distances for more information.
Python: PythonDistance.py with a function Distance(list1, list2)
Maple: MapleDistance.mpl with a function Distance(list1, list2)

Specify the custom distance source using:

distances=["javadistance:customdistance.class"]

or

distances=["javadistance:userdistances.jar:customdistance"]

or

distances=["python"]  # or ["maple"]

📊 Demo Notebooks

Demo	Description
`demo_gunpoint.py`	Classic PF classification on UCR GunPoint dataset
`demo_multi_impute.py`	Imputation on multivariate time series with missing values
`demo_load_japanese.py`	Large-scale multivariate classification with variable-length sequences
`demo_regression.py`	Time Series Extrinsic Regression on the FloodModeling1 dataset

Example MDS Visualization

📄 Data Format

PF-GAP supports flexible input formats:

UCR-style .tsv files (label + data in one file)
Custom delimited files with:
- entry_separator (e.g., ",", " ")
- array_separator (e.g., ":" for 2D arrays)

For multivariate or 3D data, use data_dimension=2.

📂 Output Files

Depending on options, PF-GAP may generate:

File	Description
`Predictions.txt`	Predictions on test set
`Predictions_saved.txt`	Predictions from a saved model
`TrainingProximities.txt`	Proximity matrix for training set
`TestTrainProximities.txt`	Proximities between test and train
`outlier_scores.txt`	Intra-class outlier scores
`imputed_train.txt`	Imputed training data (if requested)
`imputed_test.txt`	Imputed test data (if requested)

Use PF_wrapper.getArray(filename) to load proximity or outlier arrays.

🔹 Outlier Scores

Set return_training_outlier_scores=True to compute intra-class outlier scores for the training set.
These are saved to outlier_scores.txt in the output directory.
Use PF_wrapper.getArray(output_directory + "outlier_scores.txt") to load them as a NumPy array.
Note that outlier scores are not supported for regression.

🔹 Imputed Data

If impute_training_data=True and return_imputed_training=True, the imputed training set is saved to:

[output_directory]/[train_file].txt

Similarly, return_imputed_testing=True saves:

[output_directory]/[test_file].txt

These files preserve the original format and delimiters.

🔹 Proximity Matrices

If return_proximities=True, proximity matrices are saved to:
- TrainingProximities.txt (train vs. train)
- TestTrainProximities.txt (test vs. train)
These are used internally for imputation and outlier detection, but can also be used for:
- MDS or PHATE visualization
- Clustering
- Custom analysis

Load them with:

p = PF_wrapper.getArray(str(output_directory) + "TrainingProximities.txt")

or:

pt = PF_wrapper.getArray(str(output_directory) + "TestTrainProximities.txt")

📖 Citation

If you use PF-GAP in your work, please cite the appropriate paper(s) from the following list:

Ben Shaw, Jake S. Rhodes, Soukaina Filali Boubrahimi, and Kevin R. Moon. Forest Proximities for Time Series, IntelliSys 2025
arXiv preprint

Ben Shaw, Adam Rustad, Sofia Pelagalli Maia, Jake S. Rhodes, and Kevin R. Moon. The Generalized Proximity Forest, ACDSA 2026
arXiv preprint

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.idea		.idea
Application		Application
PFGAP		PFGAP
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PF-GAP

📚 Table of Contents

🛠 Installation

Requirements

📂 Repository Structure

⚡ Quickstart

🚀 Usage

Training

Prediction

Imputation

📊 Demo Notebooks

Example MDS Visualization

📄 Data Format

📂 Output Files

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

KevinMoonLab/PF-GAP

Folders and files

Latest commit

History

Repository files navigation

PF-GAP

📚 Table of Contents

🛠 Installation

Requirements

📂 Repository Structure

⚡ Quickstart

🚀 Usage

Training

Prediction

Imputation

📊 Demo Notebooks

Example MDS Visualization

📄 Data Format

📂 Output Files

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages