This repository contains the component that extracts privacy-statement tuples from policy sentences of PurPliance.
Download NER model en_core_web_lg.high_f1_data_org.model.tar.xz
and extract the file to src/oppnlp/analyze/pded/models/:
src/oppnlp/analyze/pded/models $ tar -xvf en_core_web_lg.high_f1_data_org.model.tar.xzTo run the code, use a virtual environment and run the test as follows:
# Create new conda python environment.
conda create -n purpliance_oss python=3.8
conda activate purpliance_oss
# Install the current package.
pip install -e .
pip install -r requirements.txt
# Test: Extract privacy statements from each file in test/policies and output
# as json files in test/policies/stmt.
bash test/test.sh
# Extract privacy statements from each file in the $INPUT_DIR
# json files in $INPUT_DIR/stmt.
# INPUT_DIR contains plain sentencized text files *.txt to be analyzed.
# Each non-blank line in the file should contain one and only 1 sentence.
python src/runner/analyze/priv_stmt/run_priv_stmt_extractor.py $INPUT_DIRThe code was tested on Ubuntu 18.04 and MacOS 12.6.
PurPliance is licensed under the BSD-3-Clause License (See LICENSE.txt).
This repo uses code from PolicyLint/PoliCheck repository.