Skip to content

dataiku/kiji-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

205 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Dataiku's Kiji Privacy Proxy

Kiji Privacy Proxy

Build MacOS Build Linux Lint & Test License: Apache 2.0 GitHub Stars GitHub Issues

Go Version Node Version Python Version Platform

Privacy First Contributions Welcome PRs Welcome

An intelligent privacy layer for AI APIs. Kiji automatically detects and masks personally identifiable information (PII) in requests to AI services, ensuring your sensitive data never leaves your control.

Built by 575 Lab - Dataiku's Open Source Office.

Kiji Privacy Proxy UI

🎯 Why Kiji Privacy Proxy?

When using AI services like OpenAI or Anthropic, sensitive data in your prompts gets sent to external servers. Kiji solves this by:

  • πŸ”’ Automatic PII Protection - ML-powered detection of 16+ PII types (emails, SSNs, credit cards, etc.)
  • 🎭 Seamless Masking - Replaces sensitive data with realistic dummy values before API calls
  • πŸ”„ Transparent Restoration - Restores original data in responses so your app works normally
  • πŸš€ Zero Code Changes - Works as a transparent proxy with automatic configuration (PAC) on macOS
  • 🌐 Browser-Ready - Automatic proxy setup for Safari, Chrome - no environment variables needed
  • πŸƒ Fast Local Inference - ONNX-optimized model runs locally, no external API calls
  • πŸ’» Easy to Use - Desktop app for macOS, standalone server for Linux

Use Cases:

  • Protect customer data when using ChatGPT for customer support
  • Sanitize logs before sending to AI for analysis
  • Comply with privacy regulations (GDPR, HIPAA, CCPA)
  • Prevent accidental data leaks in development/testing

⚑ Quick Start

For Users

macOS (Desktop App):

# Download from releases
# https://github.com/dataiku/kiji-proxy/releases

# Install
open Kiji-Privacy-Proxy-*.dmg
# Drag to Applications folder

Linux (Standalone Server):

# Download and extract
wget https://github.com/dataiku/kiji-proxy/releases/download/vX.Y.Z/kiji-privacy-proxy-X.Y.Z-linux-amd64.tar.gz
tar -xzf kiji-privacy-proxy-X.Y.Z-linux-amd64.tar.gz
cd kiji-privacy-proxy-X.Y.Z-linux-amd64

# Run
./run.sh

Test It:

macOS (with automatic PAC):

# Start with sudo for automatic browser configuration
sudo "/Applications/Kiji Privacy Proxy.app/Contents/MacOS/kiji-proxy"

# Open browser - requests to api.openai.com automatically go through proxy!
# No configuration needed for Safari/Chrome

# For CLI tools, set environment variables:
export OPENAI_API_KEY="sk-..."
export HTTP_PROXY=http://127.0.0.1:8081
export HTTPS_PROXY=http://127.0.0.1:8081

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "My email is john@example.com"}]
  }'

Linux (manual proxy configuration):

# Set environment variables
export OPENAI_API_KEY="sk-..."
export HTTP_PROXY=http://127.0.0.1:8081
export HTTPS_PROXY=http://127.0.0.1:8081

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "My email is john@example.com"}]
  }'

What happens:

# Check logs - "john@example.com" was masked before sending to OpenAI
# Response contains the original email (restored automatically)

For Developers

# Clone and setup
git clone https://github.com/dataiku/kiji-proxy.git
cd kiji-proxy

# Install dependencies
make electron-install
make setup-onnx

# Run with debugger (VSCode)
# Press F5

# Or run directly
make electron

See full documentation: docs/README.md


✨ Key Features

  • 16+ PII Types Detected - Email, phone, SSN, credit cards, IP addresses, URLs, and more
  • ML-Powered - DistilBERT transformer model with ONNX Runtime (model, dataset)
  • Automatic Configuration - PAC (Proxy Auto-Config) for zero-setup browser integration on macOS
  • Real-Time Processing - Sub-100ms latency for most requests
  • Thread-Safe - Handles concurrent requests with isolated mappings
  • Desktop UI - Native Electron app for macOS with visual request monitoring
  • Production Ready - Systemd service, Docker support, comprehensive logging
  • Privacy First - All processing happens locally, no external dependencies

πŸ“š Documentation

Complete documentation is available in docs/README.md:

Quick Links:


πŸ€— HuggingFace Models & Data

The PII detection model and training data are published on HuggingFace:

Resource Link
Quantized ONNX model DataikuNLP/kiji-pii-model-onnx
Trained SafeTensors model DataikuNLP/kiji-pii-model
Training dataset DataikuNLP/kiji-pii-training-data

You can train your own model or fine-tune the existing one. See Customizing the PII Model for the full workflow.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€---───┐        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Your App/CLI   │───►│ Kiji Privacy Proxy │───────►│   OpenAI API    β”‚
β”‚                 β”‚    β”‚     (Port 8080)    β”‚        β”‚  (Masked Data)  β”‚
β”‚                 │◄────    - Detect PII    │◄────────                 β”‚
β”‚  Original Data  β”‚    β”‚    - Mask/Restore  β”‚        β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What Happens:

  1. Your app sends request to Kiji Privacy Proxy
  2. Kiji detects PII using ML model
  3. PII is replaced with dummy data
  4. Request forwarded to OpenAI (with masked data)
  5. Response received and PII restored
  6. Original-looking response returned to your app

🀝 Contributing

We welcome contributions! Here's how to help:

  1. Report Issues - Found a bug? Open an issue
  2. Submit PRs - See docs/02-development-guide.md for dev setup
  3. Improve Docs - Documentation PRs are always welcome
  4. Share Feedback - Start a discussion
  5. Join our Slack - Slack Community

Quick Contribution Guide:

# 1. Fork and clone
git clone https://github.com/YOUR-USERNAME/kiji-proxy.git

# 2. Create feature branch
git checkout -b feature/my-feature

# 3. Make changes and add changeset
cd src/frontend
npm run changeset

# 4. Test
make test-all
make check

# 5. Submit PR

See CONTRIBUTING.md for detailed guidelines.


πŸ’– Support the Project

If you find Kiji useful, here's how you can support its development:

⭐ Star the Repository

Click the ⭐ button at the top of this page - it helps others discover the project!

πŸ› Report Issues & Request Features

Found a bug or have an idea? Open an issue

πŸ“ Contribute Code or Documentation

Pull requests are welcome! See CONTRIBUTING.md for guidelines.

πŸ’¬ Spread the Word

  • Share on Twitter/LinkedIn
  • Write a blog post about your experience
  • Present at meetups/conferences

πŸŽ“ Improve the ML Model

  • Contribute training data samples
  • Improve PII detection accuracy
  • Add support for new PII types

πŸ“š Write Tutorials

  • Create video tutorials
  • Write integration guides
  • Share use cases and examples

Every contribution, big or small, makes a difference!


πŸ§ͺ Development

Prerequisites

  • Go 1.21+ with CGO enabled
  • Node.js 20+
  • Python 3.13
  • Rust toolchain

Quick Setup

# Install dependencies
make electron-install

# Run with VSCode debugger (F5)
# Or run directly
make electron

Available Commands

make help              # Show all commands
make electron          # Build and run Electron app
make build-dmg         # Build macOS DMG
make build-linux       # Build Linux tarball
make test-all          # Run all tests
make check             # Code quality checks

See docs/02-development-guide.md for detailed development guide.


πŸ“¦ Releases

Download the latest release from GitHub Releases:

  • macOS: Kiji-Privacy-Proxy-{version}.dmg (~400MB)
  • Linux: kiji-privacy-proxy-{version}-linux-amd64.tar.gz (~150MB)

Automated Builds: CI/CD builds both platforms in parallel on every release tag.

See docs/04-release-management.md for release process.


πŸ”’ Security

Reporting Vulnerabilities:

Do not open public issues for security vulnerabilities.

Email: opensource@dataiku.com (or contact maintainers privately)

Security Features:

  • All processing happens locally
  • No external API calls for PII detection
  • Optional encrypted storage for mappings
  • MITM certificate for local use only

See docs/05-advanced-topics.md#security-best-practices for security guidelines.


πŸ“„ License

Copyright (c) 2026 Dataiku SAS

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.


πŸš€ Contributors


πŸ™ Acknowledgments

  • ONNX Runtime - Microsoft's cross-platform ML inference engine
  • HuggingFace - DistilBERT model and tokenizers
  • Electron - Cross-platform desktop framework
  • Go Community - Excellent libraries and tools

Made with ❀️ for privacy-conscious developers

GitHub β€’ Issues β€’ Discussions β€’ Slack β€’ Documentation