Skip to content

feat: add output_length_guard plugin#24

Open
msureshkumar88 wants to merge 1 commit intomainfrom
feat/output-length-guard-plugin
Open

feat: add output_length_guard plugin#24
msureshkumar88 wants to merge 1 commit intomainfrom
feat/output-length-guard-plugin

Conversation

@msureshkumar88
Copy link
Copy Markdown
Collaborator

Pull Request #3926: Rust Acceleration for Output Length Guard Plugin

PR Link: IBM/mcp-context-forge#3926

Status: CLOSED (Recreated in PR #4104)
Closed Date: April 9, 2026


📋 Overview

This pull request introduces a PyO3-based Rust execution engine for the output length guard plugin in the IBM/mcp-context-forge repository. The implementation creates a hybrid Python-Rust architecture that significantly improves performance while maintaining backward compatibility.

Architecture Design

The hybrid approach divides responsibilities between Python and Rust:

Python Layer Handles:

  • Plugin lifecycle management
  • Hook integration with the framework
  • MCP content dictionary handling
  • Fallback behavior when Rust is unavailable

Rust Layer Handles:

  • High-performance string truncation
  • Recursive list/dict traversal
  • Violation detection
  • Passthrough optimization

The Rust engine exposes a high-level process() API that reduces Python-Rust boundary crossing to a single call per tool_post_invoke invocation, minimizing FFI overhead.


🎯 Problems Solved

Gap 1: No Rust Acceleration Path (MEDIUM Priority)

Problem:
The output length guard was the only post-invoke plugin without Rust optimization, creating a performance bottleneck in the plugin ecosystem.

Solution:
Introduced OutputLengthGuardEngine with automatic detection and graceful fallback to Python implementation when Rust module is unavailable.

Gap 2: O(n) Character Counting Performance (MEDIUM Priority)

Problem:
Initial Rust implementation was 124x slower than Python for large strings due to inefficient character counting.

Solution:
Implemented count_chars_capped() with:

  • Early-exit optimization at limit + 1
  • Byte-length fast path for ASCII strings
  • Zero-copy string borrowing

Gap 3: Per-item FFI Overhead (LOW Priority)

Problem:
Processing lists required crossing the Python-Rust boundary for each item, causing significant overhead.

Solution:
Added batch fast path for all-string lists in truncate mode, processing entire lists in a single Rust call.


🚀 Performance Results

Comprehensive benchmarks (1000 iterations + 50 warmup, character mode, max_chars=500):

Scenario Python Time Rust Time Speedup
Short list passthrough (4 items) 2.88 μs 0.15 μs 18.9x faster
Short string passthrough (11 chars) 0.62 μs 0.06 μs 9.8x faster
Wide nested dict (d=2, b=20, 400 leaves) 651 μs 76 μs 8.5x faster
Deep nested dict (d=5, b=3, 243 leaves) 426 μs 61 μs 7.0x faster
Block mode (10 KB string) 10.4 μs 2.0 μs 5.1x faster
List of 10 x 10KB strings 105 μs 35 μs 3.0x faster

Key Takeaways

  • Consistent speedups across all scenarios (3x - 19x)
  • Largest gains in passthrough scenarios (minimal processing)
  • Significant improvements for nested structures (7x - 8.5x)
  • Substantial gains even for large string processing (3x - 5x)

⚡ Optimizations Implemented

  1. O(1) Python len() Pre-check

    • Skip Rust extraction for strings already under limit
    • Eliminates unnecessary FFI calls
  2. Zero-copy PyString::to_str() Borrow

    • Replaces full string copy with borrow
    • Reduces memory allocation overhead
  3. count_chars_capped() Early-exit

    • Stops counting at limit + 1
    • Avoids processing entire large strings
  4. byte_offset_of_char() Direct Slicing

    • Zero-copy truncation using byte offsets
    • Maintains UTF-8 character boundaries
  5. String::with_capacity() Pre-sized Allocation

    • Eliminates reallocation during string building
    • Improves memory efficiency
  6. Batch List Processing

    • Process all-string lists in one Rust call
    • Reduces FFI overhead significantly
  7. Numeric String Skip

    • Skip character counting for strings > 50 bytes
    • Optimizes common numeric data patterns
  8. MCP Content Dict Exclusion

    • Preserve Python-side logic for MCP structures
    • Maintains compatibility with framework

✅ Test Results

Rust Tests

  • 47/47 unit tests passed
  • Clippy clean (no warnings) ✅
  • rustfmt clean (formatting verified) ✅

Python Tests

  • 331/331 tests passed
  • 1 expected skip (intentional)
  • 0 failures

Performance Verification

  • All benchmark scenarios show expected improvements ✅
  • No regressions detected ✅
  • Fallback behavior verified ✅

⚠️ Critical Issues Identified & Resolved

Issue 1: Broader Processing Scope

Problem:
Rust fast path processes more payload types than the original Python implementation:

  • Python only mutates dict["text"], list[str], and MCP text items
  • Rust can now truncate/block string-valued metadata (type, mimeType, IDs, URLs, annotations)

Impact:
Could potentially modify critical metadata fields that should remain intact.

Mitigation:
Added METADATA_KEYS list to explicitly preserve critical fields and maintain semantic integrity.

Issue 2: Token-Mode Semantics Divergence

Problem:
Different behavior based on whether Rust module loads:

  • Python path: Ignores token limits for plain str/dict/list
  • Rust path: Enforces token bounds for ALL shapes
  • Same configuration produces different results depending on Rust availability

Impact:
Inconsistent behavior across environments could lead to unexpected truncation.

Status:
Documented for awareness; requires architectural decision on desired behavior.

Issue 3: Bug Fix - Structured Content Display

Problem:
When structuredContent value is truncated, content[0].text showed only the value instead of full JSON representation.

Before:

content[0].text = "Helloasds…"  ❌

After:

content[0].text = "{\"message\":\"Helloasds…\"}"  ✅

Solution:
Removed single-key dict value extraction logic (lines 529-536 in src/lib.rs) to preserve full JSON context.


📁 Files Changed

File Type Description
plugins_rust/output_length_guard/Cargo.toml New Rust crate configuration with dependencies
plugins_rust/output_length_guard/pyproject.toml New Maturin build configuration for Python packaging
plugins_rust/output_length_guard/Makefile New Build, test, and install automation targets
plugins_rust/output_length_guard/src/lib.rs New Core Rust implementation (1,297 lines + 47 tests)
plugins_rust/output_length_guard/src/bin/stub_gen.rs New Python type stub generator for IDE support
plugins_rust/output_length_guard/compare_performance.py New Comprehensive benchmark script
plugins/output_length_guard/output_length_guard.py Modified +70/-1 lines (Rust integration layer)

👥 Contributors

  • gandhipratik203 - PR Author & Primary Developer
  • msureshkumar88 (Suresh Kumar Moharajan) - Contributor
  • lucarlig - Reviewer (requested changes)
  • jonpspri - Maintainer (closed PR, recreated in #4104)

📝 Commit History

Total Commits: 28
Timeline: March 24 - April 9, 2026

Commit Categories:

  • Feature development and initial implementation
  • Performance optimizations and benchmarking
  • Bug fixes and edge case handling
  • Test coverage improvements
  • Documentation updates
  • Code review feedback integration

🔄 Next Steps

This PR was closed and the work was recreated in PR #4104 for a clean implementation history. The recreation allows for:

  • Clean commit history without experimental iterations
  • Incorporation of all review feedback
  • Proper documentation of final design decisions
  • Streamlined merge process

📚 Technical Details

Dependencies Added

  • PyO3 - Python-Rust FFI bindings
  • pyo3-build - Build-time Python integration
  • unicode-segmentation - Proper Unicode character handling

Build System

  • Maturin - Rust-Python package builder
  • uv - Fast Python package installer
  • Automated wheel building and installation

Testing Strategy

  • Unit tests for all Rust functions
  • Integration tests for Python-Rust boundary
  • Performance benchmarks for regression detection
  • Fallback behavior verification

🎓 Lessons Learned

  1. FFI Overhead Matters: Minimizing boundary crossings is critical for performance
  2. Early Optimization Pays Off: Character counting optimization provided 124x improvement
  3. Batch Processing Wins: Processing collections in bulk reduces overhead significantly
  4. Fallback is Essential: Graceful degradation ensures reliability across environments
  5. Testing is Critical: Comprehensive test coverage caught semantic divergence issues

End of PR Description

For the latest updates and continued work, see PR #4104

Add new output_length_guard plugin with Rust+Python implementation.
Includes core library, build configuration, and performance comparison script.

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant