From 9d4dcbf1a4ff272f27e1a27e6b23ce2c507f7ac4 Mon Sep 17 00:00:00 2001 From: Koichi ITO Date: Mon, 1 Jun 2026 21:45:26 +0900 Subject: [PATCH] Speed Up `Tool::Schema` Validation by 5x to 100x ## Motivation and Context `MCP::Tool::Schema` previously used the pure-Ruby `json-schema` gem for both construction-time metaschema validation and runtime argument / result validation. For deep schemas this cost ~32-100ms per construction (see issue #364). `json_schemer` (https://rubygems.org/gems/json_schemer, v2.5.0) is an actively maintained alternative that is much faster on this workload. Switching reduces cold-start warming pressure in horizontally scaled multi-tenant deployments (the scenario raised in #364) and brings runtime data validation down to sub-millisecond per call once the per-instance schemer is memoized. The implementation preserves the legacy behavior of the `json-schema` based code. Metaschema is pinned to draft-04 (`meta_schema:` option); the `$schema` dialect URI is still emitted in `to_h` as 2020-12 but runtime validation continues to use draft-04, exactly as before. Moving the runtime validator itself to 2020-12 is out of scope for this PR and warrants its own discussion. `format` keywords are not enforced (`format: false` option), matching what `json-schema`'s draft-04 path did. Malformed schemas (e.g. an invalid `pattern` regex) continue to surface as `ArgumentError, "Invalid JSON Schema: ..."`; the `RegexpError` `json_schemer` raises during construction is wrapped. `Symbol` values in argument data are coerced to strings inside the internal `stringify` helper so they validate against `type: "string"` the same way they did before. The public `Schema#schema` accessor is dropped. It was a refactor artifact from #198 (the consolidation of `InputSchema`'s `properties` / `required` readers into a single hash) with no callers in `lib/`; tests that needed the merged hash now read it through `to_h`. The accessor's only remaining role would have been to expose a mutation path that could desynchronize the memoized `@schemer`, which this change removes. Runtime dependency delta: drops `json-schema` (and `addressable`), adds `json_schemer`, `hana`, `regexp_parser`, and `simpleidn`. Each added gem is a single-purpose library tied to JSON Schema spec compliance. Closes #364. ## How Has This Been Tested? Side-by-side benchmark on representative schemas (simple, with `$ref` / `additionalProperties`, nested, depth 20, depth 40): construction-time metaschema validation is 4.7x to ~100x faster; runtime data validation with the memoized schemer is sub-millisecond across all sizes. Four existing tests had their stub targets or message assertions updated to track the new validator while preserving the original test intent. The "unexpected errors bubble up" tests in `input_schema_test.rb` and `output_schema_test.rb` now stub `JSONSchemer::Schema#validate` instead of `JSON::Validator.fully_validate`. The cache tests in `schema_test.rb` now stub `JSONSchemer::Schema#validate_schema`. The two "detailed error message" tests in `tool_test.rb` switched from asserting `json-schema`'s exact wording to format-agnostic substrings (`"properties/count/minimum"`, `"number"`). The "required arguments are converted to strings" test in `input_schema_test.rb` now reads the result through `to_h[:required]` instead of the removed `schema` accessor. Regression tests added for the legacy-behavior preservations above: `format` keyword is not enforced, invalid `pattern` raises `ArgumentError` (not `RegexpError`), and `Symbol` values validate against `type: "string"`. ## Breaking Changes `Schema#schema` is no longer a public method. The accessor had no callers outside the test suite, but anyone reading the internal representation directly (rather than through `to_h`) will need to switch. `Schema` / `InputSchema` / `OutputSchema` otherwise keep their constructor signatures, `to_h` output, `ValidationError` / `ArgumentError` raise points, and the `$schema` dialect emission. The validator error message wording inside `ArgumentError` and `ValidationError` now comes from `json_schemer` and still includes the JSON pointer path and a description of the mismatch. --- lib/mcp/tool/input_schema.rb | 4 +- lib/mcp/tool/schema.rb | 57 +++++++++++++++++++---------- mcp.gemspec | 2 +- test/mcp/tool/input_schema_test.rb | 30 ++++++++++++--- test/mcp/tool/output_schema_test.rb | 8 ++-- test/mcp/tool/schema_test.rb | 14 +++---- test/mcp/tool_test.rb | 8 ++-- 7 files changed, 81 insertions(+), 42 deletions(-) diff --git a/lib/mcp/tool/input_schema.rb b/lib/mcp/tool/input_schema.rb index 724f9bbc..1fca775a 100644 --- a/lib/mcp/tool/input_schema.rb +++ b/lib/mcp/tool/input_schema.rb @@ -12,9 +12,9 @@ def missing_required_arguments?(arguments) end def missing_required_arguments(arguments) - return [] unless schema[:required].is_a?(Array) + return [] unless @schema[:required].is_a?(Array) - (schema[:required] - arguments.keys.map(&:to_s)) + (@schema[:required] - arguments.keys.map(&:to_s)) end def validate_arguments(arguments) diff --git a/lib/mcp/tool/schema.rb b/lib/mcp/tool/schema.rb index 820a7601..e71a8d84 100644 --- a/lib/mcp/tool/schema.rb +++ b/lib/mcp/tool/schema.rb @@ -1,7 +1,7 @@ # frozen_string_literal: true require "digest" -require "json-schema" +require "json_schemer" module MCP class Tool @@ -38,11 +38,10 @@ def clear # JSON Schema 2020-12 is the default dialect for MCP schema definitions # per MCP 2025-11-25 (SEP-1613). Note: emission only — runtime validation - # is still performed against the JSON Schema draft-04 metaschema because - # the `json-schema` gem does not yet support 2020-12. + # is still performed against the JSON Schema draft-04 metaschema. JSON_SCHEMA_2020_12_URI = "https://json-schema.org/draft/2020-12/schema" - attr_reader :schema + DRAFT4_META_SCHEMA_URI = "http://json-schema.org/draft-04/schema#" def initialize(schema = {}) @schema = JSON.parse(JSON.dump(schema), symbolize_names: true) @@ -51,7 +50,7 @@ def initialize(schema = {}) end def ==(other) - other.is_a?(self.class) && schema == other.schema + other.is_a?(self.class) && @schema == other.instance_variable_get(:@schema) end def to_h @@ -62,8 +61,38 @@ def to_h private + def stringify(obj) + case obj + when Hash + obj.each_with_object({}) { |(k, v), h| h[k.to_s] = stringify(v) } + when Array + obj.map { |v| stringify(v) } + when Symbol + obj.to_s + else + obj + end + end + + # Lazily built so a cache hit in `validate_schema!` avoids the schemer construction cost. + # Memoized per Schema instance because schema content is fixed at construction, + # so the compiled schemer is reusable across many `fully_validate` calls. + # + # `format: false` preserves the legacy behavior of the previous `json-schema` based implementation, + # which did not enforce `format` keywords. `RegexpError` from a malformed `pattern` is re-raised as + # `ArgumentError` so callers see the same exception class they used to. + def schemer + @schemer ||= JSONSchemer.schema( + stringify(schema_for_validation), + meta_schema: DRAFT4_META_SCHEMA_URI, + format: false, + ) + rescue RegexpError => e + raise ArgumentError, "Invalid JSON Schema: #{e.message}" + end + def fully_validate(data) - JSON::Validator.fully_validate(schema_for_validation, data) + schemer.validate(stringify(data)).map { |validation_error| validation_error.fetch("error") } end def validate_schema! @@ -75,16 +104,7 @@ def validate_schema! key = Digest::SHA256.hexdigest(JSON.generate(target, max_nesting: false)) return if VALIDATION_CACHE.validated?(key) - gem_path = File.realpath(Gem.loaded_specs["json-schema"].full_gem_path) - schema_reader = JSON::Schema::Reader.new( - accept_uri: false, - accept_file: ->(path) { File.realpath(path.to_s).start_with?(gem_path) }, - ) - metaschema_path = Pathname.new(JSON::Validator.validator_for_name("draft4").metaschema) - # Converts metaschema to a file URI for cross-platform compatibility - metaschema_uri = JSON::Util::URI.file_uri(metaschema_path.expand_path.cleanpath.to_s.tr("\\", "/")) - metaschema = metaschema_uri.to_s - errors = JSON::Validator.fully_validate(metaschema, target, schema_reader: schema_reader) + errors = schemer.validate_schema.map { |validation_error| validation_error.fetch("error") } if errors.any? raise ArgumentError, "Invalid JSON Schema: #{errors.join(", ")}" end @@ -92,9 +112,8 @@ def validate_schema! VALIDATION_CACHE.store(key) end - # The `json-schema` gem's draft-04 validator cannot resolve newer or unknown `$schema` - # dialect URIs. Strip the top-level `$schema` before validation so a dialect URI - # (whether SDK-injected by `to_h` or user-supplied) does not break the validator. + # `json_schemer` is pinned to the draft-04 metaschema, so strip top-level `$schema` before validation: + # this preserves the legacy behavior of ignoring the advertised dialect URI when the SDK validates schemas. def schema_for_validation return @schema unless @schema.key?(:"$schema") diff --git a/mcp.gemspec b/mcp.gemspec index 4c44f498..9c4d6021 100644 --- a/mcp.gemspec +++ b/mcp.gemspec @@ -30,5 +30,5 @@ Gem::Specification.new do |spec| spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) } spec.require_paths = ["lib"] - spec.add_dependency("json-schema", ">= 4.1") + spec.add_dependency("json_schemer", ">= 2.4") end diff --git a/test/mcp/tool/input_schema_test.rb b/test/mcp/tool/input_schema_test.rb index 80a57828..972c843a 100644 --- a/test/mcp/tool/input_schema_test.rb +++ b/test/mcp/tool/input_schema_test.rb @@ -7,7 +7,7 @@ class Tool class InputSchemaTest < ActiveSupport::TestCase test "required arguments are converted to strings" do input_schema = InputSchema.new(properties: { message: { type: "string" } }, required: [:message]) - assert_equal ["message"], input_schema.schema[:required] + assert_equal ["message"], input_schema.to_h[:required] end test "to_h returns a hash representation of the input schema" do @@ -139,10 +139,10 @@ class InputSchemaTest < ActiveSupport::TestCase test "unexpected errors bubble up from validate_arguments" do schema = InputSchema.new(properties: { foo: { type: "string" } }, required: ["foo"]) - JSON::Validator.stub(:fully_validate, ->(*) { raise "unexpected error" }) do - assert_raises(RuntimeError) do - schema.validate_arguments({ foo: "bar" }) - end + JSONSchemer::Schema.any_instance.stubs(:validate).raises("unexpected error") + + assert_raises(RuntimeError) do + schema.validate_arguments(foo: "bar") end end @@ -200,6 +200,26 @@ class InputSchemaTest < ActiveSupport::TestCase schema6 = InputSchema.new(properties: { foo: { type: "string" } }, required: ["foo"], additionalProperties: false) refute_equal schema1, schema6 end + + test "format keyword is not enforced (legacy behavior)" do + schema = InputSchema.new( + properties: { email: { type: "string", format: "email" } }, + required: ["email"], + ) + assert_nil(schema.validate_arguments(email: "not_an_email")) + end + + test "invalid pattern raises ArgumentError, not RegexpError" do + error = assert_raises(ArgumentError) do + InputSchema.new(properties: { id: { type: "string", pattern: "[" } }) + end + assert_includes error.message, "Invalid JSON Schema" + end + + test "Symbol values in arguments are treated as strings" do + schema = InputSchema.new(properties: { foo: { type: "string" } }, required: ["foo"]) + assert_nil(schema.validate_arguments(foo: :bar)) + end end end end diff --git a/test/mcp/tool/output_schema_test.rb b/test/mcp/tool/output_schema_test.rb index 073de2c6..5403f008 100644 --- a/test/mcp/tool/output_schema_test.rb +++ b/test/mcp/tool/output_schema_test.rb @@ -110,10 +110,10 @@ class OutputSchemaTest < ActiveSupport::TestCase test "unexpected errors bubble up from validate_result" do schema = OutputSchema.new(properties: { foo: { type: "string" } }, required: ["foo"]) - JSON::Validator.stub(:fully_validate, ->(*) { raise "unexpected error" }) do - assert_raises(RuntimeError) do - schema.validate_result({ foo: "bar" }) - end + JSONSchemer::Schema.any_instance.stubs(:validate).raises("unexpected error") + + assert_raises(RuntimeError) do + schema.validate_result(foo: "bar") end end diff --git a/test/mcp/tool/schema_test.rb b/test/mcp/tool/schema_test.rb index 48d93d80..edc90e30 100644 --- a/test/mcp/tool/schema_test.rb +++ b/test/mcp/tool/schema_test.rb @@ -10,7 +10,7 @@ class SchemaTest < ActiveSupport::TestCase end test "validates a schema once and reuses the result for identical schemas" do - JSON::Validator.expects(:fully_validate).once.returns([]) + JSONSchemer::Schema.any_instance.expects(:validate_schema).once.returns([]) schema = { properties: { validates_once: { type: "string" } } } InputSchema.new(schema) @@ -18,7 +18,7 @@ class SchemaTest < ActiveSupport::TestCase end test "validates distinct schemas separately" do - JSON::Validator.expects(:fully_validate).twice.returns([]) + JSONSchemer::Schema.any_instance.expects(:validate_schema).twice.returns([]) InputSchema.new(properties: { distinct_a: { type: "string" } }) InputSchema.new(properties: { distinct_b: { type: "string" } }) @@ -64,11 +64,11 @@ class SchemaTest < ActiveSupport::TestCase break end - JSON::Validator.stub(:fully_validate, []) do - assert_nothing_raised do - InputSchema.new(schema) - InputSchema.new(schema) - end + JSONSchemer::Schema.any_instance.stubs(:validate_schema).returns([]) + + assert_nothing_raised do + InputSchema.new(schema) + InputSchema.new(schema) end end diff --git a/test/mcp/tool_test.rb b/test/mcp/tool_test.rb index e32df0a9..bdec7b6a 100644 --- a/test/mcp/tool_test.rb +++ b/test/mcp/tool_test.rb @@ -167,8 +167,8 @@ class InputSchemaTool < Tool end assert_includes error.message, "Invalid JSON Schema" - assert_includes error.message, "#/properties/count/minimum" - assert_includes error.message, "string did not match the following type: number" + assert_includes error.message, "properties/count/minimum" + assert_includes error.message, "number" end test ".define allows definition of simple tools with a block" do @@ -431,8 +431,8 @@ class OutputSchemaObjectTool < Tool end assert_includes error.message, "Invalid JSON Schema" - assert_includes error.message, "#/properties/count/minimum" - assert_includes error.message, "string did not match the following type: number" + assert_includes error.message, "properties/count/minimum" + assert_includes error.message, "number" end test "output_schema accepts $ref in schema" do