Skip to content

Google conversational search tracer drops structuredDocumentInfo references from AnswerQueryResponse #35

Description

@glaziermag

Summary

GoogleConversationalSearchTracer appears to drop valid Discovery Engine / Vertex AI Search AnswerQueryResponse references when the reference content is structured_document_info.

The tracer extracts citations, references, and related questions from answer_query responses and adds them to the OpenLayer trace. However, extract_references only handles reference.chunk_info and skips references where chunk_info is nil.

For structured data store Answer API responses, valid references can use structured_document_info, so those references are omitted from the trace.

Relevant code

In lib/openlayer/integrations/google_conversational_search_tracer.rb, extract_references currently does:

chunk_info = safe_extract(reference, :chunk_info)
next nil if chunk_info.nil?

It then extracts content/document metadata only from chunk_info.

Why this matters

In a corpus of 500 real direct REST servingConfigs:answer responses, the tracer dropped structuredDocumentInfo references in 497 cases.

A minimized offline repro shows:

{
  "source_references": 1,
  "traced_references": 0,
  "traced_context": 0,
  "traced_citations": 1,
  "traced_related_questions": 1
}

At the same time, the tracer preserved answer text, state, citations, related questions, and skipped reasons, so the loss appears specific to structured-document references.

The Discovery Engine v1 generated schema/proto treats this as a valid reference variant. Answer.Reference.content is a oneof with variants including unstructured_document_info, chunk_info, and structured_document_info.

Expected behavior

The tracer should preserve valid reference variants, including structured_document_info, when building trace references/context.

A structured-document reference could be mapped using fields such as:

structured_document_info.document
structured_document_info.uri
structured_document_info.title
structured_document_info.struct_data

Actual behavior

References with structured_document_info are skipped because extract_references only accepts references with chunk_info.

This can also make the trace internally inconsistent: metadata can include a non-zero references_count, while the emitted step has no usable references and no context.

Reproduction

No live Google calls are required. This repro uses a sanitized saved direct REST AnswerQueryResponse shape and a tiny duck-typed object adapter so the tracer sees Ruby-style methods such as answer_text, structured_document_info, and related_questions.

Save this as repro.rb and run it from a checkout of this repo with:

TRACER_PATH=lib/openlayer/integrations/google_conversational_search_tracer.rb ruby repro.rb
require "json"
require "time"
require File.expand_path(ENV.fetch("TRACER_PATH"))

fixture = {
  "answer" => {
    "answerText" => "Example answer with a structured reference.",
    "state" => "SUCCEEDED",
    "citations" => [
      {"endIndex" => "39", "sources" => [{"referenceId" => "0"}]}
    ],
    "references" => [
      {
        "structuredDocumentInfo" => {
          "document" => "projects/example/locations/global/collections/default_collection/dataStores/example/branches/0/documents/example-doc",
          "uri" => "https://example.test/doc",
          "title" => "Example structured document",
          "structData" => {"kind" => "example"}
        }
      }
    ],
    "relatedQuestions" => ["Example related question?"],
    "answerSkippedReasons" => []
  }
}

def snake_key(key)
  key.to_s
    .gsub(/([A-Z]+)([A-Z][a-z])/, "\\1_\\2")
    .gsub(/([a-z\d])([A-Z])/, "\\1_\\2")
    .downcase
    .to_sym
end

class DynamicObject
  def initialize(hash)
    @hash = hash
  end

  def method_missing(name, *args)
    return @hash[name] if args.empty? && @hash.key?(name)
    super
  end

  def respond_to_missing?(name, include_private = false)
    @hash.key?(name) || super
  end
end

def objectify(value)
  case value
  when Hash
    DynamicObject.new(value.each_with_object({}) { |(k, v), h| h[snake_key(k)] = objectify(v) })
  when Array
    value.map { |v| objectify(v) }
  else
    value
  end
end

class FakeData
  attr_reader :records

  def initialize
    @records = []
  end

  def stream(_pipeline_id, **trace_data)
    @records << trace_data
  end
end

class FakePipelines
  attr_reader :data

  def initialize(data)
    @data = data
  end
end

class FakeClient
  attr_reader :inference_pipelines

  def initialize(data)
    @inference_pipelines = FakePipelines.new(data)
  end
end

data = FakeData.new

Openlayer::Integrations::GoogleConversationalSearchTracer.send_trace(
  args: [],
  kwargs: {query: {text: "offline repro"}},
  response: objectify(fixture),
  start_time: Time.at(1_700_000_000),
  end_time: Time.at(1_700_000_001),
  openlayer_client: FakeClient.new(data),
  inference_pipeline_id: "offline"
)

step = data.records[0][:rows][0][:steps][0]
puts JSON.pretty_generate(
  source_references: fixture["answer"]["references"].length,
  traced_references: Array(step[:references]).length,
  traced_context: Array(data.records[0][:rows][0][:context]).length,
  traced_citations: Array(step[:citations]).length,
  traced_related_questions: Array(step[:relatedQuestions]).length
)

On the current tracer, the output is:

{
  "source_references": 1,
  "traced_references": 0,
  "traced_context": 0,
  "traced_citations": 1,
  "traced_related_questions": 1
}

Notes

I am not claiming the Google API response is wrong. The direct REST response is treated as ground truth. The issue is that the tracer drops a valid reference variant while attempting to extract Answer API grounding/reference metadata.

A possible fix would be to branch inside extract_references for structured_document_info and, possibly, unstructured_document_info, instead of returning nil whenever chunk_info is absent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions