Summary
GoogleConversationalSearchTracer appears to drop valid Discovery Engine / Vertex AI Search AnswerQueryResponse references when the reference content is structured_document_info.
The tracer extracts citations, references, and related questions from answer_query responses and adds them to the OpenLayer trace. However, extract_references only handles reference.chunk_info and skips references where chunk_info is nil.
For structured data store Answer API responses, valid references can use structured_document_info, so those references are omitted from the trace.
Relevant code
In lib/openlayer/integrations/google_conversational_search_tracer.rb, extract_references currently does:
chunk_info = safe_extract(reference, :chunk_info)
next nil if chunk_info.nil?
It then extracts content/document metadata only from chunk_info.
Why this matters
In a corpus of 500 real direct REST servingConfigs:answer responses, the tracer dropped structuredDocumentInfo references in 497 cases.
A minimized offline repro shows:
{
"source_references": 1,
"traced_references": 0,
"traced_context": 0,
"traced_citations": 1,
"traced_related_questions": 1
}
At the same time, the tracer preserved answer text, state, citations, related questions, and skipped reasons, so the loss appears specific to structured-document references.
The Discovery Engine v1 generated schema/proto treats this as a valid reference variant. Answer.Reference.content is a oneof with variants including unstructured_document_info, chunk_info, and structured_document_info.
Expected behavior
The tracer should preserve valid reference variants, including structured_document_info, when building trace references/context.
A structured-document reference could be mapped using fields such as:
structured_document_info.document
structured_document_info.uri
structured_document_info.title
structured_document_info.struct_data
Actual behavior
References with structured_document_info are skipped because extract_references only accepts references with chunk_info.
This can also make the trace internally inconsistent: metadata can include a non-zero references_count, while the emitted step has no usable references and no context.
Reproduction
No live Google calls are required. This repro uses a sanitized saved direct REST AnswerQueryResponse shape and a tiny duck-typed object adapter so the tracer sees Ruby-style methods such as answer_text, structured_document_info, and related_questions.
Save this as repro.rb and run it from a checkout of this repo with:
TRACER_PATH=lib/openlayer/integrations/google_conversational_search_tracer.rb ruby repro.rb
require "json"
require "time"
require File.expand_path(ENV.fetch("TRACER_PATH"))
fixture = {
"answer" => {
"answerText" => "Example answer with a structured reference.",
"state" => "SUCCEEDED",
"citations" => [
{"endIndex" => "39", "sources" => [{"referenceId" => "0"}]}
],
"references" => [
{
"structuredDocumentInfo" => {
"document" => "projects/example/locations/global/collections/default_collection/dataStores/example/branches/0/documents/example-doc",
"uri" => "https://example.test/doc",
"title" => "Example structured document",
"structData" => {"kind" => "example"}
}
}
],
"relatedQuestions" => ["Example related question?"],
"answerSkippedReasons" => []
}
}
def snake_key(key)
key.to_s
.gsub(/([A-Z]+)([A-Z][a-z])/, "\\1_\\2")
.gsub(/([a-z\d])([A-Z])/, "\\1_\\2")
.downcase
.to_sym
end
class DynamicObject
def initialize(hash)
@hash = hash
end
def method_missing(name, *args)
return @hash[name] if args.empty? && @hash.key?(name)
super
end
def respond_to_missing?(name, include_private = false)
@hash.key?(name) || super
end
end
def objectify(value)
case value
when Hash
DynamicObject.new(value.each_with_object({}) { |(k, v), h| h[snake_key(k)] = objectify(v) })
when Array
value.map { |v| objectify(v) }
else
value
end
end
class FakeData
attr_reader :records
def initialize
@records = []
end
def stream(_pipeline_id, **trace_data)
@records << trace_data
end
end
class FakePipelines
attr_reader :data
def initialize(data)
@data = data
end
end
class FakeClient
attr_reader :inference_pipelines
def initialize(data)
@inference_pipelines = FakePipelines.new(data)
end
end
data = FakeData.new
Openlayer::Integrations::GoogleConversationalSearchTracer.send_trace(
args: [],
kwargs: {query: {text: "offline repro"}},
response: objectify(fixture),
start_time: Time.at(1_700_000_000),
end_time: Time.at(1_700_000_001),
openlayer_client: FakeClient.new(data),
inference_pipeline_id: "offline"
)
step = data.records[0][:rows][0][:steps][0]
puts JSON.pretty_generate(
source_references: fixture["answer"]["references"].length,
traced_references: Array(step[:references]).length,
traced_context: Array(data.records[0][:rows][0][:context]).length,
traced_citations: Array(step[:citations]).length,
traced_related_questions: Array(step[:relatedQuestions]).length
)
On the current tracer, the output is:
{
"source_references": 1,
"traced_references": 0,
"traced_context": 0,
"traced_citations": 1,
"traced_related_questions": 1
}
Notes
I am not claiming the Google API response is wrong. The direct REST response is treated as ground truth. The issue is that the tracer drops a valid reference variant while attempting to extract Answer API grounding/reference metadata.
A possible fix would be to branch inside extract_references for structured_document_info and, possibly, unstructured_document_info, instead of returning nil whenever chunk_info is absent.
Summary
GoogleConversationalSearchTracerappears to drop valid Discovery Engine / Vertex AI SearchAnswerQueryResponsereferences when the reference content isstructured_document_info.The tracer extracts citations, references, and related questions from
answer_queryresponses and adds them to the OpenLayer trace. However,extract_referencesonly handlesreference.chunk_infoand skips references wherechunk_infois nil.For structured data store Answer API responses, valid references can use
structured_document_info, so those references are omitted from the trace.Relevant code
In
lib/openlayer/integrations/google_conversational_search_tracer.rb,extract_referencescurrently does:It then extracts content/document metadata only from
chunk_info.Why this matters
In a corpus of 500 real direct REST
servingConfigs:answerresponses, the tracer droppedstructuredDocumentInforeferences in 497 cases.A minimized offline repro shows:
{ "source_references": 1, "traced_references": 0, "traced_context": 0, "traced_citations": 1, "traced_related_questions": 1 }At the same time, the tracer preserved answer text, state, citations, related questions, and skipped reasons, so the loss appears specific to structured-document references.
The Discovery Engine v1 generated schema/proto treats this as a valid reference variant.
Answer.Reference.contentis aoneofwith variants includingunstructured_document_info,chunk_info, andstructured_document_info.Expected behavior
The tracer should preserve valid reference variants, including
structured_document_info, when building trace references/context.A structured-document reference could be mapped using fields such as:
Actual behavior
References with
structured_document_infoare skipped becauseextract_referencesonly accepts references withchunk_info.This can also make the trace internally inconsistent: metadata can include a non-zero
references_count, while the emitted step has no usablereferencesand nocontext.Reproduction
No live Google calls are required. This repro uses a sanitized saved direct REST
AnswerQueryResponseshape and a tiny duck-typed object adapter so the tracer sees Ruby-style methods such asanswer_text,structured_document_info, andrelated_questions.Save this as
repro.rband run it from a checkout of this repo with:On the current tracer, the output is:
{ "source_references": 1, "traced_references": 0, "traced_context": 0, "traced_citations": 1, "traced_related_questions": 1 }Notes
I am not claiming the Google API response is wrong. The direct REST response is treated as ground truth. The issue is that the tracer drops a valid reference variant while attempting to extract Answer API grounding/reference metadata.
A possible fix would be to branch inside
extract_referencesforstructured_document_infoand, possibly,unstructured_document_info, instead of returning nil wheneverchunk_infois absent.