Skip to content

feat: Support file_data (URI references) in GcsArtifactService.save_artifact #5230

@blueeye-040

Description

@blueeye-040

🔴 Required Information

Is your feature request related to a specific problem?

GcsArtifactService.save_artifact() raises NotImplementedError when the artifact is a types.Part with file_data set - a URI pointer to an existing file (e.g. gs://my-bucket/report.pdf). This means any user working with GCS-hosted files (PDFs, videos, large datasets) cannot save them as artifacts without re-uploading the content. InMemoryArtifactService already supports this case, so GCS is inconsistent with the rest of the service layer.

Describe the Solution You'd Like

In _save_artifact, replace the raise NotImplementedError block with handling for two sub-cases:

  1. Internal artifact references (artifact:// URIs): validate the URI format using artifact_util.parse_artifact_uri(), then write a zero-byte blob with file_uri stored in blob metadata.
  2. External URI (gs:// or other): write a zero-byte blob with file_uri stored in blob metadata and the mime_type as content_type.

In _load_artifact, before calling download_as_bytes(), check blob.metadata.get("file_uri") — if present, return:
types.Part(file_data=types.FileData(file_uri=..., mime_type=blob.content_type))

This approach stores only a pointer (no data copy), consistent with how InMemoryArtifactService stores file_data parts as-is.

Impact on your work

Users who have files already in GCS cannot use GcsArtifactService to register those files as artifacts. They are forced to either download and re-upload the content (defeating the purpose of GCS URIs) or fall back to InMemoryArtifactService which does not persist across sessions.

Willingness to contribute

Yes, I am implementing this and will submit a PR.


🟡 Recommended Information

Describe Alternatives You've Considered

  • Copying/rewriting the file bytes from the source GCS URI into the artifact bucket — rejected because it duplicates data and requires extra GCS permissions (cross-bucket reads).
  • Rejecting file_data with a descriptive error — already done, but doesn't solve the problem.
  • Using InMemoryArtifactService — does not persist across sessions or deployments, not viable for production.

Proposed API / Implementation

# _save_artifact — replace lines 232–236 in gcs_artifact_service.py
elif artifact.file_data:
    if not artifact.file_data.file_uri:
        raise InputValidationError("Artifact file_data must have a file_uri.")
    if artifact_util.is_artifact_ref(artifact):
        if not artifact_util.parse_artifact_uri(artifact.file_data.file_uri):
            raise InputValidationError(
                f"Invalid artifact reference URI: {artifact.file_data.file_uri}"
            )
    blob.metadata = {**(blob.metadata or {}), "file_uri": artifact.file_data.file_uri}
    if artifact.file_data.mime_type:
        blob.upload_from_string(b"", content_type=artifact.file_data.mime_type)
    else:
        blob.upload_from_string(b"")

# _load_artifact — add before download_as_bytes()
if blob.metadata and "file_uri" in blob.metadata:
    return types.Part(
        file_data=types.FileData(
            file_uri=blob.metadata["file_uri"],
            mime_type=blob.content_type or None,
        )
    )

Additional Context

  • InMemoryArtifactService._save_artifact() already handles this at line 129–138.
  • artifact_util.is_artifact_ref() and artifact_util.parse_artifact_uri() are the existing helpers for URI validation.
  • _get_artifact_version_sync() already constructs gs:// canonical URIs (line 387), confirming the pattern of storing URI metadata is established.
  • New tests needed in tests/unittests/artifacts/test_artifact_service.py following the existing MockBlob pattern.

Metadata

Metadata

Labels

needs review[Status] The PR/issue is awaiting review from the maintainerservices[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions