Skip to content

feat(custom-engine): HTTP transport multipart file upload (http.files)#102

Open
zpzjzj wants to merge 3 commits into
mainfrom
feat/custom-http-multipart
Open

feat(custom-engine): HTTP transport multipart file upload (http.files)#102
zpzjzj wants to merge 3 commits into
mainfrom
feat/custom-http-multipart

Conversation

@zpzjzj

@zpzjzj zpzjzj commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Builds on the http JSON core (#99). When engine.custom.http.files is declared, the request becomes multipart/form-data — the rendered JSON body is sent as the payload field and each matched workspace file as a files part (the part filename is the workspace-relative path). Without files the request stays plain JSON (unchanged).

Refs docs/design/custom-engine.md (HTTP file upload). Not Closes — URL artifact download in the result is the remaining follow-up (PR-C).

What's implemented

  • config (validator.go): validateCustomHTTP accepts files; each path must be a non-empty workspace-relative path (no leading / or ..). The shared config.WorkspaceRelPathSafe applies the same rule at config load and in the transport.
  • agent (custom_http_files.go): expandHTTPFiles resolves each entry against the workspace file listing — exact path or doublestar glob (*, **); only regular files (the find -type f listing excludes directories and .git); required: true (default) errors on no match, required: false skips; results deduped + sorted. buildMultipartBody/addFilePart stream each file (download → os.Stat cap-check → io.Copy) with a 256 MiB cumulative cap.
  • agent (custom_http.go): run dispatches JSON vs multipart via a new buildRequest; the multipart Content-Type (with boundary) wins over any user header.

Security hardening (from self-review)

  • find -print0 (NUL-separated) instead of newline-separated: a workspace filename containing a newline can no longer split one listing entry into a bogus absolute path that a **/* glob would match and read off-workspace. Listing entries are additionally filtered through WorkspaceRelPathSafe. Regression test included.
  • Effective size cap: the file size is checked via os.Stat before loading, and content is streamed (io.Copy), so a single oversized file is rejected before it is read into memory.
  • Invalid glob patterns now surface an error instead of silently matching nothing.

Known limitations

  • The multipart body is assembled in memory bounded by maxHTTPUploadBytes (256 MiB); with many parallel cases peak RAM is N × cap. A streaming io.Pipe body would lower it — a future improvement (noted in code).
  • internal/evaluator/artifacts_collect.go::listWorkspaceFiles has the same pre-existing newline-split issue; out of scope here, flagged for a separate fix.

Test plan

  • make test (go test -race ./...) — green; new custom_http_test.go multipart cases: exact file, glob (excludes dirs + non-matches), required-missing error, optional-missing skip, unsafe-path reject, newline-filename regression, multipart Content-Type, payload carries SessionInput.
  • internal/config http file-path validator cases (valid globs / .. / absolute).
  • make verify (fmt + vet + revive + golangci-lint) — 0 issues.
  • make e2e custom-engine (local + http JSON) suite — green.

Notes for reviewers

  • listWorkspaceFiles mirrors the evaluator's collect-artifacts lister; they stay separate only because agent cannot import evaluator (a shared leaf package would consolidate them).

🤖 Generated with Claude Code

Builds on the http JSON core (#99): when engine.custom.http.files is declared,
the request becomes multipart/form-data — the rendered JSON body is sent as the
`payload` field and each matched workspace file as a `files` part (the part
filename is the workspace-relative path). Without files the request stays plain
JSON (unchanged).

- config: validateCustomHTTP now accepts files; each path must be a non-empty
  relative path (no leading / or ..). Shared config.WorkspaceRelPathSafe applies
  the same rule at config load and in the transport.
- agent (custom_http_files.go): expandHTTPFiles resolves each entry against the
  workspace file listing — exact path or doublestar glob; only regular files
  (the `find -type f` listing excludes directories and .git); required:true (the
  default) errors on no match, required:false skips; results deduped + sorted.
  buildMultipartBody streams the payload + file parts with a 256 MiB total cap.
  Reuses the temp-file + DownloadFile read pattern.
- custom_http.go run(): un-blanks rt and dispatches JSON vs multipart via the
  new buildRequest; the multipart Content-Type (with boundary) wins over any
  user header.

listWorkspaceFiles mirrors the evaluator's collect-artifacts lister; they stay
separate only because agent cannot import evaluator (a shared leaf package would
consolidate them).

Scope: multipart upload only. URL artifact download in the result is a follow-up.

Tests: httptest multipart cases (exact file, glob excluding dirs/non-matches,
required-missing error, optional-missing skip, unsafe path reject, multipart
Content-Type, payload carries SessionInput); validator file-path cases. make
test (race), make verify (0 issues), and the e2e custom-engine suite are green.

Refs docs/design/custom-engine.md (HTTP file upload)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zpzjzj zpzjzj requested a review from hittyt as a code owner June 10, 2026 08:26
…r test

CI (ubuntu + windows) failed on the newline-filename test:
- A newline in a multipart part filename corrupts the Content-Disposition
  header (Go's multipart writer does not escape CR/LF) — a real injection/
  corruption edge, not just a test artifact.
- Windows cannot create a newline-named file, so the real-file test was
  non-portable.

Fix: listWorkspaceFiles now also drops entries containing CR/LF (alongside the
absolute/.. filter and the -print0 split fix), so such names never reach a
multipart filename or DownloadFile. Replace the non-portable real-file test
with a portable fakeFindRuntime test that feeds canned `find -print0` output
(a safe relative path, an absolute entry, and a CR/LF name) and asserts only
the safe relative path survives expandHTTPFiles.
@zpzjzj

zpzjzj commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 254a3669d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agent/custom_http_files.go Outdated
…ching

listWorkspaceFiles strips a leading ./ from every workspace entry, but
expandHTTPFiles matched the raw configured path/glob against the stripped
listing. So a natural relative form like `./diff.patch` or `./src/*.go` matched
nothing — a required entry errored "matched no files" and an optional one was
silently skipped even though the file exists. Normalize the configured pattern
(strip leading ./) the same way before matching. Test covers exact + glob.

Found by Codex review on #102.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant