feat: upload CI logs to S3, CI gets called every midnight#66
feat: upload CI logs to S3, CI gets called every midnight#66redpanda-f wants to merge 8 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds functionality to upload CI run state and logs to AWS S3 for post-run inspection and debugging. The feature is designed to help diagnose test failures by preserving the complete state directory from CI runs.
Changes:
- Adds AWS CLI installation step that checks if AWS CLI is already present before installing
- Adds S3 upload step that syncs
~/.foc-devnet/state/latestdirectory to S3 with a structured path including branch name, run ID, and run attempt number
|
@redpanda-f I've opened a new pull request, #67, to work on those changes. Once the pull request is ready, I'll request review from you. |
…step (#67) * Initial plan * Consolidate AWS CLI install and S3 upload into a single conditional step Co-authored-by: redpanda-f <181817029+redpanda-f@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: redpanda-f <181817029+redpanda-f@users.noreply.github.com>
BigLep
left a comment
There was a problem hiding this comment.
If pushing logs to s3 is indeed required vs. whatever log retention we get from GitHub (#65 (comment)), then I'd like to see documentation on:
- Why we're doing this? (Again, this may be totally valid, but I haven't seen the motivation written anywhere)
- How someone accesses the logs? I would this bucket to be publicly accessible (just as github action logs are accessible) so there is no barrier for someone to access it quickly
How about we also define/create the bucket in code so that something like the retention policy on the bucket is also set and easily discoverable from looking here vs. needing to query AWS.
|
Also, I see there are changes to ci.yml itself. Is that structurally right? I though ci.yml was for validating foc-devnet itself, but it wasn't where we'd be executing tests for validating FOC. In #8 (comment) there was discussion of creating a new repo for running FOC integration tests. It's ok if the plan has changed, but that should be documented/communicated. I think part of what's missing here is:
(source) |
That is doable, for now I have done it manually. However feel free to let me know if you want in scope, as part of #8 :
|
Current changes in ci.yml is a proof that S3 uploading works and not much more. This will later be useful as a standalone action that works in tandem with a separate common devnet setup action (.github/actions/devnet-setup/action.yml) available as part of #62. That separate action abstracts away the "foc-devnet start process" and gives a common steps that any In fact with that, it become slightly clear what ci.yml entails which is not much more than "does the foc-devnet start correctly". (see .github/workflows/ci.yml) in #62 I am currently attempting e2e tests as separate workflows, as described above in #62 (WIP). That way, we get the "code separation" from foc-devnet anyways, while not really having to create a separate repo. |
|
FYI, there is also this little advertised feature of our custom github runners where they all have access to an s3 bucket for storing artifacts. It shouldn't be used as cache because it doesn't have things like poisoning protections, etc. but it can and in some cases is used an artifacts storage (e.g. kubo uploads html reports to it so that they can be rendered in the browser). By default it has 90 days retention set, same like gh artifacts, but we have full control over it obviously. There are these 2 helper actions for interacting with it - |
That sounds actually very nice, and almost exactly what we need. Can you describe more on HTML and rendering on browser? do we have a web server fronting this as well? Would be useful to use that for our nightly reports. Also, do we have an upper limit on the retention period? Will get in touch with you. This sounds like a better direction. |
No server, just for static websites. Let me show you an example. Here, https://github.com/ipfs/kubo/actions/runs/22498049626/attempts/1#summary-65177630169, we have workflow run summary (that's just an md file - $GITHUB_STEP_SUMMARY - you can write to from your job). There, you'll see two links:
They're both static websites, s3 handles the rendering bit. And you can upload them like this:
Nope, we can set it to whatever we want or not set it at all. |
|
S3 is not madatory for what we are doing right now, although necessary in longer term. Will close: #66 and will rely on log retention of GH action for now. |
No description provided.