Skip to content

feat: upload CI logs to S3, CI gets called every midnight#66

Closed
redpanda-f wants to merge 8 commits intomainfrom
feat/redpanda/s3-uploads
Closed

feat: upload CI logs to S3, CI gets called every midnight#66
redpanda-f wants to merge 8 commits intomainfrom
feat/redpanda/s3-uploads

Conversation

@redpanda-f
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 27, 2026 07:22
@FilOzzy FilOzzy added this to FOC Feb 27, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds functionality to upload CI run state and logs to AWS S3 for post-run inspection and debugging. The feature is designed to help diagnose test failures by preserving the complete state directory from CI runs.

Changes:

  • Adds AWS CLI installation step that checks if AWS CLI is already present before installing
  • Adds S3 upload step that syncs ~/.foc-devnet/state/latest directory to S3 with a structured path including branch name, run ID, and run attempt number

@redpanda-f redpanda-f self-assigned this Feb 27, 2026
@redpanda-f redpanda-f changed the title feat: upload CI logs to S3 feat: upload CI logs to S3, CI gets called every midnight Feb 27, 2026
Copy link
Contributor

Copilot AI commented Feb 27, 2026

@redpanda-f I've opened a new pull request, #67, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits February 27, 2026 13:12
…step (#67)

* Initial plan

* Consolidate AWS CLI install and S3 upload into a single conditional step

Co-authored-by: redpanda-f <181817029+redpanda-f@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: redpanda-f <181817029+redpanda-f@users.noreply.github.com>
@redpanda-f redpanda-f marked this pull request as draft February 27, 2026 08:02
@rjan90 rjan90 moved this from 📌 Triage to ⌨️ In Progress in FOC Feb 27, 2026
@rjan90 rjan90 added this to the M4.1: mainnet ready milestone Feb 27, 2026
@rjan90 rjan90 linked an issue Feb 27, 2026 that may be closed by this pull request
@redpanda-f redpanda-f marked this pull request as ready for review February 27, 2026 12:55
@redpanda-f redpanda-f requested a review from rvagg February 27, 2026 12:55
@redpanda-f redpanda-f mentioned this pull request Feb 27, 2026
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pushing logs to s3 is indeed required vs. whatever log retention we get from GitHub (#65 (comment)), then I'd like to see documentation on:

  1. Why we're doing this? (Again, this may be totally valid, but I haven't seen the motivation written anywhere)
  2. How someone accesses the logs? I would this bucket to be publicly accessible (just as github action logs are accessible) so there is no barrier for someone to access it quickly

How about we also define/create the bucket in code so that something like the retention policy on the bucket is also set and easily discoverable from looking here vs. needing to query AWS.

@BigLep
Copy link
Contributor

BigLep commented Feb 28, 2026

Also, I see there are changes to ci.yml itself. Is that structurally right? I though ci.yml was for validating foc-devnet itself, but it wasn't where we'd be executing tests for validating FOC.

In #8 (comment) there was discussion of creating a new repo for running FOC integration tests. It's ok if the plan has changed, but that should be documented/communicated. I think part of what's missing here is:

I also think it will be helpful to have a comment in foc-localnet ci.yml about its purpose and what's in scope and not in scope for that CI job so others don't get confused.

(source)

@redpanda-f
Copy link
Collaborator Author

If pushing logs to s3 is indeed required vs. whatever log retention we get from GitHub (#65 (comment)), then I'd like to see documentation on:

  1. Why we're doing this? (Again, this may be totally valid, but I haven't seen the motivation written anywhere)
  2. How someone accesses the logs? I would this bucket to be publicly accessible (just as github action logs are accessible) so there is no barrier for someone to access it quickly
  1. GH actions have a retention policy for anwhere upto 90 days (ref: https://docs.github.com/en/organizations/managing-organization-settings/configuring-the-retention-period-for-github-actions-artifacts-and-logs-in-your-organization). We may want longer retention, both in case the issues raised due to failed nightlies are deprioritized aggressively and picked later, or for further reference of the issue. Hence S3 sounds like a good choice.
  2. The bucket is currently publicly accessible to everyone. For example, here is from one of the runs: https://filoz-foc-devnet.s3.ap-southeast-1.amazonaws.com/runs/feat-redpanda-s3-uploads/22481882552/1/foc_metadata.json

How about we also define/create the bucket in code so that something like the retention policy on the bucket is also set and easily discoverable from looking here vs. needing to query AWS.

That is doable, for now I have done it manually. However feel free to let me know if you want in scope, as part of #8 :

  • s3 bucket creation IaC
  • s3 lifecycle / retention policy of logs

@redpanda-f
Copy link
Collaborator Author

Also, I see there are changes to ci.yml itself. Is that structurally right? I though ci.yml was for validating foc-devnet itself, but it wasn't where we'd be executing tests for validating FOC.

In #8 (comment) there was discussion of creating a new repo for running FOC integration tests. It's ok if the plan has changed, but that should be documented/communicated. I think part of what's missing here is:

I also think it will be helpful to have a comment in foc-localnet ci.yml about its purpose and what's in scope and not in scope for that CI job so others don't get confused.

(source)

Current changes in ci.yml is a proof that S3 uploading works and not much more. This will later be useful as a standalone action that works in tandem with a separate common devnet setup action (.github/actions/devnet-setup/action.yml) available as part of #62. That separate action abstracts away the "foc-devnet start process" and gives a common steps that any ci.yml or e2e_some_specific_test.yml can use.

In fact with that, it become slightly clear what ci.yml entails which is not much more than "does the foc-devnet start correctly". (see .github/workflows/ci.yml) in #62

I am currently attempting e2e tests as separate workflows, as described above in #62 (WIP). That way, we get the "code separation" from foc-devnet anyways, while not really having to create a separate repo.

@galargh
Copy link

galargh commented Mar 1, 2026

FYI, there is also this little advertised feature of our custom github runners where they all have access to an s3 bucket for storing artifacts. It shouldn't be used as cache because it doesn't have things like poisoning protections, etc. but it can and in some cases is used an artifacts storage (e.g. kubo uploads html reports to it so that they can be rendered in the browser). By default it has 90 days retention set, same like gh artifacts, but we have full control over it obviously. There are these 2 helper actions for interacting with it - download-artifact and upload-artifact in https://github.com/ipdxco/custom-github-runners/tree/main/.github/actions

@redpanda-f
Copy link
Collaborator Author

redpanda-f commented Mar 1, 2026

FYI, there is also this little advertised feature of our custom github runners where they all have access to an s3 bucket for storing artifacts. It shouldn't be used as cache because it doesn't have things like poisoning protections, etc. but it can and in some cases is used an artifacts storage (e.g. kubo uploads html reports to it so that they can be rendered in the browser). By default it has 90 days retention set, same like gh artifacts, but we have full control over it obviously. There are these 2 helper actions for interacting with it - download-artifact and upload-artifact in https://github.com/ipdxco/custom-github-runners/tree/main/.github/actions

That sounds actually very nice, and almost exactly what we need. Can you describe more on HTML and rendering on browser? do we have a web server fronting this as well? Would be useful to use that for our nightly reports.

Also, do we have an upper limit on the retention period?

Will get in touch with you. This sounds like a better direction.

@galargh
Copy link

galargh commented Mar 1, 2026

That sounds actually very nice, and almost exactly what we need. Can you describe more on HTML and rendering on browser? do we have a web server fronting this as well? Would be useful to use that for our nightly reports.

No server, just for static websites. Let me show you an example. Here, https://github.com/ipfs/kubo/actions/runs/22498049626/attempts/1#summary-65177630169, we have workflow run summary (that's just an md file - $GITHUB_STEP_SUMMARY - you can write to from your job). There, you'll see two links:

They're both static websites, s3 handles the rendering bit. And you can upload them like this:

Also, do we have an upper limit on the retention period?

Nope, we can set it to whatever we want or not set it at all.

@redpanda-f
Copy link
Collaborator Author

S3 is not madatory for what we are doing right now, although necessary in longer term. Will close: #66 and will rely on log retention of GH action for now.
Follow up tasks would transition to our own S3 or IPDX's S3 buckets.

@redpanda-f redpanda-f closed this Mar 2, 2026
@github-project-automation github-project-automation bot moved this from ⌨️ In Progress to 🎉 Done in FOC Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

CI/Nightly: Introduce logs dumping to S3 Buckets

6 participants