Conversation
There was a problem hiding this comment.
Pull request overview
Adds Slack notifications for CI runs, including deep links to uploaded run artifacts, to improve visibility into nightly/CI health and debugging.
Changes:
- Introduces a new
Notify Slackworkflow that runs onworkflow_runcompletion of theCIworkflow and posts a structured Slack message. - Updates the
CIworkflow to run nightly on a cron schedule. - Adds AWS/S3 setup and a post-run upload step to sync
~/.foc-devnet/state/latestto S3 for later inspection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
.github/workflows/notify-slack.yml |
New workflow that queries CI job results and posts a Slack message with job status + S3 artifact links. |
.github/workflows/ci.yml |
Adds nightly schedule, AWS/S3 bootstrap, S3 upload of run state, and adjusts start invocation. |
| # Upload state/latest directory to S3 for post-run inspection | ||
| # Path: s3://<CI_LOGS_BUCKET>/runs/<branch>/<run_id>/<run_attempt>/ | ||
| # Requires: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION secrets | ||
| # and CI_LOGS_BUCKET repository variable. | ||
| - name: "EXEC: {Install AWS CLI and upload state/latest to S3}, independent" | ||
| if: always() | ||
| env: | ||
| AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} | ||
| AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
| AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }} | ||
| CI_LOGS_BUCKET: ${{ secrets.CI_LOGS_BUCKET }} |
There was a problem hiding this comment.
The comments above this step say CI_LOGS_BUCKET is a "repository variable" and mention "AWS_REGION" secrets, but the workflow actually sources CI_LOGS_BUCKET and region from secrets.* (and uses AWS_DEFAULT_REGION). Please align the comments with the actual configuration to avoid confusion during setup/rotation.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
BigLep
left a comment
There was a problem hiding this comment.
A few things:
- Where did the slack reporting requirement come from? In #8 we discussed creating a github issue if the nightly job fails. I agree slack is nice for more "in your face" visibility, but the problem I see with slack threads are:
- Easy to get lost in the stream of other threads.
- Doesn't provide a durable place to have followup task tracking, investigation, etc.
- They don't show up on our standup board. If nightly jobs are failing, I would want this talked about during standup and I'd like to assign someone to it.
I think all of the above is facilitated better with an issue, but I am open to change my mind if the team feels differently.
- (probably a misunderstanding by me) If "it bases itself on #66", then why isn't this PR targeting https://github.com/FilOzone/foc-devnet/tree/feat/redpanda/s3-uploads ?
- (nit) I think it's ideal if PRs give more context for the motivation of the change. That can be as simple as linking to the issue that motivated the work (which I assume is #8 in this case).
There was a problem hiding this comment.
Why package this as a an action vs. just a secript that the ci.yml workflow can use?
I am assuming there will be workflows, one per composite e2e test. Each of them can decide whether they want to report it or not. For example, ci run where |
It bases itself on #66, merge that before so diff is smaller.
See issue #8 for context.
Motivation: Introduce reporting of select workflows as needed