Skip to content

feat: slack reporting#68

Closed
redpanda-f wants to merge 3 commits intofeat/redpanda/s3-uploadsfrom
feat/redpanda/slack_reporting
Closed

feat: slack reporting#68
redpanda-f wants to merge 3 commits intofeat/redpanda/s3-uploadsfrom
feat/redpanda/slack_reporting

Conversation

@redpanda-f
Copy link
Collaborator

@redpanda-f redpanda-f commented Feb 27, 2026

It bases itself on #66, merge that before so diff is smaller.

See issue #8 for context.

Motivation: Introduce reporting of select workflows as needed

Copilot AI review requested due to automatic review settings February 27, 2026 13:15
@redpanda-f redpanda-f self-assigned this Feb 27, 2026
@FilOzzy FilOzzy added this to FOC Feb 27, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 27, 2026
@redpanda-f redpanda-f added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Feb 27, 2026
@redpanda-f redpanda-f moved this from 📌 Triage to ⌨️ In Progress in FOC Feb 27, 2026
@redpanda-f redpanda-f added this to the M4.2: mainnet GA milestone Feb 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Slack notifications for CI runs, including deep links to uploaded run artifacts, to improve visibility into nightly/CI health and debugging.

Changes:

  • Introduces a new Notify Slack workflow that runs on workflow_run completion of the CI workflow and posts a structured Slack message.
  • Updates the CI workflow to run nightly on a cron schedule.
  • Adds AWS/S3 setup and a post-run upload step to sync ~/.foc-devnet/state/latest to S3 for later inspection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.github/workflows/notify-slack.yml New workflow that queries CI job results and posts a Slack message with job status + S3 artifact links.
.github/workflows/ci.yml Adds nightly schedule, AWS/S3 bootstrap, S3 upload of run state, and adjusts start invocation.

Comment on lines +372 to +382
# Upload state/latest directory to S3 for post-run inspection
# Path: s3://<CI_LOGS_BUCKET>/runs/<branch>/<run_id>/<run_attempt>/
# Requires: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION secrets
# and CI_LOGS_BUCKET repository variable.
- name: "EXEC: {Install AWS CLI and upload state/latest to S3}, independent"
if: always()
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
CI_LOGS_BUCKET: ${{ secrets.CI_LOGS_BUCKET }}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments above this step say CI_LOGS_BUCKET is a "repository variable" and mention "AWS_REGION" secrets, but the workflow actually sources CI_LOGS_BUCKET and region from secrets.* (and uses AWS_DEFAULT_REGION). Please align the comments with the actual configuration to avoid confusion during setup/rotation.

Copilot uses AI. Check for mistakes.
redpanda-f and others added 2 commits February 27, 2026 13:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  1. Where did the slack reporting requirement come from? In #8 we discussed creating a github issue if the nightly job fails. I agree slack is nice for more "in your face" visibility, but the problem I see with slack threads are:
    • Easy to get lost in the stream of other threads.
    • Doesn't provide a durable place to have followup task tracking, investigation, etc.
    • They don't show up on our standup board. If nightly jobs are failing, I would want this talked about during standup and I'd like to assign someone to it.
      I think all of the above is facilitated better with an issue, but I am open to change my mind if the team feels differently.
  2. (probably a misunderstanding by me) If "it bases itself on #66", then why isn't this PR targeting https://github.com/FilOzone/foc-devnet/tree/feat/redpanda/s3-uploads ?
  3. (nit) I think it's ideal if PRs give more context for the motivation of the change. That can be as simple as linking to the issue that motivated the work (which I assume is #8 in this case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why package this as a an action vs. just a secript that the ci.yml workflow can use?

@redpanda-f
Copy link
Collaborator Author

A few things:

  1. Where did the slack reporting requirement come from? In CI/Nightly End-to-End Validation of FOC as a whole #8 we discussed creating a github issue if the nightly job fails. I agree slack is nice for more "in your face" visibility, but the problem I see with slack threads are:

    • Easy to get lost in the stream of other threads.
    • Doesn't provide a durable place to have followup task tracking, investigation, etc.
    • They don't show up on our standup board. If nightly jobs are failing, I would want this talked about during standup and I'd like to assign someone to it.
      I think all of the above is facilitated better with an issue, but I am open to change my mind if the team feels differently.
  2. (probably a misunderstanding by me) If "it bases itself on feat: upload CI logs to S3, CI gets called every midnight #66", then why isn't this PR targeting https://github.com/FilOzone/foc-devnet/tree/feat/redpanda/s3-uploads ?

  3. (nit) I think it's ideal if PRs give more context for the motivation of the change. That can be as simple as linking to the issue that motivated the work (which I assume is CI/Nightly End-to-End Validation of FOC as a whole #8 in this case).

  1. I can extend this PR itself for including GH issue posting. In fact, slack reporting can be removed entirely if bots are not useful (I think they are a nice to have).
  2. I try keeping PRs smaller so single functionalities are introduced in one PR at a time. Sometimes, that requires PRs to use other running PRs codebase. There are some ways of tackling it:
    2.1. Make the dependent PR (this) target the base (in this case feat/redpanda/s3-uploads). This holds back the base PR merge till all dependents are ready.
    2.2. Make the dependent PR (this) target the main, but call it out. Base PR merge is not held back, and dependent PR is blocked till base goes in mainline. The pro is the dependency is clear, the con is that PR diff looks larger than it really is.
    I generally go with 2.2 as the is clarity of dependency, and PRs are single utility. However, no strong opinions. Composite large PRs are also okay, just clunky imo.
  3. Will do.

Why package this as a an action vs. just a secript that the ci.yml workflow can use?

I am assuming there will be workflows, one per composite e2e test. Each of them can decide whether they want to report it or not. For example, ci run where --notest is run may not need an issue created.

@redpanda-f redpanda-f changed the base branch from main to feat/redpanda/s3-uploads March 2, 2026 03:41
@redpanda-f
Copy link
Collaborator Author

Slack reporting is deprioritized / removed. #68 will be closed. Only GH issues are focus for now, since that's where a lot of our tracking happens anyways. There is @rvagg and @BigLep buy-in here.

will raise a separate composite PR for this.

@redpanda-f redpanda-f closed this Mar 2, 2026
@github-project-automation github-project-automation bot moved this from ⌨️ In Progress to 🎉 Done in FOC Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

3 participants