Branding: improve IO resilience#3182
Merged
ChrisLovering merged 5 commits intopython-discord:mainfrom Nov 23, 2024
Merged
Conversation
Contributor
Author
|
For testing purposes, you can do this. async with self.bot.http_session.get(...) as response:
response.status = 500
_raise_for_status(response)The cog is overall not very testable. I have some ideas on how to improve it, but I'd rather leave it for a separate PR. |
If we fail to fetch an event, the whole branding sync will now be aborted. This will prevent situations where we fail to fetch the current event due to a 5xx error and the cog resorts to the fallback branding in the middle of an event. Error handling is moved to the cog. The repo abstraction will now propagate errors rather than silence them.
Use the tenacity lib to retry 5xx responses from GitHub.
We now send a result embed after refresh. It would be noisy to also send the calendar embed. Users can invoke the calendar manually if desired.
80d1e8c to
9dc2aea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2808
This PR aims to improve the resiliency of the Branding manager cog. It is a rework of #2869, which I've closed and re-implemented in this branch in adherence with the reviewer's request to use the tenacity lib.
Context
The Branding manager automatically discovers events once per day or when instructed to. During each such discovery, it makes multiple HTTP requests to the GitHub API. First, it fetches a list of all events in the Branding repository, and then it fetches those events one-by-one.
Currently, if the bot fails to fetch an event, it simply skips it. This seemed useful initially, as we didn't want the bot to stop the discovery iteration if just one event is badly configured. In practice, however, this causes undesired behaviour: if the currently active event is skipped due to a 5xx error, the bot may reset the guild's branding to the fallback event, despite the event still being in progress.
This PR makes adjustments to the error handling methodology. Now, if any of the requests fail, the whole discovery iteration is aborted & retried the following day (or manually via the
synccommand). The last valid state remains in the Redis cache, so the cog will continue to work just fine.Additionally, 5xx errors are now retried up to 5 times, with exponential backoff. This prevents discovery iterations from being cancelled due to stray server errors.
Implementation
The solution is fairly simple. The repository class no longer swallows IO/deserialization errors, and instead lets them propagate to the cog. The cog mostly already handles exceptions coming from the repo, I just added error handling where it was missing.
The calendar refresh can now fail (well, it could always fail, but the error was hidden). I added an embed response to inform the user of the result:
Because of this, it no longer automatically invokes the calendar view after the refresh - it seemed noisy. But feel free to propose other solutions.
I've used the Apache 2.0 licensed tenacity library for the retry with backoff.