Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the previous memcache-based storage of GitHub Security Advisory (GHSA) data in audit taskflows with a dedicated SQL (SQLite) persistence layer exposed via the existing ghsa MCP server/toolbox, to prevent advisory notes from being mixed/discarded and avoid memcache-clear confirmation stalls.
Changes:
- Update audit taskflows/prompts to stop using the memcache toolbox and instead use the
ghsatoolbox + DB-backed GHSA/GHSASummary records. - Extend the
ghsatoolbox/server configuration to include a dedicated data directory env var for persistence. - Add SQLAlchemy models and MCP tools to store/fetch/clear GHSA advisories and summary records.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/seclab_taskflows/toolboxes/ghsa.yaml | Adds GHSA_DIR env to back the dedicated advisory DB storage location. |
| src/seclab_taskflows/taskflows/audit/fetch_security_advisories.yaml | Updates fetching instructions/toolboxes to use ghsa DB storage instead of memcache. |
| src/seclab_taskflows/taskflows/audit/classify_application_local.yaml | Replaces memcache toolbox dependency with ghsa toolbox for advisory-aware classification. |
| src/seclab_taskflows/taskflows/audit/audit_issue_local_iter.yaml | Replaces memcache toolbox dependency with ghsa toolbox for advisory-aware auditing. |
| src/seclab_taskflows/prompts/audit/known_security_advisories.yaml | Switches advisory lookup instructions from memcache to DB-backed GHSA/GHSASummary. |
| src/seclab_taskflows/mcp_servers/ghsa_models.py | Introduces SQLAlchemy models/tables for GHSA and GHSASummary persistence. |
| src/seclab_taskflows/mcp_servers/ghsa.py | Implements SQLite backend + new MCP tools for storing and retrieving advisories/summaries. |
Comments suppressed due to low confidence (1)
src/seclab_taskflows/taskflows/audit/fetch_security_advisories.yaml:29
- Spelling/grammar: "provide a summary of the findings including." should not end with a period (use a colon or remove the punctuation) to avoid a sentence fragment.
If one or more advisories are found, provide a summary of the findings including.
The ghsa_id of each advisory, followed by the severity and type of vulnerability, together with the location of the vulnerable code if available.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for advisory in results: | ||
| backend.store_new_ghsa( | ||
| process_repo(owner, repo), | ||
| advisory["ghsa_id"], | ||
| advisory["severity"], | ||
| advisory["cve_id"], | ||
| advisory["description"], | ||
| advisory["summary"], | ||
| advisory["published_at"], | ||
| advisory["state"], | ||
| ) |
There was a problem hiding this comment.
fetch_and_store_GHSA_list calls backend.store_new_ghsa in a loop, and store_new_ghsa opens a new SQLAlchemy Session and commits per advisory. This is likely to be slow for repos with many advisories. Prefer doing the upsert work in a single session/transaction for the whole batch.
| Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }} | ||
| with the return_results option. | ||
|
|
||
| If an error occurs during fetching, store the error message in memcache under the key 'security_advisories_{{ globals.repo }}'. | ||
| If an error occurs during fetching, store a GHSASummary with the error message. | ||
| Ensure the error message starts with "Error:" followed by a description of the error. | ||
|
|
||
| If fetching is successful, store the list of advisories in memcache under the key 'security_advisories_{{ globals.repo }}'. | ||
|
|
||
| If one or more advisories are found, provide a summary of the findings including: | ||
| 1. How many advisories were found | ||
| 2. The severity levels of the advisories | ||
| If one or more advisories are found, provide a summary of the findings including. | ||
| The ghsa_id of each advisory, followed by the severity and type of vulnerability, together with the location of the vulnerable code if available. | ||
| and store it in a GHSASummary entry in the database. |
There was a problem hiding this comment.
Prompt text has a few issues that may confuse the agent: "with the return_results option" is ambiguous (it’s a boolean, but the prompt doesn’t say true/false), and "provide a summary of the findings including." has a stray period. Consider clarifying return_results: true/false explicitly and fixing the punctuation to make the instructions unambiguous.
This issue also appears on line 28 of the same file.
See below for a potential fix:
with the `return_results` option set to `false`.
If an error occurs during fetching, store a GHSASummary with the error message.
Ensure the error message starts with "Error:" followed by a description of the error.
If one or more advisories are found, provide a summary of the findings, including the ghsa_id of each advisory, the severity and type of vulnerability, and the location of the vulnerable code if available, and store this summary in a GHSASummary entry in the database.
toolboxes:
- seclab_taskflows.toolboxes.ghsa
- seclab_taskflows.toolboxes.local_file_viewer
- seclab_taskflows.toolboxes.ghsa
- seclab_taskflows.toolboxes.local_file_viewer
| """Fetch all GitHub Security Advisories (GHSAs) for a specific repository and store them in the database.""" | ||
| results = await fetch_GHSA_list_from_gh(owner, repo) | ||
| if isinstance(results, str): | ||
| return results |
There was a problem hiding this comment.
fetch_and_store_GHSA_list returns raw error strings from fetch_GHSA_list_from_gh (e.g., "Request error: ...") without normalizing them. Since the taskflow prompt requires errors to start with "Error:", consider wrapping/standardizing error returns here (and/or in fetch_GHSA_list_from_gh) so downstream prompts can reliably detect failures.
| @@ -6,7 +6,12 @@ | |||
| import json | |||
There was a problem hiding this comment.
This file appears to lack the standard SPDX copyright/license header used by other modules in src/seclab_taskflows/mcp_servers/. Consider adding the SPDX header above the imports to keep licensing metadata consistent.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Removed the function get_ghsa_with_id_from_db that retrieves a GHSA advisory record by ID.
Clarified the user prompt for fetching GitHub Security Advisories by specifying that the return_results option should be set to true.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }} | ||
| with the return_results option set to true. |
There was a problem hiding this comment.
The updated prompt text has minor grammar/punctuation issues (missing period after the repo interpolation, and a trailing space after "option."). Since this text directly steers tool usage, please tighten it up so it’s unambiguous and consistently formatted.
| Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }} | |
| with the return_results option set to true. | |
| Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }}. | |
| Set the return_results option to true. |
| if not Path(self.db_dir).exists(): | ||
| db_uri = "sqlite://" | ||
| else: | ||
| db_uri = f"sqlite:///{self.db_dir}/ghsa.db" |
There was a problem hiding this comment.
If the configured GHSA_DIR directory doesn’t exist, this falls back to an in-memory SQLite database ("sqlite://"), which defeats the goal of persisting advisories across runs. Consider creating the directory (mkdir parents/exist_ok) and always using a file-backed DB URI, or returning an explicit error if the directory cannot be created.
| if not Path(self.db_dir).exists(): | |
| db_uri = "sqlite://" | |
| else: | |
| db_uri = f"sqlite:///{self.db_dir}/ghsa.db" | |
| db_dir_path = Path(self.db_dir) | |
| try: | |
| db_dir_path.mkdir(parents=True, exist_ok=True) | |
| except OSError as e: | |
| logging.error("Failed to create GHSA database directory '%s': %s", self.db_dir, e) | |
| raise RuntimeError(f"Cannot create GHSA database directory: {self.db_dir}") from e | |
| db_path = db_dir_path / "ghsa.db" | |
| db_uri = f"sqlite:///{db_path}" |
| for advisory in results: | ||
| backend.store_new_ghsa( | ||
| process_repo(owner, repo), | ||
| advisory["ghsa_id"], | ||
| advisory["severity"], | ||
| advisory["cve_id"], | ||
| advisory["description"], | ||
| advisory["summary"], | ||
| advisory["published_at"], | ||
| advisory["state"], | ||
| ) |
There was a problem hiding this comment.
fetch_and_store_GHSA_list stores each advisory by calling backend.store_new_ghsa(...) inside a loop, which opens a new Session and commits for every row. This is unnecessarily expensive for large advisory lists; prefer using a single Session/transaction for the whole batch (and commit once), or add a backend method that upserts the whole list in one go.
| for advisory in results: | |
| backend.store_new_ghsa( | |
| process_repo(owner, repo), | |
| advisory["ghsa_id"], | |
| advisory["severity"], | |
| advisory["cve_id"], | |
| advisory["description"], | |
| advisory["summary"], | |
| advisory["published_at"], | |
| advisory["state"], | |
| ) | |
| # Use a single Session/transaction for the whole batch to avoid per-row session/commit overhead. | |
| repo_identifier = process_repo(owner, repo) | |
| with Session(bind=engine) as session: | |
| for advisory in results: | |
| ghsa_obj = GHSA( | |
| repo=repo_identifier, | |
| ghsa_id=advisory["ghsa_id"], | |
| severity=advisory["severity"], | |
| cve_id=advisory["cve_id"], | |
| description=advisory["description"], | |
| summary=advisory["summary"], | |
| published_at=advisory["published_at"], | |
| state=advisory["state"], | |
| ) | |
| # merge provides upsert-like behaviour if a matching row already exists | |
| session.merge(ghsa_obj) | |
| session.commit() |
| existing.high_severity_count = high_severity_count | ||
| existing.medium_severity_count = medium_severity_count | ||
| existing.low_severity_count = low_severity_count | ||
| existing.summary_notes = (existing.summary_notes or "") + (summary_notes or "") |
There was a problem hiding this comment.
store_new_ghsa_summary appends to summary_notes even on the "store" path (existing.summary_notes = ... + ...). This makes add_ghsa_summary_notes largely redundant and can cause unbounded growth / stale errors being carried forward. Consider making store_new_ghsa_summary overwrite summary_notes and keeping all appending behavior in add_ghsa_summary_notes (ideally with a delimiter like a newline).
| existing.summary_notes = (existing.summary_notes or "") + (summary_notes or "") | |
| existing.summary_notes = summary_notes |
| """Get all GHSA advisory records for a repository.""" | ||
| return json.dumps(backend.get_ghsas(process_repo(owner, repo))) | ||
|
|
||
| @mcp.tool() | ||
| def store_new_ghsa_summary( |
There was a problem hiding this comment.
get_ghsa_with_id_from_db appears to be functionally identical to get_ghsa_from_db (same parameters and behavior). Consider removing one, or making one an alias/wrapper, to avoid divergence and extra maintenance.
|
|
||
| If an error occurs during fetching, store the error message in memcache under the key 'security_advisories_{{ globals.repo }}'. | ||
| If an error occurs during fetching, store a GHSASummary with the error message. | ||
| Ensure the error message starts with "Error:" followed by a description of the error. |
There was a problem hiding this comment.
This prompt requires any fetch error message to start with "Error:", but the GHSA MCP server can also return non-prefixed failure strings (e.g. "Could not parse response" / "No advisories found."). Either relax this requirement in the prompt, or update the GHSA server to consistently prefix non-success terminal returns with "Error:" so the model can comply deterministically.
| Ensure the error message starts with "Error:" followed by a description of the error. | |
| Use the error text exactly as returned by the GHSA toolbox; do not assume it starts with any particular prefix. |
|
@m-y-mo thanks for creating this PR. Removing the memcache makes sense, and it would also allow the user to see exactly which advisories the workflows sees. Looks like the linter is complaining 🔮 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return backend.store_new_ghsa_summary( | ||
| repo_name, | ||
| existing["total_advisories"], | ||
| existing["high_severity_count"], | ||
| existing["medium_severity_count"], | ||
| existing["low_severity_count"], | ||
| summary_notes, |
There was a problem hiding this comment.
add_ghsa_summary_notes is documented/named as an append operation, but it overwrites summary_notes by calling store_new_ghsa_summary(..., summary_notes) without combining with the existing notes. This makes it impossible to accumulate notes and is inconsistent with the tool name/docstring. Consider appending to existing["summary_notes"] (with a separator) rather than replacing it.
| return backend.store_new_ghsa_summary( | |
| repo_name, | |
| existing["total_advisories"], | |
| existing["high_severity_count"], | |
| existing["medium_severity_count"], | |
| existing["low_severity_count"], | |
| summary_notes, | |
| # Combine existing summary notes with the new notes, if any. | |
| existing_notes = existing.get("summary_notes") or "" | |
| if summary_notes: | |
| if existing_notes: | |
| combined_notes = existing_notes + "\n" + summary_notes | |
| else: | |
| combined_notes = summary_notes | |
| else: | |
| combined_notes = existing_notes | |
| return backend.store_new_ghsa_summary( | |
| repo_name, | |
| existing["total_advisories"], | |
| existing["high_severity_count"], | |
| existing["medium_severity_count"], | |
| existing["low_severity_count"], | |
| combined_notes, |
| """Fetch all GitHub Security Advisories (GHSAs) for a specific repository and store them in the database.""" | ||
| results = await fetch_GHSA_list_from_gh(owner, repo) | ||
| if isinstance(results, str): | ||
| return results | ||
| for advisory in results: | ||
| backend.store_new_ghsa( | ||
| process_repo(owner, repo), | ||
| advisory["ghsa_id"], | ||
| advisory["severity"], | ||
| advisory["cve_id"], | ||
| advisory["description"], | ||
| advisory["summary"], | ||
| advisory["published_at"], | ||
| advisory["state"], | ||
| ) | ||
| if return_results: | ||
| return json.dumps(results, indent=2) | ||
| return f"Fetched and stored {len(results)} GHSAs for {owner}/{repo}" |
There was a problem hiding this comment.
fetch_and_store_GHSA_list stores new GHSA rows but never clears existing rows for the repo. If the repo’s advisories are removed/withdrawn, or if a fetch returns "No advisories found." / an error, stale advisories (and any previous summary) will remain in the DB and be treated as current by downstream prompts. Consider clearing existing GHSA/GHSASummary for the repo at the start of this tool (or explicitly handling the no-advisories/error cases by clearing/updating the summary).
Add newline at the end of the file for consistency.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class Base(DeclarativeBase): | ||
| pass | ||
|
|
||
| class GHSA(Base): | ||
| __tablename__ = "ghsa" | ||
|
|
||
| id: Mapped[int] = mapped_column(primary_key=True) | ||
| ghsa_id: Mapped[str] | ||
| repo: Mapped[str] | ||
| severity: Mapped[str] | ||
| cve_id: Mapped[Optional[str]] = mapped_column(nullable=True) | ||
| description: Mapped[Optional[str]] = mapped_column(Text, nullable=True) | ||
| summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True) | ||
| published_at: Mapped[Optional[str]] = mapped_column(nullable=True) | ||
| state: Mapped[Optional[str]] = mapped_column(nullable=True) | ||
|
|
||
| def __repr__(self): | ||
| return ( | ||
| f"<GHSA(id={self.id}, ghsa_id={self.ghsa_id}, repo={self.repo}, " | ||
| f"severity={self.severity}, cve_id={self.cve_id}, description={self.description}, summary={self.summary}, " | ||
| f"published_at={self.published_at}, state={self.state})>" | ||
| ) | ||
|
|
||
| class GHSASummary(Base): | ||
| __tablename__ = "ghsa_summary" |
There was a problem hiding this comment.
ghsa_models.py is missing the required two blank lines between top-level class definitions (e.g., between Base and GHSA, and between GHSA and GHSASummary). This will likely fail Ruff/pycodestyle (E302). Add an extra blank line before each top-level class (except the first one).
| """Fetch all GitHub Security Advisories (GHSAs) for a specific repository and store them in the database.""" | ||
| results = await fetch_GHSA_list_from_gh(owner, repo) | ||
| if isinstance(results, str): | ||
| return results | ||
| for advisory in results: |
There was a problem hiding this comment.
fetch_and_store_GHSA_list returns raw error strings from fetch_GHSA_list_from_gh unchanged. Those errors currently come back as e.g. "Request error: ..." / "HTTP error: ..." (no leading "Error:"), which conflicts with the updated taskflow prompt that requires error messages to start with "Error:". Consider normalizing error returns here (and/or in fetch_GHSA_list_from_gh) to consistently prefix errors with "Error:".
| return backend.store_new_ghsa_summary( | ||
| repo_name, | ||
| existing["total_advisories"], | ||
| existing["high_severity_count"], | ||
| existing["medium_severity_count"], | ||
| existing["low_severity_count"], | ||
| summary_notes, |
There was a problem hiding this comment.
add_ghsa_summary_notes is documented/named as appending notes, but it overwrites summary_notes by passing only the new notes into store_new_ghsa_summary. Either actually append to the existing summary_notes value (e.g., concatenate with a separator) or rename/update the docstring to reflect overwrite behavior.
| return backend.store_new_ghsa_summary( | |
| repo_name, | |
| existing["total_advisories"], | |
| existing["high_severity_count"], | |
| existing["medium_severity_count"], | |
| existing["low_severity_count"], | |
| summary_notes, | |
| existing_notes = existing.get("summary_notes") or "" | |
| new_notes = summary_notes or "" | |
| if existing_notes and new_notes: | |
| combined_notes = existing_notes.rstrip() + "\n\n" + new_notes.lstrip() | |
| elif new_notes: | |
| combined_notes = new_notes | |
| else: | |
| combined_notes = existing_notes | |
| return backend.store_new_ghsa_summary( | |
| repo_name, | |
| existing["total_advisories"], | |
| existing["high_severity_count"], | |
| existing["medium_severity_count"], | |
| existing["low_severity_count"], | |
| combined_notes, |
|
@Kwstubbs Thanks. I've addressed your comment and fixed the linter errors. |
Renamed function to better reflect its purpose.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
@Kwstubbs The PR #54 added memcache toolbox to the
classify_application_localandaudit_issue_local_itertaskflows, which sometimes causes these taskflows to store audit/issue notes in memcache. In some cases, the notes are not labelled with issue id and got mixed up/discarded, and sometimes the taskflows may also get stuck asking for a confirmation to clear the memcache. This PR remove the memcache toolbox dependency and use a dedicated database table for advisory information.