Skip to content

use dedicated database for advisories#69

Merged
m-y-mo merged 9 commits intomainfrom
add_advisory_store
Mar 31, 2026
Merged

use dedicated database for advisories#69
m-y-mo merged 9 commits intomainfrom
add_advisory_store

Conversation

@m-y-mo
Copy link
Copy Markdown
Contributor

@m-y-mo m-y-mo commented Mar 26, 2026

@Kwstubbs The PR #54 added memcache toolbox to the classify_application_local and audit_issue_local_iter taskflows, which sometimes causes these taskflows to store audit/issue notes in memcache. In some cases, the notes are not labelled with issue id and got mixed up/discarded, and sometimes the taskflows may also get stuck asking for a confirmation to clear the memcache. This PR remove the memcache toolbox dependency and use a dedicated database table for advisory information.

Copilot AI review requested due to automatic review settings March 26, 2026 14:36
@m-y-mo m-y-mo requested a review from Kwstubbs as a code owner March 26, 2026 14:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the previous memcache-based storage of GitHub Security Advisory (GHSA) data in audit taskflows with a dedicated SQL (SQLite) persistence layer exposed via the existing ghsa MCP server/toolbox, to prevent advisory notes from being mixed/discarded and avoid memcache-clear confirmation stalls.

Changes:

  • Update audit taskflows/prompts to stop using the memcache toolbox and instead use the ghsa toolbox + DB-backed GHSA/GHSASummary records.
  • Extend the ghsa toolbox/server configuration to include a dedicated data directory env var for persistence.
  • Add SQLAlchemy models and MCP tools to store/fetch/clear GHSA advisories and summary records.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/seclab_taskflows/toolboxes/ghsa.yaml Adds GHSA_DIR env to back the dedicated advisory DB storage location.
src/seclab_taskflows/taskflows/audit/fetch_security_advisories.yaml Updates fetching instructions/toolboxes to use ghsa DB storage instead of memcache.
src/seclab_taskflows/taskflows/audit/classify_application_local.yaml Replaces memcache toolbox dependency with ghsa toolbox for advisory-aware classification.
src/seclab_taskflows/taskflows/audit/audit_issue_local_iter.yaml Replaces memcache toolbox dependency with ghsa toolbox for advisory-aware auditing.
src/seclab_taskflows/prompts/audit/known_security_advisories.yaml Switches advisory lookup instructions from memcache to DB-backed GHSA/GHSASummary.
src/seclab_taskflows/mcp_servers/ghsa_models.py Introduces SQLAlchemy models/tables for GHSA and GHSASummary persistence.
src/seclab_taskflows/mcp_servers/ghsa.py Implements SQLite backend + new MCP tools for storing and retrieving advisories/summaries.
Comments suppressed due to low confidence (1)

src/seclab_taskflows/taskflows/audit/fetch_security_advisories.yaml:29

  • Spelling/grammar: "provide a summary of the findings including." should not end with a period (use a colon or remove the punctuation) to avoid a sentence fragment.
        If one or more advisories are found, provide a summary of the findings including.
        The ghsa_id of each advisory, followed by the severity and type of vulnerability, together with the location of the vulnerable code if available.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +223 to +233
for advisory in results:
backend.store_new_ghsa(
process_repo(owner, repo),
advisory["ghsa_id"],
advisory["severity"],
advisory["cve_id"],
advisory["description"],
advisory["summary"],
advisory["published_at"],
advisory["state"],
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_and_store_GHSA_list calls backend.store_new_ghsa in a loop, and store_new_ghsa opens a new SQLAlchemy Session and commits per advisory. This is likely to be slow for repos with many advisories. Prefer doing the upsert work in a single session/transaction for the whole batch.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +30
Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }}
with the return_results option.

If an error occurs during fetching, store the error message in memcache under the key 'security_advisories_{{ globals.repo }}'.
If an error occurs during fetching, store a GHSASummary with the error message.
Ensure the error message starts with "Error:" followed by a description of the error.

If fetching is successful, store the list of advisories in memcache under the key 'security_advisories_{{ globals.repo }}'.

If one or more advisories are found, provide a summary of the findings including:
1. How many advisories were found
2. The severity levels of the advisories
If one or more advisories are found, provide a summary of the findings including.
The ghsa_id of each advisory, followed by the severity and type of vulnerability, together with the location of the vulnerable code if available.
and store it in a GHSASummary entry in the database.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompt text has a few issues that may confuse the agent: "with the return_results option" is ambiguous (it’s a boolean, but the prompt doesn’t say true/false), and "provide a summary of the findings including." has a stray period. Consider clarifying return_results: true/false explicitly and fixing the punctuation to make the instructions unambiguous.

This issue also appears on line 28 of the same file.

See below for a potential fix:

        with the `return_results` option set to `false`.

        If an error occurs during fetching, store a GHSASummary with the error message.
        Ensure the error message starts with "Error:" followed by a description of the error.

        If one or more advisories are found, provide a summary of the findings, including the ghsa_id of each advisory, the severity and type of vulnerability, and the location of the vulnerable code if available, and store this summary in a GHSASummary entry in the database.
      toolboxes:
        - seclab_taskflows.toolboxes.ghsa
        - seclab_taskflows.toolboxes.local_file_viewer

        - seclab_taskflows.toolboxes.ghsa
        - seclab_taskflows.toolboxes.local_file_viewer

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +222
"""Fetch all GitHub Security Advisories (GHSAs) for a specific repository and store them in the database."""
results = await fetch_GHSA_list_from_gh(owner, repo)
if isinstance(results, str):
return results
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_and_store_GHSA_list returns raw error strings from fetch_GHSA_list_from_gh (e.g., "Request error: ...") without normalizing them. Since the taskflow prompt requires errors to start with "Error:", consider wrapping/standardizing error returns here (and/or in fetch_GHSA_list_from_gh) so downstream prompts can reliably detect failures.

Copilot uses AI. Check for mistakes.
@@ -6,7 +6,12 @@
import json
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file appears to lack the standard SPDX copyright/license header used by other modules in src/seclab_taskflows/mcp_servers/. Consider adding the SPDX header above the imports to keep licensing metadata consistent.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 14:57
m-y-mo added 3 commits March 26, 2026 14:58
Removed the function get_ghsa_with_id_from_db that retrieves a GHSA advisory record by ID.
Clarified the user prompt for fetching GitHub Security Advisories by specifying that the return_results option should be set to true.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +23
Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }}
with the return_results option set to true.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated prompt text has minor grammar/punctuation issues (missing period after the repo interpolation, and a trailing space after "option."). Since this text directly steers tool usage, please tighten it up so it’s unambiguous and consistently formatted.

Suggested change
Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }}
with the return_results option set to true.
Fetch and store all GitHub Security Advisories (GHSAs) for the repo {{ globals.repo }}.
Set the return_results option to true.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +63
if not Path(self.db_dir).exists():
db_uri = "sqlite://"
else:
db_uri = f"sqlite:///{self.db_dir}/ghsa.db"
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the configured GHSA_DIR directory doesn’t exist, this falls back to an in-memory SQLite database ("sqlite://"), which defeats the goal of persisting advisories across runs. Consider creating the directory (mkdir parents/exist_ok) and always using a file-backed DB URI, or returning an explicit error if the directory cannot be created.

Suggested change
if not Path(self.db_dir).exists():
db_uri = "sqlite://"
else:
db_uri = f"sqlite:///{self.db_dir}/ghsa.db"
db_dir_path = Path(self.db_dir)
try:
db_dir_path.mkdir(parents=True, exist_ok=True)
except OSError as e:
logging.error("Failed to create GHSA database directory '%s': %s", self.db_dir, e)
raise RuntimeError(f"Cannot create GHSA database directory: {self.db_dir}") from e
db_path = db_dir_path / "ghsa.db"
db_uri = f"sqlite:///{db_path}"

Copilot uses AI. Check for mistakes.
Comment on lines +226 to +236
for advisory in results:
backend.store_new_ghsa(
process_repo(owner, repo),
advisory["ghsa_id"],
advisory["severity"],
advisory["cve_id"],
advisory["description"],
advisory["summary"],
advisory["published_at"],
advisory["state"],
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_and_store_GHSA_list stores each advisory by calling backend.store_new_ghsa(...) inside a loop, which opens a new Session and commits for every row. This is unnecessarily expensive for large advisory lists; prefer using a single Session/transaction for the whole batch (and commit once), or add a backend method that upserts the whole list in one go.

Suggested change
for advisory in results:
backend.store_new_ghsa(
process_repo(owner, repo),
advisory["ghsa_id"],
advisory["severity"],
advisory["cve_id"],
advisory["description"],
advisory["summary"],
advisory["published_at"],
advisory["state"],
)
# Use a single Session/transaction for the whole batch to avoid per-row session/commit overhead.
repo_identifier = process_repo(owner, repo)
with Session(bind=engine) as session:
for advisory in results:
ghsa_obj = GHSA(
repo=repo_identifier,
ghsa_id=advisory["ghsa_id"],
severity=advisory["severity"],
cve_id=advisory["cve_id"],
description=advisory["description"],
summary=advisory["summary"],
published_at=advisory["published_at"],
state=advisory["state"],
)
# merge provides upsert-like behaviour if a matching row already exists
session.merge(ghsa_obj)
session.commit()

Copilot uses AI. Check for mistakes.
existing.high_severity_count = high_severity_count
existing.medium_severity_count = medium_severity_count
existing.low_severity_count = low_severity_count
existing.summary_notes = (existing.summary_notes or "") + (summary_notes or "")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store_new_ghsa_summary appends to summary_notes even on the "store" path (existing.summary_notes = ... + ...). This makes add_ghsa_summary_notes largely redundant and can cause unbounded growth / stale errors being carried forward. Consider making store_new_ghsa_summary overwrite summary_notes and keeping all appending behavior in add_ghsa_summary_notes (ideally with a delimiter like a newline).

Suggested change
existing.summary_notes = (existing.summary_notes or "") + (summary_notes or "")
existing.summary_notes = summary_notes

Copilot uses AI. Check for mistakes.
Comment on lines +277 to +281
"""Get all GHSA advisory records for a repository."""
return json.dumps(backend.get_ghsas(process_repo(owner, repo)))

@mcp.tool()
def store_new_ghsa_summary(
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_ghsa_with_id_from_db appears to be functionally identical to get_ghsa_from_db (same parameters and behavior). Consider removing one, or making one an alias/wrapper, to avoid divergence and extra maintenance.

Copilot uses AI. Check for mistakes.

If an error occurs during fetching, store the error message in memcache under the key 'security_advisories_{{ globals.repo }}'.
If an error occurs during fetching, store a GHSASummary with the error message.
Ensure the error message starts with "Error:" followed by a description of the error.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prompt requires any fetch error message to start with "Error:", but the GHSA MCP server can also return non-prefixed failure strings (e.g. "Could not parse response" / "No advisories found."). Either relax this requirement in the prompt, or update the GHSA server to consistently prefix non-success terminal returns with "Error:" so the model can comply deterministically.

Suggested change
Ensure the error message starts with "Error:" followed by a description of the error.
Use the error text exactly as returned by the GHSA toolbox; do not assume it starts with any particular prefix.

Copilot uses AI. Check for mistakes.
@Kwstubbs
Copy link
Copy Markdown
Contributor

Kwstubbs commented Mar 27, 2026

@m-y-mo thanks for creating this PR. Removing the memcache makes sense, and it would also allow the user to see exactly which advisories the workflows sees. Looks like the linter is complaining 🔮

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 27, 2026 08:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +312 to +318
return backend.store_new_ghsa_summary(
repo_name,
existing["total_advisories"],
existing["high_severity_count"],
existing["medium_severity_count"],
existing["low_severity_count"],
summary_notes,
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_ghsa_summary_notes is documented/named as an append operation, but it overwrites summary_notes by calling store_new_ghsa_summary(..., summary_notes) without combining with the existing notes. This makes it impossible to accumulate notes and is inconsistent with the tool name/docstring. Consider appending to existing["summary_notes"] (with a separator) rather than replacing it.

Suggested change
return backend.store_new_ghsa_summary(
repo_name,
existing["total_advisories"],
existing["high_severity_count"],
existing["medium_severity_count"],
existing["low_severity_count"],
summary_notes,
# Combine existing summary notes with the new notes, if any.
existing_notes = existing.get("summary_notes") or ""
if summary_notes:
if existing_notes:
combined_notes = existing_notes + "\n" + summary_notes
else:
combined_notes = summary_notes
else:
combined_notes = existing_notes
return backend.store_new_ghsa_summary(
repo_name,
existing["total_advisories"],
existing["high_severity_count"],
existing["medium_severity_count"],
existing["low_severity_count"],
combined_notes,

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +239
"""Fetch all GitHub Security Advisories (GHSAs) for a specific repository and store them in the database."""
results = await fetch_GHSA_list_from_gh(owner, repo)
if isinstance(results, str):
return results
for advisory in results:
backend.store_new_ghsa(
process_repo(owner, repo),
advisory["ghsa_id"],
advisory["severity"],
advisory["cve_id"],
advisory["description"],
advisory["summary"],
advisory["published_at"],
advisory["state"],
)
if return_results:
return json.dumps(results, indent=2)
return f"Fetched and stored {len(results)} GHSAs for {owner}/{repo}"
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_and_store_GHSA_list stores new GHSA rows but never clears existing rows for the repo. If the repo’s advisories are removed/withdrawn, or if a fetch returns "No advisories found." / an error, stale advisories (and any previous summary) will remain in the DB and be treated as current by downstream prompts. Consider clearing existing GHSA/GHSASummary for the repo at the start of this tool (or explicitly handling the no-advisories/error cases by clearing/updating the summary).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 27, 2026 08:20
Add newline at the end of the file for consistency.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +9 to +33
class Base(DeclarativeBase):
pass

class GHSA(Base):
__tablename__ = "ghsa"

id: Mapped[int] = mapped_column(primary_key=True)
ghsa_id: Mapped[str]
repo: Mapped[str]
severity: Mapped[str]
cve_id: Mapped[Optional[str]] = mapped_column(nullable=True)
description: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
published_at: Mapped[Optional[str]] = mapped_column(nullable=True)
state: Mapped[Optional[str]] = mapped_column(nullable=True)

def __repr__(self):
return (
f"<GHSA(id={self.id}, ghsa_id={self.ghsa_id}, repo={self.repo}, "
f"severity={self.severity}, cve_id={self.cve_id}, description={self.description}, summary={self.summary}, "
f"published_at={self.published_at}, state={self.state})>"
)

class GHSASummary(Base):
__tablename__ = "ghsa_summary"
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ghsa_models.py is missing the required two blank lines between top-level class definitions (e.g., between Base and GHSA, and between GHSA and GHSASummary). This will likely fail Ruff/pycodestyle (E302). Add an extra blank line before each top-level class (except the first one).

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +223
"""Fetch all GitHub Security Advisories (GHSAs) for a specific repository and store them in the database."""
results = await fetch_GHSA_list_from_gh(owner, repo)
if isinstance(results, str):
return results
for advisory in results:
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_and_store_GHSA_list returns raw error strings from fetch_GHSA_list_from_gh unchanged. Those errors currently come back as e.g. "Request error: ..." / "HTTP error: ..." (no leading "Error:"), which conflicts with the updated taskflow prompt that requires error messages to start with "Error:". Consider normalizing error returns here (and/or in fetch_GHSA_list_from_gh) to consistently prefix errors with "Error:".

Copilot uses AI. Check for mistakes.
Comment on lines +309 to +315
return backend.store_new_ghsa_summary(
repo_name,
existing["total_advisories"],
existing["high_severity_count"],
existing["medium_severity_count"],
existing["low_severity_count"],
summary_notes,
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_ghsa_summary_notes is documented/named as appending notes, but it overwrites summary_notes by passing only the new notes into store_new_ghsa_summary. Either actually append to the existing summary_notes value (e.g., concatenate with a separator) or rename/update the docstring to reflect overwrite behavior.

Suggested change
return backend.store_new_ghsa_summary(
repo_name,
existing["total_advisories"],
existing["high_severity_count"],
existing["medium_severity_count"],
existing["low_severity_count"],
summary_notes,
existing_notes = existing.get("summary_notes") or ""
new_notes = summary_notes or ""
if existing_notes and new_notes:
combined_notes = existing_notes.rstrip() + "\n\n" + new_notes.lstrip()
elif new_notes:
combined_notes = new_notes
else:
combined_notes = existing_notes
return backend.store_new_ghsa_summary(
repo_name,
existing["total_advisories"],
existing["high_severity_count"],
existing["medium_severity_count"],
existing["low_severity_count"],
combined_notes,

Copilot uses AI. Check for mistakes.
@m-y-mo
Copy link
Copy Markdown
Contributor Author

m-y-mo commented Mar 27, 2026

@Kwstubbs Thanks. I've addressed your comment and fixed the linter errors.

Renamed function to better reflect its purpose.
Copilot AI review requested due to automatic review settings March 27, 2026 08:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@m-y-mo m-y-mo merged commit 3cdb7d0 into main Mar 31, 2026
13 checks passed
@m-y-mo m-y-mo deleted the add_advisory_store branch March 31, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants