Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for JavaScript #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Add support for JavaScript #593
Changes from all commits
b1a6b81File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 1 day ago
In general, to fix this you should ensure that any filesystem path derived from user input is constrained to a safe root directory and normalized before use. That means: (1) define a base directory under which all analyses must occur; (2) resolve the user-supplied path relative to that base; (3) reject anything whose normalized/real path escapes that base (e.g., via
..or absolute paths); and (4) only then pass the safe resolved path into the rest of the code. This prevents a client from causing the server to traverse or operate on arbitrary parts of the filesystem.For this codebase, the best minimally invasive fix is to introduce such validation in
SourceAnalyzer.analyze_local_folder, since that is the point where the untrustedpath: strfirst enters the analyzer layer and gets converted to aPath. We can:CODE_GRAPH_PROJECTS_ROOT, defaulting to the current working directory if unset. This keeps behavior similar while allowing operators to constrain where analyses can occur.analyze_local_folder, convert both the configured base path and the userpathinto absolute, resolvedPathobjects using.resolve().pathis not under the base (check viarelative_toor a simple prefix check), log an error and raise an exception instead of proceeding.Pathtoanalyze_sourcesso that downstream calls (path.rglob(...)) operate only within the validated directory tree.This change only touches
api/analyzers/source_analyzer.py, keeps the external API ofanalyze_local_folderunchanged, and preserves existing functionality for valid paths that lie under the configured base directory. We will add a small helper method insideSourceAnalyzerto encapsulate the “ensure path under base” logic and call it fromanalyze_local_folder. We do not need new imports beyond what already exists.Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 1 day ago
General approach: treat user-provided paths as relative or confined within a known safe root directory on the server, and validate/normalize them before using them as a root for filesystem traversal. Use
Path.resolve()and then verify that the resolved path is within a configured base directory (for example, the directory where repositories are stored). Reject or error out if the requested path escapes this root. This prevents a client from causing analysis of arbitrary directories like/etcor/.Best concrete fix here: centralize the trust boundary in
SourceAnalyzer.analyze_local_folderby:REPOS_ROOTunder the project). Since we must not assume wider project structure, we’ll use an environment variableCODEGRAPH_ROOTwith a reasonable default (current working directory).analyze_local_folder, convert the incomingpathstring into aPath, resolve it, and then check that it is contained within the safe root by comparingPath.is_relative_to(Python 3.9+) or an equivalent prefix check.analyze_sourceswith the resolved safePath.analyze_local_repository, which also takes a user-controlledpath, before interacting withpygit2.Repository.This confines all downstream uses, including the rglob on line 180, without changing the external API signatures or the functional behavior for legitimate, in-root paths. The only edits needed are within
api/analyzers/source_analyzer.py: adding imports foros(for environment variable) if needed, a helper to get/validate the root, and modifications toanalyze_local_folder(andanalyze_local_repository) to perform normalization and the “within root” check.Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 1 day ago
General approach: normalize and validate the user-supplied path before using it for filesystem traversal. Optionally (and recommended), restrict analysis to live under a configured safe root directory. Even if we cannot see configuration here, we can at least normalize and refuse paths that do not resolve to directories or that attempt to climb above an optional root.
Best concrete fix within the shown code:
SourceAnalyzer.analyze_local_folder, convert the incomingpath: strto a normalized, absolutePathusingPath(path).resolve(strict=True)inside atryblock.CODEGRAPH_REPOS_ROOT(a safe base directory). If set, ensure the requested path is inside that root. This mirrors the “safe-root” pattern in the background section and doesn’t change existing behavior when the env var is unset.Pathintoanalyze_sourcesinstead of creating a newPathfrom the raw string.Concretely, in
api/analyzers/source_analyzer.py:analyze_local_folderso that:pathwithbase_path = Path(path).resolve(strict=True).CODEGRAPH_REPOS_ROOTis set in the environment, also resolves that toroot = Path(os.environ["CODEGRAPH_REPOS_ROOT"]).resolve(strict=True)and verifiesstr(base_path).startswith(str(root)). If not, it raisesValueErroror logs and returns early.self.analyze_sources(base_path, ignore, g)instead ofPath(path).This adds robust validation at the boundary where untrusted data enters the analyzer, addresses all CodeQL variants that flow through this method, and preserves existing functionality when no root is configured.
We will need to add an
import osat the top ofapi/analyzers/source_analyzer.pyto access environment variables.Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 1 day ago
In general, to fix uncontrolled path usage you must (1) normalize the user-provided path, and (2) enforce that it lies within an allowed base directory (or otherwise belongs to an allow‑list) before using it. For directory‑wide operations like
rglob, define a server‑side base directory (e.g., from an environment variable or a constant), resolve both base and user path, and then verify that the user path is a subpath of the base. If the check fails, reject the request.For this codebase, the most direct and non‑disruptive fix is inside
SourceAnalyzer.analyze_sources, because that is wherePath(path)is used to traverse the filesystem withrglob. We can:CODE_GRAPH_BASE_DIR(defaulting to the current working directory or another reasonable root).path.pathis equal to the base dir or is a subdirectory of it. In Python ≥3.9 this can be done safely withPath.is_relative_to; we can implement a small helper usingPath.relative_toif needed.This keeps all existing behavior for callers who already point to directories under the chosen base, and restricts path traversal to safe locations. The only file that needs code changes is
api/analyzers/source_analyzer.py. We will:osto read the environment variable.analyze_sources, computebase_dir = Path(os.getenv("CODE_GRAPH_BASE_DIR", ".")).resolve()and validate that the resolvedpathis underbase_dirbefore doingrglob.ValueError(or similar) if the check fails.No changes are needed in
tests/index.pyorapi/index.pyto achieve the core mitigation, since their use ofSourceAnalyzerwill automatically be constrained by this validation.Uh oh!
There was an error while loading. Please reload this page.