Add support for JavaScript by gkorland · Pull Request #593 · FalkorDB/code-graph

gkorland · 2026-03-10T08:58:32Z

Migrated from falkordb/code-graph-backend#59

Summary

Add support for JavaScript code analysis using tree-sitter.

Changes:

New JavaScriptAnalyzer class using tree-sitter for JavaScript
Extracts functions and classes from JavaScript code
First and second pass analysis methods
Updated source_analyzer.py to include JavaScript
Added tree-sitter-javascript dependency

Resolves #540

Originally authored by @gkorland in falkordb/code-graph-backend#59

Migrated from FalkorDB/code-graph-backend PR #59. Original issue: FalkorDB/code-graph-backend#51 Resolves #540 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vercel · 2026-03-10T08:58:33Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
code-graph	Error		Mar 10, 2026 8:58am

coderabbitai · 2026-03-10T08:58:54Z

Warning

Rate limit exceeded

@gkorland has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 1 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9f28da76-7731-4a46-8888-68a08dc3b467

📥 Commits

Reviewing files that changed from the base of the PR and between e0bcdf6 and b1a6b81.

📒 Files selected for processing (4)

api/analyzers/javascript/__init__.py
api/analyzers/javascript/analyzer.py
api/analyzers/source_analyzer.py
pyproject.toml

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch backend/add-javascript-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


In general, to fix this you should ensure that any filesystem path derived from user input is constrained to a safe root directory and normalized before use. That means: (1) define a base directory under which all analyses must occur; (2) resolve the user-supplied path relative to that base; (3) reject anything whose normalized/real path escapes that base (e.g., via .. or absolute paths); and (4) only then pass the safe resolved path into the rest of the code. This prevents a client from causing the server to traverse or operate on arbitrary parts of the filesystem.

For this codebase, the best minimally invasive fix is to introduce such validation in SourceAnalyzer.analyze_local_folder, since that is the point where the untrusted path: str first enters the analyzer layer and gets converted to a Path. We can:

Decide on a safe root for analysis, e.g. an environment-variable-controlled base like CODE_GRAPH_PROJECTS_ROOT, defaulting to the current working directory if unset. This keeps behavior similar while allowing operators to constrain where analyses can occur.

In analyze_local_folder, convert both the configured base path and the user path into absolute, resolved Path objects using .resolve().

If path is not under the base (check via relative_to or a simple prefix check), log an error and raise an exception instead of proceeding.

Pass the resolved, safe Path to analyze_sources so that downstream calls (path.rglob(...)) operate only within the validated directory tree.

This change only touches api/analyzers/source_analyzer.py, keeps the external API of analyze_local_folder unchanged, and preserves existing functionality for valid paths that lie under the configured base directory. We will add a small helper method inside SourceAnalyzer to encapsulate the “ensure path under base” logic and call it from analyze_local_folder. We do not need new imports beyond what already exists.

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


General approach: treat user-provided paths as relative or confined within a known safe root directory on the server, and validate/normalize them before using them as a root for filesystem traversal. Use Path.resolve() and then verify that the resolved path is within a configured base directory (for example, the directory where repositories are stored). Reject or error out if the requested path escapes this root. This prevents a client from causing analysis of arbitrary directories like /etc or /.

Best concrete fix here: centralize the trust boundary in SourceAnalyzer.analyze_local_folder by:

Introducing a function that returns the safe root directory (e.g., from an environment variable or a default like REPOS_ROOT under the project). Since we must not assume wider project structure, we’ll use an environment variable CODEGRAPH_ROOT with a reasonable default (current working directory).

In analyze_local_folder, convert the incoming path string into a Path, resolve it, and then check that it is contained within the safe root by comparing Path.is_relative_to (Python 3.9+) or an equivalent prefix check.

If the check fails, log an error and raise an exception; if it passes, proceed to call analyze_sources with the resolved safe Path.

Optionally, apply similar confinement in analyze_local_repository, which also takes a user-controlled path, before interacting with pygit2.Repository.

This confines all downstream uses, including the rglob on line 180, without changing the external API signatures or the functional behavior for legitimate, in-root paths. The only edits needed are within api/analyzers/source_analyzer.py: adding imports for os (for environment variable) if needed, a helper to get/validate the root, and modifications to analyze_local_folder (and analyze_local_repository) to perform normalization and the “within root” check.

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


General approach: normalize and validate the user-supplied path before using it for filesystem traversal. Optionally (and recommended), restrict analysis to live under a configured safe root directory. Even if we cannot see configuration here, we can at least normalize and refuse paths that do not resolve to directories or that attempt to climb above an optional root.

Best concrete fix within the shown code:

In SourceAnalyzer.analyze_local_folder, convert the incoming path: str to a normalized, absolute Path using Path(path).resolve(strict=True) inside a try block.

Optionally support an environment variable CODEGRAPH_REPOS_ROOT (a safe base directory). If set, ensure the requested path is inside that root. This mirrors the “safe-root” pattern in the background section and doesn’t change existing behavior when the env var is unset.

Pass this validated Path into analyze_sources instead of creating a new Path from the raw string.

Keep the public function signature unchanged to avoid breaking callers.

Concretely, in api/analyzers/source_analyzer.py:

Modify analyze_local_folder so that:

It resolves path with base_path = Path(path).resolve(strict=True).

If CODEGRAPH_REPOS_ROOT is set in the environment, also resolves that to root = Path(os.environ["CODEGRAPH_REPOS_ROOT"]).resolve(strict=True) and verifies str(base_path).startswith(str(root)). If not, it raises ValueError or logs and returns early.

It calls self.analyze_sources(base_path, ignore, g) instead of Path(path).

This adds robust validation at the boundary where untrusted data enters the analyzer, addresses all CodeQL variants that flow through this method, and preserves existing functionality when no root is configured.

We will need to add an import os at the top of api/analyzers/source_analyzer.py to access environment variables.

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


In general, to fix uncontrolled path usage you must (1) normalize the user-provided path, and (2) enforce that it lies within an allowed base directory (or otherwise belongs to an allow‑list) before using it. For directory‑wide operations like rglob, define a server‑side base directory (e.g., from an environment variable or a constant), resolve both base and user path, and then verify that the user path is a subpath of the base. If the check fails, reject the request.

For this codebase, the most direct and non‑disruptive fix is inside SourceAnalyzer.analyze_sources, because that is where Path(path) is used to traverse the filesystem with rglob. We can:

Define an allowed root directory for analysis using an environment variable such as CODE_GRAPH_BASE_DIR (defaulting to the current working directory or another reasonable root).

Resolve both the base directory and the user‑supplied path.

Check that path is equal to the base dir or is a subdirectory of it. In Python ≥3.9 this can be done safely with Path.is_relative_to; we can implement a small helper using Path.relative_to if needed.

If the check fails, raise a clear exception instead of proceeding.

This keeps all existing behavior for callers who already point to directories under the chosen base, and restricts path traversal to safe locations. The only file that needs code changes is api/analyzers/source_analyzer.py. We will:

Import os to read the environment variable.

In analyze_sources, compute base_dir = Path(os.getenv("CODE_GRAPH_BASE_DIR", ".")).resolve() and validate that the resolved path is under base_dir before doing rglob.

Log an error and raise ValueError (or similar) if the check fails.

No changes are needed in tests/index.py or api/index.py to achieve the core mitigation, since their use of SourceAnalyzer will automatically be constrained by this validation.

api/analyzers/javascript/analyzer.py

+            heritage = entity.node.child_by_field_name('body')
+            if heritage is None:
+                return
+            superclass_node = entity.node.child_by_field_name('name')


Add support for JavaScript

b1a6b81

Migrated from FalkorDB/code-graph-backend PR #59. Original issue: FalkorDB/code-graph-backend#51 Resolves #540 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vercel bot had a problem deploying to Preview March 10, 2026 08:58 Failure

github-advanced-security bot found potential problems Mar 10, 2026

View reviewed changes

github-code-quality bot found potential problems Mar 10, 2026

View reviewed changes

api/analyzers/javascript/analyzer.py

heritage = entity.node.child_by_field_name('body')

if heritage is None:

return

superclass_node = entity.node.child_by_field_name('name')

@@ -1,6 +1,7 @@
             from contextlib import nullcontext
             from pathlib import Path
             from typing import Optional
+            import os
             from api.entities.entity import Entity
             from api.entities.file import File
@@ -184,6 +185,34 @@
                     # Second pass analysis of the source code
                     self.second_pass(graph, files, path)
+                def _resolve_and_validate_path(self, user_path: str) -> Path:
+                    """
+                    Resolve a user-supplied path against a safe root and ensure it does not escape.
+                    The safe root can be configured via the CODE_GRAPH_PROJECTS_ROOT environment
+                    variable; if unset, the current working directory is used.
+                    """
+                    base_dir_env = os.environ.get("CODE_GRAPH_PROJECTS_ROOT")
+                    if base_dir_env:
+                        base_dir = Path(base_dir_env)
+                    else:
+                        base_dir = Path.cwd()
+                    base_dir = base_dir.resolve()
+                    candidate = Path(user_path)
+                    if not candidate.is_absolute():
+                        candidate = base_dir / candidate
+                    candidate = candidate.resolve()
+                    try:
+                        # Ensure candidate is within base_dir
+                        candidate.relative_to(base_dir)
+                    except ValueError:
+                        logging.error(f"Requested path '{candidate}' is outside of allowed base directory '{base_dir}'")
+                        raise ValueError("Requested path is not allowed")
+                    return candidate
                 def analyze_local_folder(self, path: str, g: Graph, ignore: Optional[list[str]] = []) -> None:
                     """
                     Analyze path.
@@ -195,8 +224,11 @@
                     logging.info(f"Analyzing local folder {path}")
+                    # Resolve and validate the provided path against a safe root
+                    safe_path = self._resolve_and_validate_path(path)
                     # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    self.analyze_sources(safe_path, ignore, g)
                     logging.info("Done analyzing path")

@@ -1,6 +1,7 @@
             from contextlib import nullcontext
             from pathlib import Path
             from typing import Optional
+            import os
             from api.entities.entity import Entity
             from api.entities.file import File
@@ -21,6 +22,36 @@
             # Configure logging
             logging.basicConfig(level=logging.DEBUG, format='%(filename)s - %(asctime)s - %(levelname)s - %(message)s')
+            def _get_safe_root() -> Path:
+                """
+                Returns the root directory under which analysis is allowed.
+                The root can be configured via the CODEGRAPH_ROOT environment variable,
+                otherwise the current working directory is used.
+                """
+                root_env = os.environ.get("CODEGRAPH_ROOT")
+                if root_env:
+                    return Path(root_env).resolve()
+                return Path.cwd().resolve()
+            def _ensure_within_root(requested_path: Path) -> Path:
+                """
+                Resolve the requested path and ensure it resides within the safe root.
+                Raises a ValueError if the path is outside the allowed root.
+                """
+                safe_root = _get_safe_root()
+                resolved = requested_path.resolve()
+                try:
+                    # Python 3.9+: Path.is_relative_to
+                    if resolved.is_relative_to(safe_root):
+                        return resolved
+                except AttributeError:
+                    # Fallback for older Python versions
+                    if os.path.commonpath([str(safe_root), str(resolved)]) == str(safe_root):
+                        return resolved
+                raise ValueError(f"Path '{resolved}' is outside of the allowed root '{safe_root}'")
             # List of available analyzers
             analyzers: dict[str, AbstractAnalyzer] = {
                 # '.c': CAnalyzer(),
@@ -195,8 +226,15 @@
                     logging.info(f"Analyzing local folder {path}")
+                    # Normalize and validate that the path is within the allowed root
+                    try:
+                        target_path = _ensure_within_root(Path(path))
+                    except ValueError as e:
+                        logging.error(str(e))
+                        raise
                     # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    self.analyze_sources(target_path, ignore, g)
                     logging.info("Done analyzing path")
@@ -213,12 +250,19 @@
                     from pygit2.repository import Repository
-                    proj_name = Path(path).name
+                    # Normalize and validate repository path
+                    try:
+                        repo_path = _ensure_within_root(Path(path))
+                    except ValueError as e:
+                        logging.error(str(e))
+                        raise
+                    proj_name = repo_path.name
                     graph = Graph(proj_name)
-                    self.analyze_local_folder(path, graph, ignore)
+                    self.analyze_local_folder(str(repo_path), graph, ignore)
                     # Save processed commit hash to the DB
-                    repo = Repository(path)
+                    repo = Repository(str(repo_path))
                     current_commit = repo.walk(repo.head.target).__next__()
                     graph.set_graph_commit(current_commit.short_id)

@@ -18,6 +18,7 @@
             from multilspy.multilspy_logger import MultilspyLogger
             import logging
+            import os
             # Configure logging
             logging.basicConfig(level=logging.DEBUG, format='%(filename)s - %(asctime)s - %(levelname)s - %(message)s')
@@ -193,10 +194,29 @@
                         ignore (List(str)): List of paths to skip
                     """
-                    logging.info(f"Analyzing local folder {path}")
+                    try:
+                        base_path = Path(path).resolve(strict=True)
+                    except FileNotFoundError:
+                        logging.error("Path '%s' does not exist or is not accessible", path)
+                        return
+                    safe_root = os.environ.get("CODEGRAPH_REPOS_ROOT")
+                    if safe_root:
+                        try:
+                            root_path = Path(safe_root).resolve(strict=True)
+                        except FileNotFoundError:
+                            logging.error("Configured CODEGRAPH_REPOS_ROOT '%s' does not exist", safe_root)
+                            return
+                        base_path_str = str(base_path)
+                        root_path_str = str(root_path)
+                        if not base_path_str.startswith(root_path_str.rstrip(os.sep) + os.sep) and base_path_str != root_path_str:
+                            logging.error("Path '%s' is outside of allowed root '%s'", base_path, root_path)
+                            return
+                    logging.info(f"Analyzing local folder {base_path}")
                     # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    self.analyze_sources(base_path, ignore, g)
                     logging.info("Done analyzing path")

@@ -1,6 +1,7 @@
             from contextlib import nullcontext
             from pathlib import Path
             from typing import Optional
+            import os
             from api.entities.entity import Entity
             from api.entities.file import File
@@ -176,7 +177,16 @@
                     self.second_pass(graph, files, path)
                 def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
+                    # Resolve the target path and enforce that it lies within an allowed base directory.
                     path = path.resolve()
+                    base_dir_env = os.getenv("CODE_GRAPH_BASE_DIR", ".")
+                    base_dir = Path(base_dir_env).resolve()
+                    try:
+                        # This will raise ValueError if 'path' is not inside 'base_dir'.
+                        path.relative_to(base_dir)
+                    except ValueError:
+                        logging.error("Refusing to analyze path '%s' outside of base directory '%s'", path, base_dir)
+                        raise ValueError(f"Path '{path}' is not allowed for analysis")
                     files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))
                     # First pass analysis of the source code
                     self.first_pass(path, files, ignore, graph)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for JavaScript#593

Add support for JavaScript#593
gkorland wants to merge 1 commit intostagingfrom
backend/add-javascript-support

gkorland commented Mar 10, 2026

Uh oh!

vercel bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026

Rate limit exceeded

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gkorland commented Mar 10, 2026

Summary

Changes:

Uh oh!

vercel bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 10, 2026

Rate limit exceeded

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 10, 2026 •

edited

Loading