Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Kotlin language support #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Uh oh!
There was an error while loading. Please reload this page.
Add Kotlin language support #592
Changes from all commits
137ecbef55f13aFile filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate Kotlin superclass and interface symbols before storing them.
The class branch records the same captured
user_typenodes as bothbase_classandimplement_interface, while the object branch can only emitimplement_interface. That guarantees incorrect inheritance metadata for Kotlin declarations.🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
In the Pythontree-sitterpackage version 0.25.x, doesQueryexpose acaptures(node)method directly, or must captures be executed viaQueryCursor(query).captures(node)? Please use the official tree-sitter Python API documentation.💡 Result:
In
tree-sitter(py-tree-sitter) 0.25.x,Querydoes not expose acaptures(node)method directly—its documented methods are things likecapture_name,disable_capture, etc., but nocaptures. [1]To execute captures, you must use
QueryCursor.captures(node), i.e. create a cursor for the query and run captures through it (conceptually:QueryCursor(query).captures(node)). [2]Sources (official docs): [1], [2]
🏁 Script executed:
Repository: FalkorDB/code-graph
Length of output: 3029
🏁 Script executed:
Repository: FalkorDB/code-graph
Length of output: 1247
🏁 Script executed:
Repository: FalkorDB/code-graph
Length of output: 4566
Use
self._captures()instead of calling.captures()directly on query objects.The
Queryobject in tree-sitter 0.25.x does not expose a.captures()method. All six query calls at lines 71, 80, 88, 96, 103, and 110 must useself._captures(pattern, node)instead, which properly wraps the QueryCursor. Every other analyzer in the codebase follows this pattern; the Kotlin analyzer is the only one using the incorrect direct call syntax.🤖 Prompt for AI Agents
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
In general, to fix uncontrolled path usage, you should constrain user-provided paths to lie under a known safe root and validate them after normalization. That usually means: (1) decide on an allowed base directory (for example, a projects root), (2) convert user input into a
Pathrelative to that base or reject absolute paths, (3) call.resolve()(oros.path.realpath) and then confirm that the resolved path is within the base directory using a prefix or ancestor check (is_relative_toor try/except withrelative_to), and (4) only then use that path for filesystem traversal or as a workspace root.For this codebase, the best non-breaking fix is to add validation inside
SourceAnalyzer.analyze_local_folder, since that is where the untrustedpath: strfirst becomes aPathand before it is used byanalyze_sourcesandsecond_pass. We can introduce a constant safe root (for example, the directory where repositories live) and then:pathis absolute, interpret it as relative to the safe root only if it does not escape that root; otherwise reject it.Path(...).resolve()to eliminate..segments and symbolic links, then verify it is under the safe root viaresolved_path.is_relative_to(base_dir)(Python 3.9+) or atry: resolved_path.relative_to(base_dir)check.tests/index.pythe test route still just callsanalyze_local_folder, so raising will surface as an internal error, which is acceptable for a security fix.Concretely, in
api/analyzers/source_analyzer.pyaroundanalyze_local_folder, we will:osat the top (a standard library module) because we may want to useos.pathconstants if needed, though we can largely usePath.Path(path)with a small normalization/validation sequence:CODE_GRAPH_PROJECTS_ROOTwith a sensible default (such as the current working directory or areposdirectory under it).base_path = Path(base_root).resolve().user_path = (base_path / path).resolve()ifpathis relative, orPath(path).resolve()if absolute, then ensureuser_pathis underbase_pathusing.relative_to. If not, raiseValueError("Path is outside of allowed root").user_pathintoanalyze_sourcesinstead of the raw, possibly unsafePath(path).This keeps all existing analyzer functionality, but prevents a client from using HTTP to point the analyzer at arbitrary directories outside the intended root. It also addresses all the variants, since every call path goes through
analyze_local_folder.Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
General approach: normalize user-provided paths and constrain them to a configured safe root directory. Convert the incoming string to an absolute, normalized
Path(usingresolve()), then ensure it is a subpath of a “projects root” directory that the server is allowed to analyze. Reject paths outside that root before callinganalyze_local_folder. This addresses all flows where taintedpathis turned into aPathand used withrglobor as a base for other paths.Best concrete fix with minimal behavior change:
tests/index.py, before usingpath, resolve it to an absolute path and ensure it is under a configured root (e.g., from an environment variable likeCODE_GRAPH_PROJECTS_ROOT, defaulting to the current working directory for backward compatibility). Ifpathescapes that root, reject the request with a 400 error.SourceAnalyzer.analyze_local_folder, immediately convert the incoming string to a normalizedPathviaPath(path).resolve()and then pass that normalizedPathtoanalyze_sources. This limits propagation of any oddities like..segments and ensures downstream code uses a canonical path. Sinceanalyze_local_folderis the main entry point for string paths, tightening it is low-impact and addresses all future call sites as well.second_passandanalyze_sourcesmethods already operate onPathobjects; with the above normalization in place, no further change is required there.Concretely:
In
tests/index.py, in the handler that callsanalyze_local_folder(path, g, ignore)(lines 340–369 region), normalize and confinepathbefore validation and use:from pathlib import Pathis already present, andosis imported, so no new imports are needed.os.path.isdir(path)check.In
api/analyzers/source_analyzer.py, inanalyze_local_folder, convertpathonce to a resolvedPathand pass that intoanalyze_sources:This preserves existing functionality (still analyzes the same directory, only now canonicalized) and stops arbitrary “weird” paths from flowing unprocessed, while
tests/index.pyenforces that untrusted HTTP clients can only request analysis within the configured root directory.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kotlin relation resolution is still disabled here.
Both branches assign
NullLanguageServer()for.ktand.kts. SinceAbstractAnalyzer.resolve()depends onrequest_definition(), once the edge wiring below is corrected every Kotlin lookup will still resolve to[].🧰 Tools
🪛 GitHub Check: CodeQL
[failure] 142-142: Uncontrolled data used in path expression
This path depends on a user-provided value.
This path depends on a user-provided value.
[failure] 142-142: Uncontrolled data used in path expression
This path depends on a user-provided value.
This path depends on a user-provided value.
🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use resolved entity IDs here, not
Node.id.entity.symbolsnow holds raw tree-sitterNodeobjects.symbol.idis a parser node identifier, butGraph.connect_entities()expects the database ID returned bygraph.add_entity(), so these relations now miss or miswire edges across every language. This is especially broken for parameter and return-type nodes, which are never graph entities on their own.Suggested fix
🤖 Prompt for AI Agents
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
In general, to fix this kind of issue you should restrict user-supplied paths so they cannot escape a designated safe root. For server-side code, that typically means: define a base directory under which analysis is allowed; combine the user input with that base; normalize the result (e.g., via
Path.resolve()oros.path.realpath()); then verify that the normalized path is still within the base before using it to walk the filesystem.For this codebase, the single best fix with minimal functional change is:
tests/index.py, normalize and constrain the user-providedpathto a configured base directory before passing it intoSourceAnalyzer.analyze_local_folder. That prevents analysis of arbitrary locations while keeping the rest of the analyzer logic unchanged.SourceAnalyzer.analyze_local_folderinapi/analyzers/source_analyzer.pyby normalizing itspathargument withPath(path).resolve()and then using that resolvedPathconsistently. This ensures that even other call sites (such asanalyze_local_repository) benefit from normalization, and it ensures that downstream uses (likeanalyze_sources) see a canonical path.Concretely:
tests/index.py, after validating thatpathis a directory, convert it to an absolute, normalizedPathobject, derive asafe_base(for example from an environment variable likeCODE_GRAPH_BASE_DIRor defaulting to the current working directory or another project root), and then check thatresolved_pathis withinsafe_base(using.resolve()and a.relative_to()try/except). If the check fails, return HTTP 400. Pass the normalized string/path intoanalyzer.analyze_local_folder.api/analyzers/source_analyzer.py, inanalyze_local_folder, convertpathtobase_path = Path(path).resolve()once and usebase_pathfor logging and foranalyze_sources. This prevents relative path oddities and guaranteesanalyze_sourcesalways works with a normalizedPath.No new imports are needed beyond
pathlib.Pathandos, which are already used in these files.Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
In general, to fix this issue you should not let arbitrary user‑supplied paths control which part of the filesystem is traversed. Instead, define a safe root directory (or an allow‑list of roots), normalize the user input, and ensure the resolved path is contained within that root before using it for any filesystem operations like
rglob.For this codebase, the minimal change that preserves behavior is to (a) introduce a helper that validates a user‑provided
pathagainst a configured base directory, and (b) call it inSourceAnalyzer.analyze_local_folderbefore converting toPathand traversing. A reasonable base directory is the repository root or a subdirectory configured via environment variable; to avoid assumptions about project layout, we can resolve the provided path to an absolute path and ensure it is under the current working directory (or another obvious base such as an environment variable, if present). The validation should: usePath(path).resolve()to normalize any..segments or symlinks, compute the allowed base asPath(os.getcwd()).resolve()(or similar) and ensureresolved_path.is_dir()andallowed_base in resolved_path.parents or resolved_path == allowed_base. If the check fails, log and raise an exception rather than proceeding.Concretely, in
api/analyzers/source_analyzer.pyyou will: (1) importosfor environment access if needed, (2) add a small private method_validate_root_pathor a local check inanalyze_local_folderthat resolves and validates the incomingpathstring, returning a safePathobject, and (3) replace the directPath(path)call inanalyze_local_folderwith the validated path returned by this helper. No changes are needed intests/index.pyorapi/index.pybecause the taint will be neutralized within the analyzer. This fixes all variants of the alert because every entry point (analyze_local_folder,analyze_local_repository) funnels through this validation before rglob or repository operations.Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
General approach: normalize any user-supplied path before using it for filesystem traversal and ensure it lies under a configured safe root directory. This protects operations like
path.rglob(...)from being used to scan arbitrary locations. The safest place to do this is at the boundary where untrustedstrbecomes aPath(insideSourceAnalyzer.analyze_local_folderandanalyze_local_repository), so all downstream methods can assume they receive safePathobjects.Concrete fix in
api/analyzers/source_analyzer.py:Decide on a safe root. A natural choice is the current working directory (server root) or a directory pointed to by an environment variable. Since we must not add new imports beyond “well-known” ones and can’t see the rest of the project, we’ll conservatively restrict analysis to be under the server’s current working directory (
Path.cwd()), which is at least stable and predictable. If your app already has a notion of a repos directory, you can later replacesafe_rootwith that.In
analyze_local_folder(self, path: str, g: Graph, ignore: Optional[list[str]] = []), instead of passingPath(path)directly toanalyze_sources, build aPathfrom the string, resolve it to an absolute canonical path (.resolve()), and verify it is withinsafe_root. If not, log and raise a clear exception. Example logic:base_path = Path.cwd().resolve()target = Path(path).expanduser().resolve()base_path not in target.parents and target != base_path: raise ValueError(...)self.analyze_sources(target, ignore, g)This uses
Path.resolve()to normalize..and symbolic links, which is thepathlibequivalent ofos.path.realpath. Usingis_relative_towould be ideal in Python 3.9+, but to maintain compatibility, we can checkbase_path in target.parents or target == base_path.Apply the same validation in
analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None). Herepathagain comes from untrusted sources (e.g., via API), and we should ensure the repository lies under the same safe root. Normalize and validate the path before:proj_name = Path(path).nameRepository(path)self.analyze_local_folder(path, graph, ignore)After validation, we should use the normalized/safe
str(e.g.,safe_path_str = str(target)) consistently forRepositoryand for forwarding toanalyze_local_folder.No changes are needed to
analyze_sources,analyze_files,first_pass, orsecond_pass, because they already operate onPathobjects and lists of files built from that base. By ensuring the basePathitself is safe, those calls become safe.We can reuse existing imports:
Pathfrompathlibis already imported. We will add no new external dependencies.This fix does not change the core functionality for “normal” use (analyzing directories under the server’s working directory), but prevents clients from making the server analyze or access arbitrary directories outside that root.
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
In general, to fix this kind of problem you must constrain user-controlled paths before using them in filesystem operations. A common approach is to define a trusted “root” directory under which all operations must occur, resolve the user-provided value to an absolute/normalized path, and then verify that this resolved path is contained within the root (for example by checking that
resolved_path.is_relative_to(root)in modern Python, or by comparing string prefixes afterresolve()/normpath()).For this codebase, the minimal change that addresses CodeQL’s concern without altering external behavior too much is to validate the
pathargument insideSourceAnalyzer.analyze_local_folderbefore it is converted to aPathand passed toanalyze_sources. We can:pathargument and the root toPathobjects, call.resolve()on both to normalize and remove..segments, and then ensure the user path is under the root. In Python 3.9+,Path.is_relative_to(root)is the cleanest way; otherwise we fall back to checking thatstr(resolved_path).startswith(str(root) + os.sep)or similar.Concretely, in
api/analyzers/source_analyzer.pywe will:ossince we needos.getcwd()and, for compatibility, possiblyos.path.commonpathif we avoidis_relative_to.analyze_local_folderbefore callingself.analyze_sources(Path(path), ...).analyze_local_folder(path: str, ...)) and its call sites unchanged, so the rest of the application logic does not need modification.No changes are required in
tests/index.pyorapi/index.pyfor this particular issue; they will continue to pass the untrustedpathstring toanalyze_local_folder, which now enforces safe usage.Uh oh!
There was an error while loading. Please reload this page.