Conversation
Fixes #51 Add support for JavaScript code analysis using tree-sitter. * Add `api/analyzers/javascript/analyzer.py` implementing `JavaScriptAnalyzer` class using tree-sitter for JavaScript. - Implement methods for first and second pass analysis. - Use tree-sitter to parse JavaScript code. - Extract functions and classes from JavaScript code. - Connect entities in the graph. * Update `api/analyzers/source_analyzer.py` to include `JavaScriptAnalyzer` in the analyzers list. * Add `tree-sitter-javascript` dependency to `pyproject.toml`. * Add utility functions for JavaScript analysis in `api/analyzers/utils.py`. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/FalkorDB/code-graph-backend/issues/51?shareId=XXXX-XXXX-XXXX-XXXX).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughAdds JavaScript support: new Changes
Sequence Diagram(s)sequenceDiagram
participant FS as FileSystem (*.js)
participant Analyzer as JavaScriptAnalyzer
participant LSP as JS Language Server
participant Graph as CodeGraph
rect rgba(100,149,237,0.5)
FS->>Analyzer: provide file contents (first_pass)
Analyzer->>Analyzer: parse with tree-sitter, create Entities
Analyzer->>Graph: add Entities (declarations)
end
rect rgba(60,179,113,0.5)
Analyzer->>LSP: request symbol/type info (second_pass)
LSP-->>Analyzer: return resolution data
Analyzer->>Graph: add edges (calls, inherits, references)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
api/analyzers/javascript/analyzer.py (3)
22-71: Enhance function declaration processing and handle arrow functions.Currently,
process_function_declarationonly captures standard function declarations. If you'd like to capture arrow functions or function expressions, consider expanding the query or logic. Also, note that usingfind_child_of_type(node, 'identifier')might skip function declarations without a named identifier (like anonymous functions).🧰 Tools
🪛 Ruff (0.8.2)
22-22:
Functionmay be undefined, or defined from star imports(F405)
35-35:
find_child_of_typemay be undefined, or defined from star imports(F405)
45-45:
find_child_of_typemay be undefined, or defined from star imports(F405)
62-62:
Functionmay be undefined, or defined from star imports(F405)
72-102: Extend class declaration handling for inheritance.This method correctly extracts the class name from the
identifierchild. You may want to handleextendsclauses (e.g.,class Foo extends Bar) or keep track of implemented interfaces in the future.🧰 Tools
🪛 Ruff (0.8.2)
72-72:
Classmay be undefined, or defined from star imports(F405)
85-85:
find_child_of_typemay be undefined, or defined from star imports(F405)
99-99:
Classmay be undefined, or defined from star imports(F405)
157-220: Protect against missing function entities and arrow function calls.
- The second pass currently assumes function declarations are always standard. Arrow functions won't be captured, so the calls might remain unresolved.
assert(caller_f is not None)may crash if the function is somehow not recognized. Consider a safer check, logging a warning, or creating a placeholder entity to avoid halting the entire analysis.-assert(caller_f is not None) +if caller_f is None: + logger.warning(f"Caller function '{caller_name}' not found. Skipping relationship.") + continue🧰 Tools
🪛 Ruff (0.8.2)
216-216:
Functionmay be undefined, or defined from star imports(F405)
api/analyzers/utils.py (2)
25-38: Augment arrow function or unnamed function handling.
extract_js_function_nameassumes there's an identifier child. Consider fallback logic for arrow or anonymous functions if needed (e.g., generating a placeholder name).
40-53: Handle anonymous or default-export classes.Similar to functions, classes can sometimes be declared without a direct identifier (export default class, etc.). Consider a fallback name or a distinct approach for these cases.
api/analyzers/source_analyzer.py (1)
20-21: Consider broader JavaScript-related extensions.Mapping
.jstoJavaScriptAnalyzeris a good start. If you plan to handle.mjs,.cjs, or.jsx, consider extending the analyzers dictionary or making it more flexible.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/analyzers/javascript/analyzer.py(1 hunks)api/analyzers/source_analyzer.py(2 hunks)api/analyzers/utils.py(1 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/analyzers/javascript/analyzer.py
3-3: from ..utils import * used; unable to detect undefined names
(F403)
5-5: from ...entities import * used; unable to detect undefined names
(F403)
22-22: Function may be undefined, or defined from star imports
(F405)
35-35: find_child_of_type may be undefined, or defined from star imports
(F405)
45-45: find_child_of_type may be undefined, or defined from star imports
(F405)
62-62: Function may be undefined, or defined from star imports
(F405)
72-72: Class may be undefined, or defined from star imports
(F405)
85-85: find_child_of_type may be undefined, or defined from star imports
(F405)
99-99: Class may be undefined, or defined from star imports
(F405)
123-123: File may be undefined, or defined from star imports
(F405)
216-216: Function may be undefined, or defined from star imports
(F405)
🔇 Additional comments (5)
api/analyzers/javascript/analyzer.py (3)
18-21: Analyzer constructor looks good.The constructor properly initializes the Parser with
JS_LANGUAGE. This is a succinct and clear approach.
103-156: Consider robust UTF-8 handling.The code tries to decode the file content using UTF-8, which might fail if the file is in another encoding. This fallback is fine in many cases, but you might want to log or handle suspected encoding issues more explicitly if supporting diverse codebases.
🧰 Tools
🪛 Ruff (0.8.2)
123-123:
Filemay be undefined, or defined from star imports(F405)
10-13: Confirm Tree-sitter JavaScript library availability.The import statements
import tree_sitter_javascript as tsjsandLanguage, Parserfromtree_sitterlook correct for parsing JavaScript. Just verify that thepyproject.tomlchanges (addingtree-sitter-javascript) are indeed installed in your environment.✅ Verification successful
tree-sitter-javascript dependency is properly configured
The
pyproject.tomlfile shows thattree-sitter-javascriptversion^0.23.2is correctly listed as a dependency in the project's Poetry configuration. This confirms that the import statementimport tree_sitter_javascript as tsjswill work as expected when the dependencies are installed.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Ensure tree-sitter-javascript is installed in the current environment pip freeze | grep "tree-sitter-javascript"Length of output: 136
Script:
#!/bin/bash # Check if pyproject.toml exists and contains tree-sitter-javascript dependency fd pyproject.toml --exec cat {}Length of output: 877
Script:
#!/bin/bash # Check if requirements.txt exists and contains tree-sitter-javascript dependency fd requirements.txt --exec cat {}Length of output: 65574
Script:
#!/bin/bash # Check if setup.py exists and contains tree-sitter-javascript dependency fd setup.py --exec cat {}Length of output: 29
Script:
#!/bin/bash # Look for any other Python dependency files fd ".*requirements.*\.txt|poetry\.lock|Pipfile" --exec cat {}Length of output: 65602
api/analyzers/source_analyzer.py (1)
11-11: JavaScript analyzer import addition confirmed.Adding
JavaScriptAnalyzeraligns with the pull request objective to support.jsfiles.pyproject.toml (1)
16-16: Dependency addition looks appropriate.
tree-sitter-javascript = "^0.23.2"is correctly specified. Validate that this version meets any advanced language feature requirements you may want to parse (e.g., newer JS syntax).✅ Verification successful
Version 0.23.2 is appropriate for the JavaScript parsing requirements
Based on the code analysis, the tree-sitter-javascript dependency is used for basic JavaScript AST parsing, focusing on:
- Function declarations and their parameters
- Class declarations
- Function call expressions
- Basic identifier extraction
The specified version (^0.23.2) is actually newer than the latest PyPI version (0.23.1) and supports all the JavaScript syntax features currently being used in the analyzer. The codebase doesn't use any advanced JavaScript features that would require a newer version, as it's primarily focused on fundamental syntax elements like functions, classes, and method calls.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if the newly added dependency is at the latest stable version curl -s https://pypi.org/pypi/tree-sitter-javascript/json | jq '.info.version'Length of output: 87
Script:
#!/bin/bash # Check if there are any JavaScript files in the repository to understand the JS features being used fd -e js -x head -n 50Length of output: 24
Script:
#!/bin/bash # Look for any configuration files that might specify JavaScript version/features fd -g '*babel*' -g '*eslint*' -g '*tsconfig*' -g '*jsconfig*'Length of output: 273
Script:
#!/bin/bash # Check if tree-sitter-javascript is already in use and how rg -l "tree-sitter-javascript"Length of output: 57
Script:
#!/bin/bash # Check the content of poetry.lock to see how tree-sitter-javascript is being used cat poetry.lock | grep -A 10 "tree-sitter-javascript"Length of output: 678
Script:
#!/bin/bash # Look for Python files that might be using tree-sitter-javascript rg -l "from tree_sitter"Length of output: 147
Script:
#!/bin/bash # Check how tree-sitter-javascript is being used in the JavaScript analyzer cat api/analyzers/javascript/analyzer.pyLength of output: 7853
Script:
#!/bin/bash # Check utils.py to understand the common tree-sitter usage patterns cat api/analyzers/utils.pyLength of output: 1429
api/analyzers/utils.py
Outdated
| def extract_js_function_name(node: Node) -> str: | ||
| """ | ||
| Extract the function name from a JavaScript function node. | ||
|
|
||
| Args: | ||
| node (Node): The AST node representing the function. | ||
|
|
||
| Returns: | ||
| str: The name of the function. | ||
| """ | ||
| for child in node.children: | ||
| if child.type == 'identifier': | ||
| return child.text.decode('utf-8') | ||
| return '' | ||
|
|
||
| def extract_js_class_name(node: Node) -> str: | ||
| """ | ||
| Extract the class name from a JavaScript class node. | ||
|
|
||
| Args: | ||
| node (Node): The AST node representing the class. | ||
|
|
||
| Returns: | ||
| str: The name of the class. | ||
| """ | ||
| for child in node.children: | ||
| if child.type == 'identifier': | ||
| return child.text.decode('utf-8') | ||
| return '' |
There was a problem hiding this comment.
Language specific utilizes shouldn't be added to utils
api/analyzers/javascript/analyzer.py
Outdated
| """ | ||
| Perform the first pass processing of a JavaScript source file. | ||
|
|
||
| Args: | ||
| path (Path): The path to the JavaScript source file. | ||
| f (io.TextIOWrapper): The file object representing the opened JavaScript source file. | ||
| graph (Graph): The Graph object where entities will be added. | ||
|
|
||
| Returns: | ||
| None | ||
| """ |
There was a problem hiding this comment.
Comment should include information about what the function does, "processing JavaScript file" is too general.
specify which entities are extracted.
api/analyzers/javascript/analyzer.py
Outdated
| try: | ||
| # Parse file | ||
| content = f.read() | ||
| tree = self.parser.parse(content) | ||
| except Exception as e: | ||
| logger.error(f"Failed to process file {path}: {e}") | ||
| return |
There was a problem hiding this comment.
I think this is a bit of a waste, we've already read the file and parsed it on the first pass.
api/analyzers/javascript/analyzer.py
Outdated
| caller = function_def[0] | ||
| caller_name = caller.text.decode('utf-8') | ||
| caller_f = graph.get_function_by_name(caller_name) | ||
| assert(caller_f is not None) |
There was a problem hiding this comment.
Using assert in production code is unconventional. Consider handling this case more gracefully.
| assert(caller_f is not None) | |
| if caller_f is None: logger.error(f'Caller function not found: {caller_name}'); continue |
Resolve merge conflicts with main branch which restructured the analyzer pattern. Rewrite JavaScriptAnalyzer to follow the new AbstractAnalyzer interface with get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, resolve_symbol, and other required abstract methods. Address PR review comments: - Replace wildcard imports with explicit imports (CodeRabbit) - Move JS-specific utils out of shared utils.py (swilly22) - Remove redundant file re-parsing in second pass (swilly22) - Replace assert with proper error handling pattern (copilot-reviewer) - Improve docstrings to be more specific (swilly22) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None: | ||
| path = path.resolve() | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js")) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
| def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None: | ||
| path = path.resolve() | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js")) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
| def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None: | ||
| path = path.resolve() | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js")) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
| def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None: | ||
| path = path.resolve() | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) | ||
| files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js")) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Update from >=0.23.2,<0.24.0 (no such version exists) to >=0.25.0,<0.26.0 to match the tree-sitter>=0.25.2 base dependency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
api/analyzers/source_analyzer.py (1)
148-148: JavaScript second-pass symbol resolution is effectively disabled.With
NullLanguageServer,AbstractAnalyzer.resolve()will throw onlsp.request_definition(...)and silently return[](caught by the base class's blanketexcept). This meansresolve_type/resolve_methodinJavaScriptAnalyzerwill never resolve any symbols — noCALLS,EXTENDS, orPARAMETERSedges will be created during the second pass.This is fine for an initial implementation, but consider adding a TODO or logging a warning so it's clear that JS symbol resolution is a known gap. When ready, integrating a real JS/TS language server (e.g.,
typescript-language-server) would enable full graph connectivity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/source_analyzer.py` at line 148, The JavaScript language-server entry currently uses lsps[".js"] = NullLanguageServer(), which causes AbstractAnalyzer.resolve() to throw on lsp.request_definition(...) and prevents JavaScriptAnalyzer.resolve_type/resolve_method from producing CALLS/EXTENDS/PARAMETERS edges; update the code to mark this as a known limitation by adding a TODO and/or logging a clear warning where lsps[".js"] is set (or in NullLanguageServer.__init__), referencing lsps, NullLanguageServer, AbstractAnalyzer.resolve, lsp.request_definition, and JavaScriptAnalyzer.resolve_type/resolve_method so future work can replace NullLanguageServer with a real JS/TS language server (e.g., typescript-language-server) to enable second-pass symbol resolution.api/analyzers/javascript/analyzer.py (2)
40-45: Docstring extraction only captures immediate preceding single comments.This won't capture JSDoc blocks that are separated from the declaration by whitespace/newlines (since they wouldn't be
prev_sibling), nor will it capture multi-line comment blocks split across multiplecommentnodes. For typical single-block JSDoc (/** ... */), this should work fine. Consider this a known limitation for the initial implementation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/javascript/analyzer.py` around lines 40 - 45, get_entity_docstring currently only returns node.prev_sibling if it's a single 'comment', which misses JSDoc blocks separated by whitespace/newlines and multi-line comment sequences; update get_entity_docstring to walk backwards from the declaration node via prev_sibling, skipping over non-significant whitespace/newline nodes and collecting contiguous 'comment' nodes (e.g., accumulate while prev_sibling.type == 'comment'), decode and join their .text values into a single string, then return that combined docstring for the function_declaration, class_declaration, and method_definition cases (refer to the get_entity_docstring function and the node.prev_sibling traversal).
68-72: Parameter query only captures simple identifiers.The tree-sitter query
(formal_parameters (identifier)@parameter)will miss destructured parameters ({a, b}), default values (a = 1), and rest parameters (...args), which are common in modern JavaScript. This is acceptable for an initial implementation but worth noting as a known limitation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/javascript/analyzer.py` around lines 68 - 72, The current Tree-sitter query only matches simple identifier parameters; update the query passed to self.language.query in analyzer.py to also capture destructured, default, and rest forms by including nodes like formal_parameters (identifier | object_pattern | array_pattern | assignment_pattern | rest_pattern) `@parameter` (or the equivalent node names used by the JavaScript grammar), then adjust the captures handling (captures = query.captures(entity.node)) to iterate over captures['parameter'] as before and call entity.add_symbol("parameters", parameter) so destructured ({a,b}), default (a = 1), and rest (...args) parameters are included along with simple identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 50-61: In add_symbols (method add_symbols on Entity handling
class_declaration), remove the unused assignment to superclass_node and
eliminate or simplify the misleading heritage variable: either rename heritage
to body_node and use it for a clear defensive check or drop the body existence
guard entirely since class bodies are required; keep the loop that detects
child.type == 'class_heritage' and call entity.add_symbol("base_class",
heritage_child) for heritage_child.type == 'identifier' unchanged.
---
Duplicate comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 1-13: The import statements in analyzer.py are already explicit
and clean, so no change is required; keep the explicit imports
(SyncLanguageServer, Entity, File, AbstractAnalyzer, tree_sitter_javascript as
tsjs, Language, Node, logging) as-is and ensure the logger variable logger =
logging.getLogger('code_graph') remains correctly named and used consistently
throughout this module.
---
Nitpick comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 40-45: get_entity_docstring currently only returns
node.prev_sibling if it's a single 'comment', which misses JSDoc blocks
separated by whitespace/newlines and multi-line comment sequences; update
get_entity_docstring to walk backwards from the declaration node via
prev_sibling, skipping over non-significant whitespace/newline nodes and
collecting contiguous 'comment' nodes (e.g., accumulate while prev_sibling.type
== 'comment'), decode and join their .text values into a single string, then
return that combined docstring for the function_declaration, class_declaration,
and method_definition cases (refer to the get_entity_docstring function and the
node.prev_sibling traversal).
- Around line 68-72: The current Tree-sitter query only matches simple
identifier parameters; update the query passed to self.language.query in
analyzer.py to also capture destructured, default, and rest forms by including
nodes like formal_parameters (identifier | object_pattern | array_pattern |
assignment_pattern | rest_pattern) `@parameter` (or the equivalent node names used
by the JavaScript grammar), then adjust the captures handling (captures =
query.captures(entity.node)) to iterate over captures['parameter'] as before and
call entity.add_symbol("parameters", parameter) so destructured ({a,b}), default
(a = 1), and rest (...args) parameters are included along with simple
identifiers.
In `@api/analyzers/source_analyzer.py`:
- Line 148: The JavaScript language-server entry currently uses lsps[".js"] =
NullLanguageServer(), which causes AbstractAnalyzer.resolve() to throw on
lsp.request_definition(...) and prevents
JavaScriptAnalyzer.resolve_type/resolve_method from producing
CALLS/EXTENDS/PARAMETERS edges; update the code to mark this as a known
limitation by adding a TODO and/or logging a clear warning where lsps[".js"] is
set (or in NullLanguageServer.__init__), referencing lsps, NullLanguageServer,
AbstractAnalyzer.resolve, lsp.request_definition, and
JavaScriptAnalyzer.resolve_type/resolve_method so future work can replace
NullLanguageServer with a real JS/TS language server (e.g.,
typescript-language-server) to enable second-pass symbol resolution.
| def add_symbols(self, entity: Entity) -> None: | ||
| if entity.node.type == 'class_declaration': | ||
| heritage = entity.node.child_by_field_name('body') | ||
| if heritage is None: | ||
| return | ||
| superclass_node = entity.node.child_by_field_name('name') | ||
| # Check for `extends` clause via class_heritage | ||
| for child in entity.node.children: | ||
| if child.type == 'class_heritage': | ||
| for heritage_child in child.children: | ||
| if heritage_child.type == 'identifier': | ||
| entity.add_symbol("base_class", heritage_child) |
There was a problem hiding this comment.
Dead code and misleading variable names in add_symbols for class declarations.
Two issues:
- Line 55:
superclass_nodeis assigned but never used (confirmed by Ruff F841). Remove it. - Line 52: Variable named
heritagebut it fetches the'body'field — misleading. It's also only used for theNonecheck on line 53, which is a defensive guard that's unlikely to trigger (classes without bodies are syntactically invalid JS).
🔧 Proposed fix
def add_symbols(self, entity: Entity) -> None:
if entity.node.type == 'class_declaration':
- heritage = entity.node.child_by_field_name('body')
- if heritage is None:
+ body = entity.node.child_by_field_name('body')
+ if body is None:
return
- superclass_node = entity.node.child_by_field_name('name')
# Check for `extends` clause via class_heritage
for child in entity.node.children:🧰 Tools
🪛 Ruff (0.15.1)
[error] 55-55: Local variable superclass_node is assigned to but never used
Remove assignment to unused variable superclass_node
(F841)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/analyzers/javascript/analyzer.py` around lines 50 - 61, In add_symbols
(method add_symbols on Entity handling class_declaration), remove the unused
assignment to superclass_node and eliminate or simplify the misleading heritage
variable: either rename heritage to body_node and use it for a clear defensive
check or drop the body existence guard entirely since class bodies are required;
keep the loop that detects child.type == 'class_heritage' and call
entity.add_symbol("base_class", heritage_child) for heritage_child.type ==
'identifier' unchanged.
Fixes #51
Add support for JavaScript code analysis using tree-sitter.
api/analyzers/javascript/analyzer.pyimplementingJavaScriptAnalyzerclass using tree-sitter for JavaScript.api/analyzers/source_analyzer.pyto includeJavaScriptAnalyzerin the analyzers list.tree-sitter-javascriptdependency topyproject.toml.api/analyzers/utils.py.For more details, open the Copilot Workspace session.
Summary by CodeRabbit