Conversation
Migrated from FalkorDB/code-graph-backend PR #97. Original issue: FalkorDB/code-graph-backend#61 Resolves #535 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis pull request introduces import support for Python by extending the analyzer framework with abstract methods for extracting and resolving imports, providing concrete Python implementations using tree-sitter, and integrating import tracking into the source analysis pipeline to create IMPORTS relationships in the entity graph. Changes
Sequence Diagram(s)sequenceDiagram
participant SA as SourceAnalyzer
participant Analyzer as Language Analyzer
participant File as File Entity
participant Graph as Entity Graph
participant LSP as LSP
rect rgba(100, 150, 200, 0.5)
Note over SA,File: First Pass: Import Extraction
SA->>Analyzer: add_file_imports(file)
Analyzer->>File: extract imports via tree-sitter
Analyzer->>File: add_import(import_node)
File->>File: append to imports list
end
rect rgba(150, 100, 200, 0.5)
Note over SA,Graph: Second Pass: Import Resolution & Graph Creation
SA->>Analyzer: resolve_import(files, lsp, ...)
Analyzer->>LSP: query symbol locations
Analyzer->>Analyzer: match import to entity
Analyzer-->>SA: return list[Entity]
SA->>File: add_resolved_import(entity)
File->>File: append to resolved_imports
SA->>Graph: create IMPORTS edge
Graph->>Graph: file -> entity relationship
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/analyzers/source_analyzer.py (1)
29-29:⚠️ Potential issue | 🔴 CriticalCritical:
CSharpAnalyzerfails to instantiate due to missing abstract method implementations.Verification confirms that
CSharpAnalyzercannot be instantiated because it doesn't implement the newly abstract methodsadd_file_imports()andresolve_import(). Since these are@abstractmethodinAbstractAnalyzer, Python raisesTypeErrorat module load time.JavaAnalyzerhas both methods correctly implemented;CSharpAnalyzermust be updated to match.Add these methods to
CSharpAnalyzer:def add_file_imports(self, file: File) -> None: """C# import tracking not yet implemented.""" pass def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]: """C# import resolution not yet implemented.""" return []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/source_analyzer.py` at line 29, CSharpAnalyzer currently cannot be instantiated because it omits the abstract methods defined in AbstractAnalyzer; implement add_file_imports(self, file: File) -> None and resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity] on the CSharpAnalyzer class, with add_file_imports returning None (no-op or a brief docstring) and resolve_import returning an empty list, matching the signatures used by JavaAnalyzer so the class satisfies the abstract interface and no TypeError is raised at import time.
🧹 Nitpick comments (2)
tests/test_py_imports.py (2)
42-57: Consider using a public API instead of_query().The test accesses the internal
_query()method (prefixed with underscore) to verify relationships. While this works, it couples the test to implementation details. If a public query method exists or could be added, that would make the tests more maintainable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_py_imports.py` around lines 42 - 57, Tests are calling the internal g._query(...) which couples tests to implementation; replace uses of _query in tests (the two occurrences checking IMPORTS to Class and Function) with the public query API (e.g., g.query(...) or g.run_query(...)) if available, or add a small public wrapper method on the graph object (name it query or run_query) that delegates to _query and use that in tests; update the assertions to call this public method (referencing g._query in the diff to locate the spots to change) so tests rely only on the public API.
59-59: Remove debug print statement.Print statements in tests can clutter test output. Consider removing this or using
logging.debug()if the message is needed for debugging.🧹 Suggested fix
- print("✓ Import tracking test passed")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_py_imports.py` at line 59, Remove the debug print statement that emits "✓ Import tracking test passed" from the test (tests/test_py_imports.py); either delete that line entirely or replace it with a logging.debug call and ensure the module imports the logging library and uses logging.getLogger(__name__).debug(...) so test output remains clean.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/analyzers/java/analyzer.py`:
- Around line 131-145: The annotations in add_file_imports and resolve_import
reference File and Entity but those names are only available via a wildcard
import; add explicit imports for File and Entity at the top of the module
(alongside existing imports like Node, SyncLanguageServer, Path) so the type
annotations no longer rely on the wildcard; update the module import list to
include File and Entity (used by add_file_imports and resolve_import) and run
linters to confirm F405 is resolved.
In `@api/analyzers/python/analyzer.py`:
- Around line 160-216: The resolve_import function currently only resolves class
imports by calling resolve_type and doesn't handle wildcard imports or function
imports; update resolve_import to (1) detect wildcard imports in
import_from_statement (e.g., the '*' or wildcard_import child) and, when found,
resolve the entire module by calling the module-resolution path used elsewhere
(use the same file/module lookup used by resolve_type) and add all exported
entities to res; and (2) when resolving individual imported names (both in
import_statement/aliased_import and
import_from_statement/aliased_import/dotted_name cases) try both
resolve_type(...) and resolve_method(...) (or the function-resolution helper
used in this analyzer) and extend res with results from both so functions and
classes are captured. Ensure these changes are made inside resolve_import so
existing control flow (import_statement vs import_from_statement) remains intact
and exceptions are still caught.
In `@test-project/c.java`:
- Line 1: The package declaration "package test-project;" in c.java is invalid
because hyphens are not allowed; update the package statement (the package
declaration at the top of c.java) to a valid Java package identifier such as
"package test.project;" or "package test_project;" and move the file to the
matching directory structure (e.g., test/project/c.java for test.project) so the
package name and filesystem layout align.
In `@tests/source_files/py_imports/module_b.py`:
- Around line 3-10: The resolver currently only walks up to class_definition and
therefore resolves ClassA but drops function_a; update the Python import target
resolution in api/analyzers/python/analyzer.py to also recognize and resolve
function definitions (and plain name bindings) in addition to class_definition
so that imports like ClassA and function_a from module_b.py are both linked;
modify the resolver that walks AST nodes (the logic that checks for
class_definition) to also check for function_definition and simple name
assignments so IMPORTS edges are created for function_a as well as ClassA.
---
Outside diff comments:
In `@api/analyzers/source_analyzer.py`:
- Line 29: CSharpAnalyzer currently cannot be instantiated because it omits the
abstract methods defined in AbstractAnalyzer; implement add_file_imports(self,
file: File) -> None and resolve_import(self, files: dict[Path, File], lsp:
SyncLanguageServer, file_path: Path, path: Path, import_node: Node) ->
list[Entity] on the CSharpAnalyzer class, with add_file_imports returning None
(no-op or a brief docstring) and resolve_import returning an empty list,
matching the signatures used by JavaAnalyzer so the class satisfies the abstract
interface and no TypeError is raised at import time.
---
Nitpick comments:
In `@tests/test_py_imports.py`:
- Around line 42-57: Tests are calling the internal g._query(...) which couples
tests to implementation; replace uses of _query in tests (the two occurrences
checking IMPORTS to Class and Function) with the public query API (e.g.,
g.query(...) or g.run_query(...)) if available, or add a small public wrapper
method on the graph object (name it query or run_query) that delegates to _query
and use that in tests; update the assertions to call this public method
(referencing g._query in the diff to locate the spots to change) so tests rely
only on the public API.
- Line 59: Remove the debug print statement that emits "✓ Import tracking test
passed" from the test (tests/test_py_imports.py); either delete that line
entirely or replace it with a logging.debug call and ensure the module imports
the logging library and uses logging.getLogger(__name__).debug(...) so test
output remains clean.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4d159efb-8f1f-417d-871d-0b2ef50aea9c
📒 Files selected for processing (12)
CI_OPTIMIZATION.mdapi/analyzers/analyzer.pyapi/analyzers/java/analyzer.pyapi/analyzers/python/analyzer.pyapi/analyzers/source_analyzer.pyapi/entities/file.pytest-project/a.ctest-project/b.py___test-project/c.javatests/source_files/py_imports/module_a.pytests/source_files/py_imports/module_b.pytests/test_py_imports.py
| def add_file_imports(self, file: File) -> None: | ||
| """ | ||
| Extract and add import statements from the file. | ||
| Java imports are not yet implemented. | ||
| """ | ||
| # TODO: Implement Java import tracking | ||
| pass | ||
|
|
||
| def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]: | ||
| """ | ||
| Resolve an import statement to the entities it imports. | ||
| Java imports are not yet implemented. | ||
| """ | ||
| # TODO: Implement Java import resolution | ||
| return [] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether File/Entity are re-exported or only arrive via the wildcard import.
sed -n '1,200p' api/entities/__init__.py
rg -n -C2 '^class (File|Entity)\b' api/entitiesRepository: FalkorDB/code-graph
Length of output: 686
Import File and Entity explicitly before using them in these annotations.
Ruff flags F405 on lines 131 and 139 because File and Entity are available only through the wildcard import on line 4. If CI runs linting, this change will fail the branch even though the methods are stubs.
Minimal fix
from ...entities import *
+from ...entities.entity import Entity
+from ...entities.file import File🧰 Tools
🪛 Ruff (0.15.5)
[error] 131-131: File may be undefined, or defined from star imports
(F405)
[warning] 139-139: Unused method argument: files
(ARG002)
[error] 139-139: File may be undefined, or defined from star imports
(F405)
[warning] 139-139: Unused method argument: lsp
(ARG002)
[warning] 139-139: Unused method argument: file_path
(ARG002)
[warning] 139-139: Unused method argument: path
(ARG002)
[warning] 139-139: Unused method argument: import_node
(ARG002)
[error] 139-139: Entity may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/analyzers/java/analyzer.py` around lines 131 - 145, The annotations in
add_file_imports and resolve_import reference File and Entity but those names
are only available via a wildcard import; add explicit imports for File and
Entity at the top of the module (alongside existing imports like Node,
SyncLanguageServer, Path) so the type annotations no longer rely on the
wildcard; update the module import list to include File and Entity (used by
add_file_imports and resolve_import) and run linters to confirm F405 is
resolved.
| def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]: | ||
| """ | ||
| Resolve an import statement to the entities it imports. | ||
| """ | ||
| res = [] | ||
|
|
||
| try: | ||
| if import_node.type == 'import_statement': | ||
| # Handle "import module" or "import module as alias" | ||
| # Find all dotted_name and aliased_import nodes | ||
| for child in import_node.children: | ||
| if child.type == 'dotted_name': | ||
| # Try to resolve the module/name | ||
| identifier = child.children[0] if child.child_count > 0 else child | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
| elif child.type == 'aliased_import': | ||
| # Get the actual name from aliased import (before 'as') | ||
| if child.child_count > 0: | ||
| actual_name = child.children[0] | ||
| if actual_name.type == 'dotted_name' and actual_name.child_count > 0: | ||
| identifier = actual_name.children[0] | ||
| else: | ||
| identifier = actual_name | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
|
|
||
| elif import_node.type == 'import_from_statement': | ||
| # Handle "from module import name1, name2" | ||
| # Find the 'import' keyword to know where imported names start | ||
| import_keyword_found = False | ||
| for child in import_node.children: | ||
| if child.type == 'import': | ||
| import_keyword_found = True | ||
| continue | ||
|
|
||
| # After 'import' keyword, dotted_name nodes are the imported names | ||
| if import_keyword_found and child.type == 'dotted_name': | ||
| # Try to resolve the imported name | ||
| identifier = child.children[0] if child.child_count > 0 else child | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
| elif import_keyword_found and child.type == 'aliased_import': | ||
| # Handle "from module import name as alias" | ||
| if child.child_count > 0: | ||
| actual_name = child.children[0] | ||
| if actual_name.type == 'dotted_name' and actual_name.child_count > 0: | ||
| identifier = actual_name.children[0] | ||
| else: | ||
| identifier = actual_name | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
|
|
||
| except Exception as e: | ||
| logger.debug(f"Failed to resolve import: {e}") | ||
|
|
||
| return res |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if wildcard imports exist in test files and need handling
rg -n "from .* import \*" tests/source_files/py_imports/ || echo "No wildcard imports in test data"
# Check the test data to understand what import patterns are being tested
fd -t f "\.py$" tests/source_files/py_imports/ --exec cat {}Repository: FalkorDB/code-graph
Length of output: 630
🏁 Script executed:
# Check for wildcard imports in entire codebase (not just tests)
rg -n "from .* import \*" --type py .
# Look for resolve_type and resolve_method implementations
rg -n "def resolve_type|def resolve_method" api/analyzers/python/analyzer.pyRepository: FalkorDB/code-graph
Length of output: 1166
🏁 Script executed:
# Get the resolve_type method implementation
sed -n '94,110p' api/analyzers/python/analyzer.py
# Get the resolve_method method implementation
sed -n '104,120p' api/analyzers/python/analyzer.py
# Check what the Entity class looks like to understand what can be resolved
rg -n "class Entity" api/entities.py | head -20Repository: FalkorDB/code-graph
Length of output: 2072
🏁 Script executed:
# Find where entities are defined
fd -t f entities.py
# Check the Python analyzer imports to understand the structure
head -20 api/analyzers/python/analyzer.py
# Check if there are tests for function imports
rg -n "function" tests/ --type py | grep -i import | head -20Repository: FalkorDB/code-graph
Length of output: 1466
🏁 Script executed:
# View the full test to see what's expected
cat tests/test_py_imports.py
# Check if wildcard import tests exist
rg -n "import \*" tests/ --type py -B2 -A2Repository: FalkorDB/code-graph
Length of output: 2990
🏁 Script executed:
# Find where resolve_import is called
rg -n "resolve_import" api/analyzers/python/analyzer.py
# Look at what calls resolve_import and in what context
rg -B5 -A5 "resolve_import" api/analyzers/python/analyzer.pyRepository: FalkorDB/code-graph
Length of output: 686
🏁 Script executed:
# Search for calls to resolve_import (function invocation)
rg "\.resolve_import\(" api/analyzers/python/analyzer.py
# Also check if resolve_import is used anywhere else
rg "resolve_import" api/analyzers/python/Repository: FalkorDB/code-graph
Length of output: 239
🏁 Script executed:
# Find where add_import is called and how imports are processed
rg -n "add_import" api/analyzers/python/analyzer.py -B10 -A10
# Check the broader context around the import extraction
rg -n "extract_imports\|import_statement\|import_from" api/analyzers/python/analyzer.py -B3 -A3Repository: FalkorDB/code-graph
Length of output: 1176
🏁 Script executed:
# Look at the File class to understand add_import and how imports are processed
fd -t f "file.py\|entities.py" api/ --exec head -50 {}
# Search for where IMPORTS relationship is created
rg -n "IMPORTS\|add_import" api/ --type py -B2 -A2 | head -100Repository: FalkorDB/code-graph
Length of output: 45
🏁 Script executed:
# Search more broadly for IMPORTS relationship
rg -n "IMPORTS" . --type py
# Find the File class definition
rg -n "class File" api/ --type py -A20Repository: FalkorDB/code-graph
Length of output: 1726
🏁 Script executed:
# Look at source_analyzer.py around line 182 to see how imports are resolved
sed -n '170,200p' api/analyzers/source_analyzer.py
# Look at add_import method in File class
rg -n "def add_import" api/entities/file.py -A10
# Check resolved_imports and how it's populated
rg -n "resolved_imports" api/entities/file.py -B2 -A2Repository: FalkorDB/code-graph
Length of output: 2516
Handle function imports and wildcard imports in resolve_import().
The implementation handles import module, import module as alias, from module import name, and from module import name as alias. However:
-
Wildcard imports are not handled: Patterns like
from module import *are not processed. These exist throughout the codebase (e.g.,from .info import *,from .entities import *). -
Function imports are not resolved: The code only calls
resolve_type(), which filters forclass_definitionand returns only classes. For function imports likefrom module import function_a,resolve_method()should also be tried to resolve function entities.
The existing test test_py_imports.py expects both class and function imports to be tracked, indicating this is a required use case.
🧰 Tools
🪛 Ruff (0.15.5)
[error] 160-160: File may be undefined, or defined from star imports
(F405)
[error] 160-160: Entity may be undefined, or defined from star imports
(F405)
[warning] 213-213: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/analyzers/python/analyzer.py` around lines 160 - 216, The resolve_import
function currently only resolves class imports by calling resolve_type and
doesn't handle wildcard imports or function imports; update resolve_import to
(1) detect wildcard imports in import_from_statement (e.g., the '*' or
wildcard_import child) and, when found, resolve the entire module by calling the
module-resolution path used elsewhere (use the same file/module lookup used by
resolve_type) and add all exported entities to res; and (2) when resolving
individual imported names (both in import_statement/aliased_import and
import_from_statement/aliased_import/dotted_name cases) try both
resolve_type(...) and resolve_method(...) (or the function-resolution helper
used in this analyzer) and extend res with results from both so functions and
classes are captured. Ensure these changes are made inside resolve_import so
existing control flow (import_statement vs import_from_statement) remains intact
and exceptions are still caught.
| @@ -0,0 +1,26 @@ | |||
| package test-project; | |||
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does Java allow hyphens in package names such as package test-project;, and how should package declarations align with directory structure?
💡 Result:
No. A Java package name is a dot-separated sequence of identifiers, and - is not legal in a Java identifier—so package test-project; will not compile. (The JLS even notes package-info would be an illegal identifier because of the hyphen.) [1]
How it should align with directories (common/tooling expectation):
- If you declare
package com.example.app;, then under your source root the file should be in:com/example/app/YourClass.java
- Compiled output similarly uses the package as folders:
com/example/app/YourClass.class(under the class output directory) [2]
In practice, IDEs and build tools flag/mis-handle “package doesn’t match directory” because the convention is that the package name matches the relative folder path from the configured source root. [3]
Use instead: package test.project; or package test_project; (underscore is allowed in identifiers; hyphen is not). [1]
Sources: [1] Java Language Specification, Chapter 7 (Packages) (docs.oracle.com) [2] javac manual (package hierarchy maps to directories) (docs.oracle.com) [3] IntelliJ Inspectopedia: “Package name does not match containing directory” (jetbrains.com)
Citations:
- 1: https://docs.oracle.com/javase/specs/jls/se7/html/jls-7.html?utm_source=openai
- 2: https://docs.oracle.com/en/java/javase/25/docs/specs/man/javac.html?utm_source=openai
- 3: https://www.jetbrains.com/help/inspectopedia/PackageDirectoryMismatch.html?utm_source=openai
Fix invalid Java package declaration on line 1.
package test-project; is invalid Java syntax because hyphens are not allowed in package identifiers. Java package names must be dot-separated sequences of valid identifiers. Use package test.project; or package test_project; instead, and align the directory structure accordingly (e.g., test/project/c.java for test.project).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-project/c.java` at line 1, The package declaration "package
test-project;" in c.java is invalid because hyphens are not allowed; update the
package statement (the package declaration at the top of c.java) to a valid Java
package identifier such as "package test.project;" or "package test_project;"
and move the file to the matching directory structure (e.g., test/project/c.java
for test.project) so the package name and filesystem layout align.
| from module_a import ClassA, function_a | ||
|
|
||
| class ClassB(ClassA): | ||
| """A class that extends ClassA.""" | ||
|
|
||
| def method_b(self): | ||
| """A method in ClassB.""" | ||
| result = function_a() |
There was a problem hiding this comment.
Resolve imported functions, not just classes.
This fixture imports both ClassA and function_a, but the current Python resolver path only walks up to class_definition, so function_a is dropped and no IMPORTS edge is created for it. That leaves the PR only partially implementing Python import tracking.
Possible fix in api/analyzers/python/analyzer.py
def resolve_type(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path, node: Node) -> list[Entity]:
res = []
if node.type == 'attribute':
node = node.child_by_field_name('attribute')
for file, resolved_node in self.resolve(files, lsp, file_path, path, node):
- type_dec = self.find_parent(resolved_node, ['class_definition'])
- if type_dec in file.entities:
- res.append(file.entities[type_dec])
+ decl = resolved_node
+ if decl.type not in ['class_definition', 'function_definition']:
+ decl = self.find_parent(resolved_node, ['class_definition', 'function_definition'])
+ if decl in file.entities:
+ res.append(file.entities[decl])
return res🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/source_files/py_imports/module_b.py` around lines 3 - 10, The resolver
currently only walks up to class_definition and therefore resolves ClassA but
drops function_a; update the Python import target resolution in
api/analyzers/python/analyzer.py to also recognize and resolve function
definitions (and plain name bindings) in addition to class_definition so that
imports like ClassA and function_a from module_b.py are both linked; modify the
resolver that walks AST nodes (the logic that checks for class_definition) to
also check for function_definition and simple name assignments so IMPORTS edges
are created for function_a as well as ClassA.
Migrated from falkordb/code-graph-backend#97
Summary
Add Python import tracking to code graph. Extends the analyzer to detect and track Python import statements, creating IMPORTS relationships in the graph.
Changes:
Resolves #535
Originally authored by @Copilot in falkordb/code-graph-backend#97
Summary by CodeRabbit
Release Notes
New Features
Chores