-
Notifications
You must be signed in to change notification settings - Fork 45
Add Python import tracking to code graph #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| # CI Pipeline Optimization Analysis (Staging Branch) | ||
|
|
||
| ## Current Workflows on Staging | ||
|
|
||
| The staging branch has 3 workflow files (identical to main): | ||
|
|
||
| | Workflow | File | Trigger | ~Duration | | ||
| |---|---|---|---| | ||
| | **Build** | `nextjs.yml` | All PRs + push to main | **~1 min** | | ||
| | **Playwright Tests** | `playwright.yml` | PRs + push to main/staging | **~10 min** (x2 shards) | | ||
| | **Release image** | `release-image.yml` | Tags + main push | release-only | | ||
|
|
||
| Additionally, **CodeQL** runs on staging pushes. | ||
|
|
||
| ## Playwright Tests — The Bottleneck | ||
|
|
||
| This is the critical path. It runs 2 shards in parallel, each taking ~10 min. Measured from recent staging runs: | ||
|
|
||
| | Step | Shard 1 | Shard 2 | % of total | | ||
| |---|---|---|---| | ||
| | **Seed test data into FalkorDB** | **223s** | **220s** | **37%** | | ||
| | **Run Playwright tests** | 264s | 262s | 44% | | ||
| | **Install Playwright browsers** | 48s | 51s | 8% | | ||
| | Install backend deps (`pip install`) | 28s | 31s | 5% | | ||
| | Build frontend | 12s | 12s | 2% | | ||
| | Install frontend deps (`npm ci`) | 8s | 8s | 1% | | ||
| | Container init + setup | ~15s | ~15s | 3% | | ||
|
|
||
| **Total per shard: ~600s (10 min). Total billable: ~20 min.** | ||
|
|
||
| ## Build Workflow — Wasted Work | ||
|
|
||
| The Build workflow (~64s total) installs backend dependencies but does nothing with them: | ||
|
|
||
| | Step | Duration | | ||
| |---|---| | ||
| | Install frontend deps | 7s | | ||
| | Build frontend | 14s | | ||
| | Lint frontend | <1s | | ||
| | **Install backend deps (`pip install`)** | **35s** | | ||
|
|
||
| The backend install accounts for **55% of the Build workflow** and serves no purpose. | ||
|
|
||
| --- | ||
|
|
||
| ## Optimization Recommendations | ||
|
|
||
| ### 1. Cache or pre-seed FalkorDB test data (saves **~3.5 min/shard = ~7 min total**) | ||
|
|
||
| `seed_test_data.py` clones 2 GitHub repos (GraphRAG-SDK, Flask) and runs full source analysis every run. This is the single biggest time sink at **37% of Playwright runtime**. | ||
|
|
||
| **Options:** | ||
| - **Best**: Export the seeded graph as an RDB dump, commit it as a test fixture, and restore with `redis-cli`. Eliminates the 220s step entirely. | ||
| - **Good**: Cache the cloned repos + analysis output with `actions/cache` keyed on the seed script hash + repo commit SHAs. | ||
| - **Minimum**: Cache just the git clones to skip network time. | ||
|
|
||
| ### 2. Cache Playwright browsers (saves **~50s/shard = ~1.5 min total**) | ||
|
|
||
| Browsers are installed from scratch every run (`npx playwright install --with-deps`). Add: | ||
|
|
||
| ```yaml | ||
| - name: Cache Playwright browsers | ||
| id: playwright-cache | ||
| uses: actions/cache@v4 | ||
| with: | ||
| path: ~/.cache/ms-playwright | ||
| key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }} | ||
|
|
||
| - name: Install Playwright Browsers | ||
| if: steps.playwright-cache.outputs.cache-hit != 'true' | ||
| run: npx playwright install --with-deps chromium | ||
|
|
||
| - name: Install Playwright system deps | ||
| if: steps.playwright-cache.outputs.cache-hit == 'true' | ||
| run: npx playwright install-deps chromium | ||
| ``` | ||
|
|
||
| ### 3. Switch `pip install` to `uv` (saves **~15-20s/shard**) | ||
|
|
||
| Both workflows use slow `pip install`. `uv sync` is 3-5x faster: | ||
|
|
||
| ```yaml | ||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v5 | ||
| with: | ||
| version: "latest" | ||
|
|
||
| - name: Install dependencies | ||
| run: uv sync | ||
| ``` | ||
|
|
||
| ### 4. Remove unused backend install from Build workflow (saves **~35s**) | ||
|
|
||
| `nextjs.yml` installs backend deps but runs no backend tests or lint. Either: | ||
| - **Remove** the `Setup Python` and `Install backend dependencies` steps entirely | ||
| - **Or** add backend unit tests / pylint to justify the install | ||
|
|
||
| ### 5. Add concurrency groups (saves **queued minutes**) | ||
|
|
||
| The Build workflow has no concurrency group. Rapid pushes queue redundant runs: | ||
|
|
||
| ```yaml | ||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
| cancel-in-progress: true | ||
| ``` | ||
|
|
||
| The Playwright workflow also lacks a concurrency group. | ||
|
|
||
| ### 6. Add npm cache (saves **~3-5s/shard**) | ||
|
|
||
| Neither workflow caches npm. Add to `setup-node`: | ||
|
|
||
| ```yaml | ||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: 24 | ||
| cache: 'npm' | ||
| cache-dependency-path: | | ||
| package-lock.json | ||
| app/package-lock.json | ||
| ``` | ||
|
|
||
| ### 7. Docker build caching for releases (saves **~2-5 min** on releases) | ||
|
|
||
| No layer caching on the Docker build. Add: | ||
|
|
||
| ```yaml | ||
| - uses: docker/build-push-action@v5 | ||
| with: | ||
| context: . | ||
| file: ./Dockerfile | ||
| push: true | ||
| tags: ${{ env.TAGS }} | ||
| cache-from: type=gha | ||
| cache-to: type=gha,mode=max | ||
| ``` | ||
|
|
||
| ### 8. Deduplicate npm installs in Playwright workflow | ||
|
|
||
| The Playwright workflow runs `npm ci` twice — once for frontend (`./app`) and once for root (Playwright). These could be consolidated or at least cached. | ||
|
|
||
| --- | ||
|
|
||
| ## Summary | ||
|
|
||
| | # | Optimization | Time saved | Effort | | ||
| |---|---|---|---| | ||
| | 1 | Cache/pre-seed FalkorDB data | **~7 min** | Medium | | ||
| | 2 | Cache Playwright browsers | **~1.5 min** | Low | | ||
| | 3 | Switch to `uv` from `pip` | **~40s** | Low | | ||
| | 4 | Remove unused backend install from Build | **~35s** | Trivial | | ||
| | 5 | Add concurrency groups | Variable | Trivial | | ||
| | 6 | Add npm cache | ~10s | Trivial | | ||
| | 7 | Docker layer caching | ~2-5 min (releases) | Low | | ||
| | 8 | Deduplicate npm installs | ~5s | Low | | ||
|
|
||
| **Total potential savings: ~9-10 min per CI run**, bringing Playwright from ~10 min/shard down to ~4-5 min/shard (dominated by the actual test execution). | ||
|
|
||
| The single biggest win is **pre-seeding FalkorDB data** — it alone accounts for 37% of the Playwright workflow runtime. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -122,3 +122,95 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_ | |
| return self.resolve_method(files, lsp, file_path, path, symbol) | ||
| else: | ||
| raise ValueError(f"Unknown key {key}") | ||
|
|
||
| def add_file_imports(self, file: File) -> None: | ||
| """ | ||
| Extract and add import statements from the file. | ||
|
|
||
| Supports: | ||
| - import module | ||
| - import module as alias | ||
| - from module import name | ||
| - from module import name1, name2 | ||
| - from module import name as alias | ||
| """ | ||
| try: | ||
| import warnings | ||
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("ignore") | ||
| # Query for both import types | ||
| import_query = self.language.query(""" | ||
| (import_statement) @import | ||
| (import_from_statement) @import_from | ||
| """) | ||
|
|
||
| captures = import_query.captures(file.tree.root_node) | ||
|
|
||
| # Add all import statement nodes to the file | ||
| if 'import' in captures: | ||
| for import_node in captures['import']: | ||
| file.add_import(import_node) | ||
|
|
||
| if 'import_from' in captures: | ||
| for import_node in captures['import_from']: | ||
| file.add_import(import_node) | ||
| except Exception as e: | ||
| logger.debug(f"Failed to extract imports from {file.path}: {e}") | ||
|
|
||
| def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]: | ||
| """ | ||
| Resolve an import statement to the entities it imports. | ||
| """ | ||
| res = [] | ||
|
|
||
| try: | ||
| if import_node.type == 'import_statement': | ||
| # Handle "import module" or "import module as alias" | ||
| # Find all dotted_name and aliased_import nodes | ||
| for child in import_node.children: | ||
| if child.type == 'dotted_name': | ||
| # Try to resolve the module/name | ||
| identifier = child.children[0] if child.child_count > 0 else child | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
| elif child.type == 'aliased_import': | ||
| # Get the actual name from aliased import (before 'as') | ||
| if child.child_count > 0: | ||
| actual_name = child.children[0] | ||
| if actual_name.type == 'dotted_name' and actual_name.child_count > 0: | ||
| identifier = actual_name.children[0] | ||
| else: | ||
| identifier = actual_name | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
|
|
||
| elif import_node.type == 'import_from_statement': | ||
| # Handle "from module import name1, name2" | ||
| # Find the 'import' keyword to know where imported names start | ||
| import_keyword_found = False | ||
| for child in import_node.children: | ||
| if child.type == 'import': | ||
| import_keyword_found = True | ||
| continue | ||
|
|
||
| # After 'import' keyword, dotted_name nodes are the imported names | ||
| if import_keyword_found and child.type == 'dotted_name': | ||
| # Try to resolve the imported name | ||
| identifier = child.children[0] if child.child_count > 0 else child | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
| elif import_keyword_found and child.type == 'aliased_import': | ||
| # Handle "from module import name as alias" | ||
| if child.child_count > 0: | ||
| actual_name = child.children[0] | ||
| if actual_name.type == 'dotted_name' and actual_name.child_count > 0: | ||
| identifier = actual_name.children[0] | ||
| else: | ||
| identifier = actual_name | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
|
|
||
| except Exception as e: | ||
| logger.debug(f"Failed to resolve import: {e}") | ||
|
|
||
| return res | ||
|
Comment on lines
+160
to
+216
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check if wildcard imports exist in test files and need handling
rg -n "from .* import \*" tests/source_files/py_imports/ || echo "No wildcard imports in test data"
# Check the test data to understand what import patterns are being tested
fd -t f "\.py$" tests/source_files/py_imports/ --exec cat {}Repository: FalkorDB/code-graph Length of output: 630 🏁 Script executed: # Check for wildcard imports in entire codebase (not just tests)
rg -n "from .* import \*" --type py .
# Look for resolve_type and resolve_method implementations
rg -n "def resolve_type|def resolve_method" api/analyzers/python/analyzer.pyRepository: FalkorDB/code-graph Length of output: 1166 🏁 Script executed: # Get the resolve_type method implementation
sed -n '94,110p' api/analyzers/python/analyzer.py
# Get the resolve_method method implementation
sed -n '104,120p' api/analyzers/python/analyzer.py
# Check what the Entity class looks like to understand what can be resolved
rg -n "class Entity" api/entities.py | head -20Repository: FalkorDB/code-graph Length of output: 2072 🏁 Script executed: # Find where entities are defined
fd -t f entities.py
# Check the Python analyzer imports to understand the structure
head -20 api/analyzers/python/analyzer.py
# Check if there are tests for function imports
rg -n "function" tests/ --type py | grep -i import | head -20Repository: FalkorDB/code-graph Length of output: 1466 🏁 Script executed: # View the full test to see what's expected
cat tests/test_py_imports.py
# Check if wildcard import tests exist
rg -n "import \*" tests/ --type py -B2 -A2Repository: FalkorDB/code-graph Length of output: 2990 🏁 Script executed: # Find where resolve_import is called
rg -n "resolve_import" api/analyzers/python/analyzer.py
# Look at what calls resolve_import and in what context
rg -B5 -A5 "resolve_import" api/analyzers/python/analyzer.pyRepository: FalkorDB/code-graph Length of output: 686 🏁 Script executed: # Search for calls to resolve_import (function invocation)
rg "\.resolve_import\(" api/analyzers/python/analyzer.py
# Also check if resolve_import is used anywhere else
rg "resolve_import" api/analyzers/python/Repository: FalkorDB/code-graph Length of output: 239 🏁 Script executed: # Find where add_import is called and how imports are processed
rg -n "add_import" api/analyzers/python/analyzer.py -B10 -A10
# Check the broader context around the import extraction
rg -n "extract_imports\|import_statement\|import_from" api/analyzers/python/analyzer.py -B3 -A3Repository: FalkorDB/code-graph Length of output: 1176 🏁 Script executed: # Look at the File class to understand add_import and how imports are processed
fd -t f "file.py\|entities.py" api/ --exec head -50 {}
# Search for where IMPORTS relationship is created
rg -n "IMPORTS\|add_import" api/ --type py -B2 -A2 | head -100Repository: FalkorDB/code-graph Length of output: 45 🏁 Script executed: # Search more broadly for IMPORTS relationship
rg -n "IMPORTS" . --type py
# Find the File class definition
rg -n "class File" api/ --type py -A20Repository: FalkorDB/code-graph Length of output: 1726 🏁 Script executed: # Look at source_analyzer.py around line 182 to see how imports are resolved
sed -n '170,200p' api/analyzers/source_analyzer.py
# Look at add_import method in File class
rg -n "def add_import" api/entities/file.py -A10
# Check resolved_imports and how it's populated
rg -n "resolved_imports" api/entities/file.py -B2 -A2Repository: FalkorDB/code-graph Length of output: 2516 Handle function imports and wildcard imports in The implementation handles
The existing test 🧰 Tools🪛 Ruff (0.15.5)[error] 160-160: (F405) [error] 160-160: (F405) [warning] 213-213: Do not catch blind exception: (BLE001) 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| #include <stdio.h> | ||
| #include "/src/ff.h" | ||
|
|
||
|
|
||
| /* Create an empty intset. */ | ||
| intset* intsetNew(void) { | ||
| intset *is = zmalloc(sizeof(intset)); | ||
| is->encoding = intrev32ifbe(INTSET_ENC_INT16); | ||
| is->length = 0; | ||
| return is; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: FalkorDB/code-graph
Length of output: 686
Import
FileandEntityexplicitly before using them in these annotations.Ruff flags F405 on lines 131 and 139 because
FileandEntityare available only through the wildcard import on line 4. If CI runs linting, this change will fail the branch even though the methods are stubs.Minimal fix
🧰 Tools
🪛 Ruff (0.15.5)
[error] 131-131:
Filemay be undefined, or defined from star imports(F405)
[warning] 139-139: Unused method argument:
files(ARG002)
[error] 139-139:
Filemay be undefined, or defined from star imports(F405)
[warning] 139-139: Unused method argument:
lsp(ARG002)
[warning] 139-139: Unused method argument:
file_path(ARG002)
[warning] 139-139: Unused method argument:
path(ARG002)
[warning] 139-139: Unused method argument:
import_node(ARG002)
[error] 139-139:
Entitymay be undefined, or defined from star imports(F405)
🤖 Prompt for AI Agents