Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions CI_OPTIMIZATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# CI Pipeline Optimization Analysis (Staging Branch)

## Current Workflows on Staging

The staging branch has 3 workflow files (identical to main):

| Workflow | File | Trigger | ~Duration |
|---|---|---|---|
| **Build** | `nextjs.yml` | All PRs + push to main | **~1 min** |
| **Playwright Tests** | `playwright.yml` | PRs + push to main/staging | **~10 min** (x2 shards) |
| **Release image** | `release-image.yml` | Tags + main push | release-only |

Additionally, **CodeQL** runs on staging pushes.

## Playwright Tests — The Bottleneck

This is the critical path. It runs 2 shards in parallel, each taking ~10 min. Measured from recent staging runs:

| Step | Shard 1 | Shard 2 | % of total |
|---|---|---|---|
| **Seed test data into FalkorDB** | **223s** | **220s** | **37%** |
| **Run Playwright tests** | 264s | 262s | 44% |
| **Install Playwright browsers** | 48s | 51s | 8% |
| Install backend deps (`pip install`) | 28s | 31s | 5% |
| Build frontend | 12s | 12s | 2% |
| Install frontend deps (`npm ci`) | 8s | 8s | 1% |
| Container init + setup | ~15s | ~15s | 3% |

**Total per shard: ~600s (10 min). Total billable: ~20 min.**

## Build Workflow — Wasted Work

The Build workflow (~64s total) installs backend dependencies but does nothing with them:

| Step | Duration |
|---|---|
| Install frontend deps | 7s |
| Build frontend | 14s |
| Lint frontend | <1s |
| **Install backend deps (`pip install`)** | **35s** |

The backend install accounts for **55% of the Build workflow** and serves no purpose.

---

## Optimization Recommendations

### 1. Cache or pre-seed FalkorDB test data (saves **~3.5 min/shard = ~7 min total**)

`seed_test_data.py` clones 2 GitHub repos (GraphRAG-SDK, Flask) and runs full source analysis every run. This is the single biggest time sink at **37% of Playwright runtime**.

**Options:**
- **Best**: Export the seeded graph as an RDB dump, commit it as a test fixture, and restore with `redis-cli`. Eliminates the 220s step entirely.
- **Good**: Cache the cloned repos + analysis output with `actions/cache` keyed on the seed script hash + repo commit SHAs.
- **Minimum**: Cache just the git clones to skip network time.

### 2. Cache Playwright browsers (saves **~50s/shard = ~1.5 min total**)

Browsers are installed from scratch every run (`npx playwright install --with-deps`). Add:

```yaml
- name: Cache Playwright browsers
id: playwright-cache
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}

- name: Install Playwright Browsers
if: steps.playwright-cache.outputs.cache-hit != 'true'
run: npx playwright install --with-deps chromium

- name: Install Playwright system deps
if: steps.playwright-cache.outputs.cache-hit == 'true'
run: npx playwright install-deps chromium
```

### 3. Switch `pip install` to `uv` (saves **~15-20s/shard**)

Both workflows use slow `pip install`. `uv sync` is 3-5x faster:

```yaml
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"

- name: Install dependencies
run: uv sync
```

### 4. Remove unused backend install from Build workflow (saves **~35s**)

`nextjs.yml` installs backend deps but runs no backend tests or lint. Either:
- **Remove** the `Setup Python` and `Install backend dependencies` steps entirely
- **Or** add backend unit tests / pylint to justify the install

### 5. Add concurrency groups (saves **queued minutes**)

The Build workflow has no concurrency group. Rapid pushes queue redundant runs:

```yaml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
```

The Playwright workflow also lacks a concurrency group.

### 6. Add npm cache (saves **~3-5s/shard**)

Neither workflow caches npm. Add to `setup-node`:

```yaml
- uses: actions/setup-node@v4
with:
node-version: 24
cache: 'npm'
cache-dependency-path: |
package-lock.json
app/package-lock.json
```

### 7. Docker build caching for releases (saves **~2-5 min** on releases)

No layer caching on the Docker build. Add:

```yaml
- uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile
push: true
tags: ${{ env.TAGS }}
cache-from: type=gha
cache-to: type=gha,mode=max
```

### 8. Deduplicate npm installs in Playwright workflow

The Playwright workflow runs `npm ci` twice — once for frontend (`./app`) and once for root (Playwright). These could be consolidated or at least cached.

---

## Summary

| # | Optimization | Time saved | Effort |
|---|---|---|---|
| 1 | Cache/pre-seed FalkorDB data | **~7 min** | Medium |
| 2 | Cache Playwright browsers | **~1.5 min** | Low |
| 3 | Switch to `uv` from `pip` | **~40s** | Low |
| 4 | Remove unused backend install from Build | **~35s** | Trivial |
| 5 | Add concurrency groups | Variable | Trivial |
| 6 | Add npm cache | ~10s | Trivial |
| 7 | Docker layer caching | ~2-5 min (releases) | Low |
| 8 | Deduplicate npm installs | ~5s | Low |

**Total potential savings: ~9-10 min per CI run**, bringing Playwright from ~10 min/shard down to ~4-5 min/shard (dominated by the actual test execution).

The single biggest win is **pre-seeding FalkorDB data** — it alone accounts for 37% of the Playwright workflow runtime.
29 changes: 29 additions & 0 deletions api/analyzers/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,32 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_

pass

@abstractmethod
def add_file_imports(self, file: File) -> None:
"""
Add import statements to the file.

Args:
file (File): The file to add imports to.
"""

pass

@abstractmethod
def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]:
"""
Resolve an import statement to entities.

Args:
files (dict[Path, File]): All files in the project.
lsp (SyncLanguageServer): The language server.
file_path (Path): The path to the file containing the import.
path (Path): The path to the project root.
import_node (Node): The import statement node.

Returns:
list[Entity]: List of resolved entities.
"""

pass

16 changes: 16 additions & 0 deletions api/analyzers/java/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,19 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_
return self.resolve_method(files, lsp, file_path, path, symbol)
else:
raise ValueError(f"Unknown key {key}")

def add_file_imports(self, file: File) -> None:
"""
Extract and add import statements from the file.
Java imports are not yet implemented.
"""
# TODO: Implement Java import tracking
pass

def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]:
"""
Resolve an import statement to the entities it imports.
Java imports are not yet implemented.
"""
# TODO: Implement Java import resolution
return []
Comment on lines +131 to +145
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether File/Entity are re-exported or only arrive via the wildcard import.
sed -n '1,200p' api/entities/__init__.py
rg -n -C2 '^class (File|Entity)\b' api/entities

Repository: FalkorDB/code-graph

Length of output: 686


Import File and Entity explicitly before using them in these annotations.

Ruff flags F405 on lines 131 and 139 because File and Entity are available only through the wildcard import on line 4. If CI runs linting, this change will fail the branch even though the methods are stubs.

Minimal fix
 from ...entities import *
+from ...entities.entity import Entity
+from ...entities.file import File
🧰 Tools
🪛 Ruff (0.15.5)

[error] 131-131: File may be undefined, or defined from star imports

(F405)


[warning] 139-139: Unused method argument: files

(ARG002)


[error] 139-139: File may be undefined, or defined from star imports

(F405)


[warning] 139-139: Unused method argument: lsp

(ARG002)


[warning] 139-139: Unused method argument: file_path

(ARG002)


[warning] 139-139: Unused method argument: path

(ARG002)


[warning] 139-139: Unused method argument: import_node

(ARG002)


[error] 139-139: Entity may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/java/analyzer.py` around lines 131 - 145, The annotations in
add_file_imports and resolve_import reference File and Entity but those names
are only available via a wildcard import; add explicit imports for File and
Entity at the top of the module (alongside existing imports like Node,
SyncLanguageServer, Path) so the type annotations no longer rely on the
wildcard; update the module import list to include File and Entity (used by
add_file_imports and resolve_import) and run linters to confirm F405 is
resolved.

92 changes: 92 additions & 0 deletions api/analyzers/python/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,95 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_
return self.resolve_method(files, lsp, file_path, path, symbol)
else:
raise ValueError(f"Unknown key {key}")

def add_file_imports(self, file: File) -> None:
"""
Extract and add import statements from the file.

Supports:
- import module
- import module as alias
- from module import name
- from module import name1, name2
- from module import name as alias
"""
try:
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
# Query for both import types
import_query = self.language.query("""
(import_statement) @import
(import_from_statement) @import_from
""")

captures = import_query.captures(file.tree.root_node)

# Add all import statement nodes to the file
if 'import' in captures:
for import_node in captures['import']:
file.add_import(import_node)

if 'import_from' in captures:
for import_node in captures['import_from']:
file.add_import(import_node)
except Exception as e:
logger.debug(f"Failed to extract imports from {file.path}: {e}")

def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]:
"""
Resolve an import statement to the entities it imports.
"""
res = []

try:
if import_node.type == 'import_statement':
# Handle "import module" or "import module as alias"
# Find all dotted_name and aliased_import nodes
for child in import_node.children:
if child.type == 'dotted_name':
# Try to resolve the module/name
identifier = child.children[0] if child.child_count > 0 else child
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
res.extend(resolved)
elif child.type == 'aliased_import':
# Get the actual name from aliased import (before 'as')
if child.child_count > 0:
actual_name = child.children[0]
if actual_name.type == 'dotted_name' and actual_name.child_count > 0:
identifier = actual_name.children[0]
else:
identifier = actual_name
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
res.extend(resolved)

elif import_node.type == 'import_from_statement':
# Handle "from module import name1, name2"
# Find the 'import' keyword to know where imported names start
import_keyword_found = False
for child in import_node.children:
if child.type == 'import':
import_keyword_found = True
continue

# After 'import' keyword, dotted_name nodes are the imported names
if import_keyword_found and child.type == 'dotted_name':
# Try to resolve the imported name
identifier = child.children[0] if child.child_count > 0 else child
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
res.extend(resolved)
elif import_keyword_found and child.type == 'aliased_import':
# Handle "from module import name as alias"
if child.child_count > 0:
actual_name = child.children[0]
if actual_name.type == 'dotted_name' and actual_name.child_count > 0:
identifier = actual_name.children[0]
else:
identifier = actual_name
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
res.extend(resolved)

except Exception as e:
logger.debug(f"Failed to resolve import: {e}")

return res
Comment on lines +160 to +216
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if wildcard imports exist in test files and need handling
rg -n "from .* import \*" tests/source_files/py_imports/ || echo "No wildcard imports in test data"

# Check the test data to understand what import patterns are being tested
fd -t f "\.py$" tests/source_files/py_imports/ --exec cat {}

Repository: FalkorDB/code-graph

Length of output: 630


🏁 Script executed:

# Check for wildcard imports in entire codebase (not just tests)
rg -n "from .* import \*" --type py .

# Look for resolve_type and resolve_method implementations
rg -n "def resolve_type|def resolve_method" api/analyzers/python/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 1166


🏁 Script executed:

# Get the resolve_type method implementation
sed -n '94,110p' api/analyzers/python/analyzer.py

# Get the resolve_method method implementation  
sed -n '104,120p' api/analyzers/python/analyzer.py

# Check what the Entity class looks like to understand what can be resolved
rg -n "class Entity" api/entities.py | head -20

Repository: FalkorDB/code-graph

Length of output: 2072


🏁 Script executed:

# Find where entities are defined
fd -t f entities.py

# Check the Python analyzer imports to understand the structure
head -20 api/analyzers/python/analyzer.py

# Check if there are tests for function imports
rg -n "function" tests/ --type py | grep -i import | head -20

Repository: FalkorDB/code-graph

Length of output: 1466


🏁 Script executed:

# View the full test to see what's expected
cat tests/test_py_imports.py

# Check if wildcard import tests exist
rg -n "import \*" tests/ --type py -B2 -A2

Repository: FalkorDB/code-graph

Length of output: 2990


🏁 Script executed:

# Find where resolve_import is called
rg -n "resolve_import" api/analyzers/python/analyzer.py

# Look at what calls resolve_import and in what context
rg -B5 -A5 "resolve_import" api/analyzers/python/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 686


🏁 Script executed:

# Search for calls to resolve_import (function invocation)
rg "\.resolve_import\(" api/analyzers/python/analyzer.py

# Also check if resolve_import is used anywhere else
rg "resolve_import" api/analyzers/python/

Repository: FalkorDB/code-graph

Length of output: 239


🏁 Script executed:

# Find where add_import is called and how imports are processed
rg -n "add_import" api/analyzers/python/analyzer.py -B10 -A10

# Check the broader context around the import extraction
rg -n "extract_imports\|import_statement\|import_from" api/analyzers/python/analyzer.py -B3 -A3

Repository: FalkorDB/code-graph

Length of output: 1176


🏁 Script executed:

# Look at the File class to understand add_import and how imports are processed
fd -t f "file.py\|entities.py" api/ --exec head -50 {}

# Search for where IMPORTS relationship is created
rg -n "IMPORTS\|add_import" api/ --type py -B2 -A2 | head -100

Repository: FalkorDB/code-graph

Length of output: 45


🏁 Script executed:

# Search more broadly for IMPORTS relationship
rg -n "IMPORTS" . --type py

# Find the File class definition
rg -n "class File" api/ --type py -A20

Repository: FalkorDB/code-graph

Length of output: 1726


🏁 Script executed:

# Look at source_analyzer.py around line 182 to see how imports are resolved
sed -n '170,200p' api/analyzers/source_analyzer.py

# Look at add_import method in File class
rg -n "def add_import" api/entities/file.py -A10

# Check resolved_imports and how it's populated
rg -n "resolved_imports" api/entities/file.py -B2 -A2

Repository: FalkorDB/code-graph

Length of output: 2516


Handle function imports and wildcard imports in resolve_import().

The implementation handles import module, import module as alias, from module import name, and from module import name as alias. However:

  1. Wildcard imports are not handled: Patterns like from module import * are not processed. These exist throughout the codebase (e.g., from .info import *, from .entities import *).

  2. Function imports are not resolved: The code only calls resolve_type(), which filters for class_definition and returns only classes. For function imports like from module import function_a, resolve_method() should also be tried to resolve function entities.

The existing test test_py_imports.py expects both class and function imports to be tracked, indicating this is a required use case.

🧰 Tools
🪛 Ruff (0.15.5)

[error] 160-160: File may be undefined, or defined from star imports

(F405)


[error] 160-160: Entity may be undefined, or defined from star imports

(F405)


[warning] 213-213: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/python/analyzer.py` around lines 160 - 216, The resolve_import
function currently only resolves class imports by calling resolve_type and
doesn't handle wildcard imports or function imports; update resolve_import to
(1) detect wildcard imports in import_from_statement (e.g., the '*' or
wildcard_import child) and, when found, resolve the entire module by calling the
module-resolution path used elsewhere (use the same file/module lookup used by
resolve_type) and add all exported entities to res; and (2) when resolving
individual imported names (both in import_statement/aliased_import and
import_from_statement/aliased_import/dotted_name cases) try both
resolve_type(...) and resolve_method(...) (or the function-resolution helper
used in this analyzer) and extend res with results from both so functions and
classes are captured. Ensure these changes are made inside resolve_import so
existing control flow (import_statement vs import_from_statement) remains intact
and exceptions are still caught.

13 changes: 13 additions & 0 deletions api/analyzers/source_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ def first_pass(self, path: Path, files: list[Path], ignore: list[str], graph: Gr
# Walk thought the AST
graph.add_file(file)
self.create_hierarchy(file, analyzer, graph)

# Extract import statements
if not analyzer.is_dependency(str(file_path)):
analyzer.add_file_imports(file)

def second_pass(self, graph: Graph, files: list[Path], path: Path) -> None:
"""
Expand Down Expand Up @@ -148,6 +152,8 @@ def second_pass(self, graph: Graph, files: list[Path], path: Path) -> None:
for i, file_path in enumerate(files):
file = self.files[file_path]
logging.info(f'Processing file ({i + 1}/{files_len}): {file_path}')

# Resolve entity symbols
for _, entity in file.entities.items():
entity.resolved_symbol(lambda key, symbol, fp=file_path: analyzers[fp.suffix].resolve_symbol(self.files, lsps[fp.suffix], fp, path, key, symbol))
for key, symbols in entity.symbols.items():
Expand All @@ -167,6 +173,13 @@ def second_pass(self, graph: Graph, files: list[Path], path: Path) -> None:
graph.connect_entities("RETURNS", entity.id, resolved_symbol.id)
elif key == "parameters":
graph.connect_entities("PARAMETERS", entity.id, resolved_symbol.id)

# Resolve file imports
for import_node in file.imports:
resolved_entities = analyzers[file_path.suffix].resolve_import(self.files, lsps[file_path.suffix], file_path, path, import_node)
for resolved_entity in resolved_entities:
file.add_resolved_import(resolved_entity)
graph.connect_entities("IMPORTS", file.id, resolved_entity.id)

def analyze_files(self, files: list[Path], path: Path, graph: Graph) -> None:
self.first_pass(path, files, [], graph)
Expand Down
20 changes: 20 additions & 0 deletions api/entities/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,30 @@ def __init__(self, path: Path, tree: Tree) -> None:
self.path = path
self.tree = tree
self.entities: dict[Node, Entity] = {}
self.imports: list[Node] = []
self.resolved_imports: set[Entity] = set()

def add_entity(self, entity: Entity):
entity.parent = self
self.entities[entity.node] = entity

def add_import(self, import_node: Node):
"""
Add an import statement node to track.

Args:
import_node (Node): The import statement node.
"""
self.imports.append(import_node)

def add_resolved_import(self, resolved_entity: Entity):
"""
Add a resolved import entity.

Args:
resolved_entity (Entity): The resolved entity that is imported.
"""
self.resolved_imports.add(resolved_entity)

def __str__(self) -> str:
return f"path: {self.path}"
Expand Down
11 changes: 11 additions & 0 deletions test-project/a.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#include <stdio.h>
#include "/src/ff.h"


/* Create an empty intset. */
intset* intsetNew(void) {
intset *is = zmalloc(sizeof(intset));
is->encoding = intrev32ifbe(INTSET_ENC_INT16);
is->length = 0;
return is;
}
Loading
Loading