The Python Analyzer module is a language-specific analyzer component responsible for parsing and extracting code structure information from Python files. It leverages Python's built-in AST (Abstract Syntax Tree) module to analyze Python source code, identify classes and functions, and extract relationships between code components.
Key Purpose: Convert raw Python source code into a structured representation of code components (classes, functions) with their dependencies and call relationships, enabling downstream documentation generation and code analysis.
The Python Analyzer is one of multiple language analyzers in the CodeWiki documentation system:
[Repository]
↓
[RepoAnalyzer] → [Language-Specific Analyzers]
├── Python Analyzer ← You are here
├── JavaScript Analyzer
├── TypeScript Analyzer
├── Java Analyzer
├── C/C++/C# Analyzers
└── PHP/Kotlin Analyzers
↓
[CallGraphAnalyzer] (builds call graphs from extracted nodes)
↓
[DocumentationGenerator] (generates markdown/HTML documentation)
The Python Analyzer follows this component flow:
- Input: Python source file with repository context
- Parsing: Convert source code to AST using
ast.parse() - Visitor Pattern: Traverse AST with specialized handlers
- Output: Node objects for classes/functions and Relationship objects for dependencies
Input File → AST Parser → Tree Visitor → [ClassDef, FunctionDef, Call handlers]
↓ ↓
Node Objects Relationships
Type: AST Visitor Pattern Implementation
Inherits from: ast.NodeVisitor
Responsibility: Extract code structure and relationships from Python AST
| Attribute | Type | Purpose |
|---|---|---|
file_path |
str |
Path to the Python file being analyzed |
content |
str |
Raw source code content |
repo_path |
Optional[str] |
Repository root for relative path calculation |
nodes |
List[Node] |
Extracted classes and functions |
call_relationships |
List[CallRelationship] |
Function call dependencies |
current_class_name |
Optional[str] |
Tracks nested class context |
current_function_name |
Optional[str] |
Tracks nested function context |
top_level_nodes |
dict |
Map of top-level components by name |
Initializes the analyzer with Python file context.
Parameters:
file_path: Path to Python filecontent: Raw file contentrepo_path: Optional repository root path
Side Effects: Initializes all tracking structures
Main analysis entry point - parses Python code and visits AST nodes.
Process:
- Parses content using
ast.parse() - Traverses AST using visitor pattern
- Collects nodes and relationships
- Handles syntax errors gracefully
Error Handling:
- SyntaxWarnings suppressed (regex patterns in analyzed code)
- SyntaxErrors logged with warnings
- General exceptions logged with traceback
# Usage
analyzer = PythonASTAnalyzer(file_path, content, repo_path)
analyzer.analyze()
nodes = analyzer.nodes
relationships = analyzer.call_relationshipsProcesses class definitions.
Extracts:
- Class name and inheritance (base classes)
- Docstring
- Source code range
- Component ID in format:
relative_path::ClassName
Creates:
Nodeobject with type"class"CallRelationshipfor each inherited base class
Special Behavior:
- Tracks
current_class_namefor nested context - Continues traversal into class body for methods
Processes function definitions.
Extracts:
- Function name and parameters
- Docstring
- Source code range
- Component ID in format:
relative_path::function_name(only top-level)
Filtering:
- Skips test functions (starting with
_test_) - Only captures top-level functions (not methods)
Special Behavior:
- Tracks
current_function_namefor nested context - Continues traversal into function body
Processes function call nodes.
Relationship Extraction:
- Identifies caller (current class or function)
- Identifies callee (called function name)
- Records call line number
- Marks relationship as resolved/unresolved
Scope Tracking:
- Only records calls within class definitions or top-level functions
- Ignores calls within nested structures
Example:
# If analyzing:
class MyClass:
def method(self):
helper_func() # Creates CallRelationship
print("test") # Ignored (builtin)Source: dependency_analyzer_models
Represents a code component (class or function).
Key Fields for Python:
Node(
id="relative_path::ComponentName", # Unique identifier
name="ComponentName", # Component name
component_type="class" | "function", # Type discriminator
file_path="/absolute/path/file.py", # Absolute path
relative_path="src/module/file.py", # Repo-relative path
source_code="...", # Full source text
start_line=10, # 1-indexed line number
end_line=25, # Inclusive end line
has_docstring=True, # Docstring presence
docstring="Documentation text", # Extracted docstring
parameters=["arg1", "arg2"], # For functions
base_classes=["BaseClass", "Mixin"], # For classes
display_name="class MyClass" # Human-readable name
)Source: dependency_analyzer_models
Represents a dependency between two code components.
Structure:
CallRelationship(
caller="relative_path::ClassName", # Caller ID
callee="relative_path::function_name", # Callee ID
call_line=45, # Where call occurs
is_resolved=True # Whether callee exists in repo
)Resolution States:
is_resolved=True: Callee found in same file (top-level)is_resolved=False: External call (different file/module)
Phase 1: Input & Parsing
- Read Python file content
- Parse to AST using
ast.parse() - Validate syntax
Phase 2: Tree Traversal
- Visit AST nodes using visitor pattern
- For each node type:
- ClassDef: Extract class name, bases, docstring → Create
Nodeobject - FunctionDef/AsyncFunctionDef: Extract function details → Create
Nodeobject - Call: Extract call relationships → Create
CallRelationshipobject
- ClassDef: Extract class name, bases, docstring → Create
Phase 3: Output
- Collect all
Nodeobjects (classes and functions) - Collect all
CallRelationshipobjects - Return tuple of (nodes, relationships)
-
Initialization
- Store file context (path, content, repo root)
- Initialize empty collections for nodes and relationships
-
Parsing
- Parse Python content using
ast.parse() - Handle SyntaxErrors gracefully with logging
- Parse Python content using
-
Tree Traversal
- Visit AST root node
- Recursively visit all child nodes using visitor pattern
- Dispatch to specialized visit methods based on node type
-
Class Processing
- Extract class name, base classes, docstring
- Create
Nodeobject with type="class" - Create
CallRelationshipfor inheritance (if base exists) - Set context for nested method analysis
-
Function Processing
- Extract function name, parameters, docstring
- Create
Nodeobject only if top-level (not nested in class) - Apply filtering rules (skip test functions)
- Set context for analyzing function body
-
Call Tracking
- Identify function calls within current scope
- Filter out Python built-ins
- Create
CallRelationshipwith resolved/unresolved status - Continue traversal to nested calls
-
Result Compilation
- Return collected
nodesandcall_relationships - Ready for downstream analysis
- Return collected
Python analyzer maintains a comprehensive list of Python built-in functions and classes to avoid creating spurious dependencies:
PYTHON_BUILTINS = {
"print", "len", "str", "int", "float", "bool",
"list", "dict", "tuple", "set", "range", "enumerate",
"zip", "isinstance", "hasattr", "getattr", "setattr",
"open", "super", "__import__", "type", "object",
# ... (40+ built-ins)
"max", "min", "sum", "abs", "round", "sorted"
}Impact: Only user-defined function calls create relationships
Converts absolute paths to repository-relative paths for consistent component IDs:
def _get_relative_path() -> str:
if self.repo_path:
return os.path.relpath(self.file_path, self.repo_path)
return str(self.file_path)Example:
- Input:
/home/user/project/src/main/app.py - Repo root:
/home/user/project - Output:
src/main/app.py
Standardized format for unique identification:
relative_path::ComponentName
relative_path::ClassName.MethodName # For class methods (future)
Examples:
src/api/handlers.py::process_requestsrc/models/user.py::Usersrc/models/user.py::User.validate
Tracks nested scope to distinguish between:
class MyClass: # Top-level class → Node created
def method(self): # Class method → NOT a node
def helper(): # Nested function → NOT a node
pass
def top_level_func(): # Top-level function → Node created
def inner(): # Nested function → NOT a node
pass| Error Type | Handler | Result |
|---|---|---|
| SyntaxWarning (escape sequences) | Suppressed | Silent ignore |
| SyntaxError | Logged warning | Return empty results |
| Exception | Logged error + traceback | Return empty results |
Components that feed into PythonASTAnalyzer:
- RepoAnalyzer: Reads Python files and invokes the analyzer
- FileManager: Provides file content and path utilities
- Logger: Logs analysis events for debugging
External Dependencies:
codewiki.src.be.dependency_analyzer.models.core: Node, CallRelationship- Standard library:
ast,logging,pathlib,os,sys
Analysis Pipeline Flow:
PythonASTAnalyzer
↓ (nodes, relationships)
CallGraphAnalyzer
↓ (call graph)
DependencyGraphBuilder
↓ (dependency structure)
DocumentationGenerator
↓ (analysis results)
Output: Markdown/HTML
Key Consumers:
- dependency_analysis_services: RepoAnalyzer, CallGraphAnalyzer
- dependency_graph_construction: DependencyGraphBuilder
- documentation_generation: DocumentationGenerator
from codewiki.src.be.dependency_analyzer.analyzers.python import (
PythonASTAnalyzer,
analyze_python_file
)
# Method 1: Using utility function
file_path = "src/models/user.py"
with open(file_path, 'r') as f:
content = f.read()
nodes, relationships = analyze_python_file(
file_path=file_path,
content=content,
repo_path="/home/user/project"
)
# nodes: List[Node] - extracted classes and functions
# relationships: List[CallRelationship] - function calls# Method 2: Using analyzer class directly
analyzer = PythonASTAnalyzer(
file_path="src/api/handlers.py",
content=file_content,
repo_path="/home/user/project"
)
# Analyze the file
analyzer.analyze()
# Access results
for node in analyzer.nodes:
print(f"Found {node.component_type}: {node.name}")
print(f" Location: {node.file_path}:{node.start_line}")
if node.docstring:
print(f" Docs: {node.docstring[:50]}...")
for rel in analyzer.call_relationships:
status = "✓" if rel.is_resolved else "?"
print(f"{status} {rel.caller} → {rel.callee} (line {rel.call_line})")from pathlib import Path
repo_path = Path("/home/user/project")
all_nodes = []
all_relationships = []
for py_file in repo_path.rglob("*.py"):
if "venv" in py_file.parts or "__pycache__" in py_file.parts:
continue
content = py_file.read_text()
nodes, rels = analyze_python_file(
str(py_file),
content,
str(repo_path)
)
all_nodes.extend(nodes)
all_relationships.extend(rels)
print(f"Total: {len(all_nodes)} components, {len(all_relationships)} relationships")The analyzer implements the classic Visitor pattern for AST traversal:
NodeVisitor (base class)
↓
PythonASTAnalyzer (concrete visitor)
├── visit_ClassDef()
├── visit_FunctionDef()
├── visit_AsyncFunctionDef()
├── visit_Call()
└── generic_visit() (default handler)
Benefit: Separates AST structure from processing logic
Maintains stack-like context for nested structures:
self.current_class_name = None # Track class scope
self.current_function_name = None # Track function scope
# When entering:
self.current_class_name = "MyClass" # Set context
self.generic_visit(node) # Visit children
self.current_class_name = None # Restore contextBenefit: Distinguishes between methods and functions
Phase 1: Extract relationships with is_resolved=?
↓
Phase 2: Match against top_level_nodes dictionary
↓
Result: is_resolved=True/False flags set
Benefit: Handles forward references gracefully
| Feature | Support | Notes |
|---|---|---|
| Classes | ✅ | Including inheritance |
| Functions | ✅ | Top-level only |
| Async Functions | ✅ | Same as regular functions |
| Methods | Extracted but not indexed as separate nodes | |
| Decorators | ⏳ | Not currently extracted |
| Type Hints | ⏳ | Not currently used |
| Imports | ⏳ | Not analyzed for dependency extraction |
| Docstrings | ✅ | Extracted and stored |
| Parameters | ✅ | Extracted for functions |
- No Import Analysis: Cross-module dependencies not tracked
- Limited Method Analysis: Methods attached to classes but not independent nodes
- No Decorator Extraction: Decorator information lost
- No Type Hint Parsing: Type annotations ignored
- Syntax-Only: No semantic analysis or type checking
Uses standard Python logging:
logger = logging.getLogger(__name__)
# Log levels:
logger.debug() # Analysis completion info
logger.warning() # SyntaxErrors during parsing
logger.error() # Unexpected exceptionsConfigure logging via dependency_analyzer_utils:
from codewiki.src.be.dependency_analyzer.utils.logging_config import ColoredFormatter| Operation | Complexity | Notes |
|---|---|---|
| File parsing | O(n) | n = lines of code |
| AST traversal | O(n) | Linear tree walk |
| Call resolution | O(k) | k = number of calls |
| Total | O(n) | Single pass analysis |
Typical Performance:
- Small file (100 LOC): < 10ms
- Medium file (1000 LOC): 10-50ms
- Large file (10000 LOC): 50-200ms
# File content has syntax error
file_content = """
def broken_function(
# Missing closing paren
"""
analyzer = PythonASTAnalyzer(path, file_content)
analyzer.analyze()
# Result:
# ⚠️ Warning logged: "Could not parse file.py: invalid syntax"
# nodes = [] (empty)
# call_relationships = [] (empty)# File being analyzed contains regex patterns
file_content = r"""
regex_pattern = "^\d+" # Warning without suppression
"""
analyzer = PythonASTAnalyzer(path, file_content)
analyzer.analyze()
# Result:
# ✅ Analyzed successfully (warnings suppressed)
# nodes = [any found classes/functions]analyzer = PythonASTAnalyzer(
file_path="/absolute/path/file.py",
content=file_content,
repo_path=None # No repo context
)
# Fallback behavior:
# - Uses absolute path for component IDs
# - Still extracts structure normally
# - Less useful for cross-repo linking-
ClassDef Handling
- Simple classes
- Inherited classes
- Multiple inheritance
-
Function Extraction
- Top-level functions
- Async functions
- Test function filtering
-
Call Detection
- Simple function calls
- Method calls
- Built-in filtering
- Unresolved calls
-
Error Resilience
- Invalid syntax
- Missing files
- Encoding issues
# Minimal mock for testing
mock_node = Node(
id="test.py::TestFunc",
name="TestFunc",
component_type="function",
file_path="test.py",
relative_path="test.py",
source_code="def TestFunc(): pass",
start_line=1,
end_line=1,
has_docstring=False,
docstring="",
parameters=[],
node_type="function",
base_classes=None,
class_name=None,
display_name="function TestFunc",
component_id="test.py::TestFunc"
)- dependency_analyzer_models: Core data structures (Node, CallRelationship)
- language_analyzers: Other language analyzers (JS, TypeScript, Java, etc.)
- dependency_analysis_services: RepoAnalyzer that orchestrates language-specific analyzers
- dependency_graph_construction: Builds dependency graphs from analysis results
- documentation_generation: Generates documentation from analyzed code
-
Import Tracking
- Extract
importandfrom...importstatements - Build cross-module dependency graph
- Track external library usage
- Extract
-
Type Hint Extraction
- Parse type annotations
- Track parameter and return types
- Enable type-based documentation
-
Decorator Support
- Extract decorator information
- Track framework-specific markers (Flask routes, FastAPI endpoints, etc.)
- Integrate with LLM analysis
-
Advanced Scope Analysis
- Independent method nodes
- Nested function support
- Lambda expression tracking
-
Semantic Analysis
- Basic type inference
- Unused code detection
- Complexity metrics
The Python Analyzer module provides robust, AST-based parsing of Python source files, extracting class and function definitions along with their call relationships. Its integration with the broader CodeWiki analysis pipeline enables:
- Accurate Code Mapping: Complete extraction of Python code structure
- Dependency Tracking: Identification of function call relationships
- Foundation for Documentation: Structured data ready for LLM analysis and documentation generation
- Language Agnostic Integration: Plugs seamlessly into multi-language analysis system
By combining Python's native AST module with the visitor pattern and careful context tracking, the analyzer achieves high precision while maintaining simplicity and maintainability.