Comprehensive pytest-based test suite for the skillkit library, validating all core functionality, integrations, edge cases, and performance characteristics.
pytestpytest --cov=src/skillkit --cov-report=html
# View report: open htmlcov/index.htmlpytest tests/test_parser.py -v
pytest tests/test_models.py -v
pytest tests/test_manager.py -v
pytest tests/test_script_detector.py -v # Script detection (Phase 10)
pytest tests/test_script_executor.py -v # Script execution (Phase 10)test_discovery.py - Skill discovery and filesystem scanning (7 tests passing)
- Validates discovery from multiple sources
- Tests graceful error handling for invalid skills
- Verifies duplicate name handling with warnings
- Tests empty directory handling with INFO logging
test_parser.py - YAML frontmatter parsing (8 tests passing)
- Tests valid skill parsing (basic, with arguments, Unicode)
- Validates error messages for invalid YAML
- Checks required field validation (name, description)
- Parametrized tests for all invalid skill scenarios
test_models.py - Data model validation (5 tests passing)
- Tests SkillMetadata and Skill dataclass instantiation
- Validates lazy content loading pattern
- Verifies content caching behavior (@cached_property)
- Tests optional fields (allowed_tools can be None)
test_processors.py - Content processing strategies (7 tests passing)
- Tests $ARGUMENTS substitution at various positions
- Validates escaping ($$ARGUMENTS → $ARGUMENTS literal)
- Tests size limits (1MB argument size enforcement)
- Tests special characters and empty arguments
test_manager.py - Orchestration layer (6 tests passing)
- Tests end-to-end workflows (discover → list → invoke)
- Validates skill not found error handling
- Tests graceful degradation with mixed valid/invalid skills
- Verifies caching behavior and content load errors
test_async_discovery.py - Async skill discovery functionality
- Tests async file I/O wrappers (_read_skill_file_async)
- Validates async discovery methods (ascan_directory, afind_skill_files)
- Tests SkillManager async discovery (adiscover)
- Verifies async/sync state management and AsyncStateError validation
- Tests concurrent async discovery and event loop responsiveness
- Validates async vs sync discovery equivalence
test_async_invocation.py - Async skill invocation capabilities
- Tests Skill.ainvoke() async method
- Validates SkillManager.ainvoke_skill() async method
- Tests concurrent async invocations (10+ parallel)
- Verifies async/sync state management and error handling
- Tests async invocation performance (minimal overhead <5ms)
- Validates edge cases (long arguments, special characters, Unicode)
- Stress tests with 50 concurrent invocations
test_langchain_async.py - Async LangChain integration
- Tests async LangChain tool creation with ainvoke()
- Validates concurrent tool invocations (10+ parallel)
- Tests dual-mode support (sync and async invocation)
- Verifies closure capture pattern for async tools
- Tests state management (AsyncStateError after sync discover)
- Validates Pydantic schema handling for async tools
- Tests async tool performance characteristics
test_discovery_plugin.py - Plugin discovery functionality
- Tests discover_plugin_manifest() function
- Validates multi-directory skill discovery from plugin manifests
- Tests graceful error handling for malformed manifests
- Verifies security validations (path traversal prevention)
- Tests async plugin discovery (adiscover_skills)
- Validates edge cases (empty skills list, non-existent directories)
test_parser_plugin.py - Plugin manifest parsing
- Tests parse_plugin_manifest() function
- Validates valid manifest parsing with all fields
- Tests missing required fields error handling
- Verifies JSON bomb protection (MAX_MANIFEST_SIZE limit)
- Tests security validations (path traversal, absolute paths, drive letters)
- Validates manifest version compatibility
- Tests integration with real fixture files
test_manager_plugin.py - SkillManager plugin integration
- Tests plugin source building with manifest parsing
- Validates plugin skill namespacing (_plugin_skills registry)
- Tests qualified name lookups (plugin:skill syntax)
- Verifies conflict resolution (project skills win over plugin)
- Tests multi-source discovery with plugins
- Validates end-to-end plugin workflows (sync and async)
test_script_detector.py - Script detection and metadata extraction (16 tests passing, 2 skipped)
- Tests detection of Python, Shell, JavaScript, Ruby, Perl scripts
- Validates exclusion of non-script files (.json, .md, .txt)
- Tests hidden file exclusion (files starting with
.) - Verifies pycache directory exclusion
- Tests nested directory scanning up to max_depth (default 5 levels)
- Validates description extraction from Python docstrings
- Tests description extraction from shell script comments
- Verifies JSDoc comment extraction for JavaScript
- Tests empty description handling when no comments exist
- Validates multiple script detection in single skill
- Tests graceful degradation on file read errors
- Performance benchmarks (skipped, requires pytest-benchmark)
test_script_executor.py - Script execution and security controls (17 tests passing)
- Tests successful script execution (exit code 0)
- Validates failed script execution (exit code 1)
- Tests timeout handling (exit code 124, configurable timeouts)
- Verifies JSON argument passing via stdin
- Tests path traversal prevention (../../etc/passwd blocked)
- Validates symlink security (rejects symlinks outside skill directory)
- Tests setuid/setgid permission checks (dangerous permissions rejected)
- Verifies output truncation at 10MB limit
- Tests environment variable injection (SKILL_NAME, SKILL_BASE_DIR, SKILL_VERSION, SKILLKIT_VERSION)
- Validates tool restriction enforcement (requires "Bash" in allowed-tools)
- Tests None/empty allowed_tools (allows all scripts)
- Verifies execution time measurement accuracy
- Tests argument size limit enforcement (10MB max)
- Validates signal detection (SIGSEGV, SIGKILL)
- Tests interpreter not found error handling
- Performance benchmarks (skipped, requires pytest-benchmark)
test_path_resolver.py - Secure file path resolution
- Tests FilePathResolver.resolve_path() function
- Validates relative path resolution within base directory
- Tests path traversal prevention (../, absolute paths)
- Verifies symlink resolution and escape detection
- Tests security error logging with detailed context
- Validates cross-platform path handling
- Tests edge cases (Unicode, spaces, special characters, very long paths)
test_file_references_integration.py - File reference integration
- Tests end-to-end file reference resolution workflow
- Validates integration with file-reference-skill example
- Tests skill invocation with supporting files
- Verifies security validation in real-world scenarios
- Tests symlink escape blocking in real skill usage
- Validates performance (<1ms per resolution, <100ms for 100 files)
test_langchain_integration.py - LangChain StructuredTool integration (8 tests passing)
- Validates tool creation from skills
- Tests tool invocation and argument passing
- Verifies error propagation to framework
- Tests long arguments (10KB+)
- Validates tool count matches skill count
test_edge_cases.py - Boundary conditions and error scenarios
- ✅ Invalid YAML syntax handling
- ✅ Symlink handling in skill directories
- ✅ Permission denied on Unix (tested)
- ✅ Missing required field logging
- ✅ Content load error after file deletion
- ✅ Duplicate skill name handling
- ✅ Large skills (500KB+ content) with lazy loading
- ✅ Windows line endings on Unix
test_performance.py - Performance validation
- ✅ Discovery time: <500ms for 50 skills
- ✅ Invocation overhead: <25ms average
- ✅ Memory usage: <5MB for 50 skills
- ✅ Cache effectiveness validation
test_installation.py - Package distribution validation (8 tests passing)
- Import validation with/without extras
- Version metadata validation
- Package structure verification
- Type hints availability (py.typed marker)
Pre-created SKILL.md files for consistent testing:
Valid Skills:
- valid-basic/ - Minimal valid skill
- valid-with-arguments/ - Skill with $ARGUMENTS placeholder
- valid-unicode/ - Skill with Unicode content (你好 🎉)
Invalid Skills:
- invalid-missing-name/ - Missing required 'name' field
- invalid-missing-description/ - Missing required 'description' field
- invalid-yaml-syntax/ - Malformed YAML frontmatter
Edge Case Skills:
- edge-large-content/ - Large skill (1MB+ content) for lazy loading tests
- edge-special-chars/ - Special characters and injection pattern testing
Script Execution Skills:
- script-skill/ - Test skill with multiple scripts for execution testing
scripts/extract.py- Python script demonstrating JSON stdin processingscripts/convert.sh- Shell script for format conversion examplescripts/stdin_test.py- Python script for JSON argument validationscripts/timeout_test.py- Python script with infinite loop for timeout testing
- restricted-skill/ - Test skill with tool restrictions (no Bash in allowed-tools)
- Used for testing tool restriction enforcement
- Demonstrates blocked script execution when Bash not allowed
Programmatic fixtures for flexible testing:
- temp_skills_dir - Temporary directory for test isolation (auto-cleanup)
- skill_factory - Factory function for creating SKILL.md files dynamically
- sample_skills - Pre-created set of 5 diverse sample skills
- fixtures_dir - Path to static test fixtures directory (tests/fixtures/skills/)
- skills_directory - Path to example skills directory (examples/skills/)
- skill_manager_async - Async-initialized SkillManager for async tests
- create_large_skill - Helper for creating 500KB+ skills
- create_permission_denied_skill - Factory for Unix permission error testing
Filter tests by category using pytest markers:
# Run only integration tests
pytest -m integration
# Run only async tests
pytest -m asyncio
# Run only performance tests
pytest -m performance
# Skip slow tests
pytest -m "not slow"
# Run LangChain-specific tests
pytest -m requires_langchain
# Run plugin tests
pytest -m plugin
# Run security tests
pytest -m security
# Run script execution tests
pytest tests/test_script_detector.py tests/test_script_executor.pyAvailable markers:
integration- Integration tests with external frameworksasyncio- Async tests requiring asyncio event loopperformance- Performance validation tests (may take 15+ seconds)slow- Tests that take longer than 1 secondrequires_langchain- Tests requiring langchain-core dependencyplugin- Plugin system tests (discovery, parsing, manager)security- Security validation tests (path traversal, symlinks, script permissions)
Minimum coverage: 70% line coverage across all modules Current coverage: ~39% overall, 83.8% for script modules ✅ (script module exceeds target)
Coverage by Module (as of Phase 10 completion):
__init__.py: 100.00%core/__init__.py: 100.00%core/scripts.py: 83.80% ✅ (Phase 10 - Script Execution)core/exceptions.py: 72.97% ✅core/models.py: 36.96%core/processors.py: 30.59%core/path_resolver.py: 31.82%core/parser.py: 12.80%core/discovery.py: 10.96%core/manager.py: 7.94%integrations/langchain.py: 0.00% (needs script tool integration tests)
Note: Overall project coverage is lower because many legacy modules need test updates for v0.3 script integration. The script execution module itself achieves excellent coverage.
# Check coverage with failure on <70%
pytest --cov=src/skillkit --cov-fail-under=70
# Generate detailed HTML report
pytest --cov=src/skillkit --cov-report=html
open htmlcov/index.html# Core functionality only (v0.1)
pytest tests/test_discovery.py tests/test_parser.py tests/test_models.py tests/test_processors.py tests/test_manager.py
# Async functionality (v0.2)
pytest tests/test_async_discovery.py tests/test_async_invocation.py tests/test_langchain_async.py
# Plugin system (v0.3)
pytest tests/test_discovery_plugin.py tests/test_parser_plugin.py tests/test_manager_plugin.py
# Script execution (v0.3 Phase 10)
pytest tests/test_script_detector.py tests/test_script_executor.py
# File references & security (v0.2+)
pytest tests/test_path_resolver.py tests/test_file_references_integration.py
# Integration tests
pytest tests/test_langchain_integration.py
# Edge cases and performance
pytest tests/test_edge_cases.py tests/test_performance.py
# All v0.1 tests
pytest tests/test_discovery.py tests/test_parser.py tests/test_models.py tests/test_processors.py tests/test_manager.py tests/test_langchain_integration.py tests/test_edge_cases.py tests/test_performance.py tests/test_installation.py
# All v0.2 tests
pytest tests/test_async_discovery.py tests/test_async_invocation.py tests/test_langchain_async.py tests/test_path_resolver.py tests/test_file_references_integration.py
# All v0.3 tests
pytest tests/test_discovery_plugin.py tests/test_parser_plugin.py tests/test_manager_plugin.pypytest -vvpytest -spytest -n autopytest -xpytest --lfpytest --durations=10pytest --log-cli-level=DEBUGpytest --pdbpytest tests/test_parser.py::test_parse_valid_basic_skill -vpytest -k "test_parse" -v
pytest -k "invalid" -v- Follow naming convention:
test_<module>_<scenario> - Add docstrings: Explain what the test validates
- Use fixtures: Leverage conftest.py fixtures for setup
- Parametrize when possible: Reduce duplication with @pytest.mark.parametrize
- Test one thing: Each test should validate one specific behavior
- Add markers: Tag tests with appropriate markers (integration, slow, etc.)
def test_parse_valid_skill_with_unicode(fixtures_dir):
"""Validate Unicode/emoji content is handled correctly.
Tests that the parser can handle SKILL.md files containing Unicode
characters and emoji in both frontmatter and content.
"""
parser = SkillParser()
skill_path = fixtures_dir / "valid-unicode" / "SKILL.md"
metadata = parser.parse_skill_file(skill_path)
assert metadata.name is not None
assert metadata.description is not NoneTests are designed to run in automated environments:
# Example GitHub Actions workflow
- name: Run tests
run: |
pytest --cov=src/skillkit --cov-fail-under=70 --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3- Python: 3.10+ (minimum for full async support)
- pytest: 7.0+
- pytest-asyncio: 0.21.0+ (for async tests)
- pytest-cov: 4.0+ (for coverage measurement)
- PyYAML: 6.0+ (core dependency)
- aiofiles: 23.0+ (async file I/O for v0.2+)
- langchain-core: 0.1.0+ (for LangChain integration tests)
- pydantic: 2.0+ (validation for LangChain integration)
Overall Status: ✅ v0.3 Phase 10 Complete (Script execution tests passing)
-
Total test count: 19 test files across core, async, plugin, script execution, and integration tests
- ✅ Core functionality: 33 tests (test_discovery, test_parser, test_models, test_processors, test_manager)
- ✅ Async functionality: ~60 tests (test_async_discovery, test_async_invocation, test_langchain_async)
- ✅ Plugin system: ~40 tests (test_discovery_plugin, test_parser_plugin, test_manager_plugin)
- ✅ Script execution (Phase 10): 33 tests (test_script_detector, test_script_executor) ✅
- ✅ File references & security: ~80 tests (test_path_resolver, test_file_references_integration)
- ✅ LangChain integration: 8 tests (test_langchain_integration)
- ✅ Edge cases: 8 tests (test_edge_cases)
- ✅ Performance: 4 tests (test_performance)
- ✅ Installation validation: 8 tests (test_installation)
-
Test execution time:
- Core tests: <0.15 seconds
- Async tests: <0.50 seconds
- Plugin tests: <0.30 seconds
- Script execution tests: <1.7 seconds
- Full suite: ~10.5 seconds
-
Coverage: ~39% overall, 83.8% for script modules ✅ (script module exceeds 70% target)
-
Assertion count: 600+ assertions validating behavior
-
Test files: 19 test modules + conftest.py
-
Static fixtures: Multiple SKILL.md files, plugin manifests, and script fixtures
-
Dynamic fixtures: 10+ programmatic fixtures
Breakdown by Version:
- v0.1 (MVP):
- Phase 1 (Setup): ✅ Complete
- Phase 2 (Foundational): ✅ Complete
- Phase 3 (Core - US1): ✅ Complete (33/33 passing)
- Phase 4 (LangChain - US2): ✅ Complete (8/8 passing)
- Phase 5 (Edge Cases - US3): ✅ Complete (8/8 passing)
- Phase 6 (Performance - US4): ✅ Complete (4/4 passing)
- Phase 7 (Installation - US5): ✅ Complete (7/8 passing, 1 skipped)
- Phase 8 (Polish): ✅ Complete
- v0.2 (Async + File References):
- Async discovery: ✅ Complete
- Async invocation: ✅ Complete
- Async LangChain: ✅ Complete
- File path resolver: ✅ Complete
- File references integration: ✅ Complete
- v0.3 (Plugins + Script Execution):
- Plugin discovery: ✅ Complete
- Plugin manifest parsing: ✅ Complete
- Plugin manager integration: ✅ Complete
- Phase 10 (Script Execution): ✅ Complete
- Script detection tests: ✅ 16 tests passing (T072-T076)
- Script execution tests: ✅ 17 tests passing (T077-T085)
- Integration tests: ✅ Complete (T086-T087)
- Test fixtures: ✅ Complete (T088-T090)
- Coverage verification: ✅ 83.8% for scripts.py (T091)
# Ensure package installed in development mode
pip install -e ".[dev]"# Verify conftest.py is present
ls tests/conftest.py
# Check fixtures directory structure
ls tests/fixtures/skills/# Some tests require Unix permissions (skip on Windows)
pytest -m "not unix_only"# Install pytest-cov
pip install pytest-cov
# Verify source path is correct
pytest --cov=src/skillkit --cov-report=termWhen adding new features:
- Write tests first (TDD approach)
- Ensure tests pass:
pytest - Verify coverage:
pytest --cov=src/skillkit - Run type checking:
mypy src/skillkit --strict - Format code:
ruff format tests/ - Lint code:
ruff check tests/
- Main documentation: README.md
- Test specifications: specs/001-pytest-test-scripts/
- pytest documentation: https://docs.pytest.org/
- Coverage.py documentation: https://coverage.readthedocs.io/