diff --git a/.mypy.ini b/.mypy.ini index f9d68f19..cf6c2344 100644 --- a/.mypy.ini +++ b/.mypy.ini @@ -1,5 +1,6 @@ [mypy] # Global mypy configuration +mypy_path = test/unit [mypy-nodescraper.base.regexanalyzer] ignore_errors = True diff --git a/docs/PLUGIN_DOC.md b/docs/PLUGIN_DOC.md index 88c06e42..80d4e012 100644 --- a/docs/PLUGIN_DOC.md +++ b/docs/PLUGIN_DOC.md @@ -41,6 +41,8 @@ | OobBmcArchivePlugin | SSH (BMC) shell: tar+gzip archives for each path in collection_args (see PathSpec entries).
Uses sudo on the BMC when collection_args paths require elevated access. | - | **Collection Args:**
- `paths`: list[nodescraper.plugins.ooband.bmc_archive.collector_args.PathSpec] — Named BMC paths to archive with tar czf -. Configure in plugin config under plugins.OobBmcArchivePlugin.collection_ar...
- `sudo`: bool — Default sudo setting for paths that do not specify sudo.
- `timeout`: int — Default per-path tar timeout in seconds.
- `skip_if_missing`: bool — Skip paths that do not exist on the BMC instead of failing collection.
- `ignore_failed_read`: bool — When true, pass GNU tar's --ignore-failed-read when the remote tar supports it. | [BmcArchiveDataModel](#BmcArchiveDataModel-Model) | [BmcArchiveCollector](#Collector-Class-BmcArchiveCollector) | - | | RedfishEndpointPlugin | Redfish GET: explicit paths from collection_args.uris (parallel when max_workers>1).
Optional paged GET following the Members collection OData nextLink field when follow_next_link is true.
Redfish GET tree: when discover_tree is true, walks from api_root using OData resource id links and Members navigation (depth and endpoint caps from collection_args). | For each entry in analysis_args.checks, reads JSON paths in collected responses and compares values to constraints (eq, min/max, anyOf, regex, etc.).
URI key "*" runs checks against every collected response body.
**Analyzer Args:**
- `checks`: dict[str, dict[str, Union[int, float, str, bool, dict[str, Any]]]] — Map: URI or '*' -> { property_path: constraint }. URI keys must match a key in the collected responses (exact match).... | **Collection Args:**
- `uris`: list[str] — Redfish URIs to GET. Ignored when discover_tree is True.
- `discover_tree`: bool — If True, discover endpoints from the BMC Redfish tree (service root and links) instead of using uris.
- `tree_max_depth`: int — When discover_tree is True: max traversal depth (1=service root only, 2=root + collections, 3=+ members).
- `tree_max_endpoints`: int — When discover_tree is True: max endpoints to discover (0=no limit).
- `max_workers`: int — Max concurrent GETs (1=sequential). Use >1 for async endpoint fetches.
- `follow_next_link`: bool — If True, follow Redfish Members collection OData nextLink pagination for each URI and merge all pages into a single r...
- `max_pages`: int — When follow_next_link is True: safety cap on the number of pages to follow per URI (default 200). | [RedfishEndpointDataModel](#RedfishEndpointDataModel-Model) | [RedfishEndpointCollector](#Collector-Class-RedfishEndpointCollector) | [RedfishEndpointAnalyzer](#Data-Analyzer-Class-RedfishEndpointAnalyzer) | | RedfishOemDiagPlugin | Redfish LogService.CollectDiagnosticData for each entry in collection_args.oem_diagnostic_types (collection_args.log_service_path selects the LogService).
Optional binary archives under the plugin log path when log_path is set. | Summarizes success/failure per OEM diagnostic type from collected results.
When analysis_args.require_all_success is true, fails the run if any type failed collection.
**Analyzer Args:**
- `require_all_success`: bool — If True, analysis fails when any OEM type collection failed. | **Collection Args:**
- `log_service_path`: str — Redfish path to the LogService (e.g. DiagLogs).
- `oem_diagnostic_types_allowable`: Optional[list[str]] — Allowable OEM diagnostic types for this architecture/BMC. When set, used for validation and as default for oem_diagno...
- `oem_diagnostic_types`: list[str] — OEM diagnostic types to collect. When empty and oem_diagnostic_types_allowable is set, defaults to that list.
- `task_timeout_s`: int — Max seconds to wait for each BMC task. | [RedfishOemDiagDataModel](#RedfishOemDiagDataModel-Model) | [RedfishOemDiagCollector](#Collector-Class-RedfishOemDiagCollector) | [RedfishOemDiagAnalyzer](#Data-Analyzer-Class-RedfishOemDiagAnalyzer) | +| ServiceabilityPluginMI3XX | - | **Analyzer Args:**
- `hub_python_module`: Optional[str] — Import path for the hub module (class implements hub_analyze_method); hub_options forwards kwargs.
- `hub_display_name`: Optional[str] — Optional label for analyzer status messages.
- `afid_sag_path`: Optional[str] — Path to hub config (e.g. AFID_SAG.json); passed as hub_init_path_kwarg.
- `hub_init_path_kwarg`: str — Hub __init__ keyword that receives afid_sag_path.
- `hub_analyze_method`: str — Hub method called with rf_events first (default get_service_info).
- `skip_hub`: bool — If True, only build afid_events without running the service hub.
- `cper_decode_module`: Optional[str] — Module import path for CPER decoding when events include CPER attachments.
- `cper_decode_method`: str — Callable on cper_decode_module: file-like CPER in, (return_code, decode_dict) out.
- `hub_options`: Optional[dict[str, Any]] — Extra kwargs for hub __init__ and analyze; collected cper_data overrides cper_data key.
- `from_ac_cycle`: int — from_ac_cycle kwarg for the hub analyze call (merged after hub_options).
- `from_date`: Optional[str] — Optional from_date for the hub analyze call (merged after hub_options).
- `designation_serials`: Optional[dict[str, str]] — Optional designation_serials for the hub analyze call (merged after hub_options).
- `suppress_service_actions`: Optional[list[str]] — Optional suppress_service_actions for the hub analyze call (merged after hub_options). | **Collection Args:**
- `uri`: Optional[str] — Optional alias for ``rf_event_log_uri``. When both ``uri`` and ``rf_event_log_uri`` are explicitly set to non-empty v...
- `rf_event_log_uri`: str — Redfish URI for the event log ``Entries`` collection.
- `rf_chassis_devices`: Optional[List[str]] — Chassis designations for Assembly GETs; required with ``rf_assembly_uri_template``.
- `rf_assembly_uri_template`: Optional[str] — Redfish URI template containing ``{device}`` for each chassis Assembly resource.
- `rf_firmware_bundle_uri`: Optional[str] — Redfish URI for firmware bundle inventory when subclasses extract component details.
- `follow_next_link`: bool — If True, follow Members@odata.nextLink up to max_pages; else single GET.
- `max_pages`: int — Safety cap on the number of pages when following event log pagination.
- `top`: Optional[int] — Most recent N entries via $skip after count probe; None collects full window.
- `reference_time`: Optional[str] — Optional ISO-8601 date or date-time used with time_operator (e.g. 2026-05-17 or 2026-05-17T13:01:00).
- `time_operator`: Optional[Literal['>', '>=', '<', '<=', '==']] — Comparison operator applied when reference_time is set. | [ServiceabilityDataModel](#ServiceabilityDataModel-Model) | [MI3XXCollector](#Collector-Class-MI3XXCollector) | [MI3XXAnalyzer](#Data-Analyzer-Class-MI3XXAnalyzer) | +| ServiceabilityPluginBase | - | - | - | [ServiceabilityDataModel](#ServiceabilityDataModel-Model) | [ServiceabilityCollectorBase](#Collector-Class-ServiceabilityCollectorBase) | - | # Collectors @@ -1045,6 +1047,34 @@ RedfishOemDiagDataModel - Redfish LogService.CollectDiagnosticData for each entry in collection_args.oem_diagnostic_types (collection_args.log_service_path selects the LogService). - Optional binary archives under the plugin log path when log_path is set. +## Collector Class MI3XXCollector + +### Description + +MI3XX OOB Redfish serviceability collector. + +**Bases**: ['ServiceabilityCollectorBase'] + +**Link to code**: [mi3xx_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py) + +### Provides Data + +ServiceabilityDataModel + +## Collector Class ServiceabilityCollectorBase + +### Description + +OOB Redfish collection skeleton; subclasses implement filtering, CPER handling, and JSON parsing. + +**Bases**: ['RedfishDataCollector', 'Generic'] + +**Link to code**: [serviceability_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/serviceability_collector.py) + +### Provides Data + +ServiceabilityDataModel + # Data Models ## GenericCollectionDataModel Model @@ -1549,6 +1579,30 @@ Collected Redfish OEM diagnostic log results: OEM type -> result (success, error - **results**: `dict[str, nodescraper.plugins.ooband.redfish_oem_diag.oem_diag_data.OemDiagTypeResult]` +## ServiceabilityDataModel Model + +### Description + +Collected Redfish responses and intermediate serviceability fields. + +**Link to code**: [serviceability_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/serviceability_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **responses**: `dict[str, Any]` +- **rf_events**: `list[Any]` +- **assembly_info**: `Dict[str, DeviceInfo]` +- **cper_raw**: `Dict[str, str]` +- **cper_data**: `Dict[str, Any]` +- **component_details**: `Optional[str]` +- **log_path**: `Optional[str]` +- **bmc_host**: `Optional[str]` +- **afid_events**: `List[AfidEvent]` +- **serviceability**: `Optional[ServiceabilityBlock]` +- **result**: `Optional[ServiceabilityResult]` + # Data Analyzers ## Data Analyzer Class GenericAnalyzer @@ -1978,6 +2032,16 @@ Analyzes Redfish OEM diagnostic log collection results. - Summarizes success/failure per OEM diagnostic type from collected results. - When analysis_args.require_all_success is true, fails the run if any type failed collection. +## Data Analyzer Class MI3XXAnalyzer + +### Description + +Build AFID events from collected data and run the configured service hub. + +**Bases**: ['DataAnalyzer'] + +**Link to code**: [mi3xx_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py) + # Analyzer Args ## Analyzer Args Class GenericAnalyzerArgs @@ -2300,3 +2364,29 @@ Analyzer args for Redfish OEM diagnostic log results. ### Annotations / fields - **require_all_success**: `bool` — If True, analysis fails when any OEM type collection failed. + +## Analyzer Args Class ServiceabilityAnalyzerArgs + +### Description + +Analyzer args for serviceability plugins that run a configurable Python hub. + +**Bases**: ['AnalyzerArgs'] + +**Link to code**: [analyzer_args.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/analyzer_args.py) + +### Annotations / fields + +- **hub_python_module**: `Optional[str]` — Import path for the hub module (class implements hub_analyze_method); hub_options forwards kwargs. +- **hub_display_name**: `Optional[str]` — Optional label for analyzer status messages. +- **afid_sag_path**: `Optional[str]` — Path to hub config (e.g. AFID_SAG.json); passed as hub_init_path_kwarg. +- **hub_init_path_kwarg**: `str` — Hub __init__ keyword that receives afid_sag_path. +- **hub_analyze_method**: `str` — Hub method called with rf_events first (default get_service_info). +- **skip_hub**: `bool` — If True, only build afid_events without running the service hub. +- **cper_decode_module**: `Optional[str]` — Module import path for CPER decoding when events include CPER attachments. +- **cper_decode_method**: `str` — Callable on cper_decode_module: file-like CPER in, (return_code, decode_dict) out. +- **hub_options**: `Optional[dict[str, Any]]` — Extra kwargs for hub __init__ and analyze; collected cper_data overrides cper_data key. +- **from_ac_cycle**: `int` — from_ac_cycle kwarg for the hub analyze call (merged after hub_options). +- **from_date**: `Optional[str]` — Optional from_date for the hub analyze call (merged after hub_options). +- **designation_serials**: `Optional[dict[str, str]]` — Optional designation_serials for the hub analyze call (merged after hub_options). +- **suppress_service_actions**: `Optional[list[str]]` — Optional suppress_service_actions for the hub analyze call (merged after hub_options). diff --git a/docs/generate_plugin_doc_bundle.py b/docs/generate_plugin_doc_bundle.py index 4d873ca5..cd9897b0 100644 --- a/docs/generate_plugin_doc_bundle.py +++ b/docs/generate_plugin_doc_bundle.py @@ -41,7 +41,7 @@ from typing import Any, Iterable, List, Optional, Type LINK_BASE_DEFAULT = "https://github.com/amd/node-scraper/blob/HEAD/" -REL_ROOT_DEFAULT = "nodescraper/plugins/inband" +REL_ROOT_DEFAULT = "nodescraper/plugins" # Import and document every concrete plugin under nodescraper.plugins (inband, ooband, # generic_collection, regex_search, serviceability, …). PACKAGE_PLUGINS_ROOT = "nodescraper.plugins" diff --git a/nodescraper/configbuilder.py b/nodescraper/configbuilder.py index 7823b95a..bc8f1b8a 100644 --- a/nodescraper/configbuilder.py +++ b/nodescraper/configbuilder.py @@ -24,6 +24,7 @@ # ############################################################################### import enum +import inspect import logging from typing import Any, Optional, Type, Union @@ -64,9 +65,17 @@ def gen_config(self, plugin_names: list[str]) -> PluginConfig: @classmethod def _build_plugin_config(cls, plugin_class: Type[PluginInterface]) -> dict: type_map = TypeUtils.get_func_arg_types(plugin_class.run, plugin_class) + run_sig = inspect.signature(plugin_class.run) config = {} for arg, arg_data in type_map.items(): + param = run_sig.parameters.get(arg) + # abstraction level for the ServiceabilityPlugin to allow kwargs for hub call + if param is not None and param.kind in ( + inspect.Parameter.VAR_KEYWORD, + inspect.Parameter.VAR_POSITIONAL, + ): + continue cls._update_config(arg, arg_data, config) return config diff --git a/nodescraper/interfaces/dataanalyzertask.py b/nodescraper/interfaces/dataanalyzertask.py index 0e6b3b06..fd6cc284 100644 --- a/nodescraper/interfaces/dataanalyzertask.py +++ b/nodescraper/interfaces/dataanalyzertask.py @@ -99,7 +99,7 @@ def wrapper( result = analyzer.result result.finalize(analyzer.logger) - analyzer._run_hooks(result) + analyzer._run_hooks(result, data=data) return result diff --git a/nodescraper/interfaces/datacollectortask.py b/nodescraper/interfaces/datacollectortask.py index 3c30a6ea..60826b16 100644 --- a/nodescraper/interfaces/datacollectortask.py +++ b/nodescraper/interfaces/datacollectortask.py @@ -204,7 +204,8 @@ def __init_subclass__(cls, **kwargs) -> None: if not issubclass(cls.DATA_MODEL, DataModel): raise TypeError(f"DATA_MODEL must be a subclass of DataModel in {cls.__name__}") if hasattr(cls, "collect_data"): - cls.collect_data = collect_decorator(cls.collect_data) + if "collect_data" in vars(cls): + cls.collect_data = collect_decorator(cls.collect_data) else: raise TypeError(f"Data collector {cls.__name__} must implement collect_data") diff --git a/nodescraper/interfaces/plugin.py b/nodescraper/interfaces/plugin.py index 06959b54..9e22d346 100644 --- a/nodescraper/interfaces/plugin.py +++ b/nodescraper/interfaces/plugin.py @@ -26,7 +26,7 @@ import abc import inspect import logging -from typing import Callable, Generic, Optional, Type, Union +from typing import Any, Callable, Generic, Optional, Type, Union from nodescraper.constants import DEFAULT_EVENT_REPORTER, DEFAULT_LOGGER from nodescraper.models import PluginResult, SystemInfo @@ -125,7 +125,7 @@ def _update_queue(self, queue_item: tuple) -> None: self.queue_callback(queue_item) @abc.abstractmethod - def run(self, **kwargs) -> PluginResult: + def run(self, **kwargs: Any) -> PluginResult: """Plugin run function Returns: diff --git a/nodescraper/plugins/serviceability/__init__.py b/nodescraper/plugins/serviceability/__init__.py new file mode 100644 index 00000000..c5e9f857 --- /dev/null +++ b/nodescraper/plugins/serviceability/__init__.py @@ -0,0 +1,89 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from .afid_events import build_afid_events_from_data +from .analyzer_args import ServiceabilityAnalyzerArgs +from .mi3xx import ( + MI3XXAnalyzer, + MI3XXCollector, + MI3XXCollectorArgs, + MI3XXDataModel, + MI3XXDeviceInfo, + MI3XXResult, + ServiceabilityPluginMI3XX, + build_mi3xx_reporting_version_fields, +) +from .se_adapter import ( + format_serviceability_solution_lines, + serviceability_block_from_service_result, +) +from .se_models import AfidEvent, ServiceabilityBlock, ServiceabilitySolution +from .se_runner import SeRunError, run_service_hub +from .serviceability_collector import ServiceabilityCollectorBase +from .serviceability_data import ( + DeviceInfo, + ServiceabilityDataModel, + ServiceabilityResult, +) +from .serviceability_plugin_base import ServiceabilityPluginBase +from .time_utils import ( + TimeOperator, + compare_iso_datetime, + is_valid_iso_datetime, + normalize_se_timestamp, + parse_iso_datetime, + satisfies_time_check, +) + +__all__ = [ + "AfidEvent", + "DeviceInfo", + "MI3XXAnalyzer", + "MI3XXCollector", + "MI3XXCollectorArgs", + "MI3XXDataModel", + "MI3XXDeviceInfo", + "MI3XXResult", + "SeRunError", + "ServiceabilityAnalyzerArgs", + "ServiceabilityBlock", + "ServiceabilityCollectorBase", + "ServiceabilityDataModel", + "ServiceabilityPluginBase", + "ServiceabilityPluginMI3XX", + "ServiceabilityResult", + "ServiceabilitySolution", + "TimeOperator", + "build_afid_events_from_data", + "build_mi3xx_reporting_version_fields", + "compare_iso_datetime", + "format_serviceability_solution_lines", + "is_valid_iso_datetime", + "normalize_se_timestamp", + "parse_iso_datetime", + "run_service_hub", + "serviceability_block_from_service_result", + "satisfies_time_check", +] diff --git a/nodescraper/plugins/serviceability/afid_events.py b/nodescraper/plugins/serviceability/afid_events.py new file mode 100644 index 00000000..a84af503 --- /dev/null +++ b/nodescraper/plugins/serviceability/afid_events.py @@ -0,0 +1,188 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, Optional + +from .se_models import AfidEvent +from .serviceability_data import ServiceabilityDataModel +from .time_utils import normalize_se_timestamp + +_EVENT_TIMESTAMP_KEYS = ("Created", "EventTimestamp", "Timestamp") +_AFID_KEYS = ("Afid", "AFID", "afid") + + +def build_afid_events_from_data(data: ServiceabilityDataModel) -> list[AfidEvent]: + """Build SE input events from collected Redfish and CPER fields.""" + events: list[AfidEvent] = [] + seen: set[tuple[int, str, str]] = set() + + for rf_event in data.rf_events: + parsed = _afid_event_from_rf_member(rf_event) + if parsed is None: + continue + key = (parsed.afid, parsed.serviceable_unit, parsed.time) + if key in seen: + continue + seen.add(key) + events.append(parsed) + + for unit, payload in data.cper_data.items(): + parsed = _afid_event_from_cper_slot(str(unit), payload) + if parsed is None: + continue + key = (parsed.afid, parsed.serviceable_unit, parsed.time) + if key in seen: + continue + seen.add(key) + events.append(parsed) + + return events + + +def _afid_event_from_rf_member(member: Any) -> Optional[AfidEvent]: + if not isinstance(member, dict): + return None + afid = _extract_afid(member) + unit = _extract_serviceable_unit(member) + timestamp = _extract_timestamp(member) + if afid is None or unit is None or timestamp is None: + return None + return AfidEvent( + afid=afid, + serviceable_unit=unit, + time=normalize_se_timestamp(timestamp), + ) + + +def _afid_event_from_cper_slot(unit: str, payload: Any) -> Optional[AfidEvent]: + if not isinstance(payload, dict): + return None + afid = _extract_afid(payload) + timestamp = _extract_timestamp(payload) + unit_name = str(payload.get("serviceable_unit") or unit).strip() + if afid is None or not unit_name or timestamp is None: + return None + return AfidEvent( + afid=afid, + serviceable_unit=unit_name, + time=normalize_se_timestamp(timestamp), + ) + + +def _extract_afid(payload: dict[str, Any]) -> Optional[int]: + for key in _AFID_KEYS: + if key in payload and payload[key] is not None: + return int(payload[key]) + oem = payload.get("Oem") + if isinstance(oem, dict): + for vendor_payload in oem.values(): + found = _extract_afid_from_oem_fragment(vendor_payload) + if found is not None: + return found + return None + + +def _extract_afid_from_oem_fragment(vendor_payload: Any) -> Optional[int]: + """Resolve AFID from one ``Oem`` property value (dict or list of dicts, e.g. ``AMDFieldIdentifiers``).""" + if isinstance(vendor_payload, dict): + for key in _AFID_KEYS: + if key in vendor_payload and vendor_payload[key] is not None: + return int(vendor_payload[key]) + elif isinstance(vendor_payload, list): + for item in vendor_payload: + if isinstance(item, dict): + for key in _AFID_KEYS: + if key in item and item[key] is not None: + return int(item[key]) + return None + + +def _origin_dict_to_unit(value: Any) -> Optional[str]: + if not isinstance(value, dict): + return None + odata_id = value.get("@odata.id") or value.get("odata.id") + if odata_id: + return _unit_from_odata_id(str(odata_id)) + return None + + +def _extract_serviceable_unit(payload: dict[str, Any]) -> Optional[str]: + for key in ("serviceable_unit", "ServiceableUnit", "OriginOfCondition", "Device"): + value = payload.get(key) + if value is None: + continue + if isinstance(value, dict): + odata_id = value.get("@odata.id") or value.get("odata.id") + if odata_id: + return _unit_from_odata_id(str(odata_id)) + text = str(value).strip() + if text: + return _unit_from_odata_id(text) if "/" in text else text + + links = payload.get("Links") or payload.get("links") + if isinstance(links, dict): + ooc = ( + links.get("OriginOfCondition") + or links.get("originOfCondition") + or links.get("OriginofCondition") + ) + unit = _origin_dict_to_unit(ooc) + if unit: + return unit + + oem = payload.get("Oem") + if isinstance(oem, dict): + for vendor_payload in oem.values(): + if isinstance(vendor_payload, dict): + unit = vendor_payload.get("serviceable_unit") or vendor_payload.get( + "ServiceableUnit" + ) + if unit is not None and str(unit).strip(): + return str(unit).strip() + elif isinstance(vendor_payload, list): + for item in vendor_payload: + if not isinstance(item, dict): + continue + su = item.get("ServiceableUnits") or item.get("serviceable_units") + if isinstance(su, list) and su: + u = _origin_dict_to_unit(su[0]) + if u: + return u + return None + + +def _extract_timestamp(payload: dict[str, Any]) -> Optional[str]: + for key in _EVENT_TIMESTAMP_KEYS: + value = payload.get(key) + if value is not None and str(value).strip(): + return str(value).strip() + return None + + +def _unit_from_odata_id(odata_id: str) -> str: + segment = odata_id.rstrip("/").split("/")[-1] + return segment or odata_id diff --git a/nodescraper/plugins/serviceability/analyzer_args.py b/nodescraper/plugins/serviceability/analyzer_args.py new file mode 100644 index 00000000..639822cc --- /dev/null +++ b/nodescraper/plugins/serviceability/analyzer_args.py @@ -0,0 +1,150 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, Optional + +from pydantic import Field, field_validator, model_validator + +from nodescraper.models import AnalyzerArgs + + +class ServiceabilityAnalyzerArgs(AnalyzerArgs): + """Analyzer args for serviceability plugins that run a configurable Python hub.""" + + hub_python_module: Optional[str] = Field( + default=None, + description="Import path for the hub module (class implements hub_analyze_method); hub_options forwards kwargs.", + ) + hub_display_name: Optional[str] = Field( + default=None, + description="Optional label for analyzer status messages.", + ) + afid_sag_path: Optional[str] = Field( + default=None, + description="Path to hub config (e.g. AFID_SAG.json); passed as hub_init_path_kwarg.", + ) + hub_init_path_kwarg: str = Field( + default="afid_sag", + description="Hub __init__ keyword that receives afid_sag_path.", + ) + hub_analyze_method: str = Field( + default="get_service_info", + description="Hub method called with rf_events first (default get_service_info).", + ) + skip_hub: bool = Field( + default=False, + description="If True, only build afid_events without running the service hub.", + ) + cper_decode_module: Optional[str] = Field( + default=None, + description="Module import path for CPER decoding when events include CPER attachments.", + ) + cper_decode_method: str = Field( + default="analyze_cper", + description="Callable on cper_decode_module: file-like CPER in, (return_code, decode_dict) out.", + ) + hub_options: Optional[dict[str, Any]] = Field( + default=None, + description="Extra kwargs for hub __init__ and analyze; collected cper_data overrides cper_data key.", + ) + from_ac_cycle: int = Field( + default=-1, + ge=-1, + description="from_ac_cycle kwarg for the hub analyze call (merged after hub_options).", + ) + from_date: Optional[str] = Field( + default=None, + description="Optional from_date for the hub analyze call (merged after hub_options).", + ) + designation_serials: Optional[dict[str, str]] = Field( + default=None, + description="Optional designation_serials for the hub analyze call (merged after hub_options).", + ) + suppress_service_actions: Optional[list[str]] = Field( + default=None, + description="Optional suppress_service_actions for the hub analyze call (merged after hub_options).", + ) + + def resolved_hub_options(self) -> dict[str, Any]: + """Merge hub_options with from_ac_cycle, from_date, designation_serials, and suppress_service_actions.""" + merged = dict(self.hub_options or {}) + merged["from_ac_cycle"] = self.from_ac_cycle + if self.from_date is not None: + merged["from_date"] = self.from_date + if self.designation_serials is not None: + merged["designation_serials"] = self.designation_serials + if self.suppress_service_actions is not None: + merged["suppress_service_actions"] = self.suppress_service_actions + return merged + + @field_validator("hub_analyze_method", "hub_init_path_kwarg") + @classmethod + def _strip_non_empty_hub_hooks(cls, value: str) -> str: + text = str(value).strip() + if not text: + raise ValueError("must not be empty") + return text + + @field_validator("hub_options", mode="before") + @classmethod + def _none_empty_hub_options(cls, value: object) -> Optional[dict[str, Any]]: + if value is None: + return None + if isinstance(value, dict) and not value: + return None + return value # type: ignore[return-value] + + @field_validator("from_date", mode="before") + @classmethod + def _strip_from_date(cls, value: object) -> Optional[str]: + if value is None: + return None + text = str(value).strip() + return text or None + + @field_validator( + "afid_sag_path", + "hub_python_module", + "hub_display_name", + "cper_decode_module", + ) + @classmethod + def _strip_optional_strings(cls, value: Optional[str]) -> Optional[str]: + if value is None: + return None + text = str(value).strip() + return text or None + + @model_validator(mode="after") + def _require_hub_config_when_running(self) -> ServiceabilityAnalyzerArgs: + if self.skip_hub: + return self + if not self.afid_sag_path: + raise ValueError("afid_sag_path is required when running the service hub.") + if not self.hub_python_module: + raise ValueError("hub_python_module is required when running the service hub.") + return self diff --git a/nodescraper/plugins/serviceability/cper_decode.py b/nodescraper/plugins/serviceability/cper_decode.py new file mode 100644 index 00000000..d4e9b20e --- /dev/null +++ b/nodescraper/plugins/serviceability/cper_decode.py @@ -0,0 +1,145 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Decode collected CPER attachments via a configured Python decode module.""" +from __future__ import annotations + +import base64 +import binascii +import importlib +import io +import logging +from typing import Any, Callable, Optional + + +class CperDecodeError(RuntimeError): + """Raised when the configured CPER decode module cannot be loaded or decoding fails.""" + + +def _load_decode_callable( + cper_decode_module: str, + cper_decode_method: str, +) -> Callable[[io.BytesIO], tuple[int, Any]]: + """Import a decode callable from analysis_args (module + method name).""" + try: + module = importlib.import_module(cper_decode_module) + except ImportError as exc: + raise CperDecodeError( + f"Cannot import cper_decode_module {cper_decode_module!r}: {exc}" + ) from exc + + decode_fn = getattr(module, cper_decode_method, None) + if decode_fn is None: + raise CperDecodeError( + f"Module {cper_decode_module!r} has no callable {cper_decode_method!r}" + ) + if not callable(decode_fn): + raise CperDecodeError(f"{cper_decode_module!r}.{cper_decode_method!r} is not callable") + return decode_fn + + +def count_ras_err_entries(decode_payload: Any) -> int: + """Count RasErr* keys in a decoded CPER triage_result dict.""" + if not isinstance(decode_payload, dict): + return 0 + triage_result = decode_payload.get("triage_result", {}) + if not isinstance(triage_result, dict): + return 0 + return sum(1 for key in triage_result if str(key).startswith("RasErr")) + + +def decode_cper_raw_attachments( + cper_raw: dict[str, str], + *, + cper_decode_module: str, + cper_decode_method: str = "analyze_cper", + logger: Optional[logging.Logger] = None, +) -> dict[str, Any]: + """Decode base64 CPER blobs keyed by Redfish event Id. + + The decode callable must accept a binary file-like object and return + ``(return_code, decode_dict)``. Results are passed to the service hub as + ``cper_data``; the hub does not perform CPER decoding itself. + + Returns ``{event_id: {"return_code": int, "decode": dict}}``. + """ + if not cper_raw: + return {} + + decode_fn = _load_decode_callable(cper_decode_module, cper_decode_method) + + decoded: dict[str, Any] = {} + errors: list[str] = [] + + for event_id, payload_b64 in cper_raw.items(): + try: + raw = base64.b64decode(payload_b64, validate=True) + except (binascii.Error, ValueError) as exc: + errors.append(f"event {event_id}: invalid base64 ({exc})") + continue + + try: + return_code, decode_payload = decode_fn(io.BytesIO(raw)) + except Exception as exc: # noqa: BLE001 + msg = f"event {event_id}: {exc}" + errors.append(msg) + if logger is not None: + logger.warning("CPER decode failed for Redfish event %s: %s", event_id, exc) + continue + + if return_code != 0: + errors.append(f"event {event_id}: decode return code {return_code}") + + decoded[str(event_id)] = { + "return_code": return_code, + "decode": decode_payload, + } + if logger is not None: + ras_count = count_ras_err_entries(decode_payload) + if return_code == 0: + logger.info( + "CPER decoded for Redfish event %s (return_code=0, %d RasErr entr%s)", + event_id, + ras_count, + "y" if ras_count == 1 else "ies", + ) + else: + logger.warning( + "CPER decoded for Redfish event %s with non-zero return_code=%s " + "(%d RasErr entr%s)", + event_id, + return_code, + ras_count, + "y" if ras_count == 1 else "ies", + ) + + if errors and not decoded: + raise CperDecodeError("; ".join(errors)) + + if logger is not None and errors: + for msg in errors: + logger.warning("CPER decode issue: %s", msg) + + return decoded diff --git a/nodescraper/plugins/serviceability/mi3xx/__init__.py b/nodescraper/plugins/serviceability/mi3xx/__init__.py new file mode 100644 index 00000000..b97928b3 --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/__init__.py @@ -0,0 +1,46 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from .mi3xx_analyzer import MI3XXAnalyzer +from .mi3xx_collector import MI3XXCollector +from .mi3xx_collector_args import MI3XXCollectorArgs +from .mi3xx_data import ( + MI3XXDataModel, + MI3XXDeviceInfo, + MI3XXResult, + build_mi3xx_reporting_version_fields, +) +from .serviceability_plugin_mi3xx import ServiceabilityPluginMI3XX + +__all__ = [ + "MI3XXAnalyzer", + "MI3XXCollector", + "MI3XXCollectorArgs", + "MI3XXDataModel", + "MI3XXDeviceInfo", + "MI3XXResult", + "ServiceabilityPluginMI3XX", + "build_mi3xx_reporting_version_fields", +] diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py new file mode 100644 index 00000000..6150398e --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py @@ -0,0 +1,213 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, ClassVar, Optional + +from pydantic import BaseModel, Field + +from nodescraper.enums import ExecutionStatus +from nodescraper.interfaces import DataAnalyzer +from nodescraper.models import TaskResult +from nodescraper.plugins.serviceability.afid_events import build_afid_events_from_data +from nodescraper.plugins.serviceability.analyzer_args import ServiceabilityAnalyzerArgs +from nodescraper.plugins.serviceability.cper_decode import ( + CperDecodeError, + decode_cper_raw_attachments, +) +from nodescraper.plugins.serviceability.se_adapter import ( + format_serviceability_solution_lines, +) +from nodescraper.plugins.serviceability.se_models import ServiceabilityBlock +from nodescraper.plugins.serviceability.se_runner import SeRunError, run_service_hub +from nodescraper.plugins.serviceability.serviceability_data import ( + ServiceabilityDataModel, +) + +from .mi3xx_cper_utils import RF_CPER_AFID_MIN, should_skip_cper_fetch_or_decode + + +class AfidSagMetadataArtifact(BaseModel): + """Hub AFID_SAG metadata snapshot; written to ``afid_sag_metadata.json``.""" + + ARTIFACT_LOG_BASENAME: ClassVar[str] = "afid_sag_metadata" + + metadata: dict[str, Any] = Field(default_factory=dict) + + +class MI3XXAnalyzer(DataAnalyzer[ServiceabilityDataModel, ServiceabilityAnalyzerArgs]): + """Build AFID events from collected data and run the configured service hub.""" + + DATA_MODEL = ServiceabilityDataModel + + def analyze_data( + self, + data: ServiceabilityDataModel, + args: Optional[ServiceabilityAnalyzerArgs] = None, + ) -> TaskResult: + if args is None: + self.result.status = ExecutionStatus.NOT_RAN + self.result.message = "ServiceabilityAnalyzerArgs are required" + return self.result + + events = data.afid_events or build_afid_events_from_data(data) + data.afid_events = events + + if args.skip_hub: + data.serviceability = ServiceabilityBlock(afid_events=events) + self.result.status = ExecutionStatus.OK + self.result.message = f"Built {len(events)} AFID event(s); hub skipped" + self._log_serviceability_solutions(data.serviceability) + return self.result + + parent = self.parent or self.__class__.__name__ + cper_data = data.cper_data or {} + cper_raw_to_decode = self._cper_raw_needing_decode(data) + skipped_cper = len(data.cper_raw or {}) - len(cper_raw_to_decode) + if skipped_cper: + self.logger.info( + "(%s) Skipping CPER decode for %d CPER attachment(s); Redfish log " + "already has usable ACA fields (AFID<%s or no serial on decode)", + parent, + skipped_cper, + RF_CPER_AFID_MIN, + ) + if cper_raw_to_decode and not cper_data: + if not args.cper_decode_module: + self.logger.warning( + "(%s) %d CPER attachment(s) collected but cper_decode_module is " + "not set in analysis_args; skipping CPER decode", + parent, + len(cper_raw_to_decode), + ) + else: + self.logger.info( + "(%s) Decoding %d CPER attachment(s) via %s.%s", + parent, + len(cper_raw_to_decode), + args.cper_decode_module, + args.cper_decode_method, + ) + try: + cper_data = decode_cper_raw_attachments( + cper_raw_to_decode, + cper_decode_module=args.cper_decode_module, + cper_decode_method=args.cper_decode_method, + logger=self.logger, + ) + data.cper_data = cper_data + self.logger.info( + "(%s) CPER decode finished: %d of %d attachment(s) decoded", + parent, + len(cper_data), + len(cper_raw_to_decode), + ) + except CperDecodeError as exc: + self.logger.warning( + "(%s) %s; continuing without decoded CPER", + parent, + exc, + ) + elif cper_data: + self.logger.info( + "(%s) Using %d pre-decoded CPER record(s) from collection", + parent, + len(cper_data), + ) + + try: + block = run_service_hub( + hub_python_module=args.hub_python_module, # type: ignore[arg-type] + hub_display_name=args.hub_display_name, + afid_events=events, + afid_sag_path=args.afid_sag_path, # type: ignore[arg-type] + rf_events=data.rf_events, + cper_data=cper_data or None, + hub_options=args.resolved_hub_options(), + hub_analyze_method=args.hub_analyze_method, + hub_init_path_kwarg=args.hub_init_path_kwarg, + ) + except (SeRunError, ValueError) as exc: + self.result.status = ExecutionStatus.ERROR + self.result.message = str(exc) + return self.result + + data.serviceability = block + self._append_afid_sag_metadata_artifact(block) + self._log_serviceability_solutions(block) + hub_label = args.hub_display_name or args.hub_python_module + self.result.status = ExecutionStatus.OK + cper_summary = "" + if cper_data: + cper_summary = f", {len(cper_data)} decoded CPER(s)" + elif cper_raw_to_decode: + cper_summary = f", {len(cper_raw_to_decode)} CPER attachment(s) not decoded" + elif data.cper_raw: + cper_summary = f", {len(data.cper_raw)} CPER attachment(s) omitted (ACA on log entry)" + ver_bits: list[str] = [] + if block.hub_version: + ver_bits.append(f"hub {block.hub_version}") + if block.afid_sag_file_version: + ver_bits.append(f"AFID_SAG {block.afid_sag_file_version}") + ver_suffix = f" [{'; '.join(ver_bits)}]" if ver_bits else "" + self.result.message = ( + f"{hub_label}: {len(block.solution)} solution(s) " + f"from {len(data.rf_events)} Redfish event(s){cper_summary}{ver_suffix}" + ) + return self.result + + @staticmethod + def _cper_raw_needing_decode(data: ServiceabilityDataModel) -> dict[str, str]: + """Subset of ``cper_raw`` that still needs configured CPER decode (not already on the log).""" + raw = data.cper_raw or {} + if not raw: + return {} + by_id: dict[str, dict[str, Any]] = {} + for member in data.rf_events: + if not isinstance(member, dict): + continue + eid = member.get("Id") + if eid is not None: + by_id[str(eid)] = member + out: dict[str, str] = {} + for event_id, blob in raw.items(): + ev = by_id.get(str(event_id)) + if ev is not None and should_skip_cper_fetch_or_decode(ev): + continue + out[str(event_id)] = blob + return out + + def _append_afid_sag_metadata_artifact(self, block: ServiceabilityBlock) -> None: + if block.afid_sag_metadata is None: + return + self.result.artifacts.append( + AfidSagMetadataArtifact(metadata=dict(block.afid_sag_metadata)) + ) + + def _log_serviceability_solutions(self, block: ServiceabilityBlock) -> None: + parent = self.parent or self.__class__.__name__ + for line in format_serviceability_solution_lines(block): + self.logger.info("(%s) %s", parent, line) diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py new file mode 100644 index 00000000..8921796c --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py @@ -0,0 +1,170 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import base64 +from typing import Any, Optional + +from nodescraper.plugins.serviceability.serviceability_collector import ( + ServiceabilityCollectorBase, +) +from nodescraper.plugins.serviceability.serviceability_data import DeviceInfo +from nodescraper.plugins.serviceability.time_utils import satisfies_time_check + +from .mi3xx_collector_args import MI3XXCollectorArgs +from .mi3xx_cper_utils import RF_CPER_AFID_MIN, should_skip_cper_fetch_or_decode + +_EVENT_TIMESTAMP_KEYS = ("Created", "EventTimestamp", "Timestamp") + + +class MI3XXCollector(ServiceabilityCollectorBase[MI3XXCollectorArgs]): + """MI3XX OOB Redfish serviceability collector.""" + + def satisfies_reference_time( + self, + candidate: str, + args: MI3XXCollectorArgs, + ) -> bool: + """Test a timestamp against optional reference-time filter settings.""" + if args.reference_time is None or args.time_operator is None: + return True + return satisfies_time_check(candidate, args.reference_time, args.time_operator) + + def filter_event_members( + self, + members: list[Any], + args: MI3XXCollectorArgs, + ) -> list[Any]: + filtered: list[Any] = [] + for member in members: + if not isinstance(member, dict): + filtered.append(member) + continue + timestamp = self._event_timestamp(member) + if timestamp is None or self.satisfies_reference_time(timestamp, args): + filtered.append(member) + return filtered + + def is_cper_event(self, event: dict) -> bool: + if "CPER" in event: + return True + if str(event.get("DiagnosticDataType", "")).upper() == "CPER": + return True + if event.get("AdditionalDataURI"): + return True + message_id = str(event.get("MessageId", "")).lower() + message = str(event.get("Message", "")).lower() + return "cper" in message_id or "cper" in message or "diagnostic" in message_id + + def collect_cper_attachments(self, rf_events: list[Any]) -> dict[str, str]: + """Fetch CPER binaries from BMC; decoding runs in the analyzer.""" + parent = self.parent or self.__class__.__name__ + attachments: dict[str, str] = {} + for event in rf_events: + if not isinstance(event, dict) or not self.is_cper_event(event): + continue + uri = event.get("AdditionalDataURI") + event_id = event.get("Id") + if not uri or not event_id: + continue + + if should_skip_cper_fetch_or_decode(event): + self.logger.info( + "(%s) Skipping CPER attachment fetch for Redfish event %s " + "(ACA decode already on log entry; AFID<%s check or no serial)", + parent, + event_id, + RF_CPER_AFID_MIN, + ) + continue + + try: + resp = self.connection.get_response(uri) + except Exception as exc: # noqa: BLE001 + self.logger.warning( + "(%s) Failed to fetch CPER attachment for event %s: %s", + parent, + event_id, + exc, + ) + continue + if not resp.ok: + self.logger.warning( + "(%s) Failed to fetch CPER attachment for event %s: HTTP %s", + parent, + event_id, + resp.status_code, + ) + continue + + size_bytes = len(resp.content) + attachments[str(event_id)] = base64.b64encode(resp.content).decode("ascii") + self.logger.info( + "(%s) Fetched CPER attachment for Redfish event %s (%d bytes)", + parent, + event_id, + size_bytes, + ) + + if attachments: + self.logger.info( + "(%s) Collected %d CPER attachment(s) for analyzer decode", + parent, + len(attachments), + ) + return attachments + + def parse_assembly_entry( + self, + designation: str, + assembly_member_entry: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> DeviceInfo: + return DeviceInfo( + name=assembly_member_entry.get("Name") or designation, + part_number=assembly_member_entry.get("PartNumber"), + production_date=assembly_member_entry.get("ProductionDate"), + serial_number=assembly_member_entry.get("SerialNumber"), + version=assembly_member_entry.get("Version"), + ) + + def extract_component_details( + self, + firmware_inventory_payload: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> Optional[str]: + details = firmware_inventory_payload.get("Details") + if details is not None: + return str(details) + return None + + @staticmethod + def _event_timestamp(event: dict[str, Any]) -> Optional[str]: + for key in _EVENT_TIMESTAMP_KEYS: + value = event.get(key) + if value is not None and str(value).strip(): + return str(value).strip() + return None diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector_args.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector_args.py new file mode 100644 index 00000000..8d35cd2e --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector_args.py @@ -0,0 +1,172 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import List, Optional + +from pydantic import Field, field_validator, model_validator + +from nodescraper.models import CollectorArgs +from nodescraper.plugins.serviceability.time_utils import ( + TimeOperator, + is_valid_iso_datetime, +) + + +class MI3XXCollectorArgs(CollectorArgs): + """MI3XX OOB Redfish serviceability collector arguments.""" + + uri: Optional[str] = Field( + default=None, + description=( + "Optional alias for ``rf_event_log_uri``. When both ``uri`` and ``rf_event_log_uri`` " + "are explicitly set to non-empty values, ``uri`` wins." + ), + ) + rf_event_log_uri: str = Field( + default="/redfish/v1/Systems/UBB/LogServices/EventLog/Entries", + description="Redfish URI for the event log ``Entries`` collection.", + ) + rf_chassis_devices: Optional[List[str]] = Field( + default=None, + description="Chassis designations for Assembly GETs; required with ``rf_assembly_uri_template``.", + ) + rf_assembly_uri_template: Optional[str] = Field( + default=None, + description="Redfish URI template containing ``{device}`` for each chassis Assembly resource.", + ) + rf_firmware_bundle_uri: Optional[str] = Field( + default=None, + description="Redfish URI for firmware bundle inventory when subclasses extract component details.", + ) + follow_next_link: bool = Field( + default=True, + description="If True, follow Members@odata.nextLink up to max_pages; else single GET.", + ) + max_pages: int = Field( + default=200, + ge=1, + le=10_000, + description="Safety cap on the number of pages when following event log pagination.", + ) + top: Optional[int] = Field( + default=None, + ge=1, + description="Most recent N entries via $skip after count probe; None collects full window.", + ) + reference_time: Optional[str] = Field( + default=None, + description=( + "Optional ISO-8601 date or date-time used with time_operator " + "(e.g. 2026-05-17 or 2026-05-17T13:01:00)." + ), + ) + time_operator: Optional[TimeOperator] = Field( + default=None, + description="Comparison operator applied when reference_time is set.", + ) + + @field_validator("rf_event_log_uri") + @classmethod + def _strip_rf_event_log_uri(cls, value: object) -> str: + text = str(value).strip() + if not text: + raise ValueError("rf_event_log_uri must be a non-empty Redfish URI") + return text + + @field_validator("reference_time") + @classmethod + def _validate_reference_time_iso(cls, value: Optional[str]) -> Optional[str]: + if value is None: + return None + text = str(value).strip() + if not text: + raise ValueError("reference_time must be a non-empty ISO-8601 string") + if not is_valid_iso_datetime(text): + raise ValueError(f"reference_time is not ISO-8601 compliant: {value!r}") + return text + + @model_validator(mode="after") + def _require_event_log_uri(self) -> MI3XXCollectorArgs: + if not self.resolved_event_log_uri(): + raise ValueError( + "Provide a non-empty rf_event_log_uri or uri for the event log collection." + ) + return self + + @model_validator(mode="after") + def _assembly_consistency(self) -> MI3XXCollectorArgs: + has_tpl = bool( + self.rf_assembly_uri_template and "{device}" in self.rf_assembly_uri_template + ) + has_dev = bool(self.rf_chassis_devices) + if has_tpl != has_dev: + raise ValueError( + "Provide both rf_assembly_uri_template (with '{device}') and rf_chassis_devices, " + "or omit both to skip assembly collection." + ) + return self + + @model_validator(mode="after") + def _reference_time_requires_operator(self) -> MI3XXCollectorArgs: + has_ref = self.reference_time is not None + has_op = self.time_operator is not None + if has_ref != has_op: + raise ValueError("Provide both reference_time and time_operator, or omit both.") + return self + + @classmethod + def default_event_log_uri(cls) -> str: + """Return the built-in default for ``rf_event_log_uri`` (reads the field default; no duplicate constant).""" + raw = cls.model_fields["rf_event_log_uri"].default + if not isinstance(raw, str): + raise TypeError("rf_event_log_uri field default must be a str") + return raw + + def resolved_event_log_uri(self) -> str: + """Resolve the event log ``Entries`` URI from ``uri`` and ``rf_event_log_uri``.""" + uri_set = "uri" in self.model_fields_set + rf_set = "rf_event_log_uri" in self.model_fields_set + + def _strip(value: Optional[str]) -> str: + if value is None: + return "" + return str(value).strip() + + uri_s = _strip(self.uri) + rf_s = _strip(self.rf_event_log_uri) + + if uri_set and rf_set and uri_s and rf_s: + return uri_s + if rf_set: + return rf_s + if uri_set and uri_s: + return uri_s + if uri_set and not uri_s and not rf_set: + return rf_s + if not uri_set and not rf_set: + return rf_s + return "" diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_cper_utils.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_cper_utils.py new file mode 100644 index 00000000..fe9661dc --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_cper_utils.py @@ -0,0 +1,117 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any + +# Redfish CPER (RF) style AFIDs start at this value; lower values are in-band / +# OEM-field AFIDs already reflected on the log entry. +RF_CPER_AFID_MIN = 10000 + +_SERIAL_KEYS = ("SerialNumber", "serial_number", "UbbSerial", "ubb_serial") + + +def event_afids_from_oem(event: dict[str, Any]) -> list[int]: + """AFIDs from ``Oem.AMDFieldIdentifiers`` (or similar list-of-dicts).""" + oem = event.get("Oem") + if not isinstance(oem, dict): + return [] + raw = oem.get("AMDFieldIdentifiers") + if not isinstance(raw, list): + return [] + out: list[int] = [] + for item in raw: + if not isinstance(item, dict): + continue + for key in ("AFID", "Afid", "afid"): + if key in item and item[key] is not None: + try: + out.append(int(item[key])) + except (TypeError, ValueError): + pass + break + return out + + +def _err_data_arr_entries(event: dict[str, Any]) -> list[dict[str, Any]]: + oem = event.get("Oem") + if not isinstance(oem, dict): + return [] + arr = oem.get("ErrDataArr") + if not isinstance(arr, list): + return [] + return [e for e in arr if isinstance(e, dict)] + + +def event_has_aca_decode(event: dict[str, Any]) -> bool: + """True when the log entry includes ACA-style ``DecodedData`` under ``ErrDataArr``.""" + for entry in _err_data_arr_entries(event): + decoded = entry.get("DecodedData") + if isinstance(decoded, dict) and decoded: + return True + return False + + +def _nonempty_serial_in_mapping(obj: Any) -> bool: + if not isinstance(obj, dict): + return False + for key in _SERIAL_KEYS: + val = obj.get(key) + if val is not None and str(val).strip(): + return True + return False + + +def event_aca_includes_serial(event: dict[str, Any]) -> bool: + """Serial (or UBB serial) present on any ``ErrDataArr`` row (typically ``MetaData``).""" + for entry in _err_data_arr_entries(event): + meta = entry.get("MetaData") + if _nonempty_serial_in_mapping(meta): + return True + decoded = entry.get("DecodedData") + if _nonempty_serial_in_mapping(decoded): + return True + return False + + +def should_skip_cper_fetch_or_decode(event: dict[str, Any]) -> bool: + """Whether to omit CPER binary fetch and configured CPER decode for this Redfish member. + + Skip when: + + * Every OEM-listed AFID is below ``RF_CPER_AFID_MIN`` (non-RF CPER range), + ACA ``DecodedData`` is present, and a serial is present on the entry; or + * ACA ``DecodedData`` is present but no serial — the CPER blob does not add + actionable identity beyond what is already missing from the log. + """ + if not event_has_aca_decode(event): + return False + if not event_aca_includes_serial(event): + return True + afids = event_afids_from_oem(event) + if not afids: + return False + return all(afid < RF_CPER_AFID_MIN for afid in afids) diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_data.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_data.py new file mode 100644 index 00000000..17a60eaa --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_data.py @@ -0,0 +1,186 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import json +import os +from typing import Any, Dict, List, Optional + +from pydantic import BaseModel, Field + +from nodescraper.models import DataModel + + +class MI3XXDeviceInfo(BaseModel): + """Device identity with separate board and product fields.""" + + board_product_name: Optional[str] = Field( + default=None, + description="Board product name (IPMI board information area).", + ) + board_part_number: Optional[str] = Field( + default=None, + description="Board part number.", + ) + board_serial_number: Optional[str] = Field( + default=None, + description="Board serial number.", + ) + board_manufacturing_date: Optional[str] = Field( + default=None, + description=( + "Board manufacturing date as a rendered string " + "(not IPMI minutes-since-1996 encoding)." + ), + ) + product_name: Optional[str] = Field( + default=None, + description="Product name (IPMI product information area).", + ) + product_part_number: Optional[str] = Field( + default=None, + description="Product part or model number.", + ) + product_serial_number: Optional[str] = Field( + default=None, + description="Product serial number.", + ) + product_version: Optional[str] = Field( + default=None, + description="Product version (no board-area equivalent in IPMI FRU).", + ) + oem_extensions: Dict[str, Any] = Field( + default_factory=dict, + description=("Vendor-specific fields: extra board/product data, multirecord, etc."), + ) + + +class MI3XXResult(BaseModel): + """Structured serviceability report output.""" + + node: Optional[str] = None + node_scraper_version: Optional[str] = Field( + default=None, + description="Version of amd-node-scraper that produced this report.", + ) + plugin_name: Optional[str] = Field( + default=None, + description="Name of the serviceability plugin that produced this report.", + ) + plugin_version: Optional[str] = Field( + default=None, + description="Version of the serviceability plugin that produced this report.", + ) + reporter_extensions: Dict[str, str] = Field( + default_factory=dict, + description="Additional tool versions keyed by name.", + ) + service_recommendations: Dict[str, List[dict]] = Field(default_factory=dict) + service_action_definitions: Dict[str, dict] = Field(default_factory=dict) + afid_sag_metadata: Dict[str, Any] = Field(default_factory=dict) + node_info: Dict[str, Any] = Field(default_factory=dict) + extensions: Dict[str, Any] = Field( + default_factory=dict, + description="Additional implementation-specific fields.", + ) + + +def build_mi3xx_reporting_version_fields( + *, + plugin_name: Optional[str] = None, + plugin_version: Optional[str] = None, + node_scraper_version: Optional[str] = None, + **reporter_extensions: str, +) -> Dict[str, Any]: + """Build keyword arguments for result versioning fields. + + Args: + plugin_name: Name of the reporting plugin. + plugin_version: Version of the reporting plugin. + node_scraper_version: Node scraper version; defaults to the installed package version. + reporter_extensions: Additional tool versions as keyword arguments. + + Returns: + Dictionary of versioning fields for a result model. + """ + import nodescraper + + return { + "node_scraper_version": node_scraper_version or nodescraper.__version__, + "plugin_name": plugin_name, + "plugin_version": plugin_version, + "reporter_extensions": dict(reporter_extensions), + } + + +class MI3XXDataModel(DataModel): + """Collected OOB Redfish serviceability data model.""" + + collected_data: Dict[str, Any] = Field( + default_factory=dict, + description="Arbitrary keyed payloads from the collector implementation.", + ) + device_info: Dict[str, MI3XXDeviceInfo] = Field( + default_factory=dict, + description="Optional device identity keyed by implementer-defined labels.", + ) + artifacts: Dict[str, Any] = Field( + default_factory=dict, + description="Filename to JSON-serializable payload for log_model output.", + ) + endpoint: Optional[str] = Field( + default=None, + description="Optional host or service endpoint label (not necessarily a BMC).", + ) + log_path: Optional[str] = None + result: Optional[MI3XXResult] = None + + def log_model(self, log_path: str) -> None: + """Write artifact files and a JSON summary under the log directory. + + Args: + log_path: Directory path for output files. + + Returns: + None. + """ + os.makedirs(log_path, exist_ok=True) + for filename, payload in self.artifacts.items(): + if not filename or not str(filename).strip(): + continue + artifact_path = os.path.join(log_path, str(filename).strip()) + with open(artifact_path, "w", encoding="utf-8") as handle: + json.dump(payload, handle, indent=2) + summary_path = os.path.join(log_path, "MI3XX_data.json") + with open(summary_path, "w", encoding="utf-8") as handle: + json.dump( + self.model_dump( + exclude={"artifacts"}, + mode="json", + ), + handle, + indent=2, + ) diff --git a/nodescraper/plugins/serviceability/mi3xx/serviceability_plugin_mi3xx.py b/nodescraper/plugins/serviceability/mi3xx/serviceability_plugin_mi3xx.py new file mode 100644 index 00000000..d578d949 --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/serviceability_plugin_mi3xx.py @@ -0,0 +1,51 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from nodescraper.plugins.serviceability.analyzer_args import ServiceabilityAnalyzerArgs +from nodescraper.plugins.serviceability.serviceability_data import ( + ServiceabilityDataModel, +) +from nodescraper.plugins.serviceability.serviceability_plugin_base import ( + ServiceabilityPluginBase, +) +from nodescraper.utils import register_log_dir_name + +from .mi3xx_analyzer import MI3XXAnalyzer +from .mi3xx_collector import MI3XXCollector +from .mi3xx_collector_args import MI3XXCollectorArgs + +register_log_dir_name("ServiceabilityPluginMI3XX", "serviceability_plugin_MI3XX") +register_log_dir_name("MI3XXCollector", "MI3XX_collector") +register_log_dir_name("MI3XXAnalyzer", "MI3XX_analyzer") + + +class ServiceabilityPluginMI3XX(ServiceabilityPluginBase): + """MI3XX OOB Redfish serviceability: BMC event log, CPER attachments, and service hub analysis.""" + + DATA_MODEL = ServiceabilityDataModel + COLLECTOR = MI3XXCollector + ANALYZER = MI3XXAnalyzer + COLLECTOR_ARGS = MI3XXCollectorArgs + ANALYZER_ARGS = ServiceabilityAnalyzerArgs diff --git a/nodescraper/plugins/serviceability/se_adapter.py b/nodescraper/plugins/serviceability/se_adapter.py new file mode 100644 index 00000000..bea1d4a0 --- /dev/null +++ b/nodescraper/plugins/serviceability/se_adapter.py @@ -0,0 +1,344 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Map serviceability plugin models to/from Python service hub results.""" +from __future__ import annotations + +import json +from collections import defaultdict +from typing import Any, Dict, List, Optional, Tuple + +from .se_models import AfidEvent, ServiceabilityBlock, ServiceabilitySolution + +# Hub payload keys commonly holding a one-line human summary (not raw OEM metadata). +_SUMMARY_VALUE_KEYS: Tuple[str, ...] = ( + "short_service", + "short_service_info", + "summary", + "message", + "title", + "recommendation", + "solution", + "service_recommendation", + "action", +) +_UNIT_LABEL_KEYS: Tuple[str, ...] = ( + "oem", + "OEM", + "unit", + "serviceable_unit", + "designation", + "chassis", + "device", +) + + +def _hub_version_display(version_info: Any) -> Optional[str]: + """Pick a single hub version string from common hub result version dict layouts.""" + if not isinstance(version_info, dict) or not version_info: + return None + primary = ( + version_info.get("isa_version") + or version_info.get("version") + or version_info.get("engine_version") + or version_info.get("VERSION") + ) + if primary is None: + return None + text = str(primary).strip() + if not text: + return None + bd = version_info.get("build_date") + if bd and str(bd).strip(): + return f"{text} (build {str(bd).strip()})" + return text + + +def _afid_sag_file_version_display(metadata: Any) -> Optional[str]: + """Build a short AFID_SAG file identity string from hub ``afid_sag_metadata``.""" + if not isinstance(metadata, dict) or not metadata: + return None + pid = metadata.get("sag_pid") or metadata.get("pid") + rev = metadata.get("sag_revision") or metadata.get("revision") + extra = ( + metadata.get("sag_version") + or metadata.get("file_version") + or metadata.get("schema_version") + ) + parts: list[str] = [] + if pid and str(pid).strip(): + parts.append(f"PID {str(pid).strip()}") + if rev and str(rev).strip(): + parts.append(f"revision {str(rev).strip()}") + if extra and str(extra).strip(): + ex = str(extra).strip() + if ex not in (str(pid or "").strip(), str(rev or "").strip()): + parts.append(f"version {ex}") + if not parts: + return None + return ", ".join(parts) + + +def _human_summary_line_from_hub_value(value: Any) -> Optional[str]: + """Pick a single human-readable line from a hub fragment (string, number, or dict).""" + if value is None: + return None + if isinstance(value, str): + text = value.strip() + return text or None + if isinstance(value, (int, float)) and not isinstance(value, bool): + return str(value).strip() or None + if isinstance(value, dict): + for key in _SUMMARY_VALUE_KEYS: + if key not in value: + continue + got = _human_summary_line_from_hub_value(value[key]) + if got: + return got + for key in ("service_action", "ServiceAction"): + if key not in value: + continue + raw = value[key] + if isinstance(raw, dict): + inner = ( + raw.get("title") + or raw.get("text") + or raw.get("name") + or raw.get("service_action") + ) + if isinstance(inner, str) and inner.strip(): + return inner.strip() + got = _human_summary_line_from_hub_value(raw) + if got: + return got + else: + s = str(raw).strip() + if s: + return s + for alt in ("text", "name", "description", "details"): + if isinstance(value.get(alt), str) and str(value[alt]).strip(): + return str(value[alt]).strip() + return None + text = str(value).strip() + return text or None + + +def _unit_label_from_short_service_item(item: dict[str, Any]) -> str: + for key in _UNIT_LABEL_KEYS: + raw = item.get(key) + if raw is not None and str(raw).strip(): + return str(raw).strip() + return "" + + +def _maybe_unwrap_outer_unit_map(d: dict[str, Any]) -> dict[str, Any]: + """If the hub wraps {wrapper: {unit: {...}}}, return the inner unit map.""" + if len(d) != 1: + return d + _, inner = next(iter(d.items())) + if isinstance(inner, dict) and inner and all(isinstance(v, dict) for v in inner.values()): + return inner + return d + + +def _merged_short_service_lines_from_unit_messages(entries: List[Tuple[str, str]]) -> List[str]: + """Group (unit, message) rows by message; merge units when the message is identical.""" + by_message: dict[str, list[str]] = defaultdict(list) + for unit, msg in entries: + if not msg: + continue + by_message[msg].append(unit or "") + + lines: list[str] = [] + for msg in sorted(by_message.keys(), key=lambda m: (-len(by_message[m]), m.lower())): + units = sorted({u for u in by_message[msg] if u}) + if len(units) <= 1: + u = units[0] if units else "" + lines.append(f"{msg} ({u})" if u else msg) + else: + lines.append(f"{msg} — OEMs/units: {', '.join(units)}") + return lines + + +def _format_short_service_info_for_block(raw: Any) -> Optional[str]: + """Turn hub ``short_service_info`` into multiline log/LLM text (no JSON dump of unit maps).""" + if raw is None: + return None + if isinstance(raw, str): + text = raw.strip() + return text or None + if isinstance(raw, (list, tuple)): + if raw and all(isinstance(x, dict) for x in raw): + entries: list[tuple[str, str]] = [] + for item in raw: + assert isinstance(item, dict) + unit = _unit_label_from_short_service_item(item) + msg = _human_summary_line_from_hub_value( + item + ) or _human_summary_line_from_hub_value(item.get("short_service_info")) + if msg: + entries.append((unit, msg)) + lines = _merged_short_service_lines_from_unit_messages(entries) + out = "\n".join(lines).strip() + return out or None + parts = [str(x).strip() for x in raw if x is not None and str(x).strip()] + return "\n".join(parts) if parts else None + if isinstance(raw, dict): + d = _maybe_unwrap_outer_unit_map(raw) + if d and all(isinstance(v, dict) for v in d.values()): + entries = [] + for unit_key, inner in d.items(): + msg = _human_summary_line_from_hub_value(inner) + if msg: + entries.append((str(unit_key).strip(), msg)) + lines = _merged_short_service_lines_from_unit_messages(entries) + out = "\n".join(lines).strip() + if out: + return out + flat_lines: list[str] = [] + for key in sorted(d.keys(), key=lambda x: str(x).lower()): + val = d[key] + if isinstance(val, dict): + msg = _human_summary_line_from_hub_value(val) + if msg: + flat_lines.append(f"{key}: {msg}") + elif val is not None and str(val).strip(): + flat_lines.append(f"{key}: {str(val).strip()}") + if flat_lines: + return "\n".join(flat_lines) + try: + compact = json.dumps(d, sort_keys=True) + except TypeError: + compact = str(d) + compact = compact.strip() + return compact or None + text = str(raw).strip() + return text or None + + +def format_serviceability_solution_lines(block: ServiceabilityBlock) -> list[str]: + """Human-readable lines for logging or console output.""" + lines: list[str] = [] + if block.short_service_info: + lines.append("short_service_info:") + for part in block.short_service_info.splitlines(): + lines.append(f" {part}" if part else " ") + lines.append("") + if block.solution_reasoning: + lines.append(block.solution_reasoning) + if block.hub_version: + lines.append(f"Hub version: {block.hub_version}") + if block.afid_sag_file_version: + lines.append(f"AFID_SAG file: {block.afid_sag_file_version}") + if not block.solution: + lines.append("No service actions recommended.") + return lines + for index, solution in enumerate(block.solution, start=1): + units = ", ".join(solution.serviceable_unit) + title = (solution.service_action_title or "").strip() + action = f"service action {solution.service_action_num}" + if title: + action = f"{action} ({title})" + lines.append(f"[{index}] AFID {solution.afid}, {action}, units: [{units}]") + return lines + + +def serviceability_block_from_service_result( + afid_events: list[AfidEvent], + result: Any, + *, + hub_label: str = "Service hub", + rf_event_count: int = 0, +) -> ServiceabilityBlock: + """Build a ``ServiceabilityBlock`` from a hub result with ``service_info``.""" + grouped: dict[tuple[int, int], list[str]] = defaultdict(list) + titles: dict[tuple[int, int], str] = {} + service_info = getattr(result, "service_info", None) or {} + + def _action_title(info: dict[str, Any]) -> str: + raw = info.get("title") or info.get("service_action") or info.get("ServiceAction") + if raw is None: + return "" + if isinstance(raw, dict): + return str(raw.get("title") or raw.get("text") or raw.get("name") or "").strip() + return str(raw).strip() + + for designation, afid_map in service_info.items(): + if not isinstance(afid_map, dict): + continue + unit = str(designation).strip() if designation is not None else "" + for afid_raw, info in afid_map.items(): + if not isinstance(info, dict): + continue + san_raw = info.get("service_action_number") + if san_raw is None: + continue + try: + afid = int(afid_raw) + san = int(san_raw) + except (TypeError, ValueError): + continue + key = (afid, san) + if unit and unit not in grouped[key]: + grouped[key].append(unit) + label = _action_title(info) + if label and key not in titles: + titles[key] = label + + solutions = [ + ServiceabilitySolution( + afid=afid, + serviceable_unit=units, + service_action_num=san, + service_action_title=titles.get((afid, san)), + ) + for (afid, san), units in sorted(grouped.items()) + ] + raw_metadata = getattr(result, "afid_sag_metadata", None) + metadata: Dict[str, Any] = raw_metadata if isinstance(raw_metadata, dict) else {} + version_info = ( + getattr(result, "engine_version_info", None) + or getattr(result, "isa_version_info", None) + or getattr(result, "version_info", None) + or {} + ) + hub_version = _hub_version_display(version_info) + afid_sag_file_version = _afid_sag_file_version_display(metadata) + reasoning = ( + f"{hub_label}: {len(solutions)} recommendation(s) from {rf_event_count} Redfish event(s)." + ) + meta_out: Optional[dict[str, Any]] = dict(metadata) if isinstance(raw_metadata, dict) else None + short_service_info = _format_short_service_info_for_block( + getattr(result, "short_service_info", None) + ) + return ServiceabilityBlock( + afid_events=list(afid_events), + solution=solutions, + solution_reasoning=reasoning, + hub_version=hub_version, + afid_sag_file_version=afid_sag_file_version, + afid_sag_metadata=meta_out, + short_service_info=short_service_info, + ) diff --git a/nodescraper/plugins/serviceability/se_models.py b/nodescraper/plugins/serviceability/se_models.py new file mode 100644 index 00000000..6aa855a3 --- /dev/null +++ b/nodescraper/plugins/serviceability/se_models.py @@ -0,0 +1,102 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, List, Optional + +from pydantic import BaseModel, Field, field_validator + + +class AfidEvent(BaseModel): + """One AFID occurrence on a serviceable unit.""" + + afid: int = Field(description="AMD Fault ID.") + serviceable_unit: str = Field( + description="Unit label (e.g. gpu02); standardized per platform.", + ) + time: str = Field( + description="First-occurrence timestamp (SE format, e.g. 2026-05-07 12:50:42.096-07:00).", + ) + + @field_validator("serviceable_unit") + @classmethod + def _strip_serviceable_unit(cls, value: str) -> str: + text = str(value).strip() + if not text: + raise ValueError("serviceable_unit must be non-empty") + return text + + +class ServiceabilitySolution(BaseModel): + """Recommended service action for an AFID.""" + + afid: int + serviceable_unit: List[str] = Field( + description="Affected serviceable units for this AFID and service action.", + ) + service_action_num: int = Field( + description="Service action number from AFID_SAG.json.", + ) + service_action_title: Optional[str] = Field( + default=None, + description=("Short service action label from the hub."), + ) + + +class ServiceabilityBlock(BaseModel): + """ANC-style serviceability section: SE input, output, and optional reasoning.""" + + afid_events: List[AfidEvent] = Field( + default_factory=list, + description="Summarized AFID events from collected data.", + ) + solution: List[ServiceabilitySolution] = Field( + default_factory=list, + description="Hub output: recommended service actions.", + ) + solution_reasoning: Optional[str] = Field( + default=None, + description="Human-readable summary of recommendations (counts and hub label).", + ) + hub_version: Optional[str] = Field( + default=None, + description="Service hub package/build version string when the hub returned it.", + ) + afid_sag_file_version: Optional[str] = Field( + default=None, + description="AFID_SAG.json identity/revision string when the hub returned metadata.", + ) + afid_sag_metadata: Optional[dict[str, Any]] = Field( + default=None, + description="Hub-reported AFID_SAG metadata dict when the hub exposes afid_sag_metadata.", + ) + short_service_info: Optional[str] = Field( + default=None, + description=( + "Brief hub summary derived from short_service_info (human-readable lines; " + "per-unit dict payloads are collapsed, identical messages merged with unit lists)." + ), + ) diff --git a/nodescraper/plugins/serviceability/se_runner.py b/nodescraper/plugins/serviceability/se_runner.py new file mode 100644 index 00000000..6ff8b60e --- /dev/null +++ b/nodescraper/plugins/serviceability/se_runner.py @@ -0,0 +1,194 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Invoke a configured Python service hub against collected Redfish events.""" +from __future__ import annotations + +import importlib +import inspect +from pathlib import Path +from typing import Any, Callable, Optional, Type + +from .se_adapter import serviceability_block_from_service_result +from .se_models import AfidEvent, ServiceabilityBlock + + +def _signature_accepts_var_keyword(sig: inspect.Signature) -> bool: + return any(p.kind == inspect.Parameter.VAR_KEYWORD for p in sig.parameters.values()) + + +def _instantiate_hub( + hub_cls: Type[Any], + config_path: str, + init_path_kwarg: str, + hub_options: Optional[dict[str, Any]], +) -> Any: + """Construct the hub with ``config_path`` under ``init_path_kwarg``, plus matching options.""" + init_sig = inspect.signature(hub_cls.__init__) + kwargs: dict[str, Any] = {init_path_kwarg: config_path} + if not hub_options: + return hub_cls(**kwargs) + if _signature_accepts_var_keyword(init_sig): + merged = dict(hub_options) + merged[init_path_kwarg] = config_path + return hub_cls(**merged) + for key, val in hub_options.items(): + if key in init_sig.parameters: + kwargs[key] = val + kwargs[init_path_kwarg] = config_path + return hub_cls(**kwargs) + + +def _call_hub_analyze( + analyze: Callable[..., Any], + rf_events: list[Any], + cper_data: Optional[dict[str, Any]], + hub_options: Optional[dict[str, Any]], +) -> Any: + """Invoke the hub analyze callable with ``cper_data`` and per-parameter ``hub_options``.""" + sig = inspect.signature(analyze) + params = sig.parameters + eo = dict(hub_options or {}) + + if _signature_accepts_var_keyword(sig): + if "cper_data" in params: + eo["cper_data"] = dict(cper_data) if cper_data else None + return analyze(list(rf_events), **eo) + + kw = {k: v for k, v in eo.items() if k in params} + if "cper_data" in params: + kw["cper_data"] = dict(cper_data) if cper_data else None + return analyze(list(rf_events), **kw) + + +class SeRunError(RuntimeError): + """Raised when the service hub fails or returns invalid output.""" + + +def run_service_hub( + *, + hub_python_module: str, + hub_display_name: Optional[str] = None, + afid_events: list[AfidEvent], + afid_sag_path: str, + rf_events: list[Any], + cper_data: Optional[dict[str, Any]] = None, + hub_options: Optional[dict[str, Any]] = None, + hub_analyze_method: str = "get_service_info", + hub_init_path_kwarg: str = "afid_sag", +) -> ServiceabilityBlock: + """Run the configured Python service hub and return a :class:`ServiceabilityBlock`. + + The runner imports ``hub_python_module``, picks the unique class that implements + ``hub_analyze_method``, constructs it with the config file path passed as + ``hub_init_path_kwarg``, then calls the analyze method with ``rf_events`` and any + ``hub_options`` keys that match the method signature (plus ``cper_data`` when + supported). Result mapping is handled by :func:`serviceability_block_from_service_result`. + """ + sag_path = Path(afid_sag_path) + if not sag_path.is_file(): + raise SeRunError(f"Hub config file not found: {afid_sag_path}") + + if not rf_events: + raise SeRunError( + "Collected Redfish events are required; re-run collection or use skip_hub." + ) + + label = hub_display_name or hub_python_module + try: + mod = importlib.import_module(hub_python_module) + except ImportError as exc: + raise SeRunError(f"Cannot import {hub_python_module}: {exc}") from exc + + hub_cls = _resolve_hub_class(mod, hub_analyze_method) + + try: + instance = _instantiate_hub( + hub_cls, + afid_sag_path, + hub_init_path_kwarg, + hub_options, + ) + analyze = getattr(instance, hub_analyze_method) + result = _call_hub_analyze( + analyze, + rf_events, + cper_data, + hub_options, + ) + except Exception as exc: + raise SeRunError(f"{label} {hub_analyze_method}() failed: {exc}") from exc + + if result is None: + return ServiceabilityBlock( + afid_events=list(afid_events), + solution=[], + solution_reasoning=f"{label}: no service actions after event filtering.", + ) + + return serviceability_block_from_service_result( + afid_events, + result, + hub_label=label, + rf_event_count=len(rf_events), + ) + + +def _is_hub_class(obj: Any, analyze_method: str = "get_service_info") -> bool: + return inspect.isclass(obj) and callable(getattr(obj, analyze_method, None)) + + +def _resolve_hub_class(mod: Any, analyze_method: str = "get_service_info") -> Type[Any]: + """Find the hub class in ``mod`` that implements ``analyze_method``.""" + package = mod.__name__ + candidates: list[Type[Any]] = [] + seen: set[int] = set() + + def add_candidate(obj: Any) -> None: + if not _is_hub_class(obj, analyze_method): + return + key = id(obj) + if key in seen: + return + seen.add(key) + candidates.append(obj) + + for name in getattr(mod, "__all__", []) or []: + add_candidate(getattr(mod, name, None)) + + for _, obj in inspect.getmembers(mod, inspect.isclass): + obj_module = getattr(obj, "__module__", "") + if obj_module == package or obj_module.startswith(f"{package}."): + add_candidate(obj) + + if len(candidates) == 1: + return candidates[0] + if not candidates: + raise SeRunError( + f"No class with {analyze_method}() found in {package}; " + "check hub_python_module and hub_analyze_method in analysis_args." + ) + names = ", ".join(cls.__name__ for cls in candidates) + raise SeRunError(f"Multiple classes with {analyze_method}() in {package}: {names}.") diff --git a/nodescraper/plugins/serviceability/serviceability_collector.py b/nodescraper/plugins/serviceability/serviceability_collector.py new file mode 100644 index 00000000..0ad28643 --- /dev/null +++ b/nodescraper/plugins/serviceability/serviceability_collector.py @@ -0,0 +1,254 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import abc +from typing import Any, ClassVar, Generic, Literal, Optional, Protocol, TypeVar, cast +from urllib.parse import urlparse + +from pydantic import BaseModel, Field + +from nodescraper.base import RedfishDataCollector +from nodescraper.connection.redfish import ( + RF_MEMBERS, + RF_MEMBERS_COUNT, + RedfishGetResult, +) +from nodescraper.enums import ExecutionStatus +from nodescraper.models import CollectorArgs, TaskResult + +from .serviceability_data import DeviceInfo, ServiceabilityDataModel + + +class ServiceabilityUriManifestArtifact(BaseModel): + """Resolved Redfish URIs for this serviceability run (``serviceability_uri_manifest.json``).""" + + ARTIFACT_LOG_BASENAME: ClassVar[str] = "serviceability_uri_manifest" + + artifact_kind: Literal["ServiceabilityUriManifest"] = "ServiceabilityUriManifest" + event_log_uri: str + assembly_get_uris: list[str] = Field(default_factory=list) + firmware_inventory_uri: Optional[str] = None + + +class FirmwareInventoryArtifact(BaseModel): + """Firmware inventory Redfish GET; written to ``firmware_inventory.json`` with path, success, data, error, and status_code fields (same layout as a Redfish GET artifact row).""" + + ARTIFACT_LOG_BASENAME: ClassVar[str] = "firmware_inventory" + + path: str + success: bool + data: Optional[dict[str, Any]] = None + error: Optional[str] = None + status_code: Optional[int] = None + + @classmethod + def from_redfish_get(cls, res: RedfishGetResult) -> FirmwareInventoryArtifact: + return cls.model_validate(res.model_dump(mode="python")) + + +class _ServiceabilityCollectArg(Protocol): + follow_next_link: bool + max_pages: int + top: Optional[int] + rf_assembly_uri_template: Optional[str] + rf_chassis_devices: Optional[list[str]] + rf_firmware_bundle_uri: Optional[str] + + def resolved_event_log_uri(self) -> str: ... + + +TServiceabilityCollectArg = TypeVar("TServiceabilityCollectArg", bound=_ServiceabilityCollectArg) + + +class ServiceabilityCollectorBase( + RedfishDataCollector[ServiceabilityDataModel, CollectorArgs], + Generic[TServiceabilityCollectArg], +): + """OOB Redfish collection skeleton; subclasses implement filtering, CPER handling, and JSON parsing.""" + + DATA_MODEL = ServiceabilityDataModel + + def __init__(self, **kwargs: Any) -> None: + self._log_path: Optional[str] = kwargs.get("log_path") + super().__init__(**kwargs) + + @abc.abstractmethod + def filter_event_members( + self, + members: list[Any], + args: TServiceabilityCollectArg, + ) -> list[Any]: + """Return the event list to retain for downstream analysis.""" + + @abc.abstractmethod + def is_cper_event(self, event: dict) -> bool: + """Return whether a Redfish event entry should be treated as diagnostic-backed.""" + + @abc.abstractmethod + def collect_cper_attachments(self, rf_events: list[Any]) -> dict[str, str]: + """Fetch CPER binary attachments for qualifying events (base64 by event Id).""" + + @abc.abstractmethod + def parse_assembly_entry( + self, + designation: str, + assembly_member_entry: dict[str, Any], + args: TServiceabilityCollectArg, + ) -> DeviceInfo: + """Map one Assemblies[] member dict into DeviceInfo.""" + + @abc.abstractmethod + def extract_component_details( + self, + firmware_inventory_payload: dict[str, Any], + args: TServiceabilityCollectArg, + ) -> Optional[str]: + """Derive component-details text from a firmware inventory GET payload, or None.""" + + def _fetch_event_log(self, args: TServiceabilityCollectArg, uri: str): + if args.follow_next_link: + return self._run_redfish_get_paged(uri, max_pages=args.max_pages, log_artifact=True) + return self._run_redfish_get(uri, log_artifact=True) + + def collect_data( + self, args: Optional[CollectorArgs] = None + ) -> tuple[TaskResult, Optional[ServiceabilityDataModel]]: + if args is None: + self.result.status = ExecutionStatus.NOT_RAN + self.result.message = "Collector args are required" + return self.result, None + + svc_args = cast(TServiceabilityCollectArg, args) + event_uri = svc_args.resolved_event_log_uri() + self.logger.info( + "Serviceability: event log Redfish URI %s (follow_next_link=%s)", + event_uri, + svc_args.follow_next_link, + ) + if svc_args.top is not None: + res = self._fetch_top(svc_args, svc_args.top, svc_args.max_pages) + else: + res = self._fetch_event_log(svc_args, event_uri) + + if not res.success or res.data is None: + self.result.status = ExecutionStatus.ERROR + self.result.message = f"Redfish GET failed for {event_uri}: {res.error}" + return self.result, None + + members = res.data.get(RF_MEMBERS, []) + responses = {res.path: res.data} + raw_base_url = getattr(self.connection, "base_url", None) + bmc_host = urlparse(raw_base_url).hostname if raw_base_url else None + + try: + filtered_members = self.filter_event_members(members, svc_args) + except ValueError as exc: + self.result.status = ExecutionStatus.ERROR + self.result.message = f"Event filter failed: {exc}" + return self.result, None + + assembly_info: dict[str, DeviceInfo] = {} + assembly_get_uris: list[str] = [] + tpl = svc_args.rf_assembly_uri_template + devices = svc_args.rf_chassis_devices + if tpl and devices: + for device in devices: + uri_asm = tpl.format(device=device) + assembly_get_uris.append(uri_asm) + self.logger.info( + "Serviceability: assembly Redfish GET %s (chassis designation=%s)", + uri_asm, + device, + ) + assembly_res = self._run_redfish_get(uri_asm, log_artifact=True) + if not assembly_res.success or assembly_res.data is None: + continue + responses[assembly_res.path] = assembly_res.data + + assemblies = assembly_res.data.get("Assemblies", []) + if not assemblies: + continue + + entry = assemblies[0] + assembly_info[device] = self.parse_assembly_entry(device, entry, svc_args) + + cper_raw = self.collect_cper_attachments(filtered_members or []) + + component_details, firmware_uri_used = self._fetch_component_details(responses, svc_args) + + data = ServiceabilityDataModel( + responses=responses, + rf_events=filtered_members or [], + assembly_info=assembly_info, + cper_raw=cper_raw, + component_details=component_details, + log_path=self._log_path, + bmc_host=bmc_host, + ) + self.result.artifacts.append( + ServiceabilityUriManifestArtifact( + event_log_uri=event_uri, + assembly_get_uris=assembly_get_uris, + firmware_inventory_uri=firmware_uri_used, + ) + ) + self.result.status = ExecutionStatus.OK + self.result.message = f"Collected {len(members)} event log member(s)" + return self.result, data + + def _fetch_component_details( + self, responses: dict[str, Any], args: TServiceabilityCollectArg + ) -> tuple[Optional[str], Optional[str]]: + """Return ``(component_details, firmware_uri)``; firmware_uri is set when a GET was attempted.""" + fw_uri = args.rf_firmware_bundle_uri + if not fw_uri or not str(fw_uri).strip(): + return None, None + fw_uri = str(fw_uri).strip() + self.logger.info("Serviceability: firmware inventory Redfish GET %s", fw_uri) + fw_res = self._run_redfish_get(fw_uri, log_artifact=False) + self.result.artifacts.append(FirmwareInventoryArtifact.from_redfish_get(fw_res)) + if not fw_res.success or fw_res.data is None: + return None, fw_uri + responses[fw_res.path] = fw_res.data + return self.extract_component_details(fw_res.data, args), fw_uri + + def _fetch_top(self, args: TServiceabilityCollectArg, top: int, max_pages: int): + event_uri = args.resolved_event_log_uri() + probe = self._run_redfish_get(f"{event_uri}?$top=1", log_artifact=True) + if not probe.success or probe.data is None: + return probe + + count = probe.data.get(RF_MEMBERS_COUNT, 0) + + if count <= top: + return self._fetch_event_log(args, event_uri) + + skip = count - top + skip_uri = f"{event_uri}?$skip={skip}" + if args.follow_next_link: + return self._run_redfish_get_paged(skip_uri, max_pages=max_pages, log_artifact=True) + return self._run_redfish_get(skip_uri, log_artifact=True) diff --git a/nodescraper/plugins/serviceability/serviceability_data.py b/nodescraper/plugins/serviceability/serviceability_data.py new file mode 100644 index 00000000..b275c579 --- /dev/null +++ b/nodescraper/plugins/serviceability/serviceability_data.py @@ -0,0 +1,107 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import json +import os +from typing import Any, Dict, List, Optional + +from pydantic import BaseModel, Field + +from nodescraper.models import DataModel + +from .se_models import AfidEvent, ServiceabilityBlock + + +class DeviceInfo(BaseModel): + """Chassis fields from Assembly parsing; extra vendor keys belong in oem_extensions.""" + + name: Optional[str] = None + part_number: Optional[str] = None + production_date: Optional[str] = None + serial_number: Optional[str] = None + version: Optional[str] = None + oem_extensions: Dict[str, Any] = Field( + default_factory=dict, + description="Opaque vendor/product extensions parsed by the concrete collector.", + ) + + +class ServiceabilityResult(BaseModel): + """Structured serviceability output (typically populated by a downstream analyzer).""" + + node: Optional[str] = None + service_recommendations: Dict[str, List[dict]] = {} + service_action_definitions: Dict[str, dict] = {} + afid_sag_metadata: Dict[str, Any] = {} + node_info: Dict[str, Any] = {} + + +class ServiceabilityDataModel(DataModel): + """Collected Redfish responses and intermediate serviceability fields.""" + + responses: dict[str, Any] = {} + rf_events: list[Any] = [] + assembly_info: Dict[str, DeviceInfo] = {} + cper_raw: Dict[str, str] = Field( + default_factory=dict, + description=( + "Base64-encoded CPER attachment bytes keyed by Redfish event Id; " + "populated during collection and decoded in the analyzer." + ), + ) + cper_data: Dict[str, Any] = {} + component_details: Optional[str] = None + log_path: Optional[str] = None + bmc_host: Optional[str] = None + afid_events: List[AfidEvent] = Field( + default_factory=list, + description="Service Hub input; built during analysis when not pre-filled.", + ) + serviceability: Optional[ServiceabilityBlock] = Field( + default=None, + description="ANC-style serviceability block (SE input + output).", + ) + result: Optional[ServiceabilityResult] = None + + def log_model(self, log_path: str) -> None: + """Write collector artifacts and optional serviceability.json under log_path.""" + os.makedirs(log_path, exist_ok=True) + responses_path = os.path.join(log_path, "redfish_responses.json") + with open(responses_path, "w", encoding="utf-8") as f: + json.dump(self.responses, f, indent=2) + if self.cper_data: + cper_path = os.path.join(log_path, "cper_data.json") + with open(cper_path, "w", encoding="utf-8") as f: + json.dump(self.cper_data, f, indent=2) + if self.serviceability is not None: + serviceability_path = os.path.join(log_path, "serviceability.json") + with open(serviceability_path, "w", encoding="utf-8") as f: + json.dump( + self.serviceability.model_dump(mode="json"), + f, + indent=2, + ) diff --git a/nodescraper/plugins/serviceability/serviceability_plugin_base.py b/nodescraper/plugins/serviceability/serviceability_plugin_base.py new file mode 100644 index 00000000..67ff45ca --- /dev/null +++ b/nodescraper/plugins/serviceability/serviceability_plugin_base.py @@ -0,0 +1,46 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from nodescraper.base import OOBandDataPlugin +from nodescraper.models import CollectorArgs + +from .analyzer_args import ServiceabilityAnalyzerArgs +from .serviceability_collector import ServiceabilityCollectorBase +from .serviceability_data import ServiceabilityDataModel + + +class ServiceabilityPluginBase( + OOBandDataPlugin[ + ServiceabilityDataModel, + CollectorArgs, + ServiceabilityAnalyzerArgs, + ], +): + """OOB Redfish plugin stub; subclass with a concrete COLLECTOR and COLLECTOR_ARGS.""" + + DATA_MODEL = ServiceabilityDataModel + COLLECTOR = ServiceabilityCollectorBase + COLLECTOR_ARGS = CollectorArgs + ANALYZER_ARGS = ServiceabilityAnalyzerArgs diff --git a/nodescraper/plugins/serviceability/time_utils.py b/nodescraper/plugins/serviceability/time_utils.py new file mode 100644 index 00000000..7b9465c5 --- /dev/null +++ b/nodescraper/plugins/serviceability/time_utils.py @@ -0,0 +1,147 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from datetime import datetime, timezone +from typing import Literal + +TimeOperator = Literal[">", ">=", "<", "<=", "=="] + +_TIME_OPERATORS: set[str] = {">", ">=", "<", "<=", "=="} + + +def _as_utc_for_compare(value: datetime) -> datetime: + """Normalize naive datetimes to UTC for comparisons against offset-aware values.""" + if value.tzinfo is None: + return value.replace(tzinfo=timezone.utc) + return value.astimezone(timezone.utc) + + +def is_valid_iso_datetime(value: str) -> bool: + """Return whether a string is ISO-8601 compliant. + + Args: + value: Date or date-time string to validate. + + Returns: + True if the value parses as ISO-8601. + """ + try: + parse_iso_datetime(value) + except ValueError: + return False + return True + + +def normalize_se_timestamp(value: str) -> str: + """Normalize a timestamp to the Service Hub wire format. + + Accepts ISO-8601 (``2026-05-07T12:50:42``) and SE-style strings with a space + separator (``2026-05-07 12:50:42.096-07:00``). + """ + text = str(value).strip() + if not text: + raise ValueError("Empty datetime string") + if " " in text and "T" not in text: + return text + parsed = parse_iso_datetime(text) + micro = parsed.microsecond + base = parsed.strftime("%Y-%m-%d %H:%M:%S") + if micro: + base = f"{base}.{micro:06d}".rstrip("0").rstrip(".") + offset = parsed.strftime("%z") + if offset: + return f"{base}{offset[:3]}:{offset[3:]}" + return base + + +def parse_iso_datetime(value: str) -> datetime: + """Parse an ISO-8601 or SE-style date-time string. + + Args: + value: Date (e.g. 2026-05-17), ISO date-time, or SE format with a space separator. + + Returns: + Parsed datetime. + """ + text = str(value).strip() + if not text: + raise ValueError("Empty datetime string") + if text.endswith("Z"): + text = f"{text[:-1]}+00:00" + if " " in text and "T" not in text: + text = text.replace(" ", "T", 1) + try: + parsed = datetime.fromisoformat(text) + except ValueError as exc: + raise ValueError(f"Not ISO-8601 compliant: {value!r}") from exc + if "T" not in value and "+" not in value and value.count("-") == 2: + return parsed.replace(hour=0, minute=0, second=0, microsecond=0) + return parsed + + +def compare_iso_datetime(left: str, right: str, operator: TimeOperator) -> bool: + """Compare two ISO-8601 values with a relational operator. + + Args: + left: Left-hand date or date-time string. + right: Right-hand date or date-time string. + operator: One of >, >=, <, <=, or ==. + + Returns: + Result of the comparison. + """ + if operator not in _TIME_OPERATORS: + raise ValueError(f"Unsupported time operator: {operator!r}") + left_dt = _as_utc_for_compare(parse_iso_datetime(left)) + right_dt = _as_utc_for_compare(parse_iso_datetime(right)) + if operator == ">": + return left_dt > right_dt + if operator == ">=": + return left_dt >= right_dt + if operator == "<": + return left_dt < right_dt + if operator == "<=": + return left_dt <= right_dt + return left_dt == right_dt + + +def satisfies_time_check( + candidate: str, + reference: str, + operator: TimeOperator, +) -> bool: + """Test whether candidate satisfies operator against reference. + + Args: + candidate: Date or date-time string to test. + reference: Reference date or date-time string. + operator: One of >, >=, <, <=, or ==. + + Returns: + True when the comparison holds. + """ + return compare_iso_datetime(candidate, reference, operator) diff --git a/nodescraper/taskresulthooks/filesystemloghook.py b/nodescraper/taskresulthooks/filesystemloghook.py index 831e3fbe..50184b4e 100644 --- a/nodescraper/taskresulthooks/filesystemloghook.py +++ b/nodescraper/taskresulthooks/filesystemloghook.py @@ -28,7 +28,7 @@ from nodescraper.interfaces.taskresulthook import TaskResultHook from nodescraper.models import DataModel, TaskResult -from nodescraper.utils import pascal_to_snake +from nodescraper.utils import resolve_log_dir_name class FileSystemLogHook(TaskResultHook): @@ -43,9 +43,9 @@ def process_result(self, task_result: TaskResult, data: Optional[DataModel] = No """Log task result to the filesystem (single events.json per directory).""" log_path = self.log_base_path if task_result.parent: - log_path = os.path.join(log_path, pascal_to_snake(task_result.parent)) + log_path = os.path.join(log_path, resolve_log_dir_name(task_result.parent)) if task_result.task: - log_path = os.path.join(log_path, pascal_to_snake(task_result.task)) + log_path = os.path.join(log_path, resolve_log_dir_name(task_result.task)) task_result.log_result(log_path) diff --git a/nodescraper/utils.py b/nodescraper/utils.py index e7a201b8..11c3ab57 100644 --- a/nodescraper/utils.py +++ b/nodescraper/utils.py @@ -189,18 +189,35 @@ def get_unique_filename(directory, filename) -> str: count += 1 -def pascal_to_snake(input_str: str) -> str: - """Convert PascalCase to snake_case +_LOG_DIR_NAME_OVERRIDES: dict[str, str] = {} - Args: - input_str (str): string to convert - Returns: - str: converted string +def register_log_dir_name(class_name: str, log_dir_name: str) -> None: + """Register a filesystem log directory name for a task or plugin class.""" + _LOG_DIR_NAME_OVERRIDES[class_name] = log_dir_name + + +def resolve_log_dir_name(class_name: str) -> str: + """Map a class name to its log directory (override or snake_case).""" + if class_name in _LOG_DIR_NAME_OVERRIDES: + return _LOG_DIR_NAME_OVERRIDES[class_name] + return pascal_to_snake(class_name) + + +def pascal_to_snake(input_str: str) -> str: + """Convert PascalCase to snake_case. + + Handles embedded acronyms with digits (e.g. ``ServiceabilityPluginMI3XX``, + ``MI3XXCollector``) without splitting into single-letter segments. """ + if not input_str: + return "" if input_str.isupper(): return input_str.lower() - return ("_").join(re.split("(?<=.)(?=[A-Z])", input_str)).lower() + normalized = re.sub(r"([A-Z][A-Z0-9]+)([A-Z][a-z])", r"\1_\2", input_str) + normalized = re.sub(r"([a-z])([A-Z][A-Z0-9]+)", r"\1_\2", normalized) + normalized = re.sub(r"([a-z0-9])([A-Z])", r"\1_\2", normalized) + return normalized.lower() def bytes_to_human_readable(input_bytes: int) -> str: diff --git a/pyproject.toml b/pyproject.toml index 9e24d056..8cf05b74 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -42,6 +42,7 @@ dev = [ "pytest-cov", "mypy", "types-paramiko", + "types-requests", "types-setuptools", ] @@ -80,5 +81,10 @@ profile = "black" select = ["F", "B", "T20", "N", "W", "I", "E"] ignore = ["E501", "N806"] +[tool.mypy] +python_version = "3.9" +mypy_path = ["test/unit"] +explicit_package_bases = true + [tool.setuptools_scm] version_scheme = "post-release" diff --git a/test/unit/framework/common/shared_utils.py b/test/unit/framework/common/shared_utils.py index 5b882549..7ba16c16 100644 --- a/test/unit/framework/common/shared_utils.py +++ b/test/unit/framework/common/shared_utils.py @@ -23,7 +23,7 @@ # SOFTWARE. # ############################################################################### -from typing import Optional +from typing import Any, Dict, List, Optional from unittest.mock import MagicMock from nodescraper.constants import DEFAULT_EVENT_REPORTER @@ -83,7 +83,14 @@ def build_from_model(cls, model): class DummyDataModel(DataModel): - foo: str = None + foo: Optional[str] = None + some_version: str = "0" + + +# Module-level defaults so ``run`` signatures stay stable for ConfigBuilder tests. +_TEST_PLUGIN_A_LIST_DEFAULT: List[Any] = [1] +_TEST_PLUGIN_A_DICT_DEFAULT: Dict[str, Any] = {} +_TEST_PLUGIN_A_MODEL_DEFAULT = TestModelArg() class TestPluginA(PluginInterface[MockConnectionManager, None]): @@ -95,10 +102,12 @@ def run( self, test_bool_arg: bool = True, test_str_arg: str = "test", - test_list_arg: list[int] = [1], # noqa: B006 - test_dict_arg: dict = {}, # noqa: B006 - test_model_arg: Optional[TestModelArg] = None, - ): + test_list_arg: List[Any] = _TEST_PLUGIN_A_LIST_DEFAULT, + test_dict_arg: Dict[str, Any] = _TEST_PLUGIN_A_DICT_DEFAULT, + test_model_arg: TestModelArg = _TEST_PLUGIN_A_MODEL_DEFAULT, + **kwargs: Any, + ) -> PluginResult: + _ = kwargs return PluginResult( source="testA", status=ExecutionStatus.ERROR, diff --git a/test/unit/instinct_shaped_engine.py b/test/unit/instinct_shaped_engine.py new file mode 100644 index 00000000..6fa7f234 --- /dev/null +++ b/test/unit/instinct_shaped_engine.py @@ -0,0 +1,68 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### + +from __future__ import annotations + +from typing import Any, Optional + +__all__ = ["InstinctShapedEngine"] + +_LAST_CALL: dict[str, Any] = {} + + +def clear_last_call() -> None: + _LAST_CALL.clear() + + +def get_last_call() -> dict[str, Any]: + return dict(_LAST_CALL) + + +class InstinctShapedEngine: + """Mirrors keyword parameters of ``InstinctServiceAssistant.get_service_info``.""" + + def __init__(self, afid_sag: str) -> None: + self.afid_sag = afid_sag + + def get_service_info( + self, + rf_events: list[Any], + from_ac_cycle: int = -1, + from_date: Optional[str] = None, + cper_data: Optional[dict[str, Any]] = None, + designation_serials: Optional[dict[str, str]] = None, + suppress_service_actions: Optional[list[str]] = None, + ) -> None: + _LAST_CALL.clear() + _LAST_CALL.update( + from_ac_cycle=from_ac_cycle, + from_date=from_date, + cper_data=cper_data, + designation_serials=designation_serials, + suppress_service_actions=suppress_service_actions, + rf_len=len(rf_events), + ) + return None diff --git a/test/unit/mock_python_engine.py b/test/unit/mock_python_engine.py new file mode 100644 index 00000000..f48a7e43 --- /dev/null +++ b/test/unit/mock_python_engine.py @@ -0,0 +1,43 @@ +"""Mock Python service hub for unit tests.""" + +from __future__ import annotations + +from types import SimpleNamespace +from typing import Any, Optional + +from serviceability_dummy_data import ( + DUMMY_HUB_VERSION, + DUMMY_SAG_PID, + DUMMY_SAG_REVISION, + DUMMY_SERVICE_ACTION_NUM, + DUMMY_SERVICE_ACTION_TITLE, + DUMMY_UNIT_A, +) + + +class MockServiceEngine: + def __init__(self, afid_sag: str) -> None: + self.afid_sag = afid_sag + + def get_service_info( + self, + rf_events: list[dict[str, Any]], + cper_data: Optional[dict[str, Any]] = None, + **kwargs: Any, + ) -> SimpleNamespace: + del cper_data, kwargs + service_info: dict[str, dict[str, dict[str, str]]] = {} + for event in rf_events: + afid = event.get("Afid") + unit = event.get("serviceable_unit", DUMMY_UNIT_A) + if afid is None: + continue + service_info.setdefault(str(unit), {})[str(afid)] = { + "service_action_number": str(DUMMY_SERVICE_ACTION_NUM), + "title": DUMMY_SERVICE_ACTION_TITLE, + } + return SimpleNamespace( + service_info=service_info, + afid_sag_metadata={"sag_pid": DUMMY_SAG_PID, "sag_revision": DUMMY_SAG_REVISION}, + engine_version_info={"version": DUMMY_HUB_VERSION}, + ) diff --git a/test/unit/plugin/fixtures/afid_sag_sample.json b/test/unit/plugin/fixtures/afid_sag_sample.json new file mode 100644 index 00000000..952999e6 --- /dev/null +++ b/test/unit/plugin/fixtures/afid_sag_sample.json @@ -0,0 +1,8 @@ +{ + "9001": { + "service_action_num": 99 + }, + "9002": { + "service_action_num": 88 + } +} diff --git a/test/unit/plugin/test_afid_events_bmc_schema.py b/test/unit/plugin/test_afid_events_bmc_schema.py new file mode 100644 index 00000000..8529577c --- /dev/null +++ b/test/unit/plugin/test_afid_events_bmc_schema.py @@ -0,0 +1,82 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""AFID / serviceable unit extraction for OpenBMC-style LogEntry payloads.""" +from __future__ import annotations + +from serviceability_dummy_data import ( + DUMMY_AFID_A, + DUMMY_AFID_BELOW_RF, + DUMMY_AFID_FATAL_HBM, + DUMMY_TIMESTAMP, + DUMMY_UNIT_A, + DUMMY_UNIT_B, + DUMMY_UNIT_C, + dummy_fatal_hbm_log_entry, + dummy_openbmc_log_entry, + dummy_openbmc_log_entry_serviceable_units_only, +) + +from nodescraper.plugins.serviceability.afid_events import ( + _afid_event_from_rf_member, + build_afid_events_from_data, +) +from nodescraper.plugins.serviceability.serviceability_data import ( + ServiceabilityDataModel, +) + + +def test_afid_event_from_openbmc_log_entry_with_links_and_amd_field_identifiers(): + ev = _afid_event_from_rf_member(dummy_openbmc_log_entry()) + assert ev is not None + assert ev.afid == DUMMY_AFID_BELOW_RF + assert ev.serviceable_unit == DUMMY_UNIT_A + assert DUMMY_TIMESTAMP[:10] in ev.time + + +def test_serviceable_unit_from_oem_serviceable_units_when_no_links(): + ev = _afid_event_from_rf_member(dummy_openbmc_log_entry_serviceable_units_only()) + assert ev is not None + assert ev.afid == DUMMY_AFID_A + assert ev.serviceable_unit == DUMMY_UNIT_B + + +def test_afid_event_fatal_hbm_log_entry(): + ev = _afid_event_from_rf_member(dummy_fatal_hbm_log_entry()) + assert ev is not None + assert ev.afid == DUMMY_AFID_FATAL_HBM + assert ev.serviceable_unit == DUMMY_UNIT_C + + +def test_build_afid_events_from_data_includes_openbmc_entries(): + data = ServiceabilityDataModel( + rf_events=[dummy_openbmc_log_entry(), dummy_fatal_hbm_log_entry()], + cper_data={}, + ) + events = build_afid_events_from_data(data) + assert len(events) == 2 + by_afid_oam = {(e.afid, e.serviceable_unit) for e in events} + assert (DUMMY_AFID_BELOW_RF, DUMMY_UNIT_A) in by_afid_oam + assert (DUMMY_AFID_FATAL_HBM, DUMMY_UNIT_C) in by_afid_oam diff --git a/test/unit/plugin/test_mi3xx_collector.py b/test/unit/plugin/test_mi3xx_collector.py new file mode 100644 index 00000000..1cddc2f3 --- /dev/null +++ b/test/unit/plugin/test_mi3xx_collector.py @@ -0,0 +1,301 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import pytest +from pydantic import ValidationError +from serviceability_dummy_data import ( + DUMMY_BMC_HOST, + DUMMY_CPER_BYTES_BASIC, + DUMMY_CPER_BYTES_RF, + DUMMY_CPER_EVENT_ID_BASIC, + DUMMY_CPER_EVENT_ID_RF, + DUMMY_EVENT_URI, + DUMMY_EVENT_URI_ALT, + DUMMY_TIMESTAMP_EARLIER, + DUMMY_TIMESTAMP_LATER, + dummy_cper_basic_member, + dummy_cper_rf_member, + dummy_cper_skip_member, +) + +from nodescraper.connection.redfish import RF_MEMBERS, RedfishGetResult +from nodescraper.enums import ExecutionStatus +from nodescraper.plugins.serviceability import ( + MI3XXAnalyzer, + MI3XXCollector, + MI3XXCollectorArgs, + MI3XXDataModel, + MI3XXDeviceInfo, + MI3XXResult, + ServiceabilityDataModel, + ServiceabilityPluginBase, + ServiceabilityPluginMI3XX, + build_mi3xx_reporting_version_fields, + compare_iso_datetime, + is_valid_iso_datetime, + satisfies_time_check, +) + +EVENT_URI = DUMMY_EVENT_URI + + +@pytest.fixture +def mi3xx_collector(system_info, redfish_conn_mock): + redfish_conn_mock.base_url = f"https://{DUMMY_BMC_HOST}/redfish/v1" + return MI3XXCollector( + system_info=system_info, + connection=redfish_conn_mock, + log_path="/tmp/mi3xx.log", + ) + + +def test_mi3xx_collector_args_default_event_log_uri(): + args = MI3XXCollectorArgs() + uri = args.resolved_event_log_uri() + assert uri == MI3XXCollectorArgs.default_event_log_uri() + assert uri.startswith("/redfish/") + assert "EventLog" in uri + + +def test_mi3xx_collector_args_requires_event_log_uri(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs(uri="", rf_event_log_uri="") + + +def test_mi3xx_collector_args_uri_alias_prefers_uri_when_both_set(): + args = MI3XXCollectorArgs( + uri=f" {DUMMY_EVENT_URI_ALT} ", + rf_event_log_uri=DUMMY_EVENT_URI, + ) + assert args.resolved_event_log_uri() == DUMMY_EVENT_URI_ALT + + +def test_mi3xx_collector_args_strips_rf_event_log_uri(): + args = MI3XXCollectorArgs(rf_event_log_uri=f" {DUMMY_EVENT_URI_ALT} ") + assert args.rf_event_log_uri == DUMMY_EVENT_URI_ALT + assert args.resolved_event_log_uri() == DUMMY_EVENT_URI_ALT + + +def test_mi3xx_collector_args_assembly_requires_both_template_and_devices(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template="/redfish/v1/Chassis/{device}/Assembly", + ) + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_chassis_devices=["dummy-chassis"], + ) + + +def test_mi3xx_collector_args_reference_time_requires_operator(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + ) + + +def test_mi3xx_collector_args_accepts_iso_date_and_datetime(): + date_args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + time_operator=">=", + ) + assert date_args.reference_time == "2000-01-01" + + +def test_time_utils_iso_validation_and_comparison(): + assert is_valid_iso_datetime("2000-01-01") + assert satisfies_time_check("2000-01-02", "2000-01-01", ">") + assert compare_iso_datetime("2000-01-01T00:00:00", "2000-01-01T00:00:00", "==") + + +def test_serviceability_plugin_mi3xx_wiring(): + assert issubclass(ServiceabilityPluginMI3XX, ServiceabilityPluginBase) + assert ServiceabilityPluginMI3XX.DATA_MODEL is ServiceabilityDataModel + assert ServiceabilityPluginMI3XX.COLLECTOR is MI3XXCollector + assert ServiceabilityPluginMI3XX.COLLECTOR_ARGS is MI3XXCollectorArgs + assert ServiceabilityPluginMI3XX.ANALYZER is MI3XXAnalyzer + + +def test_mi3xx_collector_no_args(mi3xx_collector): + result, data = mi3xx_collector.collect_data() + assert result.status == ExecutionStatus.NOT_RAN + assert "required" in result.message.lower() + assert data is None + + +def test_mi3xx_collector_success_minimal(mi3xx_collector, redfish_conn_mock): + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [{"Id": "dummy-1", "Created": DUMMY_TIMESTAMP_LATER}]}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert len(data.rf_events) == 1 + assert data.bmc_host == DUMMY_BMC_HOST + assert data.log_path == "/tmp/mi3xx.log" + + +def test_mi3xx_collector_satisfies_reference_time_helper(mi3xx_collector): + args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + time_operator=">=", + ) + assert mi3xx_collector.satisfies_reference_time(DUMMY_TIMESTAMP_LATER, args) + assert not mi3xx_collector.satisfies_reference_time(DUMMY_TIMESTAMP_EARLIER, args) + + +def test_mi3xx_collector_fetches_cper_attachments(mi3xx_collector, redfish_conn_mock): + import base64 + from unittest.mock import MagicMock + + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [dummy_cper_basic_member()]}, + status_code=200, + ) + response = MagicMock() + response.ok = True + response.status_code = 200 + response.content = DUMMY_CPER_BYTES_BASIC + redfish_conn_mock.get_response.return_value = response + + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.cper_raw[DUMMY_CPER_EVENT_ID_BASIC] == base64.b64encode( + DUMMY_CPER_BYTES_BASIC + ).decode("ascii") + assert data.cper_data == {} + + +def test_mi3xx_collector_skips_cper_when_aca_serial_and_low_afids( + mi3xx_collector, redfish_conn_mock +): + redfish_conn_mock.get_response.reset_mock() + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [dummy_cper_skip_member()]}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.cper_raw == {} + redfish_conn_mock.get_response.assert_not_called() + + +def test_mi3xx_collector_fetches_cper_when_rf_afid(mi3xx_collector, redfish_conn_mock): + import base64 + from unittest.mock import MagicMock + + redfish_conn_mock.get_response.reset_mock() + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [dummy_cper_rf_member()]}, + status_code=200, + ) + response = MagicMock() + response.ok = True + response.status_code = 200 + response.content = DUMMY_CPER_BYTES_RF + redfish_conn_mock.get_response.return_value = response + + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.cper_raw[DUMMY_CPER_EVENT_ID_RF] == base64.b64encode(DUMMY_CPER_BYTES_RF).decode( + "ascii" + ) + redfish_conn_mock.get_response.assert_called_once() + + +def test_mi3xx_collector_filters_events_by_reference_time(mi3xx_collector, redfish_conn_mock): + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={ + RF_MEMBERS: [ + {"Id": "dummy-1", "Created": DUMMY_TIMESTAMP_LATER}, + {"Id": "dummy-2", "Created": DUMMY_TIMESTAMP_EARLIER}, + ] + }, + status_code=200, + ) + args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + time_operator=">=", + ) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert [event["Id"] for event in data.rf_events] == ["dummy-1"] + + +def test_mi3xx_device_info_fields(): + info = MI3XXDeviceInfo( + board_product_name="dummy-board", + board_serial_number="dummy-serial-001", + product_version="0.0-dummy", + ) + assert info.board_product_name == "dummy-board" + assert info.product_version == "0.0-dummy" + + +def test_mi3xx_result_reporting_versions(): + version_fields = build_mi3xx_reporting_version_fields( + plugin_name="dummy_plugin", + plugin_version="0.0-dummy", + node_scraper_version="0.0-dummy", + dummy_hub_version="0.0-dummy", + ) + result = MI3XXResult(node="dummy-node", **version_fields) + assert result.plugin_name == "dummy_plugin" + assert result.reporter_extensions["dummy_hub_version"] == "0.0-dummy" + + +def test_mi3xx_data_model_log_model(tmp_path): + model = MI3XXDataModel( + collected_data={"events": [{"id": 1}]}, + artifacts={"events.json": [{"id": 1}]}, + ) + model.log_model(str(tmp_path)) + assert (tmp_path / "events.json").is_file() + assert (tmp_path / "MI3XX_data.json").is_file() diff --git a/test/unit/plugin/test_mi3xx_cper_utils.py b/test/unit/plugin/test_mi3xx_cper_utils.py new file mode 100644 index 00000000..b156b930 --- /dev/null +++ b/test/unit/plugin/test_mi3xx_cper_utils.py @@ -0,0 +1,104 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import pytest +from serviceability_dummy_data import ( + DUMMY_AFID_B, + DUMMY_AFID_BELOW_RF, + DUMMY_RF_CPER_AFID, + dummy_aca_err_row, +) + +from nodescraper.plugins.serviceability.mi3xx.mi3xx_cper_utils import ( + event_aca_includes_serial, + event_afids_from_oem, + event_has_aca_decode, + should_skip_cper_fetch_or_decode, +) + + +def test_skip_when_afids_below_threshold_and_aca_has_serial(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + "ErrDataArr": [dummy_aca_err_row()], + } + } + assert event_afids_from_oem(event) == [DUMMY_AFID_BELOW_RF] + assert should_skip_cper_fetch_or_decode(event) is True + + +def test_no_skip_when_rf_range_afid_even_with_aca_serial(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_RF_CPER_AFID}], + "ErrDataArr": [dummy_aca_err_row()], + } + } + assert should_skip_cper_fetch_or_decode(event) is False + + +def test_skip_when_aca_decode_without_serial(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_RF_CPER_AFID}], + "ErrDataArr": [dummy_aca_err_row(serial=False)], + } + } + assert event_has_aca_decode(event) is True + assert event_aca_includes_serial(event) is False + assert should_skip_cper_fetch_or_decode(event) is True + + +def test_no_skip_when_no_err_data_decoded(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + } + } + assert should_skip_cper_fetch_or_decode(event) is False + + +def test_no_skip_when_aca_serial_but_no_afid_list(): + event = { + "Oem": { + "ErrDataArr": [dummy_aca_err_row()], + } + } + assert event_afids_from_oem(event) == [] + assert should_skip_cper_fetch_or_decode(event) is False + + +@pytest.mark.parametrize( + "afids,expect_skip", + [ + ([DUMMY_AFID_BELOW_RF, DUMMY_AFID_B], True), + ([DUMMY_AFID_BELOW_RF, DUMMY_RF_CPER_AFID], False), + ], +) +def test_skip_requires_all_afids_below_rf_threshold(afids, expect_skip): + identifiers = [{"AFID": a} for a in afids] + event = {"Oem": {"AMDFieldIdentifiers": identifiers, "ErrDataArr": [dummy_aca_err_row()]}} + assert should_skip_cper_fetch_or_decode(event) is expect_skip diff --git a/test/unit/plugin/test_se_runner.py b/test/unit/plugin/test_se_runner.py new file mode 100644 index 00000000..554f0ccc --- /dev/null +++ b/test/unit/plugin/test_se_runner.py @@ -0,0 +1,403 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import json +from pathlib import Path +from types import SimpleNamespace +from typing import Any + +import pytest +from pydantic import ValidationError +from serviceability_dummy_data import ( + DUMMY_AFID_A, + DUMMY_AFID_B, + DUMMY_AFID_C, + DUMMY_DESIGNATION_A, + DUMMY_DESIGNATION_B, + DUMMY_HUB_VERSION, + DUMMY_OEM_VENDOR, + DUMMY_RF_EVENT_COUNT, + DUMMY_SAG_PID, + DUMMY_SAG_REVISION, + DUMMY_SERVICE_ACTION_NUM, + DUMMY_TIMESTAMP, + DUMMY_UNIT_A, + DUMMY_UNIT_B, + DUMMY_UNIT_C, +) + +from nodescraper.enums import ExecutionStatus +from nodescraper.plugins.serviceability import ( + AfidEvent, + MI3XXAnalyzer, + SeRunError, + ServiceabilityAnalyzerArgs, + ServiceabilityBlock, + ServiceabilityDataModel, + build_afid_events_from_data, + format_serviceability_solution_lines, + normalize_se_timestamp, + run_service_hub, + serviceability_block_from_service_result, +) +from nodescraper.plugins.serviceability.se_models import ServiceabilitySolution + +FIXTURES = Path(__file__).resolve().parent / "fixtures" +AFID_SAG = FIXTURES / "afid_sag_sample.json" +EXAMPLE_EVENTS = [ + AfidEvent(afid=DUMMY_AFID_A, serviceable_unit=DUMMY_UNIT_A, time=DUMMY_TIMESTAMP), + AfidEvent(afid=DUMMY_AFID_B, serviceable_unit=DUMMY_UNIT_B, time=DUMMY_TIMESTAMP), + AfidEvent(afid=DUMMY_AFID_C, serviceable_unit=DUMMY_UNIT_C, time=DUMMY_TIMESTAMP), +] + + +def test_afid_event_requires_non_empty_serviceable_unit(): + with pytest.raises(ValidationError): + AfidEvent(afid=1, serviceable_unit=" ", time=DUMMY_TIMESTAMP) + + +def test_normalize_se_timestamp_preserves_format_value(): + sample = "2000-01-01 12:00:00.000+00:00" + assert normalize_se_timestamp(sample) == sample + + +def test_analyzer_args_require_hub_config(): + with pytest.raises(ValidationError): + ServiceabilityAnalyzerArgs() + with pytest.raises(ValidationError, match="hub_python_module"): + ServiceabilityAnalyzerArgs(afid_sag_path=str(AFID_SAG)) + args = ServiceabilityAnalyzerArgs( + hub_python_module="dummy.test.module", + afid_sag_path=str(AFID_SAG), + ) + assert args.hub_python_module == "dummy.test.module" + + +def test_resolved_hub_options_explicit_fields_override_options_bag(): + args = ServiceabilityAnalyzerArgs( + hub_python_module="dummy.test.module", + afid_sag_path=str(AFID_SAG), + hub_options={"from_ac_cycle": 9, "extra": 1}, + from_ac_cycle=3, + from_date="2025-01-01", + designation_serials={"U": "S"}, + suppress_service_actions=["99"], + ) + merged = args.resolved_hub_options() + assert merged["from_ac_cycle"] == 3 + assert merged["from_date"] == "2025-01-01" + assert merged["designation_serials"] == {"U": "S"} + assert merged["suppress_service_actions"] == ["99"] + assert merged["extra"] == 1 + + +def test_format_serviceability_solution_lines(): + block = ServiceabilityBlock( + afid_events=EXAMPLE_EVENTS[:1], + solution=[ + ServiceabilitySolution( + afid=DUMMY_AFID_A, + serviceable_unit=[DUMMY_DESIGNATION_A, DUMMY_DESIGNATION_B], + service_action_num=DUMMY_SERVICE_ACTION_NUM, + service_action_title="RMA", + ) + ], + solution_reasoning="Dummy test reasoning.", + hub_version="1.0.0-test", + afid_sag_file_version="PID sag-1, revision rev-a", + ) + lines = format_serviceability_solution_lines(block) + assert lines[0] == "Dummy test reasoning." + assert lines[1] == "Hub version: 1.0.0-test" + assert lines[2] == "AFID_SAG file: PID sag-1, revision rev-a" + assert f"AFID {DUMMY_AFID_A}" in lines[3] + assert DUMMY_DESIGNATION_A in lines[3] + assert "service action 99 (RMA)" in lines[3] + + +def test_serviceability_block_from_service_result(): + result = SimpleNamespace( + service_info={ + DUMMY_DESIGNATION_A: { + str(DUMMY_AFID_A): { + "service_action_number": str(DUMMY_SERVICE_ACTION_NUM), + "error_category": "dummy_category", + "error_type": "dummy_type", + "title": "Dummy service action", + } + }, + DUMMY_DESIGNATION_B: { + str(DUMMY_AFID_A): { + "service_action_number": str(DUMMY_SERVICE_ACTION_NUM), + "error_category": "dummy_category", + "error_type": "dummy_type", + "title": "Dummy service action", + } + }, + }, + afid_sag_metadata={"sag_pid": DUMMY_SAG_PID, "sag_revision": DUMMY_SAG_REVISION}, + engine_version_info={"version": DUMMY_HUB_VERSION}, + ) + block = serviceability_block_from_service_result( + EXAMPLE_EVENTS[:1], + result, + hub_label="Dummy test hub", + rf_event_count=DUMMY_RF_EVENT_COUNT, + ) + assert len(block.solution) == 1 + assert block.solution[0].afid == DUMMY_AFID_A + assert block.solution[0].service_action_num == DUMMY_SERVICE_ACTION_NUM + assert block.solution[0].service_action_title == "Dummy service action" + assert set(block.solution[0].serviceable_unit) == {DUMMY_DESIGNATION_A, DUMMY_DESIGNATION_B} + assert block.hub_version == DUMMY_HUB_VERSION + assert block.afid_sag_file_version is not None + assert DUMMY_SAG_PID in block.afid_sag_file_version + assert DUMMY_SAG_REVISION in block.afid_sag_file_version + assert f"{DUMMY_RF_EVENT_COUNT} Redfish event(s)" in block.solution_reasoning + assert "Dummy test hub" in block.solution_reasoning + + +def test_serviceability_block_from_service_result_isa_version_info(): + result = SimpleNamespace( + service_info={}, + afid_sag_metadata={"sag_pid": DUMMY_SAG_PID, "sag_revision": DUMMY_SAG_REVISION}, + isa_version_info={"VERSION": "1.2.3"}, + ) + block = serviceability_block_from_service_result( + EXAMPLE_EVENTS[:1], + result, + hub_label="ISA", + rf_event_count=1, + ) + assert block.hub_version == "1.2.3" + assert block.afid_sag_file_version is not None + assert DUMMY_SAG_PID in block.afid_sag_file_version + + +def test_resolve_hub_class_finds_package_export(): + import types + + submodule = types.ModuleType("fake_engine.impl") + submodule.__dict__["EngineImpl"] = type( + "EngineImpl", + (), + {"get_service_info": lambda self, rf_events, cper_data=None: None}, + ) + package = types.ModuleType("fake_engine") + package.EngineImpl = submodule.EngineImpl # type: ignore[attr-defined] + package.__all__ = ["EngineImpl"] + + from nodescraper.plugins.serviceability.se_runner import _resolve_hub_class + + assert _resolve_hub_class(package) is submodule.EngineImpl + + +def test_run_service_hub_with_mock_module(): + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + {"Afid": DUMMY_AFID_C, "serviceable_unit": DUMMY_UNIT_C, "Created": DUMMY_TIMESTAMP}, + ] + block = run_service_hub( + hub_python_module="mock_python_engine", + afid_events=EXAMPLE_EVENTS[:2], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + ) + assert len(block.solution) == 2 + assert block.solution[0].afid == DUMMY_AFID_A + assert block.solution[0].service_action_num == DUMMY_SERVICE_ACTION_NUM + + +def test_run_service_hub_custom_analyze_method_and_path_kwarg(): + import sys + import types + + init_log: list[tuple[str, bool]] = [] + analyze_log: list[Any] = [] + + class AltEngine: + def __init__(self, rulebook_path: str, debug: bool = False) -> None: + init_log.append((rulebook_path, debug)) + + def analyze_events(self, rf_events, cper_data=None): + analyze_log.append((list(rf_events), cper_data)) + return None + + mod = types.ModuleType("alt_service_engine") + mod.AltEngine = AltEngine + mod.__all__ = ["AltEngine"] + sys.modules["alt_service_engine"] = mod + try: + run_service_hub( + hub_python_module="alt_service_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=[{"Afid": 1}], + cper_data={"k": 1}, + hub_options={"debug": True}, + hub_analyze_method="analyze_events", + hub_init_path_kwarg="rulebook_path", + ) + finally: + del sys.modules["alt_service_engine"] + + assert init_log[0][0] == str(AFID_SAG) + assert init_log[0][1] is True + assert analyze_log[0][1] == {"k": 1} + + +def test_run_service_hub_accepts_hub_options(): + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + ] + block = run_service_hub( + hub_python_module="mock_python_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + hub_options={"reporting_level": "verbose"}, + ) + assert len(block.solution) == 1 + + +def test_run_service_hub_forwards_full_hub_options_kwargs(): + from instinct_shaped_engine import clear_last_call, get_last_call + + clear_last_call() + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + ] + run_service_hub( + hub_python_module="instinct_shaped_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + cper_data={"decoded": True}, + hub_options={ + "from_ac_cycle": 2, + "from_date": "2024-06-01", + "designation_serials": {"GPU0": "SN1"}, + "suppress_service_actions": ["42"], + }, + ) + got = get_last_call() + assert got["from_ac_cycle"] == 2 + assert got["from_date"] == "2024-06-01" + assert got["cper_data"] == {"decoded": True} + assert got["designation_serials"] == {"GPU0": "SN1"} + assert got["suppress_service_actions"] == ["42"] + + +def test_run_service_hub_collected_cper_overrides_hub_options_cper_data(): + from instinct_shaped_engine import clear_last_call, get_last_call + + clear_last_call() + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + ] + run_service_hub( + hub_python_module="instinct_shaped_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + cper_data={"from_collector": 1}, + hub_options={"cper_data": {"from_options": 2}, "from_ac_cycle": 0}, + ) + assert get_last_call()["cper_data"] == {"from_collector": 1} + + +def test_run_service_hub_missing_sag_raises(): + with pytest.raises(SeRunError, match="Hub config file not found"): + run_service_hub( + hub_python_module="mock_python_engine", + afid_events=EXAMPLE_EVENTS, + afid_sag_path="/nonexistent/dummy_afid_sag.json", + rf_events=[{"Afid": DUMMY_AFID_A}], + ) + + +def test_build_afid_events_from_rf_members(): + data = ServiceabilityDataModel( + rf_events=[ + { + "Afid": DUMMY_AFID_A, + "serviceable_unit": DUMMY_UNIT_A, + "Created": DUMMY_TIMESTAMP, + }, + { + "Oem": { + DUMMY_OEM_VENDOR: { + "Afid": DUMMY_AFID_B, + "serviceable_unit": DUMMY_UNIT_B, + } + }, + "EventTimestamp": DUMMY_TIMESTAMP, + }, + ] + ) + events = build_afid_events_from_data(data) + assert len(events) == 2 + assert events[0].afid == DUMMY_AFID_A + assert events[1].afid == DUMMY_AFID_B + + +def test_mi3xx_analyzer_runs_python_hub(system_info): + data = ServiceabilityDataModel( + rf_events=[ + { + "Afid": DUMMY_AFID_A, + "serviceable_unit": DUMMY_UNIT_A, + "Created": DUMMY_TIMESTAMP, + }, + { + "Afid": DUMMY_AFID_C, + "serviceable_unit": DUMMY_UNIT_C, + "Created": DUMMY_TIMESTAMP, + }, + ] + ) + analyzer = MI3XXAnalyzer(system_info=system_info) + args = ServiceabilityAnalyzerArgs( + hub_python_module="mock_python_engine", + afid_sag_path=str(AFID_SAG), + hub_options={"include_raw_events": False}, + ) + result = analyzer.analyze_data(data, args=args) + assert result.status == ExecutionStatus.OK + assert data.serviceability is not None + assert len(data.serviceability.solution) == 2 + + +def test_mi3xx_analyzer_writes_serviceability_json(tmp_path, system_info): + data = ServiceabilityDataModel( + afid_events=EXAMPLE_EVENTS[:1], + serviceability=ServiceabilityBlock( + afid_events=EXAMPLE_EVENTS[:1], + solution=[], + ), + ) + data.log_model(str(tmp_path)) + payload = json.loads((tmp_path / "serviceability.json").read_text(encoding="utf-8")) + assert payload["afid_events"][0]["afid"] == DUMMY_AFID_A diff --git a/test/unit/plugin/test_serviceability_collector.py b/test/unit/plugin/test_serviceability_collector.py new file mode 100644 index 00000000..1ce3fbb2 --- /dev/null +++ b/test/unit/plugin/test_serviceability_collector.py @@ -0,0 +1,344 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import json +from typing import Any, Optional + +import pytest +from pydantic import ValidationError +from serviceability_dummy_data import DUMMY_BMC_HOST, DUMMY_EVENT_URI + +from nodescraper.connection.redfish import ( + RF_MEMBERS, + RF_MEMBERS_COUNT, + RedfishGetResult, +) +from nodescraper.enums import ExecutionStatus +from nodescraper.models import CollectorArgs +from nodescraper.plugins.serviceability import ( + DeviceInfo, + MI3XXCollectorArgs, + ServiceabilityAnalyzerArgs, + ServiceabilityDataModel, + ServiceabilityPluginBase, +) +from nodescraper.plugins.serviceability.serviceability_collector import ( + ServiceabilityCollectorBase, +) + +EVENT_URI = DUMMY_EVENT_URI + + +class _StubServiceabilityCollector(ServiceabilityCollectorBase[MI3XXCollectorArgs]): + def filter_event_members( + self, + members: list[Any], + args: MI3XXCollectorArgs, + ) -> list[Any]: + return members + + def is_cper_event(self, event: dict) -> bool: + return False + + def collect_cper_attachments(self, rf_events: list[Any]) -> dict[str, str]: + return {} + + def parse_assembly_entry( + self, + designation: str, + assembly_member_entry: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> DeviceInfo: + return DeviceInfo(name=designation, serial_number=assembly_member_entry.get("SerialNumber")) + + def extract_component_details( + self, + firmware_inventory_payload: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> Optional[str]: + return firmware_inventory_payload.get("Details") + + +@pytest.fixture +def stub_serviceability_collector(system_info, redfish_conn_mock): + redfish_conn_mock.base_url = f"https://{DUMMY_BMC_HOST}/redfish/v1" + return _StubServiceabilityCollector( + system_info=system_info, + connection=redfish_conn_mock, + log_path="/tmp/serviceability.log", + ) + + +def test_mi3xx_collector_args_default_event_log_uri(): + args = MI3XXCollectorArgs() + uri = args.resolved_event_log_uri() + assert uri == MI3XXCollectorArgs.default_event_log_uri() + assert uri.startswith("/redfish/") + assert "EventLog" in uri + + +def test_mi3xx_collector_args_requires_event_log_uri(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs(uri="", rf_event_log_uri="") + + +def test_mi3xx_collector_args_uri_alias_prefers_uri_over_rf_event_log_uri(): + args = MI3XXCollectorArgs( + uri=" /redfish/v1/Systems/Dummy/LogServices/DummyEventLog/EntriesAlt ", + rf_event_log_uri="/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/Entries", + ) + assert ( + args.resolved_event_log_uri() + == "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/EntriesAlt" + ) + + +def test_mi3xx_collector_args_assembly_requires_both_template_and_devices(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template="/redfish/v1/Chassis/{device}/Assembly", + ) + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_chassis_devices=["dummy-chassis"], + ) + + +def test_mi3xx_collector_args_assembly_template_must_include_device_placeholder(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template="/redfish/v1/Chassis/dummy-chassis/Assembly", + rf_chassis_devices=["dummy-chassis"], + ) + + +def test_mi3xx_collector_args_assembly_optional_when_omitted(): + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + assert args.rf_assembly_uri_template is None + assert args.rf_chassis_devices is None + + +def test_serviceability_plugin_base_wiring(): + assert ServiceabilityPluginBase.DATA_MODEL is ServiceabilityDataModel + assert ServiceabilityPluginBase.COLLECTOR is ServiceabilityCollectorBase + assert getattr(ServiceabilityPluginBase, "COLLECTOR_ARGS", CollectorArgs) is CollectorArgs + assert ServiceabilityPluginBase.ANALYZER_ARGS is ServiceabilityAnalyzerArgs + assert ServiceabilityPluginBase.ANALYZER is None + + +def test_stub_collector_no_args(stub_serviceability_collector): + result, data = stub_serviceability_collector.collect_data() + assert result.status == ExecutionStatus.NOT_RAN + assert "required" in result.message.lower() + assert data is None + + +def test_stub_collector_event_log_get_fails(stub_serviceability_collector, redfish_conn_mock): + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=False, + error="timeout", + status_code=None, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.ERROR + assert EVENT_URI in result.message + assert data is None + + +def test_stub_collector_success_minimal(stub_serviceability_collector, redfish_conn_mock): + members = [{"Id": "1"}] + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: members}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.rf_events == members + assert EVENT_URI in data.responses + assert data.bmc_host == DUMMY_BMC_HOST + assert data.log_path == "/tmp/serviceability.log" + redfish_conn_mock.run_get_paged.assert_called_once() + + +def test_stub_collector_filter_raises_maps_to_error( + stub_serviceability_collector, redfish_conn_mock +): + class _BadFilter(_StubServiceabilityCollector): + def filter_event_members(self, members, args): + raise ValueError("bad filter") + + collector = _BadFilter( + system_info=stub_serviceability_collector.system_info, + connection=redfish_conn_mock, + ) + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: []}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = collector.collect_data(args=args) + assert result.status == ExecutionStatus.ERROR + assert "Event filter failed" in result.message + assert data is None + + +def test_stub_collector_assembly_and_firmware_paths( + stub_serviceability_collector, redfish_conn_mock +): + tpl = "/redfish/v1/Chassis/{device}/Assembly" + asm_uri = tpl.format(device="dummy-chassis") + fw_uri = "/redfish/v1/UpdateService/FirmwareInventory" + + def run_get_side_effect(path: str, *_args, **_kwargs): + if path == EVENT_URI: + return RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: []}, + status_code=200, + ) + if path == asm_uri: + return RedfishGetResult( + path=asm_uri, + success=True, + data={"Assemblies": [{"SerialNumber": "dummy-asm-serial"}]}, + status_code=200, + ) + if path == fw_uri: + return RedfishGetResult( + path=fw_uri, + success=True, + data={"Details": "dummy-fw-summary"}, + status_code=200, + ) + raise AssertionError(f"unexpected Redfish GET path: {path!r}") + + redfish_conn_mock.run_get.side_effect = run_get_side_effect + + def run_get_paged_forbidden(*_args, **_kwargs): + raise AssertionError("run_get_paged must not run when follow_next_link=False") + + redfish_conn_mock.run_get_paged.side_effect = run_get_paged_forbidden + + args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template=tpl, + rf_chassis_devices=["dummy-chassis"], + rf_firmware_bundle_uri=fw_uri, + follow_next_link=False, + ) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert "dummy-chassis" in data.assembly_info + assert data.assembly_info["dummy-chassis"].serial_number == "dummy-asm-serial" + assert data.component_details == "dummy-fw-summary" + assert asm_uri in data.responses + + +def test_stub_collector_top_when_count_exceeds_top_uses_skip_and_paged( + stub_serviceability_collector, redfish_conn_mock +): + probe = RedfishGetResult( + path=f"{EVENT_URI}?$top=1", + success=True, + data={RF_MEMBERS_COUNT: 100}, + status_code=200, + ) + window = RedfishGetResult( + path=f"{EVENT_URI}?$skip=90", + success=True, + data={RF_MEMBERS: [{"Id": "last"}]}, + status_code=200, + ) + redfish_conn_mock.run_get.return_value = probe + redfish_conn_mock.run_get_paged.return_value = window + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI, top=10) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.rf_events == [{"Id": "last"}] + redfish_conn_mock.run_get.assert_called_once() + assert "?$top=1" in redfish_conn_mock.run_get.call_args[0][0] + redfish_conn_mock.run_get_paged.assert_called_once_with( + f"{EVENT_URI}?$skip=90", max_pages=args.max_pages + ) + + +def test_stub_collector_top_when_count_within_top_fetches_full_log( + stub_serviceability_collector, redfish_conn_mock +): + probe = RedfishGetResult( + path=f"{EVENT_URI}?$top=1", + success=True, + data={RF_MEMBERS_COUNT: 3}, + status_code=200, + ) + full = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [{"Id": "a"}, {"Id": "b"}]}, + status_code=200, + ) + redfish_conn_mock.run_get.return_value = probe + redfish_conn_mock.run_get_paged.return_value = full + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI, top=50) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert len(data.rf_events) == 2 + redfish_conn_mock.run_get_paged.assert_called_once_with(EVENT_URI, max_pages=args.max_pages) + + +def test_serviceability_data_model_log_model_writes_json(tmp_path): + model = ServiceabilityDataModel( + responses={"/x": {"ok": True}}, + cper_data={"slot": {"raw": "data"}}, + ) + model.log_model(str(tmp_path)) + responses_file = tmp_path / "redfish_responses.json" + cper_file = tmp_path / "cper_data.json" + assert responses_file.is_file() + assert cper_file.is_file() + assert json.loads(responses_file.read_text(encoding="utf-8")) == {"/x": {"ok": True}} + assert json.loads(cper_file.read_text(encoding="utf-8")) == {"slot": {"raw": "data"}} + + +def test_serviceability_data_model_log_model_skips_cper_when_empty(tmp_path): + model = ServiceabilityDataModel(responses={}) + model.log_model(str(tmp_path)) + assert (tmp_path / "redfish_responses.json").is_file() + assert not (tmp_path / "cper_data.json").exists() diff --git a/test/unit/serviceability_dummy_data.py b/test/unit/serviceability_dummy_data.py new file mode 100644 index 00000000..379727d1 --- /dev/null +++ b/test/unit/serviceability_dummy_data.py @@ -0,0 +1,180 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Shared dummy values for serviceability unit tests (not production data).""" + +from __future__ import annotations + +from typing import Any + +DUMMY_AFID_A = 9001 +DUMMY_AFID_B = 9002 +DUMMY_AFID_C = 9003 +DUMMY_AFID_BELOW_RF = 22 +DUMMY_AFID_FATAL_HBM = 25 +DUMMY_RF_CPER_AFID = 10000 +DUMMY_SERVICE_ACTION_NUM = 99 +DUMMY_SERVICE_ACTION_TITLE = "Dummy service action" +DUMMY_UNIT_A = "dummy_unit_a" +DUMMY_UNIT_B = "dummy_unit_b" +DUMMY_UNIT_C = "dummy_unit_c" +DUMMY_DESIGNATION_A = "DUMMY_SLOT_A" +DUMMY_DESIGNATION_B = "DUMMY_SLOT_B" +DUMMY_EVENT_URI = "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/Entries" +DUMMY_EVENT_URI_ALT = "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/EntriesAlt" +DUMMY_EVENT_LOG_BASE = "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog" +DUMMY_CPER_ATTACHMENT_URI_1 = f"{DUMMY_EVENT_LOG_BASE}/Attachments/1" +DUMMY_CPER_ATTACHMENT_URI_2 = f"{DUMMY_EVENT_LOG_BASE}/Attachments/2" +DUMMY_TIMESTAMP = "2000-01-01T12:00:00+00:00" +DUMMY_TIMESTAMP_EARLIER = "1999-12-31T12:00:00+00:00" +DUMMY_TIMESTAMP_LATER = "2000-01-02T12:00:00+00:00" +DUMMY_RF_EVENT_COUNT = 2 +DUMMY_SAG_PID = "dummy-sag-pid" +DUMMY_SAG_REVISION = "dummy-rev-0" +DUMMY_HUB_VERSION = "0.0.0-dummy" +DUMMY_BMC_HOST = "dummy-bmc.example" +DUMMY_OEM_VENDOR = "DummyVendor" +DUMMY_GPU_SERIAL_NUMBER = "DUMMY-GPU-SERIAL-0001" +DUMMY_DECODED_ERROR_TYPE = "dummy_error_type" +DUMMY_RF_EVENT_ID_1 = "dummy-rf-evt-1" +DUMMY_RF_EVENT_ID_2 = "dummy-rf-evt-2" +DUMMY_CPER_EVENT_ID_BASIC = "dummy-cper-evt-1" +DUMMY_CPER_EVENT_ID_SKIP = "dummy-cper-evt-skip" +DUMMY_CPER_EVENT_ID_RF = "dummy-cper-evt-rf" +DUMMY_CPER_BYTES_BASIC = b"\x01\x02dummy-cper" +DUMMY_CPER_BYTES_RF = b"\xaa\xbb" + + +def dummy_chassis_uri(unit: str) -> str: + return f"/redfish/v1/Chassis/{unit}" + + +def dummy_aca_err_row(*, serial: bool = True, decoded: bool = True) -> dict[str, Any]: + meta = {"SerialNumber": DUMMY_GPU_SERIAL_NUMBER} if serial else {"GpuFw": "dummy-fw"} + decoded_data = {"error_type": DUMMY_DECODED_ERROR_TYPE} if decoded else {} + return {"DecodedData": decoded_data, "MetaData": meta} + + +def dummy_cper_rf_member() -> dict[str, Any]: + """RF-range AFID with ACA decode + serial (CPER attachment fetch expected).""" + return { + "Id": DUMMY_CPER_EVENT_ID_RF, + "Created": DUMMY_TIMESTAMP_LATER, + "DiagnosticDataType": "CPER", + "AdditionalDataURI": DUMMY_CPER_ATTACHMENT_URI_2, + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_RF_CPER_AFID}], + "ErrDataArr": [dummy_aca_err_row()], + }, + } + + +def dummy_cper_skip_member() -> dict[str, Any]: + """Low AFID with ACA decode + serial (CPER attachment fetch skipped).""" + return { + "Id": DUMMY_CPER_EVENT_ID_SKIP, + "Created": DUMMY_TIMESTAMP_LATER, + "DiagnosticDataType": "CPER", + "AdditionalDataURI": DUMMY_CPER_ATTACHMENT_URI_1, + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + "ErrDataArr": [ + { + "DecodedData": {"error_type": "dummy_on_die_ecc"}, + "MetaData": {"SerialNumber": DUMMY_GPU_SERIAL_NUMBER}, + } + ], + }, + } + + +def dummy_cper_basic_member() -> dict[str, Any]: + """CPER event without OEM ACA block (attachment fetch expected).""" + return { + "Id": DUMMY_CPER_EVENT_ID_BASIC, + "Created": DUMMY_TIMESTAMP_LATER, + "DiagnosticDataType": "CPER", + "AdditionalDataURI": DUMMY_CPER_ATTACHMENT_URI_1, + } + + +def dummy_openbmc_log_entry() -> dict[str, Any]: + """OpenBMC-style LogEntry with Links OOC and AMDFieldIdentifiers[].""" + return { + "@odata.id": f"{DUMMY_EVENT_URI}/1", + "Created": DUMMY_TIMESTAMP, + "Id": DUMMY_RF_EVENT_ID_1, + "Links": { + "OriginOfCondition": {"@odata.id": dummy_chassis_uri(DUMMY_UNIT_A)}, + }, + "Oem": { + "AMDFieldIdentifiers": [ + { + "AFID": DUMMY_AFID_BELOW_RF, + "Description": "dummy on-die ECC, uncorrected, non-fatal", + "ServiceableUnits": [{"@odata.id": dummy_chassis_uri(DUMMY_UNIT_A)}], + "ServiceableUnits@odata.count": 1, + } + ], + "AMDFieldIdentifiers@Members.count": 1, + }, + } + + +def dummy_openbmc_log_entry_serviceable_units_only() -> dict[str, Any]: + """LogEntry with ServiceableUnits only (no Links OOC).""" + return { + "Created": DUMMY_TIMESTAMP, + "Oem": { + "AMDFieldIdentifiers": [ + { + "AFID": DUMMY_AFID_A, + "ServiceableUnits": [{"@odata.id": dummy_chassis_uri(DUMMY_UNIT_B)}], + } + ], + }, + } + + +def dummy_fatal_hbm_log_entry() -> dict[str, Any]: + """Minimal CPER-style row with Links + AMDFieldIdentifiers[].""" + return { + "Created": DUMMY_TIMESTAMP_LATER, + "Id": DUMMY_RF_EVENT_ID_2, + "Links": { + "OriginOfCondition": {"@odata.id": dummy_chassis_uri(DUMMY_UNIT_C)}, + }, + "Oem": { + "AMDFieldIdentifiers": [ + { + "AFID": DUMMY_AFID_FATAL_HBM, + "Description": "dummy fatal HBM", + "ServiceableUnits": [{"@odata.id": dummy_chassis_uri(DUMMY_UNIT_C)}], + "ServiceableUnits@odata.count": 1, + } + ], + "AMDFieldIdentifiers@Members.count": 1, + }, + }