Skip to content

Add: enable_dump_args — orchestrator args dump #792

Open
vegetabledoww wants to merge 3 commits into
hw-native-sys:mainfrom
vegetabledoww:main
Open

Add: enable_dump_args — orchestrator args dump #792
vegetabledoww wants to merge 3 commits into
hw-native-sys:mainfrom
vegetabledoww:main

Conversation

@vegetabledoww
Copy link
Copy Markdown

@vegetabledoww vegetabledoww commented May 18, 2026

…lars)

New diagnostic flag parallel to enable_dump_tensor. When enabled, dumps orchestrator-level tensor shapes/dtype/addresses and scalar values to args_dump.json via the existing tensor dump collector SHM infrastructure.

Supports both a2a3 slim、 a5 slim platforms and a2a3 onboard, Python and C++ APIs.

Summary

Add --dump-args diagnostic flag that captures orchestrator-level tensor metadata (shape, dtype, address) and scalar values into args_dump.json via the existing tensor dump collector infrastructure.

Design

  • Reuse existing tensor dump channel — no new data transfer path; shares the existing dump SHM, collector, and host export pipeline.
  • New TensorDumpStage::ARGS_DUMP — stage distinguishes args dump from regular tensor dumps.
  • Independent flag--dump-args is independent of --dump-tensor; can be enabled separately.
  • Dual-platform — supports both a2a3 and a5, sim and onboard.

Scope

~40 files across Python CLI/bindings, common config, a2a3 and a5 device/host/runtime layers.

Verification

  • a2a3sim: 1 passed
  • a2a3 onboard: 1 passed
  • a5sim: 1 passed
  • All three produce valid args_dump.json (json.load() validated)
  • UT: 244 passed (1 pre-existing failure, 1 skipped HW test)

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to dump orchestrator-level arguments (tensors and scalars) into an args_dump.json file for the a2a3 and a5 platforms. Key changes include updating the CallConfig structure, Python bindings, and the TensorDumpCollector to handle a new ARGS_DUMP stage. Feedback identifies several critical issues: the collector currently records arena offsets instead of actual scalar values, and the JSON export is missing essential tensor metadata such as shapes and data pointers. Additionally, several leftover debug logs need to be removed, an indentation error in the a5 collector should be corrected, and the flush logic in the a2a3 executor should be updated to a loop for consistency across platforms.

e.ndims = rec.ndims;
memcpy(e.shapes.data(), rec.shapes, sizeof(e.shapes));
e.type = (rec.ndims == 0) ? "scalar" : "tensor";
e.scalar_value = (rec.ndims == 0 && rec.payload_size > 0 && rec.payload_offset > 0) ? rec.payload_offset : 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The scalar_value is being assigned the payload_offset, which is the offset into the circular arena where the value was written, not the actual 8-byte scalar value. The collector needs to read the uint64_t value from the arena using this offset (handling circular wrap if necessary) to correctly capture the argument value. Additionally, per repository rules, ensure this 64-bit unsigned integer is represented as a string in the JSON output to prevent precision loss in JavaScript-based parsers.

References
  1. When serializing 64-bit unsigned integers to JSON, represent them as strings to prevent precision loss in JavaScript-based parsers.

e.ndims = rec.ndims;
memcpy(e.shapes.data(), rec.shapes, sizeof(e.shapes));
e.type = (rec.ndims == 0) ? "scalar" : "tensor";
e.scalar_value = (rec.ndims == 0 && rec.payload_size > 0) ? rec.payload_offset : 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the a2a3 implementation, scalar_value is being set to the arena offset (payload_offset) instead of the actual value stored in the arena. This will result in incorrect values in the exported JSON. Ensure the resulting 64-bit value is serialized as a string.

References
  1. When serializing 64-bit unsigned integers to JSON, represent them as strings to prevent precision loss.

Comment on lines +623 to +625
json << " {\"arg_index\": " << e.arg_index << ", \"type\": \"" << e.type << "\", \"dtype\": \"" << dtype_name << "\", \"ndims\": " << static_cast<uint32_t>(e.ndims);
json << ", \"scalar_value\": " << e.scalar_value;
json << "}";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON export for orchestrator arguments is missing critical metadata for tensors. The PR description states that it should dump shapes and addresses, but the current implementation only exports arg_index, type, dtype, ndims, and scalar_value. For tensors, the shapes array and the data_ptr (address) should also be included in the JSON object. Note that the data_ptr (a 64-bit address) must be represented as a string to comply with repository standards.

References
  1. When serializing 64-bit unsigned integers (such as hardware addresses) to JSON, represent them as strings to prevent precision loss.

#endif
{
bool en = is_dump_args_enabled();
LOG_ERROR("ARGS_DEBUG: Thread %d is_dump_args_enabled=%d args_addr=%p", thread_idx, en, (void*)orch_args_cached_);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This LOG_ERROR with the ARGS_DEBUG prefix appears to be leftover debug code and should be removed before merging.

// Flush the args dump buffer so the host collector can pick it up.
{
bool en = is_dump_args_enabled();
LOG_ERROR("ARGS_FLUSH: Thread %d enabled=%d", thread_idx, en);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This LOG_ERROR with the ARGS_FLUSH prefix appears to be leftover debug code and should be removed.

Comment on lines +762 to +765
int orch_thread = thread_num_ - 1;
if (orch_thread >= sched_thread_num_ && orch_thread < thread_num_) {
dump_tensor_flush(orch_thread);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic here only flushes the last thread (thread_num_ - 1), but the comment above (line 757) says it should flush all threads from sched_thread_num_ to thread_num_ - 1. If there are multiple orchestrator threads, some buffers will not be flushed. This should be a loop to maintain consistency with the implementation in other runtimes like a5.

        for (int t = sched_thread_num_; t < thread_num_; t++) {
            dump_tensor_flush(t);
        }
References
  1. For files shared across different runtimes, maintain identical code for consistency.

Comment on lines +478 to +482
LOG_ERROR("A5_ARGS_FLUSH2: thread=%d count=%u", thread_idx, buf->count);
uint64_t buf_addr = reinterpret_cast<uint64_t>(buf);
uint32_t seq = s_dump_states[thread_idx]->current_buf_seq;
if (enqueue_dump_ready_buffer(thread_idx, buf_addr, seq) != 0) {
int enq_rc = enqueue_dump_ready_buffer(thread_idx, buf_addr, seq);
LOG_ERROR("A5_ARGS_ENQ: thread=%d rc=%d seq=%u", thread_idx, enq_rc, seq);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These LOG_ERROR statements with A5_ARGS_FLUSH2 and A5_ARGS_ENQ prefixes are clearly debug logs and should be removed.

return 0;
}
} else {
auto export_start = std::chrono::steady_clock::now();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code inside the else block is not properly indented.

Comment on lines +1076 to +1079
json << " {\"arg_index\": " << e.arg_index << ", \"type\": \"" << e.type << "\", \"dtype\": \""
<< dtype_name << "\", \"ndims\": " << static_cast<uint32_t>(e.ndims);
json << ", \"scalar_value\": " << e.scalar_value;
json << "}";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON export for a5 is also missing tensor shapes and addresses (data_ptr), which are required for the feature to be fully functional. Ensure data_ptr is serialized as a string per repository rules for 64-bit integers.

References
  1. When serializing 64-bit unsigned integers (such as hardware addresses) to JSON, represent them as strings to prevent precision loss.

…lars)

New diagnostic flag parallel to enable_dump_tensor. When enabled, dumps
orchestrator-level tensor shapes/dtype/addresses and scalar values to
args_dump.json via the existing tensor dump collector SHM infrastructure.

Supports both a2a3 and a5 platforms, sim and onboard, Python and C++ APIs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant