Add: enable_dump_args — orchestrator args dump #792
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a feature to dump orchestrator-level arguments (tensors and scalars) into an args_dump.json file for the a2a3 and a5 platforms. Key changes include updating the CallConfig structure, Python bindings, and the TensorDumpCollector to handle a new ARGS_DUMP stage. Feedback identifies several critical issues: the collector currently records arena offsets instead of actual scalar values, and the JSON export is missing essential tensor metadata such as shapes and data pointers. Additionally, several leftover debug logs need to be removed, an indentation error in the a5 collector should be corrected, and the flush logic in the a2a3 executor should be updated to a loop for consistency across platforms.
| e.ndims = rec.ndims; | ||
| memcpy(e.shapes.data(), rec.shapes, sizeof(e.shapes)); | ||
| e.type = (rec.ndims == 0) ? "scalar" : "tensor"; | ||
| e.scalar_value = (rec.ndims == 0 && rec.payload_size > 0 && rec.payload_offset > 0) ? rec.payload_offset : 0; |
There was a problem hiding this comment.
The scalar_value is being assigned the payload_offset, which is the offset into the circular arena where the value was written, not the actual 8-byte scalar value. The collector needs to read the uint64_t value from the arena using this offset (handling circular wrap if necessary) to correctly capture the argument value. Additionally, per repository rules, ensure this 64-bit unsigned integer is represented as a string in the JSON output to prevent precision loss in JavaScript-based parsers.
References
- When serializing 64-bit unsigned integers to JSON, represent them as strings to prevent precision loss in JavaScript-based parsers.
| e.ndims = rec.ndims; | ||
| memcpy(e.shapes.data(), rec.shapes, sizeof(e.shapes)); | ||
| e.type = (rec.ndims == 0) ? "scalar" : "tensor"; | ||
| e.scalar_value = (rec.ndims == 0 && rec.payload_size > 0) ? rec.payload_offset : 0; |
There was a problem hiding this comment.
Similar to the a2a3 implementation, scalar_value is being set to the arena offset (payload_offset) instead of the actual value stored in the arena. This will result in incorrect values in the exported JSON. Ensure the resulting 64-bit value is serialized as a string.
References
- When serializing 64-bit unsigned integers to JSON, represent them as strings to prevent precision loss.
| json << " {\"arg_index\": " << e.arg_index << ", \"type\": \"" << e.type << "\", \"dtype\": \"" << dtype_name << "\", \"ndims\": " << static_cast<uint32_t>(e.ndims); | ||
| json << ", \"scalar_value\": " << e.scalar_value; | ||
| json << "}"; |
There was a problem hiding this comment.
The JSON export for orchestrator arguments is missing critical metadata for tensors. The PR description states that it should dump shapes and addresses, but the current implementation only exports arg_index, type, dtype, ndims, and scalar_value. For tensors, the shapes array and the data_ptr (address) should also be included in the JSON object. Note that the data_ptr (a 64-bit address) must be represented as a string to comply with repository standards.
References
- When serializing 64-bit unsigned integers (such as hardware addresses) to JSON, represent them as strings to prevent precision loss.
| #endif | ||
| { | ||
| bool en = is_dump_args_enabled(); | ||
| LOG_ERROR("ARGS_DEBUG: Thread %d is_dump_args_enabled=%d args_addr=%p", thread_idx, en, (void*)orch_args_cached_); |
| // Flush the args dump buffer so the host collector can pick it up. | ||
| { | ||
| bool en = is_dump_args_enabled(); | ||
| LOG_ERROR("ARGS_FLUSH: Thread %d enabled=%d", thread_idx, en); |
| int orch_thread = thread_num_ - 1; | ||
| if (orch_thread >= sched_thread_num_ && orch_thread < thread_num_) { | ||
| dump_tensor_flush(orch_thread); | ||
| } |
There was a problem hiding this comment.
The logic here only flushes the last thread (thread_num_ - 1), but the comment above (line 757) says it should flush all threads from sched_thread_num_ to thread_num_ - 1. If there are multiple orchestrator threads, some buffers will not be flushed. This should be a loop to maintain consistency with the implementation in other runtimes like a5.
for (int t = sched_thread_num_; t < thread_num_; t++) {
dump_tensor_flush(t);
}References
- For files shared across different runtimes, maintain identical code for consistency.
| LOG_ERROR("A5_ARGS_FLUSH2: thread=%d count=%u", thread_idx, buf->count); | ||
| uint64_t buf_addr = reinterpret_cast<uint64_t>(buf); | ||
| uint32_t seq = s_dump_states[thread_idx]->current_buf_seq; | ||
| if (enqueue_dump_ready_buffer(thread_idx, buf_addr, seq) != 0) { | ||
| int enq_rc = enqueue_dump_ready_buffer(thread_idx, buf_addr, seq); | ||
| LOG_ERROR("A5_ARGS_ENQ: thread=%d rc=%d seq=%u", thread_idx, enq_rc, seq); |
| return 0; | ||
| } | ||
| } else { | ||
| auto export_start = std::chrono::steady_clock::now(); |
| json << " {\"arg_index\": " << e.arg_index << ", \"type\": \"" << e.type << "\", \"dtype\": \"" | ||
| << dtype_name << "\", \"ndims\": " << static_cast<uint32_t>(e.ndims); | ||
| json << ", \"scalar_value\": " << e.scalar_value; | ||
| json << "}"; |
There was a problem hiding this comment.
The JSON export for a5 is also missing tensor shapes and addresses (data_ptr), which are required for the feature to be fully functional. Ensure data_ptr is serialized as a string per repository rules for 64-bit integers.
References
- When serializing 64-bit unsigned integers (such as hardware addresses) to JSON, represent them as strings to prevent precision loss.
…lars) New diagnostic flag parallel to enable_dump_tensor. When enabled, dumps orchestrator-level tensor shapes/dtype/addresses and scalar values to args_dump.json via the existing tensor dump collector SHM infrastructure. Supports both a2a3 and a5 platforms, sim and onboard, Python and C++ APIs.
…s/data_ptr, remove debug logs, fix indent
…lars)
New diagnostic flag parallel to enable_dump_tensor. When enabled, dumps orchestrator-level tensor shapes/dtype/addresses and scalar values to args_dump.json via the existing tensor dump collector SHM infrastructure.
Supports both a2a3 slim、 a5 slim platforms and a2a3 onboard, Python and C++ APIs.
Summary
Add
--dump-argsdiagnostic flag that captures orchestrator-level tensor metadata (shape, dtype, address) and scalar values intoargs_dump.jsonvia the existing tensor dump collector infrastructure.Design
TensorDumpStage::ARGS_DUMP— stage distinguishes args dump from regular tensor dumps.--dump-argsis independent of--dump-tensor; can be enabled separately.Scope
~40 files across Python CLI/bindings, common config, a2a3 and a5 device/host/runtime layers.
Verification
1 passed1 passed1 passedargs_dump.json(json.load() validated)