Bug
In the async engine (DATA_DESIGNER_ASYNC_ENGINE=1), AsyncTaskScheduler._run_cell only writes columns tracked in _instance_to_columns back to the RowGroupBufferManager. Side-effect columns produced by generators (e.g. __trace from with_trace, __reasoning_content from extract_reasoning_content) are present in the result dict but are silently dropped during the buffer write-back.
When a downstream column references a side-effect column in its prompt template, the value is missing from the row buffer, causing a template rendering error:
The following ['<column>__reasoning_content'] columns are missing!
All rows for that downstream column fail as non-retryable, and the entire dataset generation fails.
Root Cause
_instance_to_columns is built from the generators dict which only maps primary column names to generator instances. Side-effect columns are not registered. The buffer write loop at _run_cell line 796-799 iterates only over output_cols from this map, so any extra keys in the result dict are never written to the buffer.
The same issue exists in _run_batch for batch generators.
Impact
Any pipeline using extract_reasoning_content=True or with_trace != TraceType.NONE where a downstream column references the side-effect column will fail under the async engine. The sync engine is unaffected because it mutates the row dict in place.
Fix
After writing tracked output_cols, also persist any new keys from the result dict (keys not present in the input row_data) to the buffer. Apply the same pattern to _run_batch.
Affected files
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py
Bug
In the async engine (
DATA_DESIGNER_ASYNC_ENGINE=1),AsyncTaskScheduler._run_cellonly writes columns tracked in_instance_to_columnsback to theRowGroupBufferManager. Side-effect columns produced by generators (e.g.__tracefromwith_trace,__reasoning_contentfromextract_reasoning_content) are present in the result dict but are silently dropped during the buffer write-back.When a downstream column references a side-effect column in its prompt template, the value is missing from the row buffer, causing a template rendering error:
All rows for that downstream column fail as non-retryable, and the entire dataset generation fails.
Root Cause
_instance_to_columnsis built from thegeneratorsdict which only maps primary column names to generator instances. Side-effect columns are not registered. The buffer write loop at_run_cellline 796-799 iterates only overoutput_colsfrom this map, so any extra keys in the result dict are never written to the buffer.The same issue exists in
_run_batchfor batch generators.Impact
Any pipeline using
extract_reasoning_content=Trueorwith_trace != TraceType.NONEwhere a downstream column references the side-effect column will fail under the async engine. The sync engine is unaffected because it mutates the row dict in place.Fix
After writing tracked
output_cols, also persist any new keys from the result dict (keys not present in the inputrow_data) to the buffer. Apply the same pattern to_run_batch.Affected files
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py