Handle row-level failures in Jinja expression columns by dropping invalid rows

## Summary

`ExpressionColumnConfig` currently treats any per-row Jinja render failure as a full-column failure. This makes expression columns much more brittle than LLM-backed columns: one bad row can abort the whole dataset build even when the expression is valid for the rest of the batch.

Feature request: keep expression columns as full-column processing, but handle row-level render failures by dropping only the affected rows and reporting structured warning counts. If every row is dropped, fail the column as a user template/config error.

## Current behavior

In the current sync engine path, `ExpressionColumnGenerator.generate(...)` renders the expression row by row inside a full-column generator. If one row renders to empty text, `UserTemplateSandboxEnvironment.validate_rendered_text(...)` raises `UserTemplateError("User template renders to empty text.")`. That exception escapes the generator and `ColumnWiseBuilder._run_batch(...)` reports the entire expression column as failed.

This differs from LLM-backed columns, which use cell-by-cell generation. Their worker error callback marks only the failing record for omission and continues processing the remaining rows.

## Concrete example

A workflow generated math problems, reviewed each generated row with an `llm-structured` column, and then projected the reviewed answer into a required output field:

```python
ExpressionColumnConfig(
    name="output",
    expr="{{ review.canonical_answer }}",
    dtype="str",
)
```

The upstream review column produced valid structured objects, but a small number of generated examples were intentionally judged invalid or unsolvable by the review model. For those rows, the structured review contained an empty canonical answer:

```json
{
  "review": {
    "is_valid": false,
    "canonical_answer": "",
    "issue": "Under-determined problem; multiple solutions exist."
  }
}
```

Observed outcome in a 1024-row generation:

- 1010 rows had a non-empty `review.canonical_answer`
- 14 rows had `review.canonical_answer == ""`
- the expression column failed on the first empty render
- the job exited nonzero before writing the final `output` column
- downstream consumers saw a partial parquet shard without the required `output` column and had to treat the whole generation as failed

The relevant log looked like:

```text
[INFO] Generating column `output` from expression
UserTemplateError: User template renders to empty text.
DatasetGenerationError: Failed to process column 'output':
User provided prompt generation template is invalid.
```

From the user's perspective this is surprising because a tiny number of bad model-generated rows can invalidate the entire expression column, even though LLM generation failures elsewhere are handled as per-record drops.

## Requested behavior

For expression columns, keep full-column processing, but introduce row-level error handling during expression rendering:

1. Render the expression for each row as today.
2. If the rendered value is `None`, empty, or whitespace-only, drop that row instead of failing the whole column.
3. If rendering raises a row-specific error, drop that row instead of failing the whole column.
4. Track dropped rows by error category.
5. Log a warning that includes the column name, total dropped count, total input rows, and a breakdown by error type.
6. If all rows are dropped, raise a `UserTemplateError` so clearly broken expressions still fail loudly.

Example warning for partial drops:

```text
[WARNING] Expression column 'output' dropped 14/1024 rows after render: EmptyRenderedExpression=14. Continuing with 1010 rows.
```

Example all-dropped failure:

```text
[ERROR] Expression column 'output' dropped 1024/1024 rows after render: EmptyRenderedExpression=1024.
UserTemplateError: Expression column 'output' produced no valid rows.
```

Suggested error categories could include names like:

- `EmptyRenderedExpression` for `None`, empty, or whitespace-only render results
- `TemplateRenderError` for row-specific Jinja rendering exceptions
- `TypeCastError` for failures converting the rendered value to the configured dtype

Static configuration errors should still fail immediately. For example, missing required columns, invalid Jinja syntax, unsupported template operations, or an invalid expression dtype should remain full-column/user-template failures before row-level processing begins.

## Impact on existing users

Positive impact:

- Makes expression columns consistent with model-backed columns when failures are caused by individual records.
- Prevents large generation jobs from being invalidated by a few stochastic upstream outputs.
- Improves robustness for common patterns where expression columns project or normalize fields produced by LLM columns, validators, or judges.
- Preserves visibility into quality problems through warning counts and downstream row counts/yield metrics.

Behavior change to be aware of:

- A workflow that currently fails on the first empty expression render would instead complete with fewer rows.
- This could mask some user mistakes if users ignore warnings. The all-dropped case should still fail, and static template/config errors should still fail before row processing.
- If maintainers want a transition path, this could be exposed through a run/config flag, but the default row-drop behavior would better match existing LLM cell failure semantics.

## Why this matters

Expression columns are often deterministic only in syntax. In real pipelines, they frequently depend on stochastic upstream LLM-generated fields. Treating every render failure as a global configuration error is too strict for that usage pattern. Dropping only the affected rows gives users the same resilience they already get from LLM columns while retaining fail-fast behavior for truly broken expressions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle row-level failures in Jinja expression columns by dropping invalid rows #749

Summary

Current behavior

Concrete example

Requested behavior

Impact on existing users

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Handle row-level failures in Jinja expression columns by dropping invalid rows #749

Description

Summary

Current behavior

Concrete example

Requested behavior

Impact on existing users

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions