Custom column generators currently wrap all exceptions in CustomColumnGenerationError, which the async scheduler treats as non-retryable. This means transient failures (503s, rate limits, timeouts) cause rows to be permanently dropped instead of retried in salvage rounds.
Problem
In custom.py, the generate method catches all exceptions and wraps them:
except Exception as e:
raise CustomColumnGenerationError(...) from e
The scheduler only retries exceptions in _RETRYABLE_MODEL_ERRORS (ModelInternalServerError, ModelRateLimitError, etc.). The original error is buried as __cause__ and never checked.
Proposed fix
If the original exception is already a retryable model error, re-raise it unwrapped:
except Exception as e:
if isinstance(e, _RETRYABLE_MODEL_ERRORS):
raise
raise CustomColumnGenerationError(...) from e
This gives custom generators that call model APIs (via the models dict) the same salvage/retry behavior as built-in LLM columns, while non-model errors remain non-retryable.
Impact
- Custom generators using
model_aliases would benefit from automatic retries on transient failures
- No change for custom generators that don't interact with models
- Consistent behavior between
LLMTextColumnConfig and CustomColumnConfig when both hit the same provider errors
Custom column generators currently wrap all exceptions in
CustomColumnGenerationError, which the async scheduler treats as non-retryable. This means transient failures (503s, rate limits, timeouts) cause rows to be permanently dropped instead of retried in salvage rounds.Problem
In
custom.py, thegeneratemethod catches all exceptions and wraps them:The scheduler only retries exceptions in
_RETRYABLE_MODEL_ERRORS(ModelInternalServerError,ModelRateLimitError, etc.). The original error is buried as__cause__and never checked.Proposed fix
If the original exception is already a retryable model error, re-raise it unwrapped:
This gives custom generators that call model APIs (via the
modelsdict) the same salvage/retry behavior as built-in LLM columns, while non-model errors remain non-retryable.Impact
model_aliaseswould benefit from automatic retries on transient failuresLLMTextColumnConfigandCustomColumnConfigwhen both hit the same provider errors