feat: make LiteralPredicate serializable via internal IcebergBaseModel#2561
feat: make LiteralPredicate serializable via internal IcebergBaseModel#2561Fokko merged 9 commits intoapache:mainfrom
Conversation
|
In the issue #2523 it is said to derive the class from |
|
I have now marked it as a Draft since I am not sure now that is the kind of implementation you want. Tests are still passing now using LiteralPredicate as subclass of IcebergBaseModel, but had to make |
|
Something fishy that I had to pull in order for tests to pass was putting the attribute The problem is that the earlier implementation was calling def _to_unbound_term(term: Union[str, UnboundTerm[Any]]) -> UnboundTerm[Any]:
return Reference(term) if isinstance(term, str) else termIf term is not _______________________________________________ test_not_equal_to_invert _______________________________________________
def test_not_equal_to_invert() -> None:
> bound = NotEqualTo(
term=BoundReference( # type: ignore
field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
accessor=Accessor(position=0, inner=None),
),
literal="hello",
)Should we address this in this PR or delegate it to another issue? |
That's a bit of an edge case, since you deliberately ignore the type annotation. We could add a check in the function itself: def _to_unbound_term(term: Union[str, UnboundTerm[Any]]) -> UnboundTerm[Any]:
if isinstance(term, str),
return Reference(term)
elif isinstance(term, UnboundTerm):
return term
else:
raise ValueError(f"Expected UnboundTerm or str, but got: {term}")
return Reference(term) if isinstance(term, str) else term |
I do not ignore the type annotation, it is the current implementation and a current test that is ignoring the annotation. I am trying to just implement the issue requirement and since now the types are checked in runtime by pydantic the (old) test is not passing. |
00bc5db to
5118748
Compare
pyiceberg/expressions/__init__.py
Outdated
| def __init__(self, *args: Any, **kwargs: Any) -> None: | ||
| if args: | ||
| if len(args) != 2: | ||
| raise TypeError("Expected (term, literal)") | ||
| kwargs = {"term": args[0], "literal": args[1], **kwargs} | ||
| super().__init__(**kwargs) |
There was a problem hiding this comment.
After having many issues with an init such as:
def __init__(self, term: Union[str, UnboundTerm[Any]], literals: Union[Iterable[L], Iterable[Literal[L]]]):
super().__init__(term=_to_unbound_term(term), items=_to_literal_set(literals))Because there are some typing errors with _transform_literal in pyiceberg/transforms.py for example:
pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[str | None], str | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[bool | None], bool | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[int | None], int | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[float | None], float | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[bytes | None], bytes | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[UUID | None], UUID | None]"; expected "Callable[[str], str]" [arg-type]
I decided to just go for this implementation of init. The problem now is that:
assert_type(EqualTo("a", "b"), EqualTo[str]) # <-- Fails
------
tests/expressions/test_expressions.py:1238: error: Expression is of type "LiteralPredicate[L]", not "EqualTo[str]" [assert-type]So I am really stuck, would you mind lending a hand here? @Fokko
There was a problem hiding this comment.
Always! So, I think the linter isn't really sure what to do. It is pretty clear:
In the signature, we see that transform accepts an optional, which I think is correct. However, _transform_literal requires non-null, which is incorrect.
def _truncate_number(
name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]]
) -> Optional[UnboundPredicate[Any]]:
boundary = pred.literal
if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)):
raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
if isinstance(pred, BoundLessThan):
return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
elif isinstance(pred, BoundLessThanOrEqual):The following change suppresses most of the warnings for me:
-def _transform_literal(func: Callable[[L], L], lit: Literal[L]) -> Literal[L]:
+def _transform_literal(func: Callable[[Any], Any], lit: Literal[L]) -> Literal[L]:
"""Small helper to upwrap the value from the literal, and wrap it again."""
return literal(func(lit.value))
pyiceberg/expressions/__init__.py
Outdated
| class LiteralPredicate(IcebergBaseModel, UnboundPredicate[L], ABC): | ||
| type: TypingLiteral["lt", "lt-eq", "gt", "gt-eq", "eq", "not-eq", "starts-with", "not-starts-with"] = Field(alias="type") | ||
| term: UnboundTerm[L] | ||
| literal: Literal[L] = Field(serialization_alias="value") |
There was a problem hiding this comment.
| literal: Literal[L] = Field(serialization_alias="value") | |
| value: Literal[L] = Field() |
pyiceberg/expressions/__init__.py
Outdated
| @field_validator("term", mode="before") | ||
| @classmethod | ||
| def _coerce_term(cls, v: Any) -> UnboundTerm[Any]: | ||
| return _to_unbound_term(v) | ||
|
|
||
| def __init__(self, term: Union[str, UnboundTerm[Any]], literal: Union[L, Literal[L]]): # pylint: disable=W0621 | ||
| super().__init__(term) | ||
| self.literal = _to_literal(literal) # pylint: disable=W0621 | ||
| @field_validator("literal", mode="before") | ||
| @classmethod | ||
| def _coerce_literal(cls, v: Union[L, Literal[L]]) -> Literal[L]: | ||
| return _to_literal(v) | ||
|
|
||
| @field_serializer("literal") | ||
| def ser_literal(self, literal: Literal[L]) -> str: | ||
| return "Any" |
There was a problem hiding this comment.
If we just call the field value, and we add a literal property:
@property
def literal(self) -> Literal[L]:
return self.value
pyiceberg/expressions/__init__.py
Outdated
| def __init__(self, *args: Any, **kwargs: Any) -> None: | ||
| if args: | ||
| if len(args) != 2: | ||
| raise TypeError("Expected (term, literal)") | ||
| kwargs = {"term": args[0], "literal": args[1], **kwargs} | ||
| super().__init__(**kwargs) |
There was a problem hiding this comment.
Always! So, I think the linter isn't really sure what to do. It is pretty clear:
In the signature, we see that transform accepts an optional, which I think is correct. However, _transform_literal requires non-null, which is incorrect.
def _truncate_number(
name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]]
) -> Optional[UnboundPredicate[Any]]:
boundary = pred.literal
if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)):
raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
if isinstance(pred, BoundLessThan):
return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
elif isinstance(pred, BoundLessThanOrEqual):The following change suppresses most of the warnings for me:
-def _transform_literal(func: Callable[[L], L], lit: Literal[L]) -> Literal[L]:
+def _transform_literal(func: Callable[[Any], Any], lit: Literal[L]) -> Literal[L]:
"""Small helper to upwrap the value from the literal, and wrap it again."""
return literal(func(lit.value))
pyiceberg/expressions/__init__.py
Outdated
| value: Literal[L] = Field(alias="literal", serialization_alias="value") | ||
|
|
||
| model_config = ConfigDict(populate_by_name=True, frozen=True, arbitrary_types_allowed=True) | ||
|
|
||
| def __init__( | ||
| self, | ||
| term: Union[str, UnboundTerm[Any], BoundReference[Any]], | ||
| literal: Union[L, Literal[L], None] = None, | ||
| **data: Any, | ||
| ) -> None: # pylint: disable=W0621 | ||
| extra = dict(data) | ||
|
|
||
| literal_candidates = [] | ||
| if literal is not None: | ||
| literal_candidates.append(literal) | ||
| if "literal" in extra: | ||
| literal_candidates.append(extra.pop("literal")) | ||
| if "value" in extra: | ||
| literal_candidates.append(extra.pop("value")) | ||
|
|
||
| literal_candidates = [candidate for candidate in literal_candidates if candidate is not None] | ||
|
|
||
| if not literal_candidates: | ||
| raise TypeError("LiteralPredicate requires a literal or value argument") | ||
| if len(literal_candidates) > 1: | ||
| raise TypeError("literal/value provided multiple times") | ||
|
|
||
| init = cast("Callable[..., None]", IcebergBaseModel.__init__) | ||
| init(self, term=_to_unbound_term(term), literal=_to_literal(literal_candidates[0]), **extra) | ||
|
|
||
| @field_validator("term", mode="before") | ||
| @classmethod | ||
| def _convert_term(cls, value: Any) -> UnboundTerm[Any]: | ||
| return _to_unbound_term(value) | ||
|
|
||
| @field_validator("value", mode="before") | ||
| @classmethod | ||
| def _convert_value(cls, value: Any) -> Literal[Any]: | ||
| return _to_literal(value) |
There was a problem hiding this comment.
I'm able to get the test passing with:
| value: Literal[L] = Field(alias="literal", serialization_alias="value") | |
| model_config = ConfigDict(populate_by_name=True, frozen=True, arbitrary_types_allowed=True) | |
| def __init__( | |
| self, | |
| term: Union[str, UnboundTerm[Any], BoundReference[Any]], | |
| literal: Union[L, Literal[L], None] = None, | |
| **data: Any, | |
| ) -> None: # pylint: disable=W0621 | |
| extra = dict(data) | |
| literal_candidates = [] | |
| if literal is not None: | |
| literal_candidates.append(literal) | |
| if "literal" in extra: | |
| literal_candidates.append(extra.pop("literal")) | |
| if "value" in extra: | |
| literal_candidates.append(extra.pop("value")) | |
| literal_candidates = [candidate for candidate in literal_candidates if candidate is not None] | |
| if not literal_candidates: | |
| raise TypeError("LiteralPredicate requires a literal or value argument") | |
| if len(literal_candidates) > 1: | |
| raise TypeError("literal/value provided multiple times") | |
| init = cast("Callable[..., None]", IcebergBaseModel.__init__) | |
| init(self, term=_to_unbound_term(term), literal=_to_literal(literal_candidates[0]), **extra) | |
| @field_validator("term", mode="before") | |
| @classmethod | |
| def _convert_term(cls, value: Any) -> UnboundTerm[Any]: | |
| return _to_unbound_term(value) | |
| @field_validator("value", mode="before") | |
| @classmethod | |
| def _convert_value(cls, value: Any) -> Literal[Any]: | |
| return _to_literal(value) | |
| value: Literal[L] = Field() | |
| model_config = ConfigDict(arbitrary_types_allowed=True) | |
| def __init__(self, term: Union[str, UnboundTerm[Any]], literal: Union[L, Literal[L]]): # pylint: disable=W0621 | |
| super().__init__(term=_to_unbound_term(term), value=_to_literal(literal)) # pylint: disable=W0621 |
Less is more :)
There was a problem hiding this comment.
Linter is not happy with that:
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1
pyiceberg/expressions/__init__.py:436: note: "__init__" of "UnboundPredicate" defined here
pyiceberg/expressions/__init__.py:754: error: Unexpected keyword argument "value" for "__init__" of "UnboundPredicate" [call-arg]
tests/expressions/test_expressions.py:433: error: Argument "term" to "NotEqualTo" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
tests/expressions/test_expressions.py:440: error: Argument "term" to "EqualTo" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
tests/expressions/test_expressions.py:450: error: Argument "term" to "GreaterThanOrEqual" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
tests/expressions/test_expressions.py:457: error: Argument "term" to "LessThan" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
tests/expressions/test_expressions.py:467: error: Argument "term" to "LessThanOrEqual" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
tests/expressions/test_expressions.py:474: error: Argument "term" to "GreaterThan" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
Found 7 errors in 2 files (checked 167 source files)```
There was a problem hiding this comment.
This error im not sure how to tackle it:
tests/expressions/test_expressions.py:433: error: Argument "term" to "NotEqualTo" has incompatible type "BoundReference[str]"; expected "str | UnboundTerm[Any]" [arg-type]
The test errors can be easily fixed by:
| value: Literal[L] = Field(alias="literal", serialization_alias="value") | |
| model_config = ConfigDict(populate_by_name=True, frozen=True, arbitrary_types_allowed=True) | |
| def __init__( | |
| self, | |
| term: Union[str, UnboundTerm[Any], BoundReference[Any]], | |
| literal: Union[L, Literal[L], None] = None, | |
| **data: Any, | |
| ) -> None: # pylint: disable=W0621 | |
| extra = dict(data) | |
| literal_candidates = [] | |
| if literal is not None: | |
| literal_candidates.append(literal) | |
| if "literal" in extra: | |
| literal_candidates.append(extra.pop("literal")) | |
| if "value" in extra: | |
| literal_candidates.append(extra.pop("value")) | |
| literal_candidates = [candidate for candidate in literal_candidates if candidate is not None] | |
| if not literal_candidates: | |
| raise TypeError("LiteralPredicate requires a literal or value argument") | |
| if len(literal_candidates) > 1: | |
| raise TypeError("literal/value provided multiple times") | |
| init = cast("Callable[..., None]", IcebergBaseModel.__init__) | |
| init(self, term=_to_unbound_term(term), literal=_to_literal(literal_candidates[0]), **extra) | |
| @field_validator("term", mode="before") | |
| @classmethod | |
| def _convert_term(cls, value: Any) -> UnboundTerm[Any]: | |
| return _to_unbound_term(value) | |
| @field_validator("value", mode="before") | |
| @classmethod | |
| def _convert_value(cls, value: Any) -> Literal[Any]: | |
| return _to_literal(value) | |
| value: Literal[L] = Field() | |
| model_config = ConfigDict(populate_by_name=True, frozen=True, arbitrary_types_allowed=True) | |
| def __init__(self, term: Union[str, UnboundTerm[Any], BoundReference[Any]], literal: Union[L, Literal[L]]): # pylint: disable=W0621 | |
| super().__init__(term=_to_unbound_term(term), value=_to_literal(literal)) # pylint: disable=W0621 |
There was a problem hiding this comment.
For the mypy error under tests/**, we can add a # type: ignore for each line
There was a problem hiding this comment.
I'm not sure by the fix, typically you would not pass in a BoundReference to a LiteralPredicate, which is implicitly unbound.
There was a problem hiding this comment.
If we are testing with BoundedReferences why not make the class allow to accept them?
There was a problem hiding this comment.
I'm not sure by the fix, typically you would not pass in a
BoundReferenceto aLiteralPredicate, which is implicitly unbound.
I mean, I get that and I still don't know why those tests exist, but if they are there, then I suppose LiteralPredicates where suppose to accept them, right?
There was a problem hiding this comment.
In order to supress the mypy call-arg error, we can just use # type: ignore[call-arg], is it okey for you?
pyiceberg/expressions/__init__.py
Outdated
| def _to_unbound_term(term: Union[str, UnboundTerm[Any], BoundReference[Any]]) -> UnboundTerm[Any]: | ||
| if isinstance(term, str): | ||
| return Reference(term) | ||
| if isinstance(term, UnboundTerm): | ||
| return term | ||
| if isinstance(term, BoundReference): | ||
| return Reference(term.field.name) | ||
| raise ValueError(f"Expected UnboundTerm | BoundReference | str, got {type(term).__name__}") |
There was a problem hiding this comment.
There were some faulty tests: #2632
| def _to_unbound_term(term: Union[str, UnboundTerm[Any], BoundReference[Any]]) -> UnboundTerm[Any]: | |
| if isinstance(term, str): | |
| return Reference(term) | |
| if isinstance(term, UnboundTerm): | |
| return term | |
| if isinstance(term, BoundReference): | |
| return Reference(term.field.name) | |
| raise ValueError(f"Expected UnboundTerm | BoundReference | str, got {type(term).__name__}") | |
| def _to_unbound_term(term: Union[str, UnboundTerm[Any]]) -> UnboundTerm[Any]: | |
| return Reference(term) if isinstance(term, str) else term |
There was a problem hiding this comment.
That PR makes sense, thanks for doing it. Now the LiteralPredicate makes more sense to me.
pyiceberg/expressions/__init__.py
Outdated
| def __init__(self, term: Union[str, UnboundTerm[Any]], literal: Union[L, Literal[L]]): # pylint: disable=W0621 | ||
| super().__init__(term) | ||
| self.literal = _to_literal(literal) # pylint: disable=W0621 | ||
| def __init__(self, term: Union[str, UnboundTerm[Any], BoundReference[Any]], literal: Union[L, Literal[L]]): |
There was a problem hiding this comment.
| def __init__(self, term: Union[str, UnboundTerm[Any], BoundReference[Any]], literal: Union[L, Literal[L]]): | |
| def __init__(self, term: Union[str, UnboundTerm[Any]], literal: Union[L, Literal[L]]): |
cccb370 to
1a3e702
Compare
|
Rebased to have the new tests from #2632 and implemented the suggested changes. |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! I left a few nit comments, please take a look!
| class LiteralPredicate(UnboundPredicate[L], ABC): | ||
| literal: Literal[L] | ||
| class LiteralPredicate(IcebergBaseModel, UnboundPredicate[L], ABC): | ||
| type: TypingLiteral["lt", "lt-eq", "gt", "gt-eq", "eq", "not-eq", "starts-with", "not-starts-with"] = Field(alias="type") |
There was a problem hiding this comment.
👍
this matches the LiteralExpression definition
There was a problem hiding this comment.
nit: could you include this in the PR description so its easily referenced in the future?
|
|
||
|
|
||
| def _transform_literal(func: Callable[[L], L], lit: Literal[L]) -> Literal[L]: | ||
| def _transform_literal(func: Callable[[Any], Any], lit: Literal[L]) -> Literal[L]: |
There was a problem hiding this comment.
nit: is this change relevant? i dont see _transform_literal used anywhere
There was a problem hiding this comment.
In order to silence this mypy errors:
- hook id: mypy
- exit code: 1
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[str | None], str | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[bool | None], bool | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[int | None], int | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[float | None], float | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[bytes | None], bytes | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[UUID | None], UUID | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[Decimal | None], Decimal | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[datetime | None], datetime | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[date | None], date | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1049: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[time | None], time | None]"; expected "Callable[[str], str]" [arg-type]
pyiceberg/transforms.py:1051: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[str | None], str | None]"; expected "Callable[[str], str]" [arg-type]```
| assert_type(In("a", ("a", "b", "c")), In[str]) | ||
| assert_type(In("a", (1, 2, 3)), In[int]) | ||
| assert_type(NotIn("a", ("a", "b", "c")), NotIn[str]) |
There was a problem hiding this comment.
nit: should we use _assert_literal_predicate_type for these too?
Fokko
left a comment
There was a problem hiding this comment.
Looks good, thanks @jaimeferj for working on this, and @kevinjqliu for the review 🚀
Closes #2523
Rationale for this change
Spec alignment
LiteralPredicate.typeuses the same enum as the REST OpenAPILiteralExpression.type:"lt" | "lt-eq" | "gt" | "gt-eq" | "eq" | "not-eq" | "starts-with" | "not-starts-with".Source: OpenAPI spec (LiteralExpression).
Ref: https://github.com/apache/iceberg/blob/b987e60bbd581d6e9e583107d5a85022261ff0d8/open-api/rest-catalog-open-api.yaml#L2264
Are these changes tested?
yes
Are there any user-facing changes?