Fix TableScan.update to exclude cached properties#2178
Fix TableScan.update to exclude cached properties#2178kevinjqliu merged 6 commits intoapache:mainfrom
TableScan.update to exclude cached properties#2178Conversation
pyiceberg/table/__init__.py
Outdated
| return type(self)(**{**self.__dict__, **overrides}) | ||
| data = {**self.__dict__, **overrides} | ||
|
|
||
| # Cached properties are also stored in the __dict__, so must be removed |
There was a problem hiding this comment.
Note that normal methods annotated with property (not cached_property) aren't stored in dict.
There was a problem hiding this comment.
I don't think this is the nicest solution, though it feels like a minimally viable one
There was a problem hiding this comment.
See #2178 (comment), I'm no longer convinced
|
you could add |
Yes, I think you're right. I was a bit hesitant to change the constructor, and also technically subclasses of it would need to take |
|
Maybe its fine to duplicate the update method in the subclasses and leave a comment about why its there for the cached properties? |
Yeah I think maybe we should just make a larger change here and having subclasses implementing updates sounds good. Will need to think about how to go about that though I realised that the current PR (that removes only cached properties) doesn't guard against the case where a subclass has a private member. |
|
interesting! good catch. I think implicitly the what if we just filter for only the constructor params? wdyt? |
Yes, I thought about this and think it's a good idea! We achieve something similar to that too with @jayceslesar's Was also toying with a polymorphic solution (#2199) that doesn't assume that |
|
After some thought, we can maybe narrow down to a few approaches:
@kevinjqliu @jayceslesar, happy to take suggestions / hear thoughts! The first three work; I'm leaning towards (1) currently. |
| params = signature(type(self).__init__).parameters.keys() - {"self"} # Skip "self" parameter | ||
| kwargs = {param: getattr(self, param) for param in params} # Assume parameters are attributes |
There was a problem hiding this comment.
@kevinjqliu, I wrote things slightly differently to #2178 (comment), LMKWYT of this.
Preferred getattr over self.__dict__ since it feels less low-level.
There was a problem hiding this comment.
LGTM! this is great. lets add some comments to explain why we're doing this
|
Thanks for the thoughtful explanation @jayceslesar I prefer option 1 too. We're making the intent of Option 2 introduces some maintenance burden; we'd have to remember updating |
kevinjqliu
left a comment
There was a problem hiding this comment.
Im in favor of option 1, the current approach
Looks like theres a merge conflict too
| params = signature(type(self).__init__).parameters.keys() - {"self"} # Skip "self" parameter | ||
| kwargs = {param: getattr(self, param) for param in params} # Assume parameters are attributes |
There was a problem hiding this comment.
LGTM! this is great. lets add some comments to explain why we're doing this
|
@kevinjqliu, merged and added comment! Are we good to merge? |
TableScan.update to exclude cached properties
|
Thanks for adding a fix this bug @smaheshwar-pltr. And thanks for the great discussion and review @jayceslesar |
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Closes apache#2179.
# Are these changes tested?
Yes.
# Are there any user-facing changes?
Yes, the scenario shown in the test and described in the issue now
works.
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Sreesh Maheshwar <smaheshwar@palantir.com>
Rationale for this change
Closes #2179.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, the scenario shown in the test and described in the issue now works.