WIP: Add JSON serializer for ASTs and store them upon node creation#699
WIP: Add JSON serializer for ASTs and store them upon node creation#699shangyian wants to merge 1439 commits intoDataJunction:mainfrom
Conversation
Build SQL for materialized cubes
…y-attributes Client functionality for availability + column attributes
squash alembic migrations
Add specialized exception to Python client.
…nge results in status defaulting to invalid
…atus-bug Fix node update status bug
pdm lock check doesn't run for any of the packages after the monorepo move only the root level pdm lock check (which only has pylint and pre-commit in it) actually runs. keeping only one single top-level pdm.lock file at the repo root level may work for simple development tasks (e.g. pdm sync from repo root), but any pdm functions that depend on more granular pdm.lock input (e.g. pdm export) will break when run from an individual package folder if the individual pdm.lock files in each package folder is not kept in sync as well. adding all packages back into the .pre-commit config fixes this a new dedicated Github Action step to force a pdm.lock check on every PR also has been added.
* bound dimensions * pylint can't read * filter used dimensions from used bound dimensions
…inks Add ability to remove dimension links
samredai
left a comment
There was a problem hiding this comment.
This makes sense to me, thanks @shangyian! So much good stuff in such few lines. And I'm understanding it right that this PR creates and stores the query ast (via the node validation) but doesn't actually utilize it yet in the SQL generation? It makes sense to break that out into a separate PR.
| ) | ||
| validated_node.required_dimensions = matched_bound_columns | ||
|
|
||
| validated_node.query_ast = json.loads(json.dumps(query_ast, cls=ASTEncoder)) |
There was a problem hiding this comment.
Yep, this just handles serializing and storing the ast. As I mentioned below, I might try some basic deserialization to make sure it works for query building, but I'll put the actual implementation in a separate PR :)
CircArgs
left a comment
There was a problem hiding this comment.
@shangyian a few questions taking a quick peek. These are all pretty much the same question from different directions I think since they are all some information I think are used after compilation but may be ignored during this serialization
- how are
parentandparent_keyhandled when deserializing - does this account for potential circular references like
Column<->Table - for
Tablein particular, some of the ignored attributes are only set during compilation and I think are potentially used in some build stuff, are these somehow backfilled during deserialization?
| _is_compiled: bool = False | ||
|
|
||
| @property | ||
| def json_ignore_keys(self): |
|
@shangyian this is awesome. From what you said it sounds like there still may need to be adjustments done to this code once we start deserializing and using this code? |
Python client cleanup: namespaces and test separation.
Reading on the bigger screen now... I see this is just meant to be serialization. When I was imagining this, if I had to handle potential circular stuff like in my question and your writeup, I figured maybe a flat structure like |
So right now it's handling the circular stuff by storing a
Yeah, so it sounds like I might need to take a stab at deserialization and make sure that all works with this setup. If not, a flat structure like you described will probably help! I think the case where having Table fully populated with columns will be used is when we're trying to build a query that needs one or more columns from that table to be grouped or filtered on as dimensions. |
@agorajek It's quite possible, so I'll try setting up some basic deserialization before merging just to make sure that this setup is actually enough. |
Move to Doks theme and change landing image
…r references and thus can serialize more of the AST
3cea1d7 to
b7eff1c
Compare

Summary
This PR adds a custom JSON encoder for query ASTs:
ASTEncoder. This encoder uses our own circular check so that we can short-circuit the processing of circular dependencies but not raise an error. We may want to determine what's causing these circular dependencies (it looks related toFunctionTableExpression), but that's a separate issue.This also adds a
query_astcolumn toNodeRevisionso that every time we create a node, we can store the parsed query AST alongside it. The logic for actually using this cached AST can be done separatelyTest Plan
make checkpassesmake testshows 100% unit test coverageDeployment Plan