WIP: Add JSON serializer for ASTs and store them upon node creation by shangyian · Pull Request #699 · DataJunction/dj

shangyian · 2023-08-07T16:23:18Z

Summary

This PR adds a custom JSON encoder for query ASTs: ASTEncoder. This encoder uses our own circular check so that we can short-circuit the processing of circular dependencies but not raise an error. We may want to determine what's causing these circular dependencies (it looks related to FunctionTableExpression), but that's a separate issue.

This also adds a query_ast column to NodeRevision so that every time we create a node, we can store the parsed query AST alongside it. The logic for actually using this cached AST can be done separately

Test Plan

PR has an associated issue: AST Serialization for improving Build performance #688
make check passes
make test shows 100% unit test coverage

Deployment Plan

…tes on a node

…lumn attributes

Build SQL for materialized cubes

…y-attributes Client functionality for availability + column attributes

squash alembic migrations

Add specialized exception to Python client.

…nge results in status defaulting to invalid

…atus-bug Fix node update status bug

pdm lock check doesn't run for any of the packages after the monorepo move only the root level pdm lock check (which only has pylint and pre-commit in it) actually runs. keeping only one single top-level pdm.lock file at the repo root level may work for simple development tasks (e.g. pdm sync from repo root), but any pdm functions that depend on more granular pdm.lock input (e.g. pdm export) will break when run from an individual package folder if the individual pdm.lock files in each package folder is not kept in sync as well. adding all packages back into the .pre-commit config fixes this a new dedicated Github Action step to force a pdm.lock check on every PR also has been added.

* bound dimensions * pylint can't read * filter used dimensions from used bound dimensions

…inks Add ability to remove dimension links

samredai

This makes sense to me, thanks @shangyian! So much good stuff in such few lines. And I'm understanding it right that this PR creates and stores the query ast (via the node validation) but doesn't actually utilize it yet in the SQL generation? It makes sense to break that out into a separate PR.

samredai · 2023-08-07T20:58:12Z

datajunction-server/datajunction_server/api/helpers.py

    )
    validated_node.required_dimensions = matched_bound_columns
-
+    validated_node.query_ast = json.loads(json.dumps(query_ast, cls=ASTEncoder))


Yep, this just handles serializing and storing the ast. As I mentioned below, I might try some basic deserialization to make sure it works for query building, but I'll put the actual implementation in a separate PR :)

path

CircArgs

@shangyian a few questions taking a quick peek. These are all pretty much the same question from different directions I think since they are all some information I think are used after compilation but may be ignored during this serialization

how are parent and parent_key handled when deserializing
does this account for potential circular references like Column <-> Table
for Table in particular, some of the ignored attributes are only set during compilation and I think are potentially used in some build stuff, are these somehow backfilled during deserialization?

CircArgs · 2023-08-07T21:26:38Z

datajunction-server/datajunction_server/sql/parsing/ast.py

    _is_compiled: bool = False

+    @property
+    def json_ignore_keys(self):


I like this pattern 🙂

agorajek · 2023-08-07T21:48:14Z

@shangyian this is awesome. From what you said it sounds like there still may need to be adjustments done to this code once we start deserializing and using this code?

Python client cleanup: namespaces and test separation.

CircArgs · 2023-08-07T22:33:10Z

@shangyian a few questions taking a quick peek. These are all pretty much the same question from different directions I think since they are all some information I think are used after compilation but may be ignored during this serialization

how are parent and parent_key handled when deserializing

does this account for potential circular references like Column <-> Table

for Table in particular, some of the ignored attributes are only set during compilation and I think are potentially used in some build stuff, are these somehow backfilled during deserialization?

Reading on the bigger screen now...

I see this is just meant to be serialization. When I was imagining this, if I had to handle potential circular stuff like in my question and your writeup, I figured maybe a flat structure like {hash(node): node_data} could work.

shangyian · 2023-08-07T23:10:12Z

@CircArgs -

does this account for potential circular references like Column <-> Table

So right now it's handling the circular stuff by storing a _processed set and then just stopping the continued serialization when it comes across an AST entity that's already in _processed. This might be an issue if it turns out that we do need at least one layer of serialized circular entities.

for Table in particular, some of the ignored attributes are only set during compilation and I think are potentially used in some build stuff, are these somehow backfilled during deserialization?

Yeah, so it sounds like I might need to take a stab at deserialization and make sure that all works with this setup. If not, a flat structure like you described will probably help! I think the case where having Table fully populated with columns will be used is when we're trying to build a query that needs one or more columns from that table to be grouped or filtered on as dimensions.

shangyian · 2023-08-07T23:15:34Z

From what you said it sounds like there still may need to be adjustments done to this code once we start deserializing and using this code?

@agorajek It's quite possible, so I'll try setting up some basic deserialization before merging just to make sure that this setup is actually enough.

Move to Doks theme and change landing image

…r references and thus can serialize more of the AST

shangyian and others added 30 commits July 6, 2023 08:49

Add functionality to build queries for materialized cube nodes.

65fa1a8

Add tests for building queries for materialized cubes

0b27d6f

Add client functionality to set availability state and column attribu…

2061f27

…tes on a node

Fix tests

2a48a87

Add materialized check to find existing cube function

2320251

Adding integration tests for client availability state posting and co…

7ea0ed6

…lumn attributes

Merge pull request DataJunction#608 from shangyian/build-cubes

9ae4d2d

Build SQL for materialized cubes

Merge pull request DataJunction#609 from shangyian/client-availabilit…

5f58597

…y-attributes Client functionality for availability + column attributes

squash alembic migrations

9c41536

Migrate demo.db based on new migration file

3dd42a6

Remove top-level dj.demo.db

6a4b515

Merge pull request DataJunction#613 from shangyian/squash-alembic

edfb964

squash alembic migrations

Add specialized exception to Python client.

fd15b5a

Merge pull request DataJunction#614 from DataJunction/client-exceptions

bc82bf3

Add specialized exception to Python client.

Pin pydantic<2 as we're not compatible with the latest release

edfc649

Merge pull request DataJunction#615 from shangyian/pin-pydantic-client

70b9e71

Fix node update status bug, where updating a node without a query cha…

d6c7c46

…nge results in status defaulting to invalid

Merge pull request DataJunction#621 from shangyian/fix-node-update-st…

da3fd37

…atus-bug Fix node update status bug

add /nodes/{name}/validate/ endpoint (DataJunction#619)

8214d6d

Add ability to remove dimension links

bc8045c

Add tests for deleting dimension links

6c23e7a

Lint

076194b

linters

3aaedff

Add tests for client

aa8c8ff

Clean up tests

ef470cc

bound dimensions on metric nodes (DataJunction#622)

0d4f544

* bound dimensions * pylint can't read * filter used dimensions from used bound dimensions

Cleaning up API based on comments

e864a87

Fix protected access

620c58f

Merge pull request DataJunction#623 from shangyian/remove-dimension-l…

a22deda

…inks Add ability to remove dimension links

shangyian added 8 commits August 7, 2023 12:10

Remove extraneous assets

bed20f4

Add netlify config

c79692a

Add netlify config

0c7d0dd

Fix theme choice

3084b66

Change deployment settings

4906a23

Tweak params

3f41535

Disable postcss

2050a8c

Change collapsible sidebar

cf80a00

samredai approved these changes Aug 7, 2023

View reviewed changes

shangyian added 3 commits August 7, 2023 14:14

Switch to svg file

564ae11

Revert

27ac8be

Fix image from svg file

f27153a

path

CircArgs reviewed Aug 7, 2023

View reviewed changes

agorajek and others added 2 commits August 7, 2023 14:50

Merge pull request DataJunction#695 from DataJunction/issue-630

e82dbef

Python client cleanup: namespaces and test separation.

Override css

ba072ff

shangyian added 5 commits August 7, 2023 16:33

Rename shortcodes to custom

6796c69

Merge pull request DataJunction#694 from shangyian/docs-updates

a672863

Move to Doks theme and change landing image

Add JSON serializer for query ASTs and store them upon node creation

a91f459

Fix lint

96e9bf7

Update json serializer so that we automatically short-circuit circula…

b7eff1c

…r references and thus can serialize more of the AST

shangyian force-pushed the json-serialize-ast branch from 3cea1d7 to b7eff1c Compare August 9, 2023 16:07

shangyian added 2 commits August 9, 2023 09:09

Undo sql test changes

a18c3a8

Add json deserialization and incorporate into query building

780f793

shangyian changed the title ~~Add JSON serializer for ASTs and store them upon node creation~~ WIP: Add JSON serializer for ASTs and store them upon node creation Aug 9, 2023

shangyian mentioned this pull request Aug 18, 2023

TBD: Ast serde #727

Open

3 tasks

shangyian force-pushed the main branch from 52c88f0 to 5e6a05f Compare April 1, 2026 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add JSON serializer for ASTs and store them upon node creation#699

WIP: Add JSON serializer for ASTs and store them upon node creation#699
shangyian wants to merge 1439 commits intoDataJunction:mainfrom
shangyian:json-serialize-ast

shangyian commented Aug 7, 2023 •

edited

Loading

Uh oh!

samredai left a comment

Uh oh!

samredai Aug 7, 2023

Uh oh!

shangyian Aug 7, 2023

Uh oh!

CircArgs left a comment

Uh oh!

CircArgs Aug 7, 2023

Uh oh!

agorajek commented Aug 7, 2023

Uh oh!

CircArgs commented Aug 7, 2023

Uh oh!

shangyian commented Aug 7, 2023

Uh oh!

shangyian commented Aug 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

shangyian commented Aug 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Deployment Plan

Uh oh!

samredai left a comment

Choose a reason for hiding this comment

Uh oh!

samredai Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

shangyian Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

CircArgs left a comment

Choose a reason for hiding this comment

Uh oh!

CircArgs Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

agorajek commented Aug 7, 2023

Uh oh!

CircArgs commented Aug 7, 2023

Uh oh!

shangyian commented Aug 7, 2023

Uh oh!

shangyian commented Aug 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

shangyian commented Aug 7, 2023 •

edited

Loading