Skip to content

feat(isthmus): add PostgreSQL TPC-H integration testing#700

Draft
nielspardon wants to merge 2 commits intosubstrait-io:mainfrom
nielspardon:par-pge2e
Draft

feat(isthmus): add PostgreSQL TPC-H integration testing#700
nielspardon wants to merge 2 commits intosubstrait-io:mainfrom
nielspardon:par-pge2e

Conversation

@nielspardon
Copy link
Member

@nielspardon nielspardon commented Feb 4, 2026

This PR adds integration testing for isthmus against a PostgreSQL database.

  • it adds TPC-H data generated with scale factor 0.01 aka tiny *.tbl files
  • it uses testcontainers to automatically spin up a PostgreSQL container during testing
  • populates the PostgreSQL DB by running the DDL statements and loading the data from the *.tbl files
  • runs a reference SQL and a SQL generated from the Substrait plan against the PostgreSQL database and checks whether both SQLs produce the same records
  • I had to create copies of the DDL statements and the TPC-H queries for PostgreSQL since there are smaller syntax differences vs the copies we currently have in the queries subfolder e.g.
    • PostgreSQL expects interval '1 day' instead of interval '1' day(3).
    • identifiers in Substrait plans are essentially handled as case-sensitive so I had to uppercase and quote identifiers in the reference SQLs and the DDL statements
  • the good news all but TPC-H query 21 are producing the same results based on the testing method I'm using. just need to figure out what's going wrong with this query

This is currently a draft since I want to get some early feedback. What I'm still planning to do:

  • add javadocs for the new code
  • Q: should we always run the integration tests or should we run them in a separate Gradle task?
  • I also want to enable this for the TPC-DS in a follow-up PR and then see whether we can extend this more databases supported by testcontainers

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@benbellick
Copy link
Member

I am working on a fuller response here, but just wanted to see if you were aware of the existence of this related-seeming repository?

@nielspardon
Copy link
Member Author

nielspardon commented Feb 5, 2026

I am working on a fuller response here, but just wanted to see if you were aware of the existence of this related-seeming repository?

I am aware of the repo. The problem with the consumer-testing repo is that by design it is a centralized approach and hence difficult to maintain. There is currently no active maintenance of it. I would suggest to not do the consumer testing centrally but to provide a Substrait testkit that integrations can test against in a decentralized setup. This brings the testing closer to the code being tested allowing for shorter cycles. This e.g. what a project like dbt does with their many adapters.

@andrew-coleman has been looking into updating the consumer-testing repo to a more recent spec version and it led him down a rabbit hole of many of the currently tested consumers not having updated to the latest spec version. In some cases ending up with seemingly unmaintained consumers like e.g. Apache Ibis.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants