Skip to content

Conversation

@suddendust
Copy link
Contributor

@suddendust suddendust commented Jan 12, 2026

Description

This PR implements Collection#create(Key key, Document document) for FlatPostgresCollection.

Testing

  • Added integration tests.

Checklist:

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@suddendust suddendust changed the title [Draft] Pg write create FlatPostgresCollection Create Doc Impl Jan 12, 2026
suresh-prakash
suresh-prakash previously approved these changes Jan 13, 2026
private final DataType canonicalType;
@Getter private final PostgresDataType postgresType;
private final boolean nullable;
private final boolean array;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: isArray since array is a slightly confusing without looking at this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 81.77083% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.52%. Comparing base (4853757) to head (900c87f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...documentstore/postgres/FlatPostgresCollection.java 81.81% 20 Missing and 8 partials ⚠️
...rg/hypertrace/core/documentstore/CreateResult.java 65.00% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #266      +/-   ##
============================================
+ Coverage     80.51%   80.52%   +0.01%     
- Complexity     1385     1392       +7     
============================================
  Files           234      237       +3     
  Lines          6194     6379     +185     
  Branches        554      584      +30     
============================================
+ Hits           4987     5137     +150     
- Misses          831      854      +23     
- Partials        376      388      +12     
Flag Coverage Δ
integration 80.52% <81.77%> (+0.01%) ⬆️
unit 57.15% <7.29%> (-1.50%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suddendust
Copy link
Contributor Author

suddendust commented Jan 14, 2026

@suresh-prakash Can you take a look at the following:

  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.
  2. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.
  3. If any columns are skipped this way, we return the list to the client in the CreateResult. For this, CreateResult has been enhanced.
  4. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.

@suresh-prakash
Copy link
Contributor

@suresh-prakash Can you take a look at the following:

  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.
  2. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.
  3. If any columns are skipped this way, we return the list to the client in the CreateResult. For this, CreateResult has been enhanced.
  4. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.
  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.

Best-effort makes sense. But, most of the times such log messages are ignored though we may be setting them in the CreateResult. Rather I'd prefer we fail it so that the client can take a necessary action (they may choose to retry with the column ignored, if we throw a custom exception with meaningful info.).

  1. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.

This is fine, I guess, because that's the way the Mongo impl. works today. If I give an invalid selection, it doesn't throw. Rather avoids it in the JSON, which in turn results in a null (or missing) value on the caller/client side.

  1. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.

100% makes sense. 🙂

@puneet-traceable
Copy link

puneet-traceable commented Jan 14, 2026

@suresh-prakash Can you take a look at the following:

  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.
  2. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.
  3. If any columns are skipped this way, we return the list to the client in the CreateResult. For this, CreateResult has been enhanced.
  4. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.
  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.

Best-effort makes sense. But, most of the times such log messages are ignored though we may be setting them in the CreateResult. Rather I'd prefer we fail it so that the client can take a necessary action (they may choose to retry with the column ignored, if we throw a custom exception with meaningful info.).

  1. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.

This is fine, I guess, because that's the way the Mongo impl. works today. If I give an invalid selection, it doesn't throw. Rather avoids it in the JSON, which in turn results in a null (or missing) value on the caller/client side.

  1. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.

100% makes sense. 🙂

Why retry on DATATYPE_MISMATCH? This can be a bad situation to be in. If client data type mismatches the db column type then retries won't help. DB Column data type change should never be the case in postgres.

@suddendust
Copy link
Contributor Author

DB Column data type change should never be the case in postgres.

Well theoretically speaking, the rationale behind the retry is: Maybe the data type changed and the schema is stale, so refresh it and try again.

@suddendust
Copy link
Contributor Author

@suresh-prakash I am not throwing an exception because it won't throw an exception in Mongo as well. So to keep the interface behaviour consistent, shall we stick to it?

@suresh-prakash
Copy link
Contributor

@suresh-prakash I am not throwing an exception because it won't throw an exception in Mongo as well. So to keep the interface behaviour consistent, shall we stick to it?

True. But silently neglecting seems dangerous. Most clients would be unware of it. While it may give the immediate compatibility, it could create far more issues later. Since this write path anyways require code changes in the clients, I still think it would be better to let the clients handle it rather than the library doing it behind the scenes.

@kotharironak Thoughts here?

@suddendust
Copy link
Contributor Author

suddendust commented Jan 16, 2026

@suresh-prakash How about a config to control this (in customParameters)? Maybe something like bestEffortWrites: true/false. If true, it would write on a best-effort basis. If not, if would do a strict match - That is, all fields passed in the doc should be present in the schema along with the right value type.

@suddendust
Copy link
Contributor Author

suddendust commented Jan 16, 2026

@suresh-prakash Have add a bestEffortWrites custom param that controls the dataflow as following:

  1. If true, then PG would skip any fields passed in the document that are not present in the schema, or whose passed values' types don't conform to what is present in the schema.
  2. If false, it does a strict match. All fields present in the doc should be present in the schema + the passed values' types should conform to the defined schema.

Wdyt?

@suresh-prakash
Copy link
Contributor

@suresh-prakash Have add a bestEffortWrites custom param that controls the dataflow as following:

  1. If true, then PG would skip any fields passed in the document that are not present in the schema, or whose passed values' types don't conform to what is present in the schema.
  2. If false, it does a strict match. All fields present in the doc should be present in the schema + the passed values' types should conform to the defined schema.

Wdyt?

This is a nice middle-ground. 🙂 Just a small suggestion though. Instead of a boolean, can we make it as an enum (say, MissingColumnStrategy) please? That way, if at all there arises a third strategy in the future, it's straight-forward to extend it. E.g. values: SKIP, THROW, IGNORE_DOCUMENT, MENTION_IN_RESPONSE, etc. (Out of these, we can just implement the necessary ones today).

this.succeed = succeed;
private final CreateStatus status;
private final boolean onRetry;
private final List<String> skippedFields;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can be a set.

* a field doesn't match the schema. The write operation will fail.
*/
THROW,
IGNORE_DOCUMENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can also come in later when the need arises.

*
* <p>This enum defines how the system should behave when encountering fields that either don't
* exist in the schema or have incompatible types.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also mention what is the default (in case not specified)? Or, is it a mandatory field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default can go in @implSpec annotated documentation comment so that every implementor would agree to this specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better would be to expose a static method called MissingColumnStrategy.default() so that it is self-documented (and gives an opportunity to later change the default across all implementations in a single go).

@suresh-prakash suresh-prakash merged commit a45e01d into hypertrace:main Jan 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants