[HWORKS-2802 / -2807] Document partitioned_by on feature group creation by jimdowling · Pull Request #585 · logicalclocks/logicalclocks.github.io

jimdowling · 2026-05-21T12:49:02Z

Summary

User-guide section documenting the partitioned_by parameter on feature group creation, under the existing partitioning area in docs/user_guides/fs/feature_group/create.md.

Covers:

Usage with create_feature_group / get_or_create_feature_group, and the resulting Hive on-disk layout (year=.../month=.../day=.../).
The contract: the user's dataframe never carries the grain columns. The client derives them from event_time on each write, and the backend registers them as ordinary partition columns through the normal table-creation path — there are no Delta GENERATED columns and no dedicated backend Spark job.
Validation rules: mutually exclusive with partition_key, requires event_time, grain enum membership with no duplicates, no collision with event_time or an existing feature name, and the hour grain requires a timestamp event_time.
Partition pruning: the grain columns are real partition columns, so a direct grain filter prunes natively, and an event_time-range read is rewritten into equivalent grain predicates by the query layer (pruning for hierarchical specs). Includes a hierarchical vs non-hierarchical behavior table.
Online feature store: online-enabled partitioned_by is not supported yet (deferred to HWORKS-2808), so the grains are offline-only and online_partition_columns is effectively always False today. A feature view may still select a derived grain — it is served offline (training data / batch inference), but get_feature_vector / get_feature_vectors raise a FeatureStoreException, and feature-view creation warns when the view also joins an online-enabled feature group.
Formats: DELTA, ICEBERG, and HUDI on non-stream feature groups (the client materializes the grains; Hudi partitions on them). HUDI on the Python engine becomes a stream feature group and is not yet supported, and stream feature groups are not yet supported.
The feature group UI Table DDL card.

Pairs with:

hopsworks-api#961 — Python client: client-side grain materialization, cross-engine predicate translator, and the feature-view online-serving guard.
hopsworks-ee#3034 — Backend: persistence, validation, Hudi activation, and the offline_only online groundwork.
loadtest#859 — End-to-end workflows (feature group + feature-view serving guard).

JIRA: HWORKS-2802. Engineering walkthrough: Confluence page.

Test plan

hopsworks-docs markdownlint clean (221 files, 0 errors).
hopsworks-docs snakeoil clean (Python code blocks pass ruff at line length 88).
hopsworks-docs check (mkdocs strict build) clean — API cross-references resolve, nav intact, no broken links.

🤖 Generated with Claude Code

…tion https://hopsworks.atlassian.net/browse/HWORKS-2802 Add a section to docs/user_guides/fs/feature_group/create.md describing the storage-engine-native partitioned_by parameter for Delta feature groups. Covers: - Usage example with create_feature_group / get_or_create_feature_group. - The CREATE TABLE … USING DELTA … GENERATED ALWAYS AS … contract: the storage layer derives the partition columns; the user's dataframe never carries them. - Validation rules: mutual exclusion with partition_key, requires event_time. - Partition pruning table — Delta auto-derives partition predicates from the GENERATED expressions for hierarchical specs (year / year+month / year+month+day / year+month+day+hour), so `fg.read(start_time=..., end_time=...)` and `fg.filter(fg.event_time >= ...)` prune at the partition level. Non-hierarchical specs (e.g. ["month"], ["year","week"]) are valid but skip the auto-derivation — only direct predicates on the grain columns prune. Recommend hierarchical specs. - Online feature store behavior: derived columns live offline-only by default; online_partition_columns=true opts into online materialization. Until the onlinefs consumer filter ships, the backend rejects partitioned_by + online_enabled=true with the default online_partition_columns=false. Document both workarounds. - Hudi: partitioned_by + HUDI is rejected at creation; Hudi support is tracked under a separate follow-up ticket. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

https://hopsworks.atlassian.net/browse/HWORKS-2802 The partitioned_by section described Delta GENERATED ALWAYS AS columns and storage-engine-side derivation, which is no longer how it works. Document the real design: the client derives the grain columns from event_time and writes them as real partition columns, pruning works natively on grain filters and via predicate translation on event_time ranges. Correct the online-store note: online-enabled partitioned_by feature groups are rejected entirely until HWORKS-2808, not only with the default online_partition_columns. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…io into HWORKS-2802

…note https://hopsworks.atlassian.net/browse/HWORKS-2802 The Hudi follow-up materializes the grain columns server-side and partitions on them directly; the CustomKeyGenerator phrasing described a mechanism the revised design no longer uses. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…io into HWORKS-2802

Copilot

Pull request overview

Adds documentation to the Feature Group creation guide describing the new partitioned_by parameter for time-grain partitioning.

Changes:

Introduces a new “Time-grain partitioning with partitioned_by” section with a Python usage example.
Documents partition-pruning behavior for hierarchical vs non-hierarchical grain specs.
Adds notes about online feature store and Hudi behavior (currently conflicting with the PR description).

https://hopsworks.atlassian.net/browse/HWORKS-2802 Flesh out the partitioned_by section into reference for the shipped feature: the parameter list (partitioned_by + online_partition_columns with their constraints), cross-session persistence and the round-trip through get_feature_group, the on-disk Hive layout, a read/partition- pruning example with the hierarchical-vs-non-hierarchical matrix, a clickstream-by-hour example, and the current online and Hudi limitations (online rejected at create and on enable). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…io into HWORKS-2802

https://hopsworks.atlassian.net/browse/HWORKS-2807 partitioned_by now works on DELTA and ICEBERG; NONE is rejected alongside Hudi. Update the section heading, supported-formats note, and the Hudi fallback guidance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

https://hopsworks.atlassian.net/browse/HWORKS-2807 Non-stream Hudi feature groups now support partitioned_by (direct Spark write); stream feature groups and NONE are rejected. Update the section heading, supported-formats note, Hudi note, and add a stream note. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

…eature views https://hopsworks.atlassian.net/browse/HWORKS-2802 Document that the hour grain requires a timestamp event_time (rejected on a date event_time), and that a feature view may select the derived grain columns even when it joins online-enabled feature groups: the grains are served from the offline store (training data, batch inference) and excluded from the online feature vector. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…io into HWORKS-2802

https://hopsworks.atlassian.net/browse/HWORKS-2802 Document that the feature group overview shows a Table DDL card with the Spark SQL CREATE TABLE for the offline table (format + partition columns) and the RonDB CREATE TABLE for the online table when online-enabled. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y grains https://hopsworks.atlassian.net/browse/HWORKS-2802 Selecting a derived partitioned_by grain column into a feature view does not silently exclude it from the online vector: get_feature_vector and get_feature_vectors raise a FeatureStoreException, and feature-view creation warns when the view also joins an online-enabled feature group. The grains remain available offline (training data, batch inference). Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

https://hopsworks.atlassian.net/browse/HWORKS-2802 Reword the hourly-partitioning example so "clickstream" is not mistaken for a stream feature group (stream=True), which partitioned_by does not yet support. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jimdowling changed the title ~~[HWORKS-2802] Document partitioned_by parameter on feature group creation~~ [HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation May 21, 2026

jimdowling and others added 7 commits May 30, 2026 11:43

Merge remote-tracking branch 'upstream/main' into HWORKS-2802

e3d5db3

Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…

523c327

…io into HWORKS-2802

Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…

a899acc

…io into HWORKS-2802

Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…

c28568c

…io into HWORKS-2802

Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…

f1376e2

…io into HWORKS-2802

jimdowling marked this pull request as ready for review June 11, 2026 04:35

jimdowling requested a review from Copilot June 11, 2026 04:35

Copilot started reviewing on behalf of jimdowling June 11, 2026 04:35 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

jimdowling and others added 4 commits June 11, 2026 06:41

Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…

fcaf241

…io into HWORKS-2802

jimdowling requested a review from Copilot June 13, 2026 10:04

Copilot started reviewing on behalf of jimdowling June 13, 2026 10:04 View session

Copilot AI reviewed Jun 13, 2026

View reviewed changes

Comment thread docs/user_guides/fs/feature_group/create.md

Comment thread docs/user_guides/fs/feature_group/create.md

Comment thread docs/user_guides/fs/feature_group/create.md Outdated

jimdowling and others added 4 commits June 15, 2026 10:43

Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…

6977a16

…io into HWORKS-2802

gibchikafa approved these changes Jun 18, 2026

View reviewed changes

jimdowling changed the title ~~[HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation~~ [HWORKS-2802 / -2807] Document partitioned_by on feature group creation Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HWORKS-2802 / -2807] Document partitioned_by on feature group creation#585

[HWORKS-2802 / -2807] Document partitioned_by on feature group creation#585
jimdowling wants to merge 17 commits into
logicalclocks:mainfrom
jimdowling:HWORKS-2802

jimdowling commented May 21, 2026 •

edited by atlassian Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jimdowling commented May 21, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jimdowling commented May 21, 2026 •

edited by atlassian Bot

Loading