Skip to content

Normalize lists argument in MerlinDataLoader._augment_schema#818

Open
shaun0927 wants to merge 1 commit into
NVIDIA-Merlin:mainfrom
shaun0927:fix/812-augment-schema-lists-none
Open

Normalize lists argument in MerlinDataLoader._augment_schema#818
shaun0927 wants to merge 1 commit into
NVIDIA-Merlin:mainfrom
shaun0927:fix/812-augment-schema-lists-none

Conversation

@shaun0927
Copy link
Copy Markdown

Goals ⚽

Stop MerlinDataLoader._augment_schema from raising TypeError when the caller omits lists= (the common case for datasets without list-typed features).

Fixes #812.

Implementation Details 🚧

transformers4rec/torch/utils/data_utils.py normalizes cats, conts, and labels to empty lists when they default to None, but the normalization for lists was missing:

cats = cats or []
conts = conts or []
labels = labels or []
# missing: lists = lists or []

schema = schema.select_by_name(conts + cats + labels + lists)

Any caller that does not pass lists= (or passes lists=None explicitly) hits TypeError: can only concatenate list (not "NoneType") to list at the select_by_name line. The later for col in lists: loop would also raise TypeError: 'NoneType' object is not iterable.

Minimal fix — add the symmetric lists = lists or [] alongside the other three:

         cats = cats or []
         conts = conts or []
         labels = labels or []
+        lists = lists or []

         schema = schema.select_by_name(conts + cats + labels + lists)

Testing Details 🔍

Pre-fix repro (extracted from the source, no install needed):

class Schema:
    def select_by_name(self, cols):
        return self

def _augment_schema(schema, cats=None, conts=None, labels=None, lists=None):
    cats = cats or []
    conts = conts or []
    labels = labels or []
    return schema.select_by_name(conts + cats + labels + lists)

_augment_schema(Schema(), cats=["x"])
# TypeError: can only concatenate list (not "NoneType") to list

With the patch, the same call succeeds.

No unit test is added — _augment_schema is currently exercised indirectly by every MerlinDataLoader construction in the test suite that supplies lists=[...]. I am happy to add a direct test for lists=None if reviewers would prefer.

cats/conts/labels are normalized to [] when None, but lists was not.
The subsequent 'conts + cats + labels + lists' raises
TypeError: can only concatenate list (not "NoneType") to list
for any caller that omits lists=, which is the common case for
datasets without list-typed features.

Fixes NVIDIA-Merlin#812
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shaun0927
Copy link
Copy Markdown
Author

FYI — I have read CLA.md and agree to its terms for this submission. The changes in this PR are entirely my original work, made on my own behalf (not in the course of employment by any other party), and are offered under the Apache 2.0 license of the project. Happy to re-state this in any additional form the maintainers prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] MerlinDataLoader._augment_schema raises TypeError when lists is omitted

1 participant