fix categories import #1010

jonas2612 · 2025-10-30T08:47:30Z

Possible solution to Issue #1009

codecov · 2025-10-30T08:57:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.20%. Comparing base (84db22b) to head (040021b).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1010   +/-   ##
=======================================
  Coverage   92.20%   92.20%           
=======================================
  Files          49       49           
  Lines        7560     7560           
=======================================
  Hits         6971     6971           
  Misses        589      589

Files with missing lines	Coverage Δ
src/spatialdata/models/models.py	`88.61% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

melonora · 2025-10-30T08:59:02Z

Hi @jonas2612, Thanks for the PR. However, before I look further into this, could you please check whether the problem persists when you use the PR that unpins dask, #1006? I had a different issue with partitioning, but it could be connected so want to see if it would be fixed.

Also, in general, cat.as_known in the newer dask versions does not necessarily preserve the order anymore, so it might be better to wait with this PR a tiny bit as we expect to have the unpinning dask PR reviewed soon. It would then be better to branch of from that point given what I just mentioned.

jonas2612 · 2025-10-30T09:03:54Z

Hi @melonora. Yes, of course. I'll check it today and get back to you

melonora · 2025-10-30T09:06:28Z

Cheers, please also check this https://github.com/melonora/spatialdata/blob/e017ca7d6107623750196606a07fe8e4407c242f/src/spatialdata/_core/operations/rasterize.py#L674-L677.

This might provide some context for what I mentioned in my message above.

jonas2612 · 2025-10-30T09:23:54Z

That indeed looks very similar to the bug I get.
Still, I checked it with your PR too, but the error still exists:

Here, I expect either 500 or 550 genes, but only receive 17.

I understand the difficulty, although do not understand the background well, why the ordering of the categories is important. But if it's better for you, I can branch off and redo the change at a later time point after the PR is reviewed

melonora · 2025-10-30T12:09:53Z

@jonas2612, the reason why it is important is because as_known is basically reassigning a column. If the order is changed, a particular category belonging to a certain index can have changed.

LucaMarconato · 2026-01-03T21:13:07Z

Thanks @jonas2612 for the PR, I will merge it now. It introduces a performance regression but we can address it later.

I did some quick tests and it seems that

data[c] = data[c].cat.as_known()

takes the same time as

data[c] = data[c].cat.set_categories(data[c].compute().cat.categories)

Since there may be some concern around the order of categories with as_known() (@melonora do you have updates from the Dask team?), let's use compute() for now. I opened an issue to remember to explore faster alternatives: #1042.

LucaMarconato · 2026-01-03T22:28:32Z

I added a test, which fails before the new change in the pair. As I explain in the test, compute() does not preserve the order of categories (same as as_known()). We may want to just accept it as the new behavior, or follow up with the dask team.

LucaMarconato · 2026-01-03T22:42:44Z

I'll skip the CI from the latest push since I just removed a comment.

fix categories import

64c7f67

ensure reproducibility

8425f34

LucaMarconato mentioned this pull request Jan 3, 2026

Categorical column in points element becomes filled with NaNs #1041

Closed

LucaMarconato added 2 commits January 3, 2026 22:13

Merge branch 'main' into jonas2612/main

6c50cfc

add test for missing categories bug in PointsModel

040021b

LucaMarconato mentioned this pull request Jan 3, 2026

Categories missing with highly partitioned dask dataframes in PointsModel #1009

Closed

This was linked to issues Jan 3, 2026

Categories missing with highly partitioned dask dataframes in PointsModel #1009

Closed

Categorical column in points element becomes filled with NaNs #1041

Closed

LucaMarconato mentioned this pull request Jan 3, 2026

Find faster way to infer categories in PointsModel.parse() #1042

Open

remove comment

5103dbe

LucaMarconato merged commit 8022f5c into scverse:main Jan 3, 2026
1 of 7 checks passed

LucaMarconato added the release-fixed label Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix categories import #1010

fix categories import #1010

Uh oh!

jonas2612 commented Oct 30, 2025

Uh oh!

codecov bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

melonora commented Oct 30, 2025 •

edited

Loading

Uh oh!

jonas2612 commented Oct 30, 2025

Uh oh!

melonora commented Oct 30, 2025

Uh oh!

jonas2612 commented Oct 30, 2025

Uh oh!

melonora commented Oct 30, 2025

Uh oh!

LucaMarconato commented Jan 3, 2026 •

edited

Loading

Uh oh!

LucaMarconato commented Jan 3, 2026

Uh oh!

LucaMarconato commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix categories import #1010

fix categories import #1010

Uh oh!

Conversation

jonas2612 commented Oct 30, 2025

Uh oh!

codecov bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

melonora commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonas2612 commented Oct 30, 2025

Uh oh!

melonora commented Oct 30, 2025

Uh oh!

jonas2612 commented Oct 30, 2025

Uh oh!

melonora commented Oct 30, 2025

Uh oh!

LucaMarconato commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucaMarconato commented Jan 3, 2026

Uh oh!

LucaMarconato commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 30, 2025 •

edited

Loading

melonora commented Oct 30, 2025 •

edited

Loading

LucaMarconato commented Jan 3, 2026 •

edited

Loading