Skip to content

Software Gardening Part 2 — v0.5.1 maintenance release#368

Open
gwaybio wants to merge 31 commits into
cytomining:masterfrom
gwaybio:software-gardening-part2
Open

Software Gardening Part 2 — v0.5.1 maintenance release#368
gwaybio wants to merge 31 commits into
cytomining:masterfrom
gwaybio:software-gardening-part2

Conversation

@gwaybio

@gwaybio gwaybio commented Jun 12, 2026

Copy link
Copy Markdown
Member
  • Removed dead code. We deleted the crop_generators/ plugin dispatch system (replaced by a direct import of SingleImageCropGenerator), the dataset/sampling.py single-cell export module, training-related methods from CropGenerator, and the comet_ml/dead imports from main.py
  • Simplified profiling. Profile now directly instantiates SingleImageCropGenerator instead of going through dynamic importlib dispatch; checkpoint loading always renames the Dense head and loads backbone weights by name, removing a fragile try/except
  • Bolstered documentation. We added module docstrings and method docstrings to profiling.py, cropping.py, and all retained dataset/ and imaging/ modules
  • Updated README. We added sections covering how inference works, image preparation (illumination correction + compression), large-scale parallel profiling with split, and verifying the installation
  • Updated ROADMAP and AGENTS.md. We aligned version numbers (v0.5.1 current, v0.6.x PyTorch rewrite), described the agentic AI vision, and documented scope expectations for contributors
    Reformatted all markdown files to one sentence per line
  • Dependency fixes. Very narrow, fragile range but it works. We pinned numpy<2.0 to fix TF 2.14 / ml-dtypes ABI incompatibility; added tensorflow-macos as an explicit platform-conditional dependency so uv sync --frozen works on macOS arm64 CI runners
  • Integration test suite. We added @pytest.mark.integration tests that download the Cell Painting CNN v1 checkpoint from Zenodo, verify the 1280-d pool5 embedding, and run the full Profile pipeline end-to-end; added a dedicated zenodo-integration-test CI job that runs these separately from the main matrix

gwaybio added 2 commits June 12, 2026 11:11
ty was causing too many issues for this task
@gwaybio gwaybio marked this pull request as ready for review June 12, 2026 17:36
@gwaybio gwaybio requested a review from d33bs as a code owner June 12, 2026 17:36
@gwaybio

gwaybio commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

@d33bs - please ignore for now

gwaybio and others added 24 commits June 16, 2026 05:27
ml-dtypes 0.2.0 (a TF 2.14 dep) was compiled against the numpy 1.x C
ABI; numpy 2.x changed the ABI and broke bfloat16() at import time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
uv generates lockfiles at lock time on the resolving platform; the
tensorflow meta-package's conditional dep on tensorflow-macos was never
added because the lock was generated on Linux. Declaring it explicitly
with a platform marker forces uv to include it in the cross-platform
lockfile so uv sync --frozen works on macOS CI runners.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
build_model names the GlobalAveragePooling2D layer "pool5"; the test
config still had the old name "features" from before the refactor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runs pytest -m integration on ubuntu-24.04/Python 3.11 only, after the
main matrix passes. Downloads the Cell Painting CNN v1 checkpoint from
Zenodo and validates the full profiling pipeline end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ert 1280-d embedding

- Build model with num_classes=490 to match the actual Zenodo checkpoint
  so load_weights genuinely validates all weights including the head
- Add train.validation.batch_size to zenodo_config; required by
  SingleImageCropGenerator.start()
- Assert features.shape == (2, 1280) to explicitly verify the pool5
  embedding dimension

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
addopts in pyproject.toml applies globally; without overriding it the
job was running with conflicting -m flags. Clear addopts so the job
runs only the three @pytest.mark.integration tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The try/except approach assumed strict loading could succeed, but the
Dense classifier head will virtually never match the user's class count.
Always renaming it to 'classifier' before load_weights(by_name=True)
means the checkpoint's Dense head is unconditionally skipped and only
backbone weights are loaded — which is all profiling needs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gwaybio

gwaybio commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

@d33bs - this is now ready for your eyes. This was a much heavier lift than I naively expected (see my first failed attempt #367 ) Essentially, for any DeepProfiler PyPI release to be valid, we needed to overhaul the project sooner than I had estimated. See above for updated PR description

@gwaybio

gwaybio commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

@jccaicedo - tagging you for visibility. If you're able, we'd benefit from your comments as well. Thanks!

@gwaybio gwaybio changed the title More software gardening and rescoping Software Gardening Part 2 — v0.5.1 maintenance release Jun 16, 2026

@d33bs d33bs left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looking good to me.

Comment thread .github/workflows/integration-test.yml Outdated

python-type-checks:
name: Python type checks (ty)
name: Python type checks (mypy)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we slid back to mypy. I don't suggest this as a sustainable path forward, there are just too many edge cases which mypy forces.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was unimpressed by ty! It was providing many false positives, which mypy avoided. I'll drop it now though, heeding your warnings.

Comment thread .github/workflows/integration-test.yml Outdated
if: (matrix.os != 'ubuntu-24.04') || (matrix.python-version != env.TARGET_PYTHON_VERSION)
run: uv run pytest ${{ github.event.inputs.pytest_addopts }}

zenodo-integration-test:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this clear (Zenodo archive integration issues are top of mind for me and initially misread).

Suggested change
zenodo-integration-test:
zenodo-model-integration-test:

Comment thread README.md
- Python 3.10+ is required.
- Linux (Ubuntu 20.04+) or macOS.
- For GPU acceleration, a CUDA-compatible GPU is recommended.
- Python 3.10 or 3.11

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later Python version compatibility should be a priority for us on this project soon, Python 3.10 will be EoL this year, meaning we only have one version of compatibility afterwards.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, agree... if we were a bit later in revitalizing this project, we would have had to use a different strategy given the narrowing window of dependency support

class IlluminationStatistics():
"""Accumulate pixel statistics across all images in one plate.

Builds a per-channel pixel histogram and a mean image (at downscaled

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds useful!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I meant to tag @jenna-tomkinson 👀

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHOA! I didn't realize DeepProfiler offered illumination correction. I think this is something we can test to see the performance compared to CellProfiler, especially since this looks to be fairly parameter-free. I wonder why this is kept under DeepProfiler though, feels like this could be an independent software for IC in cytomining, but just my first impression.

# Read metadata and split dataset in training and validation

def read_dataset(config, mode='train'):
"""Build an :class:`ImageDataset` from a config dict.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the missing docstrings replaced: consider adding the pydocstring ruff linter config to the pyproject.toml.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, added!

Comment thread AGENTS.md Outdated
- **Do not modify deprecated code** unless fixing a bug that affects the current release.
Deprecated paths are: `deepprofiler/learning/`, `deepprofiler/imaging/augmentations.py`, `deepprofiler/imaging/cropping.py`, `deepprofiler/dataset/sampling.py`, `plugins/`.
- **Do not anticipate the v0.4.x rewrite** in v0.3.x changes.
- **Do not anticipate the v0.6.x PyTorch rewrite** in v0.5.1 changes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should avoid version specifics in the agents file. Perhaps this is better suited for the roadmap with strong guidance here to abide what's in the roadmap as per existing git tags and upstream repo status?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i agree - i'll update this file

Comment thread deepprofiler/profiling.py Outdated
useful for fine-tuning experiments.
"""
num = config["train"]["model"]["params"]["conv_blocks"]
assert num in _EFFICIENTNET_MODELS, f"{num} not in supported EfficientNet variants: {list(_EFFICIENTNET_MODELS)}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing these asserts and using if-cases with specific Exceptions instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we can include another ruff linter rule to avoid these: https://docs.astral.sh/ruff/rules/assert

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks!!

Comment on lines +15 to +18
skimage.io.imsave(tmp_path + "/rand_img_1.png", numpy.random.randint(256, size=(16, 16), dtype=numpy.uint16))
skimage.io.imsave(tmp_path + "/rand_img_2.png", numpy.random.randint(256, size=(16, 16), dtype=numpy.uint16))
skimage.io.imsave(tmp_path + "/rand_img_3.png", numpy.random.randint(256, size=(16, 16), dtype=numpy.uint16))
skimage.io.imsave(tmp_path + "/rand_outlines.png", numpy.random.randint(256, size=(16, 16), dtype=numpy.uint16))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to test a real dataset here too? Perhaps something from the publication? It need not be a "full dataset" but having something "real" would probably give us better observation of the code changes. Maybe this is something to think about including in OME-IRIS too, eventually.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in ae4db39 - thanks for this suggestion!

@gwaybio

gwaybio commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

thanks for the review @d33bs ! I'll keep this PR open for ~1 week to give @jccaicedo a chance to comment. Juan, please let us know if you need more time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants