v0.1.0: monorepo merge — unified training + inference, canonical model, archive-free inference, vendored baselines#41
Draft
xuefei-wang wants to merge 269 commits into
Draft
Conversation
From reviews/2026-05-10-2345/simplification.md H1+H2 and complexity.md H2: - Delete _zarr_group_filesystem_path and _read_v3_1d_array from training/utils.py. Both were verbatim copies of annotations.py's group_filesystem_path / read_v3_1d_array with zero callers across the repo (verified by grep). The annotations.py versions are the canonical ones imported by training/dataset.py. - Delete the three pass-through static shim methods on FullImageDataset (_group_filesystem_path, _read_v3_1d_array, _centroid_to_cell_idx_fast). None were called anywhere — adding zero value, only obscuring that the real helpers live in annotations.py. Note: _build_centroid_tree is kept (also flagged but not in the HIGH list). - Backport the zstd-level-aware codec read from dct_kit/config.py into annotations.py:read_v3_1d_array. The old training-side copy hardcoded Zstd(level=0) while the inference side correctly reads level from the codec config. With archives written at a non-zero compression level the training-side read would silently produce garbage. Both paths now share the level-aware contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es (Theme F) config.py and utils.py had grown to 1.3k and 1.5k LOC, mixing archive fingerprinting, patch extraction, metric trackers, baseline IO, and the core TissueNetConfig/RNG/log helpers in one place each. Carve four focused modules out (verbatim, no logic changes): - training/archive.py: zarr v3 alpha metadata patch, archive metadata / array fingerprinting, FOV-key discovery, and the per-process caches. - training/patch.py: per-cell patch extraction (compute_distance_transform, extract_patch_from_zarr, extract_patch). - training/metrics.py: confusion-matrix hierarchy adjustment, MP per-marker reduction, MPMetricsTracker, LossesAndMetrics, build_label_remap. - training/baseline_features.py: baseline classifier feature extraction pipeline (_conf_mat_summary, compute_baseline_metrics, save_baseline_predictions, _extract_all_dataset_features, extract_features_from_zarr, _get_cell_data_from_ds). Re-exports at the bottom of config.py and utils.py keep all tests/scripts working unchanged (230 passed, 1 skipped, matching the pre-split baseline). dataset.py is updated to import directly from the new homes for cached_archive_metadata_fingerprint and extract_patch. Two non-mechanical touches required to keep monkey-patch-based tests green: - baseline_features.extract_features_from_zarr looks up _discover_fov_keys and _extract_all_dataset_features via the config / utils modules at call time, so tests that monkeypatch those symbols on the legacy modules still take effect after the split. _FINGERPRINT_CACHE / _FOV_KEYS_CACHE dicts are re-exported from config.py for the same reason (test_dataset_cache mutates them). - metrics.LossesAndMetrics.compute defers import of _conf_mat_summary to method-call time to avoid a metrics <-> baseline_features import cycle (baseline_features needs adjust_conf_mat_hierarchy at module load). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
From reviews/2026-05-10-2345/docs.md HIGH findings:
- README: add a "Training" section describing the [train] extra and the
four main entry points under scripts/. Move "Download the model"
after "Installation" (was non-executable in reading order).
- docs/index.md: add a "Training" section explaining that training-only
code lives under deepcell_types.training, gated behind the [train]
extra, with pointers to scripts/{train,predict,pretrain,
benchmark_gold_standard,ingest_gold_to_zarr}.py. Fix the long-standing
"sorce" typo.
- docs/site/tutorial.md: bump the example archive placeholder from
tissuenet-v8.zarr → tissuenet-v9.zarr to match DCTConfig's probe
order (v9 is the canonical contemporary archive).
The docs.md HIGH for the broken `from utils import download_training_data`
import in docs/site/API-key.md was fixed in 88b95f9.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five MEDIUM/HIGH findings from reviews/2026-05-10-2345 in one batch: - complexity H1: TissueNetConfig.get_marker_positivity() and marker_positivity_labels[] now share a single LazyMarkerPositivityDict. Previously the plain-dict cache populated by get_marker_positivity() was discarded the first time marker_positivity_labels was accessed (the property replaced the field), causing wasted I/O and divergent caches. _marker_positivity_cache is now Optional[LazyMP...] and lazily constructed on first access; get_marker_positivity routes through marker_positivity_labels for a single source of truth. - numerical M1: MarkerEmbeddingLayer.forward zeros output for padding positions (ch_idx == -1). Without this, F.normalize(proj(0)) yielded a unit-norm direction equal to F.normalize(proj.bias) — a non-trivial embedding flowing into the transformer for tokens that should be invisible. - numerical M2: CellTypeAnnotator.forward zeros spatial features for padding positions BEFORE the fusion concat. Otherwise padding tokens enter self.fusion with [0, spatial_feat] and emerge as W_spatial @ spatial_feat + bias. - API M1: rename predict(tissue_exclude=...) → predict(tissue_filter=...). The old name was inverted — "tissue_exclude='colon'" actually meant "filter TO colon-associated cell types". The deprecated alias stays (keyword-only) and emits DeprecationWarning; passing both raises TypeError. - API M3: predict(return_probabilities=True) returns a PredictionResult dataclass with cell_types, probabilities (full per- cell softmax matrix), and cell_indices. Default behaviour unchanged (returns list[str]). PredictionResult and DCTConfig are now hoisted to top-level so `from deepcell_types import PredictionResult, DCTConfig` works. Tests: 233 passed, 1 skipped. Added 3 new tests covering return_probabilities, tissue_exclude DeprecationWarning, and the both-args TypeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tests M3: add a regression anchor in test_train_loop_smoke.py that asserts scripts/train.py still contains the AMP scheduler-gate predicate. The 2-line _run_gated_step helper is faithful to the production behavior but a silent drift would otherwise let the emulator tests pass while real training desynchronizes OneCycleLR. - tests M2: same idea for test_zero_channel_masking.py. The unit-test helper is a verbatim copy of __getitem__'s masking block; a refactor could let the copy drift. New test asserts training/dataset.py still contains _zero_channel_cache and fov_zero_mask. - docs M4: add CHANGELOG.md documenting the 0.0.1 → 0.1.0 release (canonical-only refactor, training subpackage, breaking removal of CellTypeCLIPModel, deprecated tissue_exclude alias, num_workers=0 default, TissueNetConfig env-var default). Bump version in pyproject.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
complexity H8: replace FullImageDataset.indices' positional 8-tuple
with a CellIndexRecord NamedTuple. Named fields make grep / refactor
safe (no more record[6] / record[5] magic numbers across 10+ call
sites). NamedTuple IS a tuple, so positional access still works for
backward compat with serialized caches that stored raw 8-tuples.
Production call sites in dataset.py now use .ct_label_standard,
.dataset_name, .fov_name, .ds_idx, .domain accessors. Mock-index
constructors in tests/{test_v2,test_samplers,test_stratified_splits,
test_dataset_splits}.py updated to build CellIndexRecord instances.
complexity H7: introduce DataLoaderConfig dataclass + matching
create_dataloader_from_config(zarr_dir, dct_config, cfg) wrapper.
Lets new callers pass a single discoverable object instead of 20+
keyword arguments. The legacy keyword signature of create_dataloader
is preserved verbatim so train.py / predict.py / tests don't need
any change. Field defaults mirror create_dataloader's defaults
exactly — DataLoaderConfig() is equivalent to no-override.
Tests: 235 passed, 1 skipped (analysis-only env failure unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse training pipeline into deepcell-types (canonical-only)
…ne submodule rebase
Three independent bugs surfaced when running training against the current
master HEAD from a fresh workspace install:
1. tissue_idx kwarg mismatch (scripts/train.py:121, scripts/predict.py:208 + 334)
scripts pass `tissue_idx=batch_data.tissue_idx` to
`CellTypeAnnotator.forward(...)`, but the model's forward signature is
`(sample, spatial_context, ch_idx, padding_mask, ct_exclude=None,
return_attn_weights=False, domain_idx=None)` — no `tissue_idx`. The
tissue-FiLM MP head experiment was rolled back (see memory
`v10_mp_expansion_tissue_negative.md`) and the model dropped the
parameter, but the scripts kept passing it. Result: every training /
prediction run dies at the first forward pass with
`TypeError: ...got an unexpected keyword argument 'tissue_idx'`.
Fix: drop the kwarg at all three call sites. `batch_data.tissue_idx`
is still populated by the dataloader and remains available to anyone
who needs it downstream — the model just doesn't consume it.
2. Circular import between training/utils.py and training/baseline_features.py
utils.py re-exports four symbols from baseline_features.py at module
level for backward compat. baseline_features.py also imports private
helpers (`_atomic_np_savez` etc.) from utils.py. When utils.py is
imported first (training path) the cycle resolves fine, but when
baseline_features.py is imported first (baseline path — e.g.
`import xgb.run`), the partially-initialized utils.py reaches back to
`baseline_features._extract_all_dataset_features` before that name is
defined, and ImportError fires.
Fix: convert the re-exports to a module-level `__getattr__` so the
lookup is deferred until actual access, by which point both modules
have finished initializing. Existing callers
(`from deepcell_types.training.utils import save_baseline_predictions`,
verified in tests/test_v2.py) keep working.
3. Submodule rebase (baselines/{maps,cellsighter,xgboost,nimbus})
Each baseline's pyproject.toml listed `deepcelltypes @ git+...
deepcelltypes-cell-type-assignment-pytorch.git` as a dep; that URL
now resolves to the renamed research workspace (no longer a Python
package) and `uv pip install` fails with a metadata-name mismatch.
Each baseline also imported from `deepcelltypes.{config,utils,dataset}`
— the pre-refactor flat layout. Companion commits on each submodule's
`fix/post-refactor-imports` branch replace the dep URL with a plain
`deepcell-types` and rebase imports onto
`deepcell_types.training.{config,utils,dataset,metrics,baseline_features}`.
This parent commit bumps the submodule pointers to those branch tips.
End-to-end verification: with the three fixes, a fresh workspace `uv sync`
+ smoke training (`scripts/train.py` with the v10 split + svd_512_v6
embeddings) gets through model build, GPU allocation, and reaches batch 0
of epoch 0. The xgboost baseline imports cleanly after
`uv pip install -e baselines/xgboost`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…port fix(train,predict,utils): tissue_idx kwarg + circular import + baseline submodule rebase
uv.lock is regenerated on every branch switch when pyproject.toml shapes differ (master vs training), so keeping it tracked produces constant churn. /reviews/ holds local /deep-review outputs.
Untouched since Oct 2024 and broken since the kit was inlined in 0a8108e: it COPYs a non-existent top-level requirements.txt and pip-installs the deleted deepcelltypes-kit/ directory. No CI, docs, or scripts reference it.
…us v0.0.5 model.py: replace channel_feat[padding_mask] = 0.0 (in-place under AMP autocast on a tensor in the backward graph) with an out-of-place masked_fill on padding_mask.unsqueeze(-1). Eliminates the latent "a leaf Variable that requires grad has been used in an in-place operation" risk and the gradient corruption it would cause on padding rows. metrics.py: make MPMetricsTracker.compute symmetric across mp_macro_f1 and mp_macro_accuracy w.r.t. vacuous markers (n_pos_gt == 0 and n_pos_pred == 0). f1s already appended np.nan + used nanmean; accuracies appended a real value + used mean, asymmetrically inflating macro accuracy. Now both go through nanmean with np.nan sentinels, so the two headline MP numbers come from the same denominator. scripts/benchmark_gold_standard.py: support nimbus-inference v0.0.5 + the actual Pan-Multiplex gold-standard directory layout. The script previously called Nimbus.prepare_normalization_dict (removed in v0.0.5) and assumed per-subset labels/ + raw/ dirs; the real layout has <subset>/fovs/ plus a single central gold_standard_groundtruth.csv. Now: prepare_normalization_dict is invoked on MultiplexDataset (its v0.0.5 home), discover_gold_standard_subsets accepts both layouts and pivots the central CSV per-FOV when needed, the segmentation naming convention probes both Pan-Multiplex (<fov>.ome.tif) and legacy DeepCell (<fov>_whole_cell.tiff) names, and the image suffix is autodetected. Smoke against /data/xwang3/nimbus_gold_standard/ gold_standard_labelled completes end-to-end (macro F1 0.7400, micro 0.8382, 56 markers, 939K cell-marker pairs). Pytest baseline unchanged: 255 passed / 1 skipped / 1 failed (the known post-PR-#62 analysis.validate_mp_refinement path drift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After PR #62 split the monorepo, `analysis/` lives in the research workspace and is no longer importable from this repo's pytest session. The stage7 synthetic-gold-validation test imports `analysis.validate_mp_refinement` and fails collection with `ModuleNotFoundError: No module named 'analysis'` unless the workspace is on PYTHONPATH. Guard the import with `pytest.importorskip(...)` so the suite reports skipped instead of failed in the default sibling-repo-only invocation. Bumps the sibling pytest baseline to 255 passed / 2 skipped / 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… --zarr_dir defaults train.py / predict.py / pretrain.py used to default --zarr_dir to DATA_DIR / "tissuenet-caitlin-labels.zarr", forcing users to set DATA_DIR to the wrapper directory and have the scripts append the inner archive name. Switch to default --zarr_dir = DATA_DIR so the env var holds the actual archive root directly; this matches both how TissueNetConfig(zarr_path=...) is invoked elsewhere and how the baseline runners take their --zarr_dir. The three baseline submodules (xgboost, cellsighter, maps) make the same change on their --zarr_dir defaults; pointers are bumped here. The cellsighter submodule also includes a smoke-safety fix (best_macro_acc=-inf so the first val pass always saves a checkpoint even when macro_accuracy is exactly 0.0); the xgboost submodule includes a label-tightening fix for tiny subset smokes where GroupShuffleSplit can leave compact labels with zero examples in inner_train (rejected by modern xgboost.sklearn.XGBClassifier). Smoke verification on the v10 7-dataset subset (post-tier-3-repair archive) — all 3 baselines + main model now complete end-to-end: - main train.py (cuda:0) train_macro_acc=0.0268, best ckpt saved - xgb baseline (CPU) macro=0.2209, CSV + model.json saved - cellsighter baseline (cuda:1) macro=0.0131, CSV + .pth saved - maps baseline (cuda:2) val_loss=5.96, CSV + .pth + stats saved Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CellTypeAnnotator.forward zeroed padding rows in two places. The first (`channel_feat[padding_mask] = 0.0`) was switched to an out-of-place `masked_fill` in 782c611 to avoid an in-place write on a tensor that's in the AMP autocast backward graph. The second site (`spatial_expanded[padding_mask] = 0.0`) was left as an in-place write, guarded by a defensive `.clone()` on the preceding `expand()` view. That guard is correct today, but the asymmetry is a trap: anyone who removes the `.clone()` thinking it's redundant will silently reintroduce the same AMP-graph hazard the earlier fix addressed. Switching to the same masked_fill pattern removes the trap and drops the now-unneeded clone — masked_fill materializes the expand() view into a fresh tensor. Pytest unchanged: 255 passed / 2 skipped / 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop LoRA from MarkerEmbeddingLayer (and CLI flags / auto-detect):
the trainable projection makes a LoRA adapter mathematically redundant
(proj.W + lora_B @ lora_A collapses into proj_eff.W). Confirmed on v8
that LoRA-8 ties exactly with no-LoRA.
- Add --mean_intensity_mode {none|cls_residual|per_channel|both} to
CellTypeAnnotator: scatter per-cell mean marker intensity into a global
marker-position vector and inject as a CLS residual and/or per-channel
feature. Zero-init the output projection so warm-start from a baseline
ckpt preserves predictions at step 0.
- Add --freeze_backbone to train.py: requires_grad=False on everything
except intensity_cls_branch/intensity_per_channel_proj. Use with
--pretrained_path to train a cheap mean-intensity adapter on top of an
existing ckpt.
- Final-eval val-cap automation: when --max_val_samples is set (cheap
per-epoch val), final eval is rebuilt with no cap so the headline test
number is apples-to-apples vs baselines (which never cap their val).
- Auto-detect mean_intensity_mode from ckpt keys in predict.py,
benchmark_gold_standard.py, and deepcell_types.predict.
- Make pretrained loading tolerate numpy-scalar metadata
(torch.load weights_only=False for pretrained_path).
- Add scripts/fold_lora_into_proj.py utility to fold legacy LoRA weights
into proj.weight so old ckpts load against the LoRA-free model.
- Change canonical defaults: --resnet_channels 48, --domain_weight 0.1,
--mean_intensity_mode cls_residual, --best_metric macro_f1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(baselines): NaN missing-value support + XGBoost submodule bump
Adds a ``missing_value: float = 0.0`` knob to
``extract_features_from_zarr`` so each baseline can pick its preferred
sentinel for absent markers:
- MAPS / CellSighter continue to receive 0.0 (default; their MLP/CNN
layers don't tolerate NaN).
- XGBoost now receives NaN (per submodule update), routing absent
markers through XGBoost's ``missing=NaN`` learned per-split default
direction rather than conflating them with "marker present, mean
intensity 0.0".
Implementation:
- ``_extract_all_dataset_features`` records a per-dataset
``present_markers`` bool mask alongside features/labels/cell_sizes.
- ``extract_features_from_zarr`` accumulates per-split block metadata
(``{split}_block_sizes``, ``{split}_block_absent``) and applies
``_apply_missing_value`` post-extraction (and post-cache-load). The
cache stays missing-value-agnostic — same .npz/pickle serves a MAPS
run and an XGBoost run.
- Cache version bumped 5 -> 6 so legacy caches without
``present_markers`` are rebuilt automatically.
Submodule bump (baselines/xgboost):
- fix(tuning): carve FOV-grouped inner-val for early stopping instead
of leaking the test set into best_iteration.
- feat(missing): pass missing_value=np.nan from run.py and tuning.py.
Tests:
- tests/test_baseline_feature_splits.py: 5 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(submodule): bump baselines/xgboost to include FOV-grouped Optuna fix
dbca43c switches the Optuna inner-val split from cell-level
StratifiedShuffleSplit to FOV-grouped GroupShuffleSplit so
hyperparameter selection sees the same FOV-generalisation gap as the
reported test set (and drops the singleton duplication workaround).
See xuefei-wang/deepcelltypes-xgboost#3 for the full change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(submodule): bump baselines/xgboost for train_best_model retighten
57b4997 adds label-space re-tightening in train_best_model to handle
the case where GroupShuffleSplit on small splits leaves some classes
absent from inner-train (mirrors run.py:178-204). At full scale this
is a no-op; on small splits it allows the tuned XGBoost path to
complete without an XGBClassifier label-space rejection.
Surfaced during smoke testing on a 4-FOV split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(submodule): re-pin baselines/xgboost to the merged main HEAD
xuefei-wang/deepcelltypes-xgboost#3 merged as 6cda78d (squash). Re-pin
to the merged HEAD on main so the pointer tracks an actual branch tip
rather than the pre-squash branch commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xuefei-wang/deepcelltypes-xgboost#4 widens the tuning.py --metric click.Choice to also accept macro_f1 / weighted_f1, matching the research workspace's headline metric. Bumps the submodule pointer to the merged commit (470a74d). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#6) Pulls xuefei-wang/deepcelltypes-xgboost#5 — each saved XGB ckpt now writes a sidecar <model>.remap.json with the post-GSS → ct2idx mapping, so out-of-band evaluators don't need to replay the GroupShuffleSplit to recover the label-space. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls xuefei-wang/deepcelltypes-maps#3 — MAPS output head now covers all 51 archive ct2idx classes instead of just classes seen in train, removing the 5–10 pp macro-F1 artifact from classes with zero train support. Existing v10 ckpts unaffected (eval-side reads n_out from the ckpt). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…backbone (#8) The existing --freeze_backbone freezes every parameter except the mean-intensity branches (intensity_cls_branch / intensity_per_channel_proj). The CT classifier head, CLS token, and final norm — all CT-task layers — stay frozen. That's a tight definition of "adapter only" but it leaves a known limitation: the CT head can't adapt to whatever the mean-intensity branch adds to the CLS embedding, so the model's improvement is capped by the pretrained head's biases. Add --unfreeze_ct_head (default off). With this flag set alongside --freeze_backbone, the freeze policy additionally re-enables: - ct_head (the CT classifier MLP, ~105K params) - final_norm (LayerNorm before heads, ~512 params) - cls_token (the trainable CLS embedding parameter) The heavy backbone (transformer 3.2M, per-channel encoder 130K, marker embedder LoRA 175K, spatial encoder 57K) stays frozen as before. Use case: train Frozen-CLS variants where you want the new mean-intensity side-input AND the CT head to co-adapt without unleashing the full transformer. Brings the trainable share from ~3% to ~6% of total parameters. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both `scripts/predict.py --ct_abstention_k` and the module-form `deepcell_types.predict()` now default to k=0.5 — the v10 published headline operating point (≈9% of cells abstained, +5pp macro_F1 on kept cells; clears every baseline including XGBoost-tuned on the held-out test split). Module-form predict() additions: - `ct_abstention_k=0.5` parameter, k<=0 / None disables. - Per-FOV IQR fence Q1 - k*IQR on max-softmax (the whole FOV is a single tissue×modality group). compute_iqr_fence already guards n<4. - Abstained cells get the sentinel label "Unknown" in `cell_types`. Original argmax preserved in PredictionResult.cell_types_raw, with a boolean PredictionResult.abstained mask alongside. scripts/predict.py: - --ct_abstention_k default flipped from None to 0.5. Set 0 or a negative value to disable. Help text updated to point at docs/reports/ct_iqr_abstention_test.md instead of the older audit doc. - Guard tightened to `k > 0` so the disable contract is explicit. Tests: - Replaced test_default_no_abstention_column with two new tests: test_default_k_0_5_abstention_is_on (≈10% abstained on synthetic frame) and test_disable_abstention_with_nonpositive_k (k<=0 / None as no-op). - All 24 existing canonical-inference + abstention tests pass; the 1-cell test_predict_* cases trip compute_iqr_fence's n<4 guard, so no abstention fires and assertions hold. Backwards-compat note: callers that don't pass `ct_abstention_k=` will now see "Unknown" labels appear for low-confidence cells. To restore the pre-change behaviour, pass `ct_abstention_k=0` (or None) at the call site, or `--ct_abstention_k 0` on the CLI. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…,score) (#10) benchmark_gold_standard.py evaluates marker-positivity at a single hardcoded threshold (default 0.5). Per-marker threshold tuning needs the raw per-cell scores, which the script wasn't persisting. When the DCT_GOLD_PREDS_CSV env var is set, after the inference pass the script writes a flat CSV of (fov, cell_id, channel, pred_score) for every prediction the model produced. Downstream callers can then apply oracle CV per-marker τ (or any other threshold-tuning protocol) without re-running the model — see analysis/rescore_gold_oracle_cv.py in the research workspace, which consumes this CSV to produce the final_*_gold_metrics_learned.json adaptive-τ tables. No behaviour change when the env var is unset (no file written). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Module docstring previously documented `k=None (off): default`. The production default in both `scripts/predict.py:81` and `deepcell_types/predict.py:242` is `k=0.5`. Updated the Pareto-sweep note to reflect the current operating point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the data-provenance audit, the --min_channels=3 filter is vacuous on the labeled v10 corpus — the 622 archive FOVs excluded from v10 are unlabeled (no standardized_source annotations), not channel-filtered. The filter logic in dataset.py / baseline_features.py is retained and gated on `min_channels > 0`, so callers who pass it explicitly still get the behavior; only the default changes from 3 → 0 across the 4 CLI scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntime After dropping the --min_channels default 3 → 0 (the filter is a no-op on the labeled v10 corpus), load_fov_splits was strict-failing on the recorded vs runtime min_channels metadata. Add min_channels to _ADVISORY_SPLIT_METADATA_KEYS so the mismatch logs a warning instead of raising. Restores load-compatibility with all existing fov_split_v10*.json files (which carry min_channels=3 in their metadata). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e #79) (#11) * docs(abstention): fix stale docstring — k=0.5 is the default Module docstring previously documented `k=None (off): default`. The production default in both `scripts/predict.py:81` and `deepcell_types/predict.py:242` is `k=0.5`. Updated the Pareto-sweep note to reflect the current operating point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * scripts: drop --min_channels default 3 → 0 (filter is a no-op on v10) Per the data-provenance audit, the --min_channels=3 filter is vacuous on the labeled v10 corpus — the 622 archive FOVs excluded from v10 are unlabeled (no standardized_source annotations), not channel-filtered. The filter logic in dataset.py / baseline_features.py is retained and gated on `min_channels > 0`, so callers who pass it explicitly still get the behavior; only the default changes from 3 → 0 across the 4 CLI scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(splits): tolerate min_channels mismatch between split file and runtime After dropping the --min_channels default 3 → 0 (the filter is a no-op on the labeled v10 corpus), load_fov_splits was strict-failing on the recorded vs runtime min_channels metadata. Add min_channels to _ADVISORY_SPLIT_METADATA_KEYS so the mismatch logs a warning instead of raising. Restores load-compatibility with all existing fov_split_v10*.json files (which carry min_channels=3 in their metadata). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(predict): FOV-grouped train sampler for --learn_mp_thresholds (#79) Root cause: when --learn_mp_thresholds is on, predict.py builds the train loader with use_weighted_sampler=False, which falls back to shuffle=True. Each random batch of 256 hits ~256 unique FOVs, and FullImageDataset's _get_zarr_arrays runs `raw_np = raw_zarr[:]` on the first hit per worker (populating _zero_channel_cache) even when the FOV exceeds the per-worker numpy budget — a full ~1 GB cold zarr load per FOV. With 8 spawn workers × prefetch=4 × random FOVs, batch 0 waits on terabytes of cold zarr reads and effectively never arrives. Training avoids this entirely because FOVGroupedSampler keeps each worker on one FOV at a time. Fix: add SequentialFOVGroupedSampler — uniform-coverage counterpart to FOVGroupedSampler with the same cache-locality guarantee — and a fov_grouped_train flag on create_dataloader to enable it. predict.py passes the flag when --learn_mp_thresholds is set, so the original issue-#79 invocation now runs to completion. Smoke (8 workers, spawn, v10 test split): first 5 batches in 5 min (cold-load), subsequent batches stream from per-worker numpy cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ty tissue buckets
Three cleanups from the 2026-05-19 triple-review low-priority list:
(1) deepcell_types/model.py constructor defaults aligned with CLI:
- resnet_base_channels: 32 → 48 (canonical paper recipe, matches
--resnet_channels default in scripts/train.py + predict.py)
- mean_intensity_mode: "none" → "cls_residual" (canonical paper recipe,
matches --mean_intensity_mode default in scripts/train.py)
Direct callers like `CellTypeAnnotator(...)` without explicit kwargs
were previously silently building the pre-v10 model variant.
(2) deepcell_types/training/config.py:_compute_all_mappings:
Drop tissues whose tissue_celltype_mapping ends up with an empty
allowed-CT set. Previously 4 tissues (esophagus, immune,
musculoskeletal, colon) had keys created on first sighting but never
populated, since their only FOVs lacked standardized_source
annotations. Empty sets are a bug attractor under
--apply_tissue_mask (the mask becomes all-Inf → NaN softmax). Now
the empty entries are filtered out and --apply_tissue_mask just
skips unmapped tissues.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pinned commit a21b97f was the head of merged PR #2; main tip b5447d1 is the merge commit. Same effective content, but the pointer now matches main rather than a non-tip commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rebased onto current master (atop the rebased #52). MAPS adapter: - epoch schedule --max_epochs 500 / --min_epochs 250 / --patience 100 with early stopping on a FOV-grouped inner-validation loss (reported test set never feeds selection); - DCT-safe normalization default (train-set z-score then /255), with a /255-only ablation via --no_znorm; reproducibility metadata recorded; - stale normalization-default comment corrected to match the code. Final tree taken from the pre-rebase tip (81ac2ad); the original branch's merge-commit resolution (DCT-safe README wording) is preserved here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- model.py: thread ct_head_width / ct_head_depth through the annotator so the residual-MLP head size is no longer hardcoded at 512/4. - predict.py: _infer_ct_head_params() derives head arch/width/depth from a checkpoint state_dict when the config omits them, so config-less resMLP checkpoints load; master's ct_out_key / vocab-guard logic is preserved. - retrain_head.py: record ct_head_width/depth + stage-2 provenance in the deployable checkpoint config. - scripts/predict.py: build the model through the inferred head params. Salvaged from PR #55 / stale #41. The released v0.1.0 checkpoint is a legacy MLP head and still loads unchanged. Note: scripts/predict.py carries a near-duplicate of _infer_ct_head_params (follow-up: dedupe into deepcell_types/predict.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review of #56: - scripts/predict.py imported a near-duplicate of _infer_ct_head_params; import the canonical helper from deepcell_types.predict instead (single source of truth, no drift). - Wrap the call so a self-inconsistent checkpoint (config says resmlp but the state_dict lacks ct_head.inp.0.weight) raises a clear ValueError instead of a bare KeyError, matching the deepcell_types.predict path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
docs(baselines): correct stale faithfulness claims in baseline READMEs
…ning feat(maps): DCT-safe training schedule and normalization controls
feat(model): configurable resMLP head + config-less head-shape inference (salvage of #55)
Release-readiness fixes to the user-facing predict() path: - Abstention is now opt-in: ct_abstention_k defaults to None (raw argmax) instead of 0.2. The old default silently relabelled low-confidence cells to "Unknown" in the plain list[str] return at a benchmark-tuned operating point; k=0.2 still reproduces the paper operating point. - Mask all-zero channels in PatchDataset, matching the training dataloader (which attention-masks them). A listed marker that is all-zero on a FOV was previously fed as a present zero token with a real marker embedding, an input the model was trained never to see. Dropping it is equivalent to training's attention mask (padding channels are inert for the CLS). - Reject non-finite (NaN/inf) raw up front instead of silently labelling every cell as class 0 via a poisoned softmax. - Size patch tensors to the real channel count instead of MAX_NUM_CHANNELS; padding tokens are provably inert, so this is numerically identical while avoiding the per-channel ResNet + quadratic-transformer work over padding. Tests: abstention-opt-in default, non-finite rejection, all-zero channel masking, padding numerical-inertness, and a config/preprocessing MPP+percentile parity assertion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- README: the quickstart downloaded the latest checkpoint but then called predict(model_name="deepcell-types_2026-05-17"), which resolves to a file download_model() never wrote -> FileNotFoundError on copy-paste. Capture the path download_model() returns and pass it straight to predict(). Also add the DEEPCELL_ACCESS_TOKEN prerequisite to the model-download section (previously only documented elsewhere). - docs/index.md "Recognized channels" limitation said the registry comes from the zarr archive; with 0.1.0 it ships in the packaged vocab.json by default and the archive is an override. Corrected to match. - CHANGELOG: abstention entry now describes the opt-in (default None) behavior; added entries for all-zero-channel masking, the non-finite input guard, and the real-channel-width tensor sizing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the highest-value test gaps the review flagged: - test_auth.py (new): the download/integrity/extraction layer (utils/_auth.py) was entirely untested. Cover the md5/sha256 hash dispatch + bad-length error, extract_archive's zip-slip / tar-traversal / tar-symlink rejection and the benign-archive happy path, fetch_data's cache-hit and missing-token branches (no network), and the model-registry digest shapes. - compat_marker0_zero now has a behavioral test asserting it zeros marker-0's mean-intensity column (the released-checkpoint parity contract), via a hook on the intensity CLS branch — the branch's final layer is zero-initialized, so the contract must be checked on the branch input, not the fresh model's ct_logits. - An end-to-end numeric regression pin: a fixed-seed checkpoint on a fixed FOV must reproduce a golden softmax fingerprint and be deterministic across calls, so preprocessing/forward drift fails instead of shipping green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/evaluate_on_test.sh required embeddings/svd_512.npz (absent from the repo, regenerable only via OpenAI + the multi-GB archive), so the headline test-split evaluation was not runnable out of the box — even though load_state_dict overwrites the marker embeddings with the checkpoint's, making the SVD file's values unused (only its shape matters). - scripts/predict.py: load the checkpoint before the marker embeddings and add _resolve_marker_embeddings(), which builds a correctly-shaped zeros placeholder from the checkpoint when --svd_embeddings_path is omitted. - evaluate_on_test.sh: the SVD path is now optional (passed only when set), and the default MODEL_CKPT points at the download_model() cache (~/.deepcell/ models) instead of a local dct-final-ckpt path. - test_scripts_predict.py: unit-test the placeholder / delegate / error paths (pure, no archive or network needed). Not run end-to-end here (needs the registration-gated archive in DATA_DIR); a single confirmatory real-archive run is recommended before trusting the numbers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
deepcell_types.baselines.maps.run does `import click` at module top, so tests/baselines/test_maps_normalization.py (which imports normalize_features from it) raised a ModuleNotFoundError collection error — not a clean skip — on an inference-only install with no [train]/baseline-maps extra. This turned the inference-only CI job red (pre-existing on master). The baselines conftest's hand-maintained collect_ignore list was missing this entry; add a click gate matching the other baseline-test gates. Verified by simulating an inference-only collection (extra-only packages hidden via a meta-path blocker): `pytest --collect-only tests` now exits 0 with the maps-normalization module excluded and no remaining collection errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…w findings Addresses code-review findings on PR #57: - dataset.py: use the same `max(axis=1) == 0` criterion as the training dataloader for all-zero channel masking (the previous `(== 0).all(axis=1)` diverged for negative-valued input, breaking the claimed training parity). - dataset.py: fix the comment's reference to a non-existent test file; the real test is test_channel_padding_is_numerically_inert. - dataset.py: broaden the all-masked error message — channels can now be dropped for being unmatched, duplicate, or all-zero, not only unmatched. - test_canonical_inference.py: pin the opt-in `ct_abstention_k=None` default via a signature check (the behavioural assertions use a uniform input that never trips the IQR fence, so they alone don't catch a default regression). - scripts/predict.py: note in --ct_abstention_k help that the batch CLI deliberately defaults abstention ON (paper reproduction) while the predict() library API defaults it OFF. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…=None) The tutorial still described abstention as on-by-default (k=0.2) and told users to disable it with k=0, contradicting PR #57's change making it opt-in. Now documents the None default (raw argmax for every cell) and shows k=0.2 to enable the paper's headline operating point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Release-readiness fixes: README quickstart, abstention opt-in, inference parity, tests
Drop docs/reviews/ (internal multi-agent review + ablation reports kept only for provenance; nothing in code/docs references them) and the dev-only `if __name__ == "__main__"` runner appended to tests/test_v2.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The [analysis] extra documented figure scripts (plot_tsne.py, plot_experiment_results.py) that do not exist in the repo, and nothing imports its only deps (seaborn, openpyxl). Remove the extra, its mention in `all`, and the dangling allowlist entries in tests/conftest.py. Retarget a package-data comment to HierarchicalLoss (the real consumer of combined_celltypes.yaml). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- scripts/train.py: docstring said "DANN disabled by default" but --domain_weight defaults to 0.1 (DANN enabled). - Remove references to internal-monorepo files not shipped in the public repo (analysis/ct_abstention_iqr.py, preprocess_for_training.py, analysis.test_split_summary, dct-final-ckpt/). - training/utils.py: BatchData.tissue_idx docstring described a removed 'index 0 = null token' scheme; the code now raises on a missing tissue. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove public-but-uncalled API surface (0.1.0 is unreleased, so never shipped):
TissueNetConfig.{get_excluded_ct_indices, get_channel_embedding,
get_celltype_embedding, combined_celltype_mapping, color_mapping, core_tree,
lineage_mapping, validate}, the now-unused yaml import, and
create_dataloader_from_config (plus its dataset re-export and __all__ entry).
DataLoaderConfig is kept (exercised by the test suite).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- splits/fov_split_test_current.json: replace a leaked author-machine zarr_path (/data/xwang3/...) with the $DATA_DIR placeholder used by the other three. - splits/README.md: document fov_split_test_current.json (the actual default headline-eval split) and the prior- vs current-archive fingerprint split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- scripts/pretrain.py: usage/runtime hints used bare `python pretrain.py` / `python train.py`, which only run from scripts/; align to `python scripts/...` to match the README. - baselines/nimbus/run.py: reword the in-code TODO documenting the centroid scale_factor overlap as a 'Known limitation' note (the limitation is real and documented intentionally; not unfinished work to flag for release). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All four methods (DCT, MAPS, CellSighter, XGBoost) now use the main
DeepCell-Types sampler by default — sqrt-inverse-frequency with a
1000-count floor — so the baseline-vs-DCT comparison no longer confounds
the method with its class-balancing scheme. Each method's own faithful
sampler stays available as an opt-in ablation.
- samplers.py: factor the DCT weight formula into a shared label-array
helper `compute_sample_weights_dct()`; `compute_sample_weights()` now
delegates to it (byte-identical weights — DCT main path unchanged).
- CellSighter: default `--class_balance` equal -> sqrt (`sqrt` already
maps to the DCT sampler in `create_dataloader`); the faithful
equal-proportion + size_data recipe stays as the `equal` ablation.
- MAPS: add `--class_balance {dct,full_inv_freq,none}`, default `dct`;
`full_inv_freq` is the faithful mahmoodlab/MAPS `n/count` sampler.
- XGBoost (plain + tuned): add `--class_balance {dct,none}`, default
`dct`, applied as a per-row `sample_weight` in `fit()` (the tree analog
of the neural samplers); `none` restores faithful unweighted XGBoost.
- READMEs updated to reflect the new default + the faithful ablations.
Code-only change; baseline numbers must be regenerated by retraining
with the new default before they land in any figure.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…it metadata) Follow-up to the release-cleanup audit on this PR: - scripts/train.py: align the DEFAULT_LOSS_WEIGHTS fallback (domain 0.0 -> 0.1) with the documented "DANN enabled by default" / --domain_weight default. Behavior-neutral for the CLI (the per-run loss_weights dict always overrides "domain" with --domain_weight), fixing only the fallback used by a programmatic forward_one_batch(loss_weights=None) caller. - scripts/evaluate_on_test.sh: drop the surviving internal `dct-final-ckpt/` reference from the header comment; point at the public deepcell_types.baselines path instead. - tests/conftest.py: remove the deleted `[analysis]` extra from the optional- extras comment. - splits/fov_split_test_current.json: correct stale metadata (num_val_fovs 431 -> 129; add num_heldout_fovs: 302) to match the actual val/heldout keys (verified: val=129, bit-identical to fov_split_test.json; heldout=302). Full suite: 358 passed, 1 skipped. ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… provenance Two fixes from the PR #60 deep-review. M1 — XGB sample_weight scale. `compute_sample_weights_dct` returns raw sqrt-inverse-frequency weights (mean several×, up to ~47× on floored rare classes). A WeightedRandomSampler is scale-invariant, but XGBoost consumes sample_weight as an absolute per-row multiplier on the summed gradient/hessian per leaf, so the raw weights silently inflate hessian mass and weaken reg_lambda / min_child_weight relative to the unweighted run — confounding class balance with reduced regularization. Add a `normalize` kwarg (default False, preserving the resampling and main-model paths bit-for-bit; verified: test_samplers 24/24 still pass) and opt the three XGB sites (run.py, tuning.py objective + train_best_model) into normalize=True so the dct-vs-none ablation isolates balancing. M3 — provenance. Because all baselines now default to the shared sampler, two prediction CSVs trained under different schemes are byte-schema-identical. `save_baseline_predictions` now writes the active class_balance (and size_data for CellSighter) to a sidecar `*.meta.json` — a sidecar, not a CSV column, so the prediction schema and downstream softmax-column selection are unchanged. Also warn when CellSighter `--size_data` is set under a non-`equal` scheme (silently inert otherwise), and soften the sampler docstring (rare-tail-is- unbalanced note; drop the unverified "236 cells" figure). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…defaults PR #60 added `--class_balance` to the maps/xgboost/xgboost-tune commands but did not update the frozen-option snapshot tests, so test_xgboost_*_frozen and test_maps_*_frozen failed with "Extra items in the left set: 'class_balance'" (CI-blocking). Add `class_balance` to XGBOOST_OPTS, XGBOOST_TUNE_OPTS, and MAPS_OPTS, and lock the unified-sampler defaults (xgb/xgboost-tune/maps -> dct, cellsighter -> sqrt) via default-value assertions so a silent default flip is caught. Verified: 9/9 frozen-option tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…servative claim From the PR #60 deep-review docs findings: - Note that CellSighter's `--class_balance sqrt` and MAPS/XGB's `dct` select the identical shared scheme (the equivalence was only in a code docstring, so a user scripting "same sampler everywhere" hit click Invalid value). - Document that the 1000-count floor leaves the rare tail effectively unbalanced (the scheme mainly rebalances the head), so "balanced" is not overstated for the region macro-F1 weights most. - Hedge the XGB README's "had made the XGBoost rare-class macro number conservative" — an unmeasured causal claim — to "we expect ... (direction not yet measured)", and document the new mean-1 sample_weight normalization. - Replace bare "WeightedRandomSampler" with a note that the main model wraps it as FOVGroupedSampler (identical draw distribution). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
create_dataloader_from_config (its only functional consumer) was removed earlier in this PR, leaving DataLoaderConfig as an exported-but-inert dataclass with no ergonomic call path. Remove it for a clean release surface; the full keyword API of create_dataloader remains the single way to configure a loader. - dataloader.py: drop the class and its now-orphaned `dataclass`/`typing` imports; update the module docstring. - dataset.py: drop the back-compat re-export and the `__all__` entry. - test_training_import_order.py: drop the DataLoaderConfig import assertions. Full suite: 358 passed, 1 skipped. ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mpler feat(baselines): unify all baselines on the DCT sampler by default
Rebased onto master (which now has #60's --class_balance / DCT-sampler default). Adds an opt-in --val_split_file to MAPS, CellSighter, XGBoost (plain + tuned) so each trains on the FULL --split_file train and selects its checkpoint / early-stop / Optuna-trial on an EXTERNAL validation set — the 'val' FOVs of --val_split_file, capped to 200k cells at seed 42 (mirroring dataloader.py:269-273 max_val_samples), scored with the canonical hierarchical ct_macro_f1 (metrics.py:399-419). The reported set stays --split_file 'val'. Legacy inner-val carve is unchanged when the flag is absent. This matches how the main DCT model selects (val_macro_f1 on the canonical 302-FOV validation, 200k cap), making baseline model-selection consistent with DCT instead of each baseline self-carving a different 10% inner-val. Combined-state fix (needed once #58's eval_set_external path meets #60's sample_weight): thread class_balance through xgb/tuning.py _run_canonical_val_tuning -> run_tuning / train_best_model, and apply compute_sample_weights_dct in the eval_set_external branch — otherwise the tuned-XGB canonical run would tune with DCT weights but ship an unweighted final model. Also adds --features_cache to xgboost-tune for cache reuse, and lists val_split_file in the frozen CLI option snapshots. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…selection feat(baselines): --val_split_file canonical external-val selection (consistent with DCT)
chore: release-readiness cleanup (cruft, stale refs, dead API, split docs)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Merges the separate training repository (
deepcelltypes-cell-type-assignment-pytorch)into this repo and replaces the legacy
CellTypeCLIPModelinference path with thecurrent canonical model. This is the
v0.1.0release cut.Before this PR,
vanvalenlab/deepcell-typeswas inference-only — it shippedCellTypeCLIPModel, thedct_kit/helpers, and a top-level__init__thatexported just
predict. After it, a single package covers training andinference: inference stays a plain
pip install deepcell-types, the fulltraining pipeline lives behind a
[train]extra, and the four paper comparisonbaselines are vendored behind per-baseline extras.
Canonical model
model.pyis rewritten aroundCellTypeAnnotator;CellTypeCLIPModel/CellTypeDataEncoderare removed. Canonical training defaults (scripts/train.py,click-based CLI):--resnet_channels 48,--domain_weight 0.1,--best_metric macro_f1.into a marker-position vector and injected as a CLS residual. The output
projection is zero-init, so warm-starting from a checkpoint preserves
predictions at step 0.
(
--domain_weight 0.1;0disables it).--freeze_backbonetrains only themean-intensity branches on top of an existing checkpoint;
--unfreeze_ct_headadditionally co-adapts the CT head / CLS token / final norm without unfreezing
the transformer backbone.
masked_fill) through thechannel encoder, fusion, and mean-intensity paths so masked tokens contribute
exactly zero rather than leaking
bias/spatial_featinto the transformer.scripts/train.pybundlesct2idx,n_heads,and
compat_marker0_zerointo the checkpoint, and inference asserts thevocabulary ordering matches (a permuted vocabulary previously passed the
count-only check and silently mislabeled cells).
Canonical-only inference
packaged
vocab.jsonsnapshot, sopip install deepcell-types+download_model()is enough to runpredict()— the multi-GB TissueNet zarrarchive is no longer required (pass
zarr_path=/ setDEEPCELL_TYPES_ZARR_PATHonly if you need it). Verified identicalpredictions with vs. without the archive on the paper checkpoint.
ct_abstention_k=0.2), bucketedper-FOV everywhere (CLI, Python API, library): cells below an IQR fence on
the FOV confidence distribution are relabeled to the
"Unknown"sentinel(skipped when
kis disabled or the FOV has <4 cells).predict(..., preprocess=...)overrides theper-FOV normalization without retraining, backed by a bounded op library
(
apply_config,make_preprocessor,DEFAULT_CONFIG) and acomposition-guided adaptation loop (
skills/preproc-adapt/).DCTConfig.PERCENTILE_THRESHOLD) is now99.9, matching the recipe the training archive was built with (was99.0,a carryover from the original packaging).
predict(return_probabilities=True)returns aPredictionResultdataclasswith the full per-cell softmax matrix, cell indices, and the pre-abstention
argmax labels (
cell_types_raw)._torch_load_weightsloads withweights_only=Trueand emits a loud warningif it has to fall back to unsafe pickle on an older torch; a missing
checkpoint raises a clear
FileNotFoundErrorpointing atdownload_model().New public API
predict,DCTConfig,PredictionResult,preprocess_fov,apply_config,make_preprocessor, andDEFAULT_CONFIGare importable fromdeepcell_typesdirectly.
preprocess_fov(raw, mask, native_mpp, channel_names) → PreprocessedFovis the standalone preprocessing entry point.Monorepo: training pipeline
deepcell_types.trainingships from this repo behindpip install "deepcell-types[train]":config.py,dataset.py,archive.py,annotations.py,baseline_features.py,gold_metadata.py,losses.py,metrics.py,patch.py,utils.py,abstention.py.scripts/:train.py,pretrain.py,predict.py,generate_openai_embeddings.py,generate_splits.py,split_val_for_test.py,plus the release-archive gate (
validate_archive_contract.py,check_release_archive.sh).splits/(
fov_split{,_valsubset,_test}.json+ README), so the publishedtrain/val/test partition is reproducible from the repo.
anywhere (
--enable_wandbis gone; confusion matrices save locally as PNGs).zarr>=3.1pulls the Python floor up to 3.11 for the train extra.Baselines
deepcell_types/baselines/(
cellsighter,maps,nimbus,xgb), invoked through the unified runnerpython -m deepcell_types.baselines <name>, each with a self-containedinstall extra (
baseline-cellsighter,baseline-maps,baseline-nimbus,baseline-xgboost).source; third-party licenses are tracked in
deepcell_types/baselines/NOTICE.extract_features_from_zarr(missing_value=...)lets each baseline choose itsabsent-marker sentinel: MAPS / CellSighter keep
0.0; XGBoost can passnp.nanso absent markers route through XGBoost's learnedmissingdirectioninstead of being conflated with "present, intensity 0.0". The feature matrix
records a
present_markersmask and the cache stays missing-value-agnostic.Breaking changes
CellTypeCLIPModelremoved. No shim — usefrom deepcell_types import predict, DCTConfig.predict()arguments aftermppare keyword-only, preventingaccidental transposition of the adjacent string arguments.
device=is thepreferred spelling (
device_num=remains a deprecated alias).predict(num_workers=...)default is now0(was24) — 24 workersOOM'd machines with <64 GB RAM.
of prior releases; pass
ct_abstention_k=0to recover raw argmax.99.0 → 99.9shifts ~5% of predicted labels; on aheld-out test-split sample it reproduces the canonical predictions slightly
better (92.5% vs 91.9% argmax agreement).
Packaging / infra
vocab.json,channel_mapping.yaml, andtraining/config/*.yaml(incl.combined_celltypes.yaml), which werepreviously outside the package tree and absent after
pip install.tifffiledeclared in the[train]extra..github/workflows/ci.yml); inference vs.[train]testboundary enforced.
LICENSEtext matches the OSI Apache 2.0 text exactly (LIC: Revert licence text to exactly match OSI Apache 2 #42);NOTICEaligned to the vanvalenlab convention.
Tests
35 test modules under
tests/(plustests/baselines/) covering canonicalinference, abstention CLI, checkpoint round-trip, dataset/split/sampler
behavior, preprocessing + the preprocess hook, losses, hierarchical eval,
archive-contract validation, baseline feature splits, and vendored-baseline
equivalence against upstream.
See
CHANGELOG.mdfor the full
0.1.0entry and migration notes.