Update keras, tf and new model usage, numpy 2.0 updates by JGSweets · Pull Request #1206 · capitalone/DataProfiler

JGSweets · 2026-03-13T22:13:54Z

this pr:

updates to allow keras usage > 3.4
allows usage of most recent TFs / metal instead of macos
Updates numpy for v2+

NOTES:

built off of refactor: move from deprecated pkg_resources #1202 and will need to rebase once that merges

JGSweets · 2026-05-06T19:56:44Z

        :param fn: Plugin function
        :return: function
        """
-        global plugins_dict


global is only needed when a function rebinds a module variable, like plugins_dict = {...}.

JGSweets · 2026-05-06T20:01:45Z

@shania-m this is the final PR that will update TF and keras to be current!

JGSweets · 2026-05-11T18:08:44Z

@shania-m I've updated numpy to allow for 2.0+ as well here!

shania-m · 2026-05-11T19:03:33Z

@JGSweets please let me know when it’s ready for review

JGSweets · 2026-05-11T20:21:24Z

@shania-m fixed all the issues related. Updating mypy was more complicated than expected when trying to handle the numpy update!

JGSweets · 2026-05-20T13:51:09Z

@shania-m it is ready for review!

shania-m · 2026-05-22T15:29:02Z

+        """Compiles the loss for the given model and number of labels."""
+        # Compile the model
+        softmax_output_layer_name = model.output_names[0]
+        # losses = {softmax_output_layer_name: "categorical_crossentropy"}


please remove

shania-m · 2026-05-22T15:32:33Z

+        # Compile the model
+        softmax_output_layer_name = model.output_names[0]
+        # losses = {softmax_output_layer_name: "categorical_crossentropy"}
+        losses = ["categorical_crossentropy", None, None]


Quick question — the loss assignment changed from dict-based (by output name) to list-based (by position):

CharacterLevelCnnModel (3 outputs)

losses = ["categorical_crossentropy", None, None]

CharLoadTFModel (2 outputs)

losses = ["categorical_crossentropy", None]

Can you confirm the output ordering is stable and these align correctly with the model outputs? Just want to make sure the positional
assignment matches up since the dict approach was order-independent.

Good call out! I believe the previous layers were list based which is why in keras 3 it required the list losses. This code matched that, however, like you I prefer the order-independent approach and am looking into the requirements of that and ensuring the backwards compatibility of loading a model that was list based initially.

shania-m · 2026-05-22T15:33:05Z

Few small things, thanks for the contribution!

JGSweets · 2026-05-22T16:58:43Z


+    def _compile_model(self, num_labels: int) -> None:
+        """Compile the model with dict-based losses and metrics."""
+        losses = {


ensure we utilize dict based solution

socket-security · 2026-05-22T16:59:11Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Package	Supply Chain Security	Vulnerability
numpy@1.26.4 ⏵ 2.4.6	⁺¹
memray@1.11.0 ⏵ 1.19.3	^-20	⁺¹
keras@3.4.0 ⏵ 3.14.1		⁺⁷⁵
pre-commit@2.19.0 ⏵ 4.3.0	^-1

View full report

JGSweets · 2026-05-22T16:59:13Z

+        cls, softmax_output: tf.Tensor, argmax_output: tf.Tensor | None = None
+    ) -> dict[str, tf.Tensor]:
+        """Return normalized dict outputs for training and inference."""
+        if argmax_output is None:


ensure normalized dict based model outputs

socket-security · 2026-05-22T16:59:13Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action	Severity	Alert (click "▶" to expand/collapse)
Warn		License policy violation: pypi `numpy` under FSFAP License: FSFAP - The applicable license policy does not permit this license (5) (numpy-2.4.6/vendored-meson/meson/test cases/frameworks/6 gettext/data3/metainfo.its) From: requirements.txt → `pypi/numpy@2.4.6` ℹ Read more on: This package \| This alert \| What is a license policy violation? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore pypi/numpy@2.4.6`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

JGSweets · 2026-05-22T16:59:40Z

+    @classmethod
+    def _normalize_model_outputs(cls, model: tf.keras.Model) -> tf.keras.Model:
+        """Convert list-style outputs to the normalized dict structure."""
+        return labeler_utils.normalize_tf_model_outputs(


conversion of previous style for consistency requirement by keras 3

JGSweets · 2026-05-22T17:06:04Z


    # boolean if the label mapping requires the mapping for index 0 reserved
    requires_zero_mapping: bool = True
+    _SOFTMAX_OUTPUT = "softmax_output"


normalize layer names

JGSweets · 2026-05-22T17:06:12Z


    # boolean if the label mapping requires the mapping for index 0 reserved
    requires_zero_mapping = False
+    _SOFTMAX_OUTPUT = "softmax_output"


normalize layer names

JGSweets · 2026-05-22T17:07:34Z

-                num_labels, activation="softmax", name="softmax_output"
+                num_labels,
+                activation="softmax",
+                name=self._new_softmax_head_name(),


allows iteration on layer name due to keras reqs

JGSweets · 2026-05-22T17:09:36Z

+            acc_value = next(
+                (value for key, value in model_results.items() if key.endswith("acc")),
+                np.nan,
+            )
+            f1_value = next(
+                (value for key, value in model_results.items() if "f1" in key.lower()),
+                np.nan,
+            )


due to dict based output

JGSweets · 2026-05-22T17:10:51Z

        BaseModel.__init__(self, label_mapping, parameters)

+    @classmethod
+    def _create_model_outputs(


similar to char_load_tf_model.py but with the threshargmax

JGSweets · 2026-05-22T17:11:16Z

+            acc_value = next(
+                (value for key, value in model_results.items() if key.endswith("acc")),
+                np.nan,
+            )
+            f1_value = next(
+                (value for key, value in model_results.items() if "f1" in key.lower()),
+                np.nan,
+            )


due to dict based change

JGSweets · 2026-05-22T17:11:56Z

    return None


+def normalize_tf_model_outputs(


this allows us backwards compatibility with the list based models.

JGSweets · 2026-05-22T17:27:54Z

@shania-m sorry to add so much more, but this should be safer since it has the dict mapping back!

shania-m · 2026-05-22T17:36:35Z

Thanks for the contributions!
Some non blocking recommendations:

Add an upper bound to numpy — numpy>=1.22.0,<3.0.0 instead of numpy>=1.0.0 to prevent future breakage from numpy 3.
Add an upper bound to keras — keras>3.4.0,<4.0.0 to protect against future Keras major versions.
Add a test for loading old-format models — Verify that _normalize_model_outputs correctly handles models saved with the previous list-style
output format.
Track numpy private API usage — Add a code comment noting that _histograms_impl is a private module, linking to any numpy discussion about a public replacement.
Update CHANGELOG — Document the numpy 2.0 and keras >3.4 support as a notable change.

JGSweets · 2026-05-22T19:45:02Z

Updated.
Updated.
Added unit tests to address this.
Added a comment.
Added a CHANGELOG

Thanks!

shania-m · 2026-05-22T19:55:02Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
Action Severity Alert (click "▶" to expand/collapse)
Warn
License policy violation: pypi numpy under FSFAP
License: FSFAP - The applicable license policy does not permit this license (5) (numpy-2.4.6/vendored-meson/meson/test cases/frameworks/6 gettext/data3/metainfo.its)

From: requirements.txt → pypi/numpy@2.4.6

ℹ Read more on: This package | This alert | What is a license policy violation?
Next steps: Take a moment to review the security alert above. Review
the linked package source code to understand the potential risk. Ensure the
package is not malicious before proceeding. If you're unsure how to proceed,
reach out to your security team or ask the Socket team for help at
support@socket.dev.

Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license.
_Mark the package as acceptable risk_. To ignore this alert only
in this pull request, reply with the comment
`@SocketSecurity ignore pypi/numpy@2.4.6`. You can
also ignore all packages with `@SocketSecurity ignore-all`.
To ignore an alert for all future pull requests, use Socket's Dashboard to
change the [triage state of this alert](https://socket.dev/dashboard/org/CapitalOne/diff-scan/fb93fa10-16b6-463d-98b5-4ecd9aace5bf/alert/QlzpKL-e6SclipJA2kKiS3Hc2nht0YycgksTG8-rSmHk).
View full report

@JGSweets i need to review these before approving

JGSweets · 2026-05-22T20:32:47Z

@shania-m of course! ty for working with me on this!

JGSweets · 2026-05-22T20:40:18Z

After this goes in, what would be the steps needed to make a release? I assume I cannot be part of that, but if so I'm happy to help achieve that as well!

JGSweets · 2026-05-28T16:23:14Z

I realized it might also be good to add py3.12 / py3.13 to the test list, especially since py3.10 reaches EOL this fall.

Will do that in a subsequent PR.

JGSweets requested a review from a team as a code owner March 13, 2026 22:13

JGSweets added 13 commits May 6, 2026 14:21

refactor: move from deprecated pkg_resources

d6b1ec1

fix: to use func

9920fab

fix: add missing change

7592895

refactor: resources to be in package

a3592fb

fix: tests bc of almost

3ecaf6b

feat: refactor to pass in a path or string or None

0a2efd3

fix: import for older versions

fb321a2

fix: Tranversable must be done at runtime

2207920

refactor: keras reqs and others

96344db

refactor: losses for keras and tests

0458e73

fix: remove unneeded global

2fe4ddd

fix: accidentally duplicated test on rebase

c41303e

fix: rebase duplicates

0615268

JGSweets force-pushed the update-keras branch from bf7f29c to 0615268 Compare May 6, 2026 19:38

fix: keras reqs

f08af16

JGSweets commented May 6, 2026

View reviewed changes

refactor: update to be more than 3.4.0 for keras

e5f4041

JGSweets changed the title ~~[WIP] Update keras, tf and new model usage~~ Update keras, tf and new model usage May 6, 2026

JGSweets changed the title ~~Update keras, tf and new model usage~~ Update keras, tf and new model usage, numpy 2.0 updates May 8, 2026

refactor: numpy2 and mypy

052d058

JGSweets added 2 commits May 11, 2026 13:29

fix: mypy 3.10

3965667

fix: bugs

f1046a9

JGSweets added 3 commits May 11, 2026 14:05

fix: float

8f1b4e0

refactor: for hist fix too

fdc671e

fix: issue with none in hist

34c47fe

shania-m requested changes May 22, 2026

View reviewed changes

shania-m reviewed May 22, 2026

View reviewed changes

JGSweets added 2 commits May 22, 2026 11:04

fix: remove comment

57066fb

refactor: to still utilize dict mapping for losses

5de7abe

JGSweets commented May 22, 2026

View reviewed changes

fix: int pre-commit

e1afcf7

JGSweets commented May 22, 2026

View reviewed changes

fix: train labeling

0b00aed

JGSweets added 3 commits May 22, 2026 12:48

refactor notes, reqs, and change log

8edd1dc

fix: pre-commit

03b4fa1

refactor: add unit tests validating usage of the old load format

ffbac1a

Conversation

JGSweets commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JGSweets commented May 6, 2026

Uh oh!

JGSweets commented May 11, 2026

Uh oh!

shania-m commented May 11, 2026

Uh oh!

JGSweets commented May 11, 2026

Uh oh!

JGSweets commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

CharacterLevelCnnModel (3 outputs)

CharLoadTFModel (2 outputs)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shania-m commented May 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

socket-security Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

socket-security Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JGSweets commented May 22, 2026

Uh oh!

shania-m commented May 22, 2026

Uh oh!

JGSweets commented May 22, 2026

Uh oh!

shania-m commented May 22, 2026

Uh oh!

JGSweets commented May 22, 2026

Uh oh!

JGSweets commented May 22, 2026

Uh oh!

JGSweets commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JGSweets commented Mar 13, 2026 •

edited

Loading

socket-security Bot commented May 22, 2026 •

edited

Loading

socket-security Bot commented May 22, 2026 •

edited

Loading

JGSweets commented May 28, 2026 •

edited

Loading