Skip to content

add drop parameter to OneHotEncoder#934

Open
karen-elisha wants to merge 1 commit into
feature-engine:mainfrom
karen-elisha:onehot-drop-parameter
Open

add drop parameter to OneHotEncoder#934
karen-elisha wants to merge 1 commit into
feature-engine:mainfrom
karen-elisha:onehot-drop-parameter

Conversation

@karen-elisha

Copy link
Copy Markdown

Is your feature request related to a problem?

When drop_last=True, OneHotEncoder always drops the last category (by insertion order). Users cannot control which category is used as the reference group. In many modeling scenarios (e.g., logistic regression), the choice of the reference category matters.

What does this PR do?

Adds a new drop parameter to OneHotEncoder that lets users choose which category to drop:

  • drop="last" — drops the last category alphabetically
  • drop="first" — drops the first category alphabetically
  • drop="most_frequent" — drops the most frequent category found during fit()

Backward compatibility

  • The existing drop_last parameter continues to work as before.
  • If both drop_last and drop are set, a FutureWarning is raised and drop takes precedence.

Edge case handling

  • If drop="most_frequent" and multiple categories are tied for the highest frequency, a UserWarning is raised and the first category alphabetically among the tied ones is dropped.

Files changed

  • feature_engine/encoding/one_hot.py — added drop parameter, validation, and fit logic
  • tests/test_encoding/test_onehot_encoder.py — added 7 new tests covering all drop options

Tests

All 39 tests pass:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant