Fix dynamic top-k index scores#98
Conversation
- Calculate theta_k for train only, and reuse for val and test. - Catch edge cases max = min leading to zero div - Printing optional, off by default - Elbow method removes nans and skips leading zeros for tighter distribution. - Update default concept caption v
|
All index scores should now be bound between -1 and 1. |
gabrieletijunaityte
left a comment
There was a problem hiding this comment.
Looks great! Thank you for the changes.
|
Do you think we could cache these elbow values as the data split is consistent? Or is the computation very quick? |
|
@gabrieletijunaityte thanks! What do you mean by cached here? The values are only computed for the train split; then cached (in the dictionary) and retrieved for the other splits. See src/models/text_alignment_model.py line 134. |
|
Should be OK now @gabrieletijunaityte ! |
- Calculate thresholds on train set. - Option (default True) to use stored thresholds if available. - Option (default True) to save newly calculated values. - Option (default False) to also compute new thresholds if old thresholds are available for some concepts.
v2 (old) has hand picked theta values. v3 uses the values from v2 but adds baseline values. v4 has newly calculated thresholds (using elbow).
|
@gabrieletijunaityte could you please review again? I made the following changes:
I think that's all! Needed more features than I initially thought but all works well now. The only thing that I haven't implemented is to look for uni- vs bimodal distributions and add both max and min accordingly. Not sure if it's worth it, but it's possible .. |
- Save mode - Save average index
|
@gabrieletijunaityte could you please review again? I have now:
|
gabrieletijunaityte
left a comment
There was a problem hiding this comment.
Looks good, thanks for all the iterations!
What does this PR do?
Before submitting
pytestcommand?