|
gloss += dloss_alpha * dl |
Is the current implementation of mobility regularity-aware loss correct? Now both $L_d$ and $L_p$ are directly added to the loss for calculating policy gradient, but they won't produce any gradient on the input (generated sequences, since they are discrete values).
I guess the correct way is to add them to the reward instead. Is that right?
MoveSim/code/main.py
Line 165 in 93e6837
Is the current implementation of mobility regularity-aware loss correct? Now both$L_d$ and $L_p$ are directly added to the loss for calculating policy gradient, but they won't produce any gradient on the input (generated sequences, since they are discrete values).
I guess the correct way is to add them to the reward instead. Is that right?