Skip to content

Interaction of padding and bidirectional mask #6

@vaibhavad

Description

@vaibhavad

Hi,

Thanks for sharing this very interesting work. I had a question about how the bidirectional attention mask is implemented here

Based on this implementation, it seems like even the padding tokens in a batch will get unmasked, whereas they should remain masked in both unidirectional and bidirectional attention. Is my understanding correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions