Skip to content

Differences in faithfulness metric #26

@dropberry

Description

@dropberry

Hi there, your formula for faithfulness as stated in the appendix of your paper doesn't fit with the formula illustrated in figure 1.

According to the formula in the appendix, the number of all model response claims that are entailed in the retrieved chunks is divided by the number of all model response claims. According to the formula in figure 1, the number of correct model response claims in relevant chunks + the number of incorrect model response claims in the retrieved chunks are divided by the number of all model response claims.

In my analysis, I had some special cases where I had correct model response claims entailed in irrelevant chunks. I worked with the formulas as presented in the picture on your github, and so I noticed the difference between ragcheckers' calculation of faithfulness and mine.

In theory, this shouldn't be possible because if there is any correct claim in the model response, then there should be a correct claim in the gt_answer as well. However, the claim in the gt_answer, that should have been labeled with 'Entailment', was labeled with 'Neutral'.

It would be nice to update your picture on github so that it matches the calculation of ragchecker or at least make a note that the formulas differ in that case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions