Differences in faithfulness metric

Hi there, your formula for faithfulness as stated in the appendix of your paper doesn't fit with the formula illustrated in figure 1.

According to the formula in the appendix, the number of all model response claims that are entailed in the retrieved chunks is divided by the number of all model response claims.  According to the formula in figure 1, the number of correct model response claims in relevant chunks + the number of incorrect model response claims in the retrieved chunks are divided by the number of all model response claims.

In my analysis, I had some special cases where I had correct model response claims entailed in irrelevant chunks. I worked with the formulas as presented in the picture on your github, and so I noticed the difference between ragcheckers' calculation of faithfulness and mine.

In theory, this shouldn't be possible because if there is any correct claim in the model response, then there should be a correct claim in the gt_answer as well. However, the claim in the gt_answer, that should have been labeled with 'Entailment', was labeled with 'Neutral'.

It would be nice to update your picture on github so that it matches the calculation of ragchecker or at least make a note that the formulas differ in that case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences in faithfulness metric #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Differences in faithfulness metric #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions