Is there any script that we can evaluate on a specific benchmark in the paper?
Is there any script that we can evaluate on a specific benchmark in the paper?