feat: Use {{field_name}} placeholders and move criteria to user promp#400
feat: Use {{field_name}} placeholders and move criteria to user promp#400daniel5u wants to merge 16 commits into
Conversation
feat: v2.2.2
…lated runtime config
Replace format_map() with regex-based _replace_placeholders() to avoid ValueError when criteria contain JSON braces. Move rule-specific content from system prompt to user prompt for cleaner LLM judge instructions. Make description optional in CustomLLMRuleArgs.
Resolve conflicts by keeping main's version of LLMCustomRule files
which use {{field_name}} placeholder syntax and criteria in user prompt.
There was a problem hiding this comment.
Code Review
This pull request introduces a template-based approach for custom LLM rules, allowing dynamic placeholder substitution within criteria and moving rule-specific context to the user prompt. Key feedback includes a critical mismatch between the regex implementation (expecting double braces {{...}}) and the provided examples/tests (using single braces {...}). Other recommendations involve restoring the metric name in default labels for better tracking, ensuring the LLM receives the metric description for context, and moving the re import to the top of the file to follow standard Python conventions.
| @staticmethod | ||
| def _replace_placeholders(text: str, inputs: dict) -> str: | ||
| """Replace {{field_name}} placeholders, leaving other braces intact.""" | ||
| import re |
| f'- If the criteria do not specify labels, use "label": ["QUALITY_BAD.{custom_rule.metric}"] when status is true.\n' | ||
| "- If the criteria do not specify score semantics, use score 1 for pass/good and score 0 for fail/bad.\n" | ||
| "- If the criteria do not specify pass/good or fail/bad standard, return 1 for all inputs." | ||
| '- If no labels are specified, use "label": ["QUALITY_GOOD"] when status is false and "label": ["QUALITY_BAD"] when status is true.\n' |
There was a problem hiding this comment.
The default failure label has been changed from QUALITY_BAD.{metric} to QUALITY_BAD. This loses the specific metric context in the output. Additionally, this is inconsistent with other error labels in this class (e.g., lines 172 and 206) which still include the metric or exception name. It is recommended to maintain the metric name in the label for consistency and better downstream analysis.
| '- If no labels are specified, use "label": ["QUALITY_GOOD"] when status is false and "label": ["QUALITY_BAD"] when status is true.\n' | |
| f'- If no labels are specified, use "label": ["QUALITY_GOOD"] when status is false and "label": ["QUALITY_BAD.{custom_rule.metric}"] when status is true.\n' |
| user_content = "\n".join( | ||
| self._replace_placeholders(criterion, inputs) for criterion in custom_rule.criteria | ||
| ) |
There was a problem hiding this comment.
The metric name and description are no longer passed to the LLM in the prompt. While the PR description mentions moving rule-specific content to the user prompt, these fields were removed entirely. Consider including them at the beginning of the user_content to provide the LLM with the necessary context for evaluation.
Replace format_map() with regex-based _replace_placeholders() to avoid ValueError when criteria contain JSON braces. Move rule-specific content from system prompt to user prompt for cleaner LLM judge instructions.
Make description optional in CustomLLMRuleArgs.