Update gkd_loss.py teacher_prompt supports two formats:#9562
Conversation
teacher_prompt supports two formats:
1. str: Used only as the user content for the teacher, replacing the last user message.
2. list[dict]: A complete list of messages (including system/user/assistant),
which directly replaces the teacher's messages (retaining the assistant from the original data as the response).
There was a problem hiding this comment.
Code Review
This pull request updates build_opsd_teacher_data in swift/rlhf_trainers/gkd_loss.py to support teacher_prompt as either a string or a list of message dictionaries. A review comment points out an inefficiency where an assistant message is fetched from the original messages and then immediately popped when strip_assistant is True. A code suggestion was provided to optimize this logic by only fetching the fallback assistant message when strip_assistant is False.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # Fallback: If teacher_prompt does not end with an assistant message, fetch it from the original messages | ||
| if (not teacher_messages) or teacher_messages[-1].get('role') != 'assistant': | ||
| orig = data.get('messages', []) | ||
| for m in reversed(orig): | ||
| if m.get('role') == 'assistant': | ||
| teacher_messages.append(dict(m)) | ||
| break | ||
| if strip_assistant and teacher_messages and teacher_messages[-1].get('role') == 'assistant': | ||
| teacher_messages.pop() |
There was a problem hiding this comment.
When strip_assistant is True and teacher_prompt (as a list) does not end with an assistant message, the code currently fetches the assistant message from the original messages, appends it, and then immediately pops it. This is redundant and inefficient. We can optimize this by only running the fallback logic when strip_assistant is False.
| # Fallback: If teacher_prompt does not end with an assistant message, fetch it from the original messages | |
| if (not teacher_messages) or teacher_messages[-1].get('role') != 'assistant': | |
| orig = data.get('messages', []) | |
| for m in reversed(orig): | |
| if m.get('role') == 'assistant': | |
| teacher_messages.append(dict(m)) | |
| break | |
| if strip_assistant and teacher_messages and teacher_messages[-1].get('role') == 'assistant': | |
| teacher_messages.pop() | |
| if strip_assistant: | |
| if teacher_messages and teacher_messages[-1].get('role') == 'assistant': | |
| teacher_messages.pop() | |
| else: | |
| # Fallback: If teacher_prompt does not end with an assistant message, fetch it from the original messages | |
| if (not teacher_messages) or teacher_messages[-1].get('role') != 'assistant': | |
| orig = data.get('messages', []) | |
| for m in reversed(orig): | |
| if m.get('role') == 'assistant': | |
| teacher_messages.append(dict(m)) | |
| break |
|
If replacing the full messages, would using the |
That makes perfect sense, I'll update it right away. |
teacher_prompt supports two formats:
1. str: Used only as the user content for the teacher, replacing the last user message.
2. list[dict]: A complete list of messages (including system/user/assistant),
which directly replaces the teacher's messages (retaining the assistant from the original data as the response).
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).