Skip to content

Update gkd_loss.py teacher_prompt supports two formats:#9562

Open
yhy19 wants to merge 1 commit into
modelscope:mainfrom
yhy19:patch-1
Open

Update gkd_loss.py teacher_prompt supports two formats:#9562
yhy19 wants to merge 1 commit into
modelscope:mainfrom
yhy19:patch-1

Conversation

@yhy19

@yhy19 yhy19 commented Jun 15, 2026

Copy link
Copy Markdown

teacher_prompt supports two formats:
1. str: Used only as the user content for the teacher, replacing the last user message.
2. list[dict]: A complete list of messages (including system/user/assistant),
which directly replaces the teacher's messages (retaining the assistant from the original data as the response).

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

teacher_prompt supports two formats:
        1. str: Used only as the user content for the teacher, replacing the last user message.
        2. list[dict]: A complete list of messages (including system/user/assistant),
           which directly replaces the teacher's messages (retaining the assistant from the original data as the response).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates build_opsd_teacher_data in swift/rlhf_trainers/gkd_loss.py to support teacher_prompt as either a string or a list of message dictionaries. A review comment points out an inefficiency where an assistant message is fetched from the original messages and then immediately popped when strip_assistant is True. A code suggestion was provided to optimize this logic by only fetching the fallback assistant message when strip_assistant is False.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +290 to +298
# Fallback: If teacher_prompt does not end with an assistant message, fetch it from the original messages
if (not teacher_messages) or teacher_messages[-1].get('role') != 'assistant':
orig = data.get('messages', [])
for m in reversed(orig):
if m.get('role') == 'assistant':
teacher_messages.append(dict(m))
break
if strip_assistant and teacher_messages and teacher_messages[-1].get('role') == 'assistant':
teacher_messages.pop()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When strip_assistant is True and teacher_prompt (as a list) does not end with an assistant message, the code currently fetches the assistant message from the original messages, appends it, and then immediately pops it. This is redundant and inefficient. We can optimize this by only running the fallback logic when strip_assistant is False.

Suggested change
# Fallback: If teacher_prompt does not end with an assistant message, fetch it from the original messages
if (not teacher_messages) or teacher_messages[-1].get('role') != 'assistant':
orig = data.get('messages', [])
for m in reversed(orig):
if m.get('role') == 'assistant':
teacher_messages.append(dict(m))
break
if strip_assistant and teacher_messages and teacher_messages[-1].get('role') == 'assistant':
teacher_messages.pop()
if strip_assistant:
if teacher_messages and teacher_messages[-1].get('role') == 'assistant':
teacher_messages.pop()
else:
# Fallback: If teacher_prompt does not end with an assistant message, fetch it from the original messages
if (not teacher_messages) or teacher_messages[-1].get('role') != 'assistant':
orig = data.get('messages', [])
for m in reversed(orig):
if m.get('role') == 'assistant':
teacher_messages.append(dict(m))
break

@hjh0119

hjh0119 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

If replacing the full messages, would using the teacher_messages key be more appropriate?

@yhy19

yhy19 commented Jun 15, 2026

Copy link
Copy Markdown
Author

If replacing the full messages, would using the teacher_messages key be more appropriate?

That makes perfect sense, I'll update it right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants