Implementation for soft offline distillation using saved top-k teacher logits by ajkv-google · Pull Request #3382 · AI-Hypercomputer/maxtext

ajkv-google · 2026-03-11T19:49:06Z

Description

This PR introduces an end-to-end offline distillation training pipeline. Previously, the distillation loop executed in an "online" mode, which required both the frozen Teacher model and the learning Student model to be loaded and executed simultaneously during training. This change allows the trainer to load pre-computed, top-K Teacher logits from .array_record files, which allows us to bybass the forward pass for the teacher model during the training loop.

Tests

Tested this code change by running the following command:

python3 src/maxtext/trainers/post_train/distillation/train_distill.py src/maxtext/configs/post_train/distillation.yml steps=100 tokenizer_path="/mnt/ajkv/disks/codebase/maxtext/src/maxtext/assets/tokenizers/tokenizer_llama3.tiktoken" --offline_distillation --offline_data_dir="/mnt/ajkv/disks/teacher_logits_output/teacher_top_k_global.array_record"

Truncated output showing the successful run: https://paste.googleplex.com/6342987127848960#l=8.

Verified that the training happened sucessfully and finished the distillation run.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-11T19:56:02Z

Codecov Report

❌ Patch coverage is 19.23077% with 42 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ners/post_train/distillation/distillation_utils.py	28.12%	23 Missing ⚠️
.../trainers/post_train/distillation/train_distill.py	5.00%	19 Missing ⚠️

📢 Thoughts on this report? Let us know!

entrpn · 2026-03-11T21:44:26Z

src/maxtext/trainers/post_train/distillation/distillation_utils.py

+  def __init__(self, data_dir: str, epochs: int = 100):
+    # Check if the user passed a directory or a direct file path
+    if tf.io.gfile.isdir(data_dir):
+      self.filepath = os.path.join(data_dir, "teacher_top_k_global.array_record")


is it ok to hardcode this file as teacher_top_k_global.array_record?

In the save_top_k_teacher logits file (from this PR), we are writing a single arrayrecord file from one host rather than having multiple hosts write their chunks of data. So, I just named the file as "teacher_top_k_global.arrayrecord". But, I believe not everyone running offline distillation will use the same file to save top-k teacher logits, so I will add this as a field to the config so that users can specify the filename of the saved top-k teacher logits to have it be dynamic.

entrpn · 2026-03-11T22:47:56Z

src/maxtext/trainers/post_train/distillation/train_distill.py


 if __name__ == "__main__":
-  app.run(main)
+  parser = argparse.ArgumentParser()


I think these should go inside types.py to add them as part of the config.

That makes sense, it would make the command less complex and make things more organized if it is in the config. I moved these to types.py and verified the training ran successfully after the change.

… use

…om config

ajkv-google added 3 commits March 11, 2026 19:07

Added train script for offline distillation training

dc4d964

updated code formatting and style

f8bb608

updated iterator to ensure weight updates when training student model

02931fb

ajkv-google requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners March 11, 2026 19:49

entrpn reviewed Mar 11, 2026

View reviewed changes

ajkv-google added 2 commits March 12, 2026 17:17

moved cmd args into the distillation config to make command easier to…

6bb64d0

… use

removed the need for hardcoding arrayrecord file and read directly fr…

1fc0699

…om config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for soft offline distillation using saved top-k teacher logits#3382

Implementation for soft offline distillation using saved top-k teacher logits#3382
ajkv-google wants to merge 5 commits intomainfrom
ajkv/offline-distillation-soft

ajkv-google commented Mar 11, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

entrpn Mar 11, 2026

Uh oh!

ajkv-google Mar 12, 2026

Uh oh!

entrpn Mar 11, 2026

Uh oh!

ajkv-google Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ajkv-google commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

entrpn Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

ajkv-google Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

entrpn Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

ajkv-google Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajkv-google commented Mar 11, 2026 •

edited

Loading

codecov bot commented Mar 11, 2026 •

edited

Loading