Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions PREFLIGHT.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Before you run ML workload on Multihost with GCE or GKE, simply apply `bash pref

Here is an example for GCE:
```
bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
```

Here is an example for GKE:
```
bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
```

# Optimization 2: Numa binding (You can only apply this to v4 and v5p)
Expand All @@ -22,14 +22,14 @@ For GCE,
[preflight.sh](https://github.com/google/maxtext/blob/main/preflight.sh) will help you install `numactl` dependency, so you can use it directly, here is an example:

```
bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
```

For GKE,
`numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example

```
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
```

1. `numactl`: This is the command-line tool used for controlling NUMA policy for processes or shared memory. It's particularly useful on multi-socket systems where memory locality can impact performance.
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/checkpointing_solutions/convert_checkpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Finally, run below command to complete the conversion
# Optional: If run out of disk space when downloading HuggingFace safetensors,
# customize your "HF_HOME" to redirect the cache to a larger or mounted disk (e.g., on a TPU VM).
# export HF_HOME="/dev/shm/huggingface_tmp"
python3 -m maxtext.checkpoint_conversion.to_maxtext maxtext/configs/base.yml \
python3 -m maxtext.checkpoint_conversion.to_maxtext \
model_name=${MODEL_NAME?} \
hf_access_token=${HF_TOKEN?} \
base_output_directory=${MODEL_CHECKPOINT_DIRECTORY?} \
Expand Down Expand Up @@ -108,7 +108,7 @@ Use the `to_huggingface.py` script to convert a MaxText checkpoint into the Hugg
The following command converts a MaxText checkpoint and saves it locally, to GCS, or uploads it directly to the Hugging Face Hub.

```bash
python3 -m maxtext.checkpoint_conversion.to_huggingface src/maxtext/configs/base.yml \
python3 -m maxtext.checkpoint_conversion.to_huggingface \
model_name=<MODEL_NAME> \
load_parameters_path=<path-to-maxtext-checkpoint> \
base_output_directory=<path-to-save-converted-checkpoint> \
Expand Down
8 changes: 4 additions & 4 deletions docs/run_maxtext/run_maxtext_localhost.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ bash tools/setup/setup.sh DEVICE={tpu|gpu}
After the installation is complete, run a short training job using synthetic data to confirm everything is working correctly. This command trains a model for just 10 steps. Remember to replace `$YOUR_JOB_NAME` with a unique name for your run and `gs://<my-bucket>` with the path to the GCS bucket you configured in the prerequisites.

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
Expand All @@ -72,7 +72,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
To demonstrate model output, run the following command:

```bash
python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \
python3 -m maxtext.inference.decode \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
per_device_batch_size=1
Expand All @@ -92,7 +92,7 @@ To use a pre-configured model for TPUs, you override the `model_name` parameter,
<summary><strong>llama3-8b (TPU)</strong></summary>

```bash
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
model_name=llama3-8b \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
Expand All @@ -106,7 +106,7 @@ python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
<summary><strong>qwen3-4b (TPU)</strong></summary>

```bash
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
model_name=qwen3-4b \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
Expand Down
2 changes: 1 addition & 1 deletion docs/run_maxtext/run_maxtext_single_host_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ Hardware: GPU
```

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=gpu01 base_output_directory=/deps/output \
python3 -m maxtext.trainers.pre_train.train run_name=gpu01 base_output_directory=/deps/output \
dataset_type=synthetic enable_checkpointing=True steps=10 attention=cudnn_flash_te scan_layers=False \
use_iota_embed=True hardware=gpu per_device_batch_size=12
```
Expand Down
2 changes: 1 addition & 1 deletion docs/run_maxtext/run_maxtext_via_multihost_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ The `multihost_job.py` script:

```sh
RUN_NAME=${YOUR_JOB_NAME?} # You may set this to any unique name for a fresh run.
python3 multihost_job.py --NUM_SLICES=${NODE_COUNT?} --RUN_NAME=${RUN_NAME?} --BUCKET_NAME=${BUCKET_NAME?} --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash tools/setup/setup.sh && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${RUN_NAME?}"
python3 multihost_job.py --NUM_SLICES=${NODE_COUNT?} --RUN_NAME=${RUN_NAME?} --BUCKET_NAME=${BUCKET_NAME?} --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash tools/setup/setup.sh && python3 -m maxtext.trainers.pre_train.train run_name=${RUN_NAME?}"
```

We tell `multihost_job` to target the `reserved` pool by by including `--reserved` as extra arguments to the CQR request, but you may instead target the `on-demand` pool by removing the `--CQR_EXTRA_ARGS` flag (on-demand is default), or the pre-emptible pool with `--CQR_EXTRA_ARGS="--best-effort"`, which may be necessary if your reservation is full.
Expand Down
2 changes: 1 addition & 1 deletion docs/run_maxtext/run_maxtext_via_multihost_runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Although there are several steps below, most are for the initial setup. Once set
Set config values for `base_output_directory` and `dataset_path` in `configs/base.yml` if not set already.

```
python3 multihost_runner.py --TPU_PREFIX=${TPU_PREFIX?} --COMMAND="python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${RUN_NAME?}"
python3 multihost_runner.py --TPU_PREFIX=${TPU_PREFIX?} --COMMAND="python3 -m maxtext.trainers.pre_train.train run_name=${RUN_NAME?}"
```

If you are running the `multihost_runner.py` script from a TPUVM, you will need to set `--INTERNAL_IP=true`.
Expand Down
4 changes: 2 additions & 2 deletions docs/run_maxtext/run_maxtext_via_pathways.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ xpk workload create-pathways \
--project=${PROJECT?} \
--zone=${ZONE?} \
--docker-image=${DOCKER_IMAGE?} \
--command="python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
--command="python3 -m maxtext.trainers.pre_train.train \
base_output_directory=gs://${BUCKET_NAME?} \
per_device_batch_size=1 \
enable_checkpointing=false \
Expand Down Expand Up @@ -154,7 +154,7 @@ export JAX_PLATFORMS=proxy
export JAX_BACKEND_TARGET=grpc://127.0.0.1:29000

# Run the training script
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
base_output_directory=gs://${BUCKET_NAME?} \
per_device_batch_size=1 \
enable_checkpointing=false \
Expand Down
4 changes: 2 additions & 2 deletions docs/run_maxtext/run_maxtext_via_xpk.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ For instance, to run a job across **four TPU slices**, you would change `--num-s
--base-docker-image maxtext_base_image\
--tpu-type v5litepod-256\
--num-slices 1\
--command "python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${USER}-tpu-job base_output_directory=${BASE_OUTPUT_DIR?} dataset_path=${DATASET_PATH?} steps=100"
--command "python3 -m maxtext.trainers.pre_train.train run_name=${USER}-tpu-job base_output_directory=${BASE_OUTPUT_DIR?} dataset_path=${DATASET_PATH?} steps=100"
```

- **On your GPU cluster:**
Expand All @@ -199,7 +199,7 @@ For instance, to run a job across **four TPU slices**, you would change `--num-s
--base-docker-image maxtext_base_image\
--device-type h100-80gb-8\
--num-nodes 2\
--command "python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${USER}-gpu-job base_output_directory=${BASE_OUTPUT_DIR?} dataset_path=${DATASET_PATH?} steps=100"
--command "python3 -m maxtext.trainers.pre_train.train run_name=${USER}-gpu-job base_output_directory=${BASE_OUTPUT_DIR?} dataset_path=${DATASET_PATH?} steps=100"
```

______________________________________________________________________
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/first_run.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ pre-commit install
4. After installation completes, run training on synthetic data with the following command:

```sh
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
Expand All @@ -61,7 +61,7 @@ Optional: If you want to try training on a Hugging Face dataset, see [Data Input
5. To demonstrate model output, run the following command:

```sh
python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \
python3 -m maxtext.inference.decode \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
per_device_batch_size=1
Expand All @@ -83,7 +83,7 @@ You can use [demo_decoding.ipynb](https://github.com/AI-Hypercomputer/maxtext/bl
2. After installation is complete, run training with the following command on synthetic data:

```sh
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
Expand All @@ -93,7 +93,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
3. To demonstrate model output, run the following command:

```sh
python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \
python3 -m maxtext.inference.decode \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
per_device_batch_size=1
Expand Down
1 change: 0 additions & 1 deletion docs/tutorials/posttraining/full_finetuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,6 @@ Below is a sample training script.

```sh
python3 -m maxtext.trainers.pre_train.train \
src/maxtext/configs/base.yml \
run_name=${RUN_NAME?} \
base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
load_parameters_path=${MODEL_CKPT_PATH?} \
Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials/posttraining/knowledge_distillation.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
export PRE_TRAINED_MODEL_CKPT_DIRECTORY=${BASE_DIRECTORY?}/llama3.1-8b-ckpt

# Convert to MaxText format
python3 -m maxtext.checkpoint_conversion.to_maxtext src/maxtext/configs/base.yml \
python3 -m maxtext.checkpoint_conversion.to_maxtext \
model_name=llama3.1-8b \
hf_access_token=${HF_TOKEN?} \
base_output_directory=${PRE_TRAINED_MODEL_CKPT_DIRECTORY?} \
Expand Down Expand Up @@ -170,7 +170,7 @@ You can now fine-tune your smaller student model using supervised fine-tuning te
Example command to run fine-tuning on a TPU v6e-8:

```bash
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated src/maxtext/configs/post_train/sft.yml \
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated \
run_name=${RUN_NAME?} \
base_output_directory=${BASE_DIRECTORY?}/distillation/qwen3-32b-distill-llama3.1-8b \
tokenizer_path=meta-llama/Llama-3.1-8B-Instruct tokenizer_type=huggingface \
Expand Down Expand Up @@ -209,7 +209,7 @@ largest_dir="${sorted_dirs[-1]}"
FINE_TUNED_MODEL_CKPT_PATH=${CHECKPOINTS_PATH?}/${largest_dir}/model_params

# Fine-tune student model on original dataset
python3 -m maxtext.trainers.post_train.sft.train_sft src/maxtext/configs/post_train/sft.yml \
python3 -m maxtext.trainers.post_train.sft.train_sft \
run_name=${RUN_NAME?}_stage2 \
base_output_directory=${BASE_DIRECTORY?}/distillation/qwen3-32b-distill-llama3.1-8b \
tokenizer_path=meta-llama/Llama-3.1-8B-Instruct tokenizer_type=huggingface \
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/posttraining/multimodal.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Then use this command to convert an unscanned checkpoint from HuggingFace to Max
```shell
export HF_ACCESS_TOKEN=hf_...
export MAXTEXT_CKPT_GCS_PATH=gs://...
python -m maxtext.checkpoint_conversion.to_maxtext maxtext/configs/base.yml \
python -m maxtext.checkpoint_conversion.to_maxtext \
model_name=gemma3-4b \
hf_access_token=${HF_ACCESS_TOKEN?} \
base_output_directory=${MAXTEXT_CKPT_GCS_PATH?} \
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/posttraining/rl.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucke
Run the following command for GRPO:

```
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \
python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=${MODEL?} \
tokenizer_path=${TOKENIZER?} \
load_parameters_path=${MAXTEXT_CKPT_PATH?} \
Expand All @@ -157,7 +157,7 @@ The overview of what this run will do is as follows:
Run the following command for GSPO:

```
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \
python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=${MODEL?} \
tokenizer_path=${TOKENIZER?} \
load_parameters_path=${MAXTEXT_CKPT_PATH?} \
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/posttraining/rl_on_multi_host.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ xpk workload create-pathways --workload ${WORKLOAD?} \
--tpu-type=${TPU_TYPE?} --num-slices=1 \
--project=${PROJECT_ID?} --priority=high \
--command "HF_TOKEN=${HF_TOKEN?} TF_CPP_MIN_LOG_LEVEL=0 JAX_PLATFORMS=proxy JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 ENABLE_PATHWAYS_PERSISTENCE='1' \
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \
python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=${MODEL?} \
tokenizer_path=${TOKENIZER?} \
load_parameters_path=${MAXTEXT_CKPT_PATH?} \
Expand All @@ -213,7 +213,7 @@ xpk workload create-pathways --workload ${WORKLOAD?} \
--tpu-type=${TPU_TYPE?} --num-slices=1 \
--project=${PROJECT_ID?} --priority=high \
--command "HF_TOKEN=${HF_TOKEN?} TF_CPP_MIN_LOG_LEVEL=0 JAX_PLATFORMS=proxy JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 ENABLE_PATHWAYS_PERSISTENCE='1' \
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \
python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=${MODEL?} \
tokenizer_path=${TOKENIZER?} \
load_parameters_path=${MAXTEXT_CKPT_PATH?} \
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/posttraining/sft.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ export PRE_TRAINED_MODEL_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs:
Now you are ready to run SFT using the following command:

```sh
python3 -m maxtext.trainers.post_train.sft.train_sft src/maxtext/configs/post_train/sft.yml \
python3 -m maxtext.trainers.post_train.sft.train_sft \
run_name=${RUN_NAME?} \
base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
model_name=${PRE_TRAINED_MODEL?} \
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/posttraining/sft_on_multi_host.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ xpk workload create \
--workload=${WORKLOAD_NAME?} \
--tpu-type=${TPU_TYPE?} \
--num-slices=${TPU_SLICE?} \
--command "python3 -m maxtext.trainers.post_train.sft.train_sft src/maxtext/configs/post_train/sft.yml run_name=${WORKLOAD_NAME?} base_output_directory=${OUTPUT_PATH?} model_name=${MODEL_NAME?} load_parameters_path=${MODEL_CHECKPOINT_PATH?} hf_access_token=${HF_TOKEN?} tokenizer_path=${TOKENIZER_PATH?} per_device_batch_size=1 steps=${STEPS?} profiler=xplane hf_path=${DATASET_NAME?} train_split=${TRAIN_SPLIT?} train_data_columns=${TRAIN_DATA_COLUMNS?}"
--command "python3 -m maxtext.trainers.post_train.sft.train_sft run_name=${WORKLOAD_NAME?} base_output_directory=${OUTPUT_PATH?} model_name=${MODEL_NAME?} load_parameters_path=${MODEL_CHECKPOINT_PATH?} hf_access_token=${HF_TOKEN?} tokenizer_path=${TOKENIZER_PATH?} per_device_batch_size=1 steps=${STEPS?} profiler=xplane hf_path=${DATASET_NAME?} train_split=${TRAIN_SPLIT?} train_data_columns=${TRAIN_DATA_COLUMNS?}"
```

Once the fine-tuning is completed, you can access your model checkpoints at `$OUTPUT_PATH/$WORKLOAD_NAME/checkpoints`.
Expand All @@ -159,7 +159,7 @@ xpk workload create-pathways \
--workload=${WORKLOAD_NAME?} \
--tpu-type=${TPU_TYPE?} \
--num-slices=${TPU_SLICE?} \
--command="JAX_PLATFORMS=proxy JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 ENABLE_PATHWAYS_PERSISTENCE=1 python3 -m maxtext.trainers.post_train.sft.train_sft src/maxtext/configs/post_train/sft.yml run_name=${WORKLOAD_NAME?} base_output_directory=${OUTPUT_PATH?} model_name=${MODEL_NAME?} load_parameters_path=${MODEL_CHECKPOINT_PATH?} hf_access_token=${HF_TOKEN?} tokenizer_path=${TOKENIZER_PATH?} per_device_batch_size=1 steps=${STEPS?} profiler=xplane checkpoint_storage_use_zarr3=False checkpoint_storage_use_ocdbt=False enable_single_controller=True"
--command="JAX_PLATFORMS=proxy JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 ENABLE_PATHWAYS_PERSISTENCE=1 python3 -m maxtext.trainers.post_train.sft.train_sft run_name=${WORKLOAD_NAME?} base_output_directory=${OUTPUT_PATH?} model_name=${MODEL_NAME?} load_parameters_path=${MODEL_CHECKPOINT_PATH?} hf_access_token=${HF_TOKEN?} tokenizer_path=${TOKENIZER_PATH?} per_device_batch_size=1 steps=${STEPS?} profiler=xplane checkpoint_storage_use_zarr3=False checkpoint_storage_use_ocdbt=False enable_single_controller=True"
```

Once the fine-tuning is completed, you can access your model checkpoints at `$OUTPUT_PATH/$WORKLOAD_NAME/checkpoints`.
6 changes: 3 additions & 3 deletions docs/tutorials/pretraining.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ We can use this **command** for pretraining:

```bash
# replace base_output_directory with your bucket
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
base_output_directory=gs://runner-maxtext-logs run_name=demo \
model_name=deepseek2-16b per_device_batch_size=1 steps=10 max_target_length=2048 enable_checkpointing=false \
dataset_type=hf hf_path=allenai/c4 hf_data_dir=en train_split=train \
Expand Down Expand Up @@ -102,7 +102,7 @@ This **command** shows pretraining with Grain pipeline, along with evaluation:

```bash
# replace DATASET_GCS_BUCKET and base_output_directory with your buckets
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
base_output_directory=gs://runner-maxtext-logs run_name=demo \
model_name=deepseek2-16b per_device_batch_size=1 steps=10 max_target_length=2048 enable_checkpointing=false \
dataset_type=grain grain_file_type=arrayrecord grain_train_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-train.array_record* grain_worker_count=2 \
Expand Down Expand Up @@ -139,7 +139,7 @@ This **command** shows pretraining with TFDS pipeline, along with evaluation:

```bash
# replace base_output_directory and dataset_path with your buckets
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
python3 -m maxtext.trainers.pre_train.train \
base_output_directory=gs://runner-maxtext-logs run_name=demo \
model_name=deepseek2-16b per_device_batch_size=1 steps=10 max_target_length=2048 enable_checkpointing=false \
dataset_type=tfds dataset_path=gs://maxtext-dataset dataset_name='c4/en:3.0.1' train_split=train \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ If a ground-truth version isn't available, you'll need to debug the conversion m
3. After the conversion is done, run a decode to check the correctness of the generated code.
Example command:
```bash
python3 -m maxtext.inference.decode src/maxtext/configs/base.yml model_name=gemma3-4b tokenizer_path=src/maxtext/assets/tokenizers/tokenizer.gemma3 \
python3 -m maxtext.inference.decode model_name=gemma3-4b tokenizer_path=src/maxtext/assets/tokenizers/tokenizer.gemma3 \
load_parameters_path=<Your-converted-ckpt-path> per_device_batch_size=1 run_name=ht_test \
max_prefill_predict_length=8 max_target_length=16 steps=1 async_checkpointing=false scan_layers=true \
prompt='I love to' attention='dot_product'
Expand Down
Loading
Loading