From 14472dc33ba601a5c687e78a08df4239a2fa851b Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 14:07:00 +0100 Subject: [PATCH 01/16] Add Ubuntu quick start section to README Added quick start instructions for Ubuntu installation. Signed-off-by: Roberto A. Foglietta --- README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/README.md b/README.md index 798c0e951..7de650489 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,25 @@ This project is based on the [llama.cpp](https://github.com/ggerganov/llama.cpp) ## Installation +### Ubuntu quck start + +``` +sudo apt install ccache clang libomp-dev + +git clone --recursive https://github.com/microsoft/BitNet.git +cd BitNet/ + +mkdir -p models/BitNet-b1.58-2B-4T/ +link="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf" +wget -c "$link" -O models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf + +python3 setup_env.py -md models/BitNet-b1.58-2B-4T/ -q i2_s + +sysprompt="You are a helpful assistant" +python3 run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "$sysprompt" -cnv --temp 0.3 + +``` + ### Requirements - python>=3.9 - cmake>=3.22 From 961b02d32d7306d5de5dbd55625f729bce0265ef Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 14:10:30 +0100 Subject: [PATCH 02/16] Update model download link and inference command Keep the text inside a 80-cols lile code window Signed-off-by: Roberto A. Foglietta --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7de650489..fefb28730 100644 --- a/README.md +++ b/README.md @@ -166,13 +166,15 @@ git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet/ mkdir -p models/BitNet-b1.58-2B-4T/ -link="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf" +url="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf" +link="$url/resolve/main/ggml-model-i2_s.gguf" wget -c "$link" -O models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf python3 setup_env.py -md models/BitNet-b1.58-2B-4T/ -q i2_s sysprompt="You are a helpful assistant" -python3 run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "$sysprompt" -cnv --temp 0.3 +model="models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf" +python3 run_inference.py -m $model -p "$sysprompt" -cnv --temp 0.3 ``` From 3ee53d1a5802151a8bc1ba216a473ff0adb0f923 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 14:19:15 +0100 Subject: [PATCH 03/16] Update model download and setup instructions in README Strong use of viariable for bash code flexibility Signed-off-by: Roberto A. Foglietta --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index fefb28730..bf091ece4 100644 --- a/README.md +++ b/README.md @@ -165,16 +165,17 @@ sudo apt install ccache clang libomp-dev git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet/ -mkdir -p models/BitNet-b1.58-2B-4T/ +gguf="ggml-model-i2_s.gguf" +mdir="models/BitNet-b1.58-2B-4T/" url="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf" -link="$url/resolve/main/ggml-model-i2_s.gguf" -wget -c "$link" -O models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf +link="$url/resolve/main/$gguf" +modprm="$mdir/$gguf" +mkdir -p $mdir && wget -c "$link" -O $modprm -python3 setup_env.py -md models/BitNet-b1.58-2B-4T/ -q i2_s +python3 setup_env.py -md $mdir -q i2_s sysprompt="You are a helpful assistant" -model="models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf" -python3 run_inference.py -m $model -p "$sysprompt" -cnv --temp 0.3 +python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 ``` From fe9bf30171cc87e0c395986545f1ca3f1d1ad6df Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 17:05:48 +0100 Subject: [PATCH 04/16] Update Ubuntu quick start instructions in README Added swapoff command and updated model directory path in Ubuntu quick start instructions. Included additional options for running inference with multiple threads. Signed-off-by: Roberto A. Foglietta --- README.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index bf091ece4..e510e2b54 100644 --- a/README.md +++ b/README.md @@ -161,21 +161,27 @@ This project is based on the [llama.cpp](https://github.com/ggerganov/llama.cpp) ``` sudo apt install ccache clang libomp-dev +sudo swapoff -a git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet/ gguf="ggml-model-i2_s.gguf" -mdir="models/BitNet-b1.58-2B-4T/" +mdir="models/BitNet-b1.58-2B-4T" url="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf" link="$url/resolve/main/$gguf" modprm="$mdir/$gguf" -mkdir -p $mdir && wget -c "$link" -O $modprm +mkdir -p $mdir && wget -c "$link" -O $modprm +pip install -r requirements.txt python3 setup_env.py -md $mdir -q i2_s +cmake --build build --config Release sysprompt="You are a helpful assistant" -python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 +python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) +# Alternative with a file prompt +pretkns="--override-kv tokenizer.ggml.pre=str:llama3" +llama-cli -m $modprm -f ${file_prompt} -cnv --temp 0.3 -t $(nproc) $pretkns ``` From c8e95a5a446ed7c8fcc269dbec47def72e4c9538 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 19:19:09 +0100 Subject: [PATCH 05/16] Enhance README with file prompt usage details Updated instructions for running inference with file prompts and specific parameters. Signed-off-by: Roberto A. Foglietta --- README.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e510e2b54..6703abbe6 100644 --- a/README.md +++ b/README.md @@ -177,11 +177,16 @@ pip install -r requirements.txt python3 setup_env.py -md $mdir -q i2_s cmake --build build --config Release +export PATH="$PATH:$PWD/build/bin/" sysprompt="You are a helpful assistant" python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) -# Alternative with a file prompt + +# Alternative with a file prompt and sepcific parameters +tempr="--temp 0.3 --dynatemp-range 0.1" +file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} pretkns="--override-kv tokenizer.ggml.pre=str:llama3" -llama-cli -m $modprm -f ${file_prompt} -cnv --temp 0.3 -t $(nproc) $pretkns +intcnv="-i --multiline-input -cnv -c 4096 -b 2048" +llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv ``` From 655761974cc26af82e5613e1585cf9d824a239b1 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 20:33:05 +0100 Subject: [PATCH 06/16] Upgrade pip and install requirements with filtering Upgrade pip before installing requirements and filter output. Signed-off-by: Roberto A. Foglietta --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6703abbe6..ce9f1ad10 100644 --- a/README.md +++ b/README.md @@ -173,7 +173,8 @@ link="$url/resolve/main/$gguf" modprm="$mdir/$gguf" mkdir -p $mdir && wget -c "$link" -O $modprm -pip install -r requirements.txt +{ python3 -m pip install --upgrade pip; pip install -r requirements.txt; }\ + | grep -ve "^Requirement already satisfied:" python3 setup_env.py -md $mdir -q i2_s cmake --build build --config Release From 93d32254598756ad7e3742983603e53c94146310 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 20:51:49 +0100 Subject: [PATCH 07/16] Modify inference command parameters in README Updated parameters for inference command in README. Signed-off-by: Roberto A. Foglietta --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ce9f1ad10..6085f2c35 100644 --- a/README.md +++ b/README.md @@ -183,10 +183,10 @@ sysprompt="You are a helpful assistant" python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) # Alternative with a file prompt and sepcific parameters -tempr="--temp 0.3 --dynatemp-range 0.1" +tempr="--temp 0.3 --dynatemp-range 0.1 --no-warmup" file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} pretkns="--override-kv tokenizer.ggml.pre=str:llama3" -intcnv="-i --multiline-input -cnv -c 4096 -b 2048" +intcnv="-i --multiline-input -cnv -c 8192 -b 4096" llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv ``` From 71e90cf28b3ed186383855b4e2d5f571b4cd29ec Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Mon, 12 Jan 2026 00:25:58 +0100 Subject: [PATCH 08/16] Update tokenizer and input parameters in README Some more parameters added to the alternative llama start command Signed-off-by: Roberto A. Foglietta --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6085f2c35..ee9babde6 100644 --- a/README.md +++ b/README.md @@ -185,8 +185,8 @@ python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) # Alternative with a file prompt and sepcific parameters tempr="--temp 0.3 --dynatemp-range 0.1 --no-warmup" file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} -pretkns="--override-kv tokenizer.ggml.pre=str:llama3" -intcnv="-i --multiline-input -cnv -c 8192 -b 4096" +pretkns="--override-kv tokenizer.ggml.pre=str:llama3 --mlock" +intcnv="-i --multiline-input -cnv -c 8192 -b 4096 -co --keep -1 -n -1" llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv ``` From a6fb95f1b330e50c01cc36c8c6eeeb8181f50f2f Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sat, 17 Jan 2026 00:06:57 +0100 Subject: [PATCH 09/16] Add bash functions for perplexity and token counting Added useful bash functions for analyzing files and counting tokens. Signed-off-by: Roberto A. Foglietta --- README.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/README.md b/README.md index ee9babde6..43dd0d99c 100644 --- a/README.md +++ b/README.md @@ -189,6 +189,32 @@ pretkns="--override-kv tokenizer.ggml.pre=str:llama3 --mlock" intcnv="-i --multiline-input -cnv -c 8192 -b 4096 -co --keep -1 -n -1" llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv +# Useful bash functions + +function perplexity() { + for f in "$@"; do + printf "Analysing '$f'\nwait for results ...\n" + llama-perplexity -m $modprm -t $(nproc) $pretkns $tempr -f "$f" 2>&1 | grep PPL + done +} + +function llama3-token-counter() { + for f in "$@"; do + llama-cli -m $modprm -t $(nproc) $pretkns -f "$f" -c 1 2>&1 |\ + sed -ne "s/.*too long (\([0-9]\+\) tok.*/\\1/p" + done +} + +function round_to_even() { declare -i n=${1:-0}; echo $[n${2:-}+((n%2)*${3:-1})]; } +function min() { declare -i m=${1:-} a=${2:-0}; [ $m -gt $a ] && m=$a; echo $m; } +function max() { declare -i m=${1:-} a=${2:-0}; [ $m -lt $a ] && m=$a; echo $m; } + +function session_tokens() { + for f in "$@"; do + timeout 1 llama-cli -m $modprm -f "$f" -t $(nproc) $pretkns $tempr $extra_params \ + --prompt-cache-ro -n 1 2>&1 | sed -ne "s/main: loaded .* \([0-9]\+\) tokens/\1/p" + done +} ``` ### Requirements From e994d2562e311ef3b2ab197b1d9c8f071a5a0d54 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sat, 17 Jan 2026 00:12:23 +0100 Subject: [PATCH 10/16] Update intcnv parameters in README.md Standard parameters for Bitnet + better UX interaction Signed-off-by: Roberto A. Foglietta --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 43dd0d99c..4aaa270c4 100644 --- a/README.md +++ b/README.md @@ -186,7 +186,7 @@ python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) tempr="--temp 0.3 --dynatemp-range 0.1 --no-warmup" file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} pretkns="--override-kv tokenizer.ggml.pre=str:llama3 --mlock" -intcnv="-i --multiline-input -cnv -c 8192 -b 4096 -co --keep -1 -n -1" +intcnv="-i --multiline-input -cnv -co -c 4096 -b 2048 -ub 256 --keep -1 -n -1" llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv # Useful bash functions From d170f908ab7bfabdd8c45e29ef7e11acfb1a9403 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sat, 17 Jan 2026 00:15:11 +0100 Subject: [PATCH 11/16] Reorder arguments in run_inference.py command Signed-off-by: Roberto A. Foglietta --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4aaa270c4..ed5ac2022 100644 --- a/README.md +++ b/README.md @@ -180,7 +180,7 @@ cmake --build build --config Release export PATH="$PATH:$PWD/build/bin/" sysprompt="You are a helpful assistant" -python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) +python3 run_inference.py -m $modprm -p "$sysprompt" -t $(nproc) --temp 0.3 -cnv # Alternative with a file prompt and sepcific parameters tempr="--temp 0.3 --dynatemp-range 0.1 --no-warmup" From 880c1e5b4762c4d5a34b60017e98b6481bf5aec0 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sat, 17 Jan 2026 02:15:42 +0100 Subject: [PATCH 12/16] Clean up README.md by removing functions Removed unused and incomplete functions from README.md. Signed-off-by: Roberto A. Foglietta --- README.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/README.md b/README.md index ed5ac2022..96ba87b48 100644 --- a/README.md +++ b/README.md @@ -204,17 +204,6 @@ function llama3-token-counter() { sed -ne "s/.*too long (\([0-9]\+\) tok.*/\\1/p" done } - -function round_to_even() { declare -i n=${1:-0}; echo $[n${2:-}+((n%2)*${3:-1})]; } -function min() { declare -i m=${1:-} a=${2:-0}; [ $m -gt $a ] && m=$a; echo $m; } -function max() { declare -i m=${1:-} a=${2:-0}; [ $m -lt $a ] && m=$a; echo $m; } - -function session_tokens() { - for f in "$@"; do - timeout 1 llama-cli -m $modprm -f "$f" -t $(nproc) $pretkns $tempr $extra_params \ - --prompt-cache-ro -n 1 2>&1 | sed -ne "s/main: loaded .* \([0-9]\+\) tokens/\1/p" - done -} ``` ### Requirements From 27c4e2fc918646481916e6626cc6617fd242984b Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sat, 17 Jan 2026 20:52:46 +0100 Subject: [PATCH 13/16] Add llvm-dev to Ubuntu quick start instructions Signed-off-by: Roberto A. Foglietta --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 96ba87b48..4b7c9163d 100644 --- a/README.md +++ b/README.md @@ -160,7 +160,7 @@ This project is based on the [llama.cpp](https://github.com/ggerganov/llama.cpp) ### Ubuntu quck start ``` -sudo apt install ccache clang libomp-dev +sudo apt install ccache clang libomp-dev llvm-dev sudo swapoff -a git clone --recursive https://github.com/microsoft/BitNet.git From 3c69743bd9bfcca512e3dcf6a2292c5688225b8b Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sat, 17 Jan 2026 21:25:06 +0100 Subject: [PATCH 14/16] Update usage instructions for setup_env.py Signed-off-by: Roberto A. Foglietta --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 4b7c9163d..4540c0bdf 100644 --- a/README.md +++ b/README.md @@ -247,7 +247,7 @@ python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s ```
-usage: setup_env.py [-h] [--hf-repo {1bitLLM/bitnet_b1_58-large,1bitLLM/bitnet_b1_58-3B,HF1BitLLM/Llama3-8B-1.58-100B-tokens,tiiuae/Falcon3-1B-Instruct-1.58bit,tiiuae/Falcon3-3B-Instruct-1.58bit,tiiuae/Falcon3-7B-Instruct-1.58bit,tiiuae/Falcon3-10B-Instruct-1.58bit}] [--model-dir MODEL_DIR] [--log-dir LOG_DIR] [--quant-type {i2_s,tl1}] [--quant-embd]
+usage: setup_env.py [-h] [--hf-repo {1bitLLM/bitnet_b1_58-large,1bitLLM/bitnet_b1_58-3B,HF1BitLLM/Llama3-8B-1.58-100B-tokens,tiiuae/Falcon3-1B-Instruct-1.58bit,tiiuae/Falcon3-3B-Instruct-1.58bit,tiiuae/Falcon3-7B-Instruct-1.58bit,tiiuae/Falcon3-10B-Instruct-1.58bit}] [--model-dir MODEL_DIR] [--log-dir LOG_DIR] [--quant-type {i2_s,tl1,tl2}] [--quant-embd]
                     [--use-pretuned]
 
 Setup the environment for running inference
@@ -260,7 +260,7 @@ optional arguments:
                         Directory to save/load the model
   --log-dir LOG_DIR, -ld LOG_DIR
                         Directory to save the logging info
-  --quant-type {i2_s,tl1}, -q {i2_s,tl1}
+  --quant-type {i2_s,tl1,tl2}, -q {i2_s,tl1,tl2}
                         Quantization type
   --quant-embd          Quantize the embeddings to f16
   --use-pretuned, -p    Use the pretuned kernel parameters

From 3cee7d880a1b4849c179f1458dc0a213c00567a0 Mon Sep 17 00:00:00 2001
From: Suhaibinator <42899065+Suhaibinator@users.noreply.github.com>
Date: Sun, 20 Oct 2024 08:16:21 -0700
Subject: [PATCH 15/16] Update README.md, fix grammer

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 4540c0bdf..ba2f45a77 100644
--- a/README.md
+++ b/README.md
@@ -333,7 +333,7 @@ python utils/e2e_benchmark.py -m /path/to/model -n 200 -p 256 -t 4
    
 This command would run the inference benchmark using the model located at `/path/to/model`, generating 200 tokens from a 256 token prompt, utilizing 4 threads.  
 
-For the model layout that do not supported by any public model, we provide scripts to generate a dummy model with the given model layout, and run the benchmark on your machine:
+For the model layouts that are not supported by any public models, we provide scripts to generate a dummy model with the given model layout, and run the benchmark on your machine:
 
 ```bash
 python utils/generate-dummy-bitnet-model.py models/bitnet_b1_58-large --outfile models/dummy-bitnet-125m.tl1.gguf --outtype tl1 --model-size 125M

From 4721c9eff7648165ea23ab0c31445d8ec3083a33 Mon Sep 17 00:00:00 2001
From: "Roberto A. Foglietta" 
Date: Sat, 17 Jan 2026 21:56:49 +0100
Subject: [PATCH 16/16] Add git submodule sync and update commands

Signed-off-by: Roberto A. Foglietta 
---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index ba2f45a77..eb1becd0d 100644
--- a/README.md
+++ b/README.md
@@ -165,6 +165,8 @@ sudo swapoff -a
 
 git clone --recursive https://github.com/microsoft/BitNet.git
 cd BitNet/
+git submodule sync --recursive
+git submodule update --init --recursive
 
 gguf="ggml-model-i2_s.gguf"
 mdir="models/BitNet-b1.58-2B-4T"