CB accuracy demo update (#4021) by mzegla · Pull Request #4023 · openvinotoolkit/model_server

mzegla · 2026-02-27T13:55:50Z

No description provided.

Co-authored-by: Natalia Groza <natalia.groza@intel.com>

Copilot

Pull request overview

Updates the continuous batching accuracy demo documentation to reflect newer Llama 3.1 model IDs and to standardize the command examples for Linux shell usage.

Changes:

Switches Meta-Llama model references from Llama 3 to Llama 3.1 in export and lm-eval examples.
Standardizes code fences to bash and adds a note that steps were verified on Linux.

Copilot · 2026-02-27T13:58:17Z

demos/continuous_batching/accuracy/README.md

+lm-eval --model local-chat-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B-Instruct,base_url=http://localhost:8000/v3/chat/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG  --log_samples --output_path test/ --seed 1 --apply_chat_template --limit 100

-local-chat-completions ({'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'base_url': 'http://localhost:8000/v3/chat/completions', 'num_concurrent': 10, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: 100.0, num_fewshot: None, batch_size: 1
+local-chat-completions ({'model': 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'base_url': 'http://localhost:8000/v3/chat/completions', 'num_concurrent': 10, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: 100.0, num_fewshot: None, batch_size: 1


The example output shows num_concurrent: 10 while the command above sets num_concurrent=1. This is confusing for readers trying to reproduce the run; please make the command and the captured output consistent (either update the command args or regenerate/update the output snippet).

Copilot · 2026-02-27T13:58:18Z

demos/continuous_batching/accuracy/README.md

-```console
-lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG  --log_samples --output_path results/ --seed 1 --limit 100
+```bash
+lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG  --log_samples --output_path results/ --seed 1 --limit 100


The example output shows num_concurrent: 10 while the command above sets num_concurrent=1. Please align the command and the captured output snippet so the documentation is reproducible.

Suggested change

lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100

lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=10,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100

Copilot · 2026-02-27T13:58:18Z

demos/continuous_batching/accuracy/README.md

 The [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework provides a convenient method of evaluating the quality of the model exposed over OpenAI API.
 It reports end to end quality of served model from the client application point of view. 

+**Note**: Below steps have been verified on Linux


Minor grammar/punctuation: consider adding a period and/or clarifying the scope (e.g., "verified on Linux only") so readers on other OSes know what to expect.

Suggested change

**Note**: Below steps have been verified on Linux

**Note:** The following steps have been verified on Linux only.

CB accuracy demo update (#4021)

e23506f

Co-authored-by: Natalia Groza <natalia.groza@intel.com>

mzegla requested review from Copilot, dkalinowski and ngrozae February 27, 2026 13:55

Copilot started reviewing on behalf of mzegla February 27, 2026 13:56 View session

dkalinowski approved these changes Feb 27, 2026

View reviewed changes

Copilot AI reviewed Feb 27, 2026

View reviewed changes

ngrozae approved these changes Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CB accuracy demo update (#4021)#4023

CB accuracy demo update (#4021)#4023
mzegla wants to merge 1 commit intomainfrom
cb_accuracy_demo_update

mzegla commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100
	lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=10,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100

	Note: Below steps have been verified on Linux
	Note: The following steps have been verified on Linux only.

Conversation

mzegla commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants