Skip to content

CB accuracy demo update (#4021)#4023

Open
mzegla wants to merge 1 commit intomainfrom
cb_accuracy_demo_update
Open

CB accuracy demo update (#4021)#4023
mzegla wants to merge 1 commit intomainfrom
cb_accuracy_demo_update

Conversation

@mzegla
Copy link
Collaborator

@mzegla mzegla commented Feb 27, 2026

No description provided.

Co-authored-by: Natalia Groza <natalia.groza@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the continuous batching accuracy demo documentation to reflect newer Llama 3.1 model IDs and to standardize the command examples for Linux shell usage.

Changes:

  • Switches Meta-Llama model references from Llama 3 to Llama 3.1 in export and lm-eval examples.
  • Standardizes code fences to bash and adds a note that steps were verified on Linux.

Comment on lines +44 to +46
lm-eval --model local-chat-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B-Instruct,base_url=http://localhost:8000/v3/chat/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path test/ --seed 1 --apply_chat_template --limit 100

local-chat-completions ({'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'base_url': 'http://localhost:8000/v3/chat/completions', 'num_concurrent': 10, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: 100.0, num_fewshot: None, batch_size: 1
local-chat-completions ({'model': 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'base_url': 'http://localhost:8000/v3/chat/completions', 'num_concurrent': 10, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: 100.0, num_fewshot: None, batch_size: 1
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example output shows num_concurrent: 10 while the command above sets num_concurrent=1. This is confusing for readers trying to reproduce the run; please make the command and the captured output consistent (either update the command args or regenerate/update the output snippet).

Copilot uses AI. Check for mistakes.
```console
lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100
```bash
lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example output shows num_concurrent: 10 while the command above sets num_concurrent=1. Please align the command and the captured output snippet so the documentation is reproducible.

Suggested change
lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=1,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100
lm-eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Meta-Llama-3.1-8B,base_url=http://localhost:8000/v3/completions,num_concurrent=10,max_retries=3,tokenized_requests=False --verbosity DEBUG --log_samples --output_path results/ --seed 1 --limit 100

Copilot uses AI. Check for mistakes.
The [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework provides a convenient method of evaluating the quality of the model exposed over OpenAI API.
It reports end to end quality of served model from the client application point of view.

**Note**: Below steps have been verified on Linux
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammar/punctuation: consider adding a period and/or clarifying the scope (e.g., "verified on Linux only") so readers on other OSes know what to expect.

Suggested change
**Note**: Below steps have been verified on Linux
**Note:** The following steps have been verified on Linux only.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants