#

behavioral-testing

Here are 16 public repositories matching this topic...

Basaltlabs-app / Gauntlet

Community-driven behavioral reliability benchmark for LLMs. 231 probes across 19 modules, deterministic scoring, perplexity correlation, layer sensitivity mapping, quant method capture, hardware-stratified community rankings. Every test contributes to the community dataset.

benchmark mcp community-driven model-evaluation ai-evaluation llm ollama sycophancy hallucination-detection llm-testing hardware-benchmark ai-trust trust-scoring behavioral-testing llm-benchmark deterministic-scoring

Updated May 4, 2026
Python

qualixar / agentassert-abc

Formal behavioral specification and runtime enforcement for autonomous AI agents. Agent Behavioral Contracts (ABC).

formal-verification ai-agents drift-detection behavioral-testing agent-reliability qualixar agent-contracts

Updated May 24, 2026
Python

stef41 / modeldiff

Behavioral regression testing for LLMs — diff, drift, fingerprint. Zero deps.

python nlp machine-learning evaluation regression-testing fingerprinting model-comparison drift-detection llm behavioral-testing

Updated Apr 10, 2026
Python

senaayy / Computational-Cognitive-Lab

python machine-learning neuroscience computational-neuroscience cognitive-science mne-python biomedical-engineering eeg-analysis stroop-test neurotechnology behavioral-testing erp-analysis

Updated Dec 12, 2025
Python

stef41 / modeldiffx

Model behavioral diffing - compare LLM outputs across versions, detect regressions.

python testing regression-testing model-evaluation llm behavioral-testing

Updated Apr 11, 2026
Python

abdul-hamid-achik / cairntrace

Behavioral browser-spec layer for agent-in-session use. Specs declare intent+outcomes; agents execute + heal via agent-browser or Playwright. CLI + MCP server, agent-neutral.

typescript mcp browser-testing ai-agents bun e2e-testing playwright behavioral-testing agent-browser

Updated May 29, 2026
TypeScript

Ufosxm34gt / Conversational-Red-Teaming-Casebook

Bots I broke and how I broke them to be a future conversational Red Teamer

nlp machine-learning natural-language-processing ai chatbot transformers artificial-intelligence openai language-models ai-safety conversational-ai red-teaming ethical-ai llm prompt-engineering behavioral-testing

Updated Jul 1, 2025

JSLEEKR / agentspec

Agent behavioral testing -- YAML specs for tool calls, sequences, constraints

cli golang yaml mcp specification developer-tools testing-framework ai-agents active-project agent-testing behavioral-testing

Updated Mar 29, 2026
Go

StanislavBG / stepproof

Regression testing CLI for AI agents — define expected behaviors in YAML, run in CI, fail deploys on behavioral drift

nodejs testing cli open-source devops typescript ci-cd developer-tools regression-testing ai-agents llm ai-testing behavioral-testing

Updated Apr 6, 2026
TypeScript

ad25343 / GlassBox

Spec-driven development for GenAI applications. A working reference implementation showing behavioral spec, conformance scoring, drift detection, and model comparison — all running together.

react python observability claude fastapi observability-data llm llms anthropic genai claude-code spec-driven-development behavioral-testing

Updated May 5, 2026
TypeScript

chanikkyasaai / trajex

AI agent behavioral testing — learns what correct looks like, catches deviations automatically. Zero API keys needed.

python evaluation tracing pytest openai trajectory ai-agents langchain llm-testing behavioral-testing

Updated Apr 18, 2026
Python

iYashMaurya / LiveGate

AI deployment gate that mines real traffic, fires probes at staging, and tells you if your code will break — before your users do. Built on gitagent + Lyzr Studio.

deployment ci-cd deployment-automation opentelemetry ai-agent traffic-replay behavioral-testing lyzr gitagent eal-environment-testing

Updated Apr 10, 2026
JavaScript

harman-04 / mockito-spies-and-verification-demo

Advanced Mockito usage featuring Spies, Mocks, and behavioral verification to test a shopping cart checkout flow.

mockito junit5 java-testing behavioral-testing spy-vs-mock

Updated Feb 15, 2026
Java

ollieb89 / ai-workflow-evals

Catch AI behavioral regressions before merge. Run eval suites for prompts, agents, and workflows in GitHub Actions.

ci-cd developer-tools regression-testing eval github-actions ai-testing prompt-testing ai-quality llm-testing behavioral-testing

Updated Mar 22, 2026
TypeScript

SyncTek-LLC / specterqa

AI persona-based behavioral testing for web apps. No test scripts. YAML-configured. Vision-powered.

python testing cli qa ai vision developer-tools code-of-conduct software-quality persona playwright behavioral-testing trust-index

Updated Mar 21, 2026
Python

GenesisClawbot / llm-drift

LLM drift detector — know within 5 min when GPT-4o, Claude, or Gemini silently changes behaviour. Open source, self-hostable.

saas gemini openai regression-testing gpt claude mlops drift-detection production-ml model-testing ai-monitoring llm llmops prompt-testing llm-monitoring llm-observability behavioral-testing

Updated May 31, 2026
Python

Improve this page

Add a description, image, and links to the behavioral-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the behavioral-testing topic, visit your repo's landing page and select "manage topics."