Skip to content

codebyjames/Image-Generation-MCP-and-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImageForge

Standalone, reusable local AI image-generation service. Runs fully offline on Apple Silicon (MPS) using Stable Diffusion via diffusers. No paid APIs, no cloud calls — everything runs on your machine.

Target hardware: M2 Pro, 16 GB unified memory (tested).


Two connection surfaces

Surface Transport Use case
HTTP API (FastAPI) TCP on port 8765 Any app, curl, browser, Python SDK
MCP server stdio AI agents — Claude Code, scripts, orchestrators

Both surfaces share the same process-level singletons: the SD engine (load-once weights), the prompt-assist client (gemma4/Ollama), and the content-addressed cache.


Quick start

Fastest path — one command

cd ~/Developer/imageforge
./start.sh            # bootstrap deps into the venv, seed .env, launch HTTP API

start.sh verifies the SD venv, installs the light HTTP/MCP layer if anything is missing, seeds .env from .env.example on first run, reports whether the heavy SD stack is present, then launches. Other modes:

./start.sh api --reload   # HTTP API with hot-reload
./start.sh mcp            # MCP stdio server
./start.sh selftest       # MCP self-test (fast, no GPU)
./start.sh check          # bootstrap + dependency report, then exit

The manual steps below are the detailed equivalent if you prefer to run each part yourself (or are not on a Mac — see also the CPU-only Dockerfile).

1. Install dependencies

All SD components run under the existing venv at ~/.floor-voice-studio/venv-sd. Install the ImageForge HTTP/MCP layer into it:

cd ~/Developer/imageforge

arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/pip install \
    fastapi "uvicorn[standard]" mcp pydantic pydantic-settings httpx

Pre-installed in the venv: diffusers, torch, transformers, Pillow, accelerate.

2. Configure

cp .env.example .env
# edit .env as needed — defaults work for M2 Pro 16 GB

3. Launch the HTTP API

./run_api.sh
# with hot-reload during development:
./run_api.sh --reload

4. Launch the MCP server

./run_mcp.sh
# or directly:
arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python -m imageforge.mcp_server

5. (Optional) Run in Docker — CPU-only, for Linux / CI

On a Mac, prefer ./start.sh (native MPS). Docker on macOS has no MPS passthrough, so the container runs the engine CPU-only (slow — minutes per image). It exists for portability / Linux servers / CI smoke tests:

docker build -t imageforge .
docker run --rm -p 8765:8765 \
  -v "$HOME/.cache/huggingface:/root/.cache/huggingface" \
  -v "$(pwd)/outputs:/app/outputs" imageforge
curl http://127.0.0.1:8765/health

Configuration

Variable Default Description
IMAGEFORGE_MODEL sd-turbo Active model key (see model table below)
DEVICE mps mps / cuda / cpu
PORT 8765 HTTP API port
HOST 127.0.0.1 HTTP API bind address
OLLAMA_URL http://127.0.0.1:11434 Ollama base URL for prompt-assist
OLLAMA_MODEL gemma4:latest Model used for prompt expansion
OUTPUT_DIR outputs/ Where generated images land on disk
CACHE_DIR cache/ Content-addressed cache directory
MAX_RES 1024 Maximum allowed image dimension
ALLOW_FLUX 0 Set 1 to unlock the guarded flux-q4 model
WARMUP 0 Set 1 to preload SD weights at API startup

Model policy table

Model key Steps Default res Est. RAM Est. time (M2 Pro) Operations Guard Quality
sd-turbo 1–2 512 px ~4 GB ~2–3 s text2img, img2img, inpaint Draft
sdxl-turbo 4 512 px (up to 1024) ~7 GB ~8 s text2img, img2img, inpaint Good
sdxl 30 1024 px ~7 GB ~40 s text2img, img2img, inpaint High
sdxl-lcm 6 1024 px ~9 GB ~20 s text2img, img2img, inpaint Best (non-FLUX)
sdxl-portrait 30 1024 px ~8 GB ~40 s text2img, img2img, inpaint High + portrait LoRA
flux-q4 4 1024 px ~8 GB active / 16–18 GB peak ~45 s text2img only ALLOW_FLUX=1 Highest
flux2-klein 4 1024 px ~10 GB (int8) / 13–16 GB (bf16) ~35–55 s text2img only ALLOW_KLEIN=1 Frontier

Why sd-turbo is the default: 1-step inference at ~2 s per image; safe on any machine with 8 GB+. Use sdxl-lcm when quality matters and time allows.

sdxl vs sdxl-portrait: sdxl is plain SDXL base-1.0 (30 steps, 1024 px, no LoRA). sdxl-portrait is the same checkpoint with a fused portrait LoRA — it requires IMAGEFORGE_LORA_PORTRAIT=/path/to/lora.safetensors to be set; without it, it behaves like plain sdxl.

FLUX note: flux-q4 peaks at 16–18 GB due to GGUF loader overhead — this causes macOS to swap on a 16 GB machine. Enable only with ALLOW_FLUX=1 and accept the trade-off, or use a machine with 24 GB+.

FLUX.2-klein note: flux2-klein uses a new architecture (Qwen3-4B encoder, Flux2KleinPipeline) distinct from FLUX.1. With Quanto int8 quantisation the transformer fits ~9.4–10.4 GB; plain bf16 peaks at 13–16 GB (RED on 16 GB machines). The Phase-2 loader (kind="flux2-klein") has shipped this wave: klein_loader.load_flux2_klein is wired via pipeline.py's _load_flux2_klein behind ALLOW_KLEIN=1. The loader is no longer the blocker — loading now fails only on weight download (no network / missing HF checkpoint) or diffusers<0.38 (no Flux2KleinPipeline). Enable with ALLOW_KLEIN=1.


HTTP API

Endpoint reference

GET /health

Probe MPS availability and the Ollama endpoint.

curl http://127.0.0.1:8765/health
{
  "status": "ok",
  "device": "mps",
  "mps_available": true,
  "engine_constructed": true,
  "resident_models": ["sd-turbo"],
  "default_model": "sd-turbo",
  "ollama": {
    "reachable": true,
    "url": "http://127.0.0.1:11434",
    "model": "gemma4:latest",
    "model_available": true
  }
}

GET /models

List all registered models and their capabilities.

curl http://127.0.0.1:8765/models

POST /generate

Text-to-image. Returns a base64 PNG plus generation metadata.

curl -X POST http://127.0.0.1:8765/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a red fox sitting in autumn leaves, golden hour, cinematic",
    "seed": 42,
    "width": 512,
    "height": 512,
    "return_mode": "both"
  }'

With prompt-assist (gemma4 expands the short request):

curl -X POST http://127.0.0.1:8765/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a red fox",
    "assist_prompt": true,
    "model": "sdxl-turbo",
    "seed": 7
  }'

Key request fields:

Field Type Default Description
prompt string required Image description
negative_prompt string null What to avoid
model string sd-turbo Model key
steps int model default Inference steps
guidance float model default CFG scale (turbo models use 0)
width / height int model default Must be multiples of 8
seed int random Reproducibility seed
assist_prompt bool false Expand via gemma4 first
use_cache bool true Return cached result for identical seeded params
return_mode string "both" "base64" / "path" / "both"

Response shape:

{
  "ok": true,
  "task": "text2img",
  "model": "sd-turbo",
  "seed": 42,
  "steps": 2,
  "guidance": 0.0,
  "width": 512,
  "height": 512,
  "elapsed_s": 2.871,
  "prompt": "a red fox sitting in autumn leaves, golden hour, cinematic",
  "assisted": false,
  "cached": false,
  "output_path": "$(pwd)/outputs/20260610_075828_521b48.png",
  "image_base64": "<base64 PNG data>",
  "mime": "image/png"
}

POST /edit

Image-to-image. Alter an existing image guided by a prompt. Supply the source image as base64 (init_image) or a local path (init_image_path).

# Path-based (simpler for local consumers)
curl -X POST http://127.0.0.1:8765/edit \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "same fox but in a snowy winter forest",
    "init_image_path": "$(pwd)/outputs/selftest.png",
    "strength": 0.65,
    "seed": 100
  }'

strength: 0.0 = keep the original, 1.0 = full regeneration. Default 0.6.

POST /inpaint

Regenerate only the white regions of a mask within the source image.

curl -X POST http://127.0.0.1:8765/inpaint \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a bright orange pumpkin lantern",
    "init_image_path": "/path/to/source.png",
    "mask_image_path": "/path/to/mask.png",
    "strength": 0.85
  }'

Mask convention: white pixels = repaint, black pixels = keep.

GET /docs

Interactive OpenAPI docs (Swagger UI) — open http://127.0.0.1:8765/docs in a browser.


Python httpx snippet (voice-studio consumer pattern)

import httpx, base64, pathlib

IMAGEFORGE = "http://127.0.0.1:8765"

async def generate_avatar(prompt: str, seed: int = 42) -> pathlib.Path:
    """Generate an avatar image and return the local path."""
    async with httpx.AsyncClient(timeout=120) as client:
        r = await client.post(
            f"{IMAGEFORGE}/generate",
            json={
                "prompt": prompt,
                "model": "sd-turbo",
                "seed": seed,
                "width": 512,
                "height": 512,
                "return_mode": "both",
                "assist_prompt": False,
            },
        )
        r.raise_for_status()
        data = r.json()

    # The output file is already on disk at data["output_path"].
    # Optionally decode the inline base64 if you need the bytes in-process:
    png_bytes = base64.b64decode(data["image_base64"])

    out = pathlib.Path(data["output_path"])
    print(f"avatar generated: {out}  ({data['elapsed_s']:.1f}s, seed={data['seed']})")
    return out


async def edit_avatar(source_path: str, prompt: str) -> pathlib.Path:
    """Alter an existing avatar."""
    async with httpx.AsyncClient(timeout=120) as client:
        r = await client.post(
            f"{IMAGEFORGE}/edit",
            json={
                "prompt": prompt,
                "init_image_path": source_path,
                "strength": 0.65,
                "return_mode": "both",
            },
        )
        r.raise_for_status()
        data = r.json()
    return pathlib.Path(data["output_path"])


# Synchronous wrapper for non-async callers:
import asyncio

def generate_avatar_sync(prompt: str, seed: int = 42) -> pathlib.Path:
    return asyncio.run(generate_avatar(prompt, seed))

MCP server

The MCP server exposes ImageForge as Model-Context-Protocol tools over stdio. Logs go to stderr; stdout is the protocol channel. It is identical to the HTTP surface in capability — it shares the same engine, cache, and prompt-assist singletons.

Claude Code registration

Add to your project's .mcp.json (or to ~/.claude/settings.json under "mcpServers"):

{
  "mcpServers": {
    "imageforge": {
      "command": "arch",
      "args": [
        "-arm64",
        "$HOME/.floor-voice-studio/venv-sd/bin/python",
        "-m",
        "imageforge.mcp_server"
      ],
      "cwd": "~/Developer/imageforge"
    }
  }
}

Alternative (via run_mcp.sh):

{
  "mcpServers": {
    "imageforge": {
      "command": "~/Developer/imageforge/run_mcp.sh",
      "args": [],
      "cwd": "~/Developer/imageforge"
    }
  }
}

MCP tool list

Tool Required args Optional args Returns
generate_image prompt model, steps, guidance, width, height, seed, negative_prompt, assist ImageContent (base64 PNG) + TextContent (JSON metadata)
edit_image prompt, image_path strength, model, steps, guidance, seed, negative_prompt, assist Edited PNG + JSON metadata
inpaint_image prompt, image_path, mask_path strength, model, steps, guidance, seed, negative_prompt, assist Result PNG + JSON metadata
assist_prompt prompt style TextContent — expanded prompt + tags
list_models TextContent — JSON model registry

Image paths for edit_image / inpaint_image: pass absolute paths or project-relative paths (e.g. outputs/foo.png). The server resolves relative paths against the project root.

assist=true on generation tools calls gemma4/Ollama to expand a short request into a richer SD prompt. Transparent fallback to original prompt if Ollama is unavailable.

Example tool call (Claude Code session)

When imageforge is registered, Claude Code can call tools like:

<use_mcp_tool>
  <server_name>imageforge</server_name>
  <tool_name>generate_image</tool_name>
  <arguments>
    {
      "prompt": "friendly robot avatar, flat design, vibrant colors",
      "model": "sd-turbo",
      "seed": 99,
      "width": 512,
      "height": 512,
      "assist": true
    }
  </arguments>
</use_mcp_tool>

The response includes the image inline as ImageContent (rendered in-chat) plus a TextContent block with the metadata JSON:

{
  "task": "text2img",
  "model": "sd-turbo",
  "seed": 99,
  "steps": 2,
  "guidance": 0.0,
  "width": 512,
  "height": 512,
  "elapsed_s": 2.91,
  "prompt": "friendly robot avatar, ...",
  "output_path": "$(pwd)/outputs/20260610_XXXXXX_XXXXXX.png"
}

MCP self-test (no MCP client needed)

# Handshake + tool discovery + list_models + assist (no GPU):
arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python \
    -m imageforge.mcp_server --selftest

# Also run a real generation through the MCP path (loads SD, ~15 s):
arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python \
    -m imageforge.mcp_server --selftest --gen

Expected output:

[selftest] initialize ok: imageforge v0.1.0
[selftest] list_tools -> ['generate_image', 'edit_image', 'inpaint_image', 'assist_prompt', 'list_models']
[selftest] list_models ok (sd-turbo present)
[selftest] assist_prompt ok: {"prompt": ..., "assisted": false, "source": "passthrough"}
[selftest] generate_image (this loads SD; may take a while)...
[selftest] generate_image -> content kinds: ['image', 'text']
[selftest] generate_image ok: 24871 bytes PNG, mime=image/png
[selftest] ALL PASS

Unit tests

# MCP layer tests:
arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python -m unittest tests.test_mcp

# Include the real SD generation test (loads diffusers, ~15 s):
IMAGEFORGE_TEST_GEN=1 arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python \
    -m unittest tests.test_mcp

# All tests:
arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python -m pytest tests/

Proven end-to-end result

A real smoke test was run on 2026-06-10 (M2 Pro, 16 GB, macOS):

Smoke test output (outputs/20260610_075828_521b48.json):

{
  "saved_at_iso": "2026-06-10T07:58:28Z",
  "filename": "20260610_075828_521b48.png",
  "seed": 42,
  "model": "sd-turbo",
  "task": "text2img",
  "steps": 2,
  "guidance": 0.0,
  "width": 256,
  "height": 256,
  "elapsed_s": 15.432,
  "prompt": "a red fox in autumn leaves"
}

Three generated images, one edited image, and one inpainted image are present in outputs/ as proof. The MCP self-test (--selftest --gen) exercised the full MCP handshake, tool discovery, and SD generation through the MCP path.


How to add a new consumer

ImageForge is designed so adding a new consumer is three steps:

  1. Decide which surface to use: HTTP for any process/language, MCP for AI agents.
  2. Point at the service: For HTTP, import httpx and call POST /generate. For MCP, add the .mcp.json registration block above.
  3. No engine changes needed: The shared Engine, cache, and prompt-assist singletons are already running. Your consumer pays zero additional RAM cost for the model weights — they are already loaded.

Example: a new Python agent using the HTTP surface

# In any Python process on the same machine
import httpx

async def make_image(description: str) -> bytes:
    async with httpx.AsyncClient(timeout=120) as client:
        r = await client.post(
            "http://127.0.0.1:8765/generate",
            json={"prompt": description, "return_mode": "base64"},
        )
        r.raise_for_status()
        import base64
        return base64.b64decode(r.json()["image_base64"])

Example: a new agent using the MCP surface

Add to the agent's MCP config:

"imageforge": {
  "command": "arch",
  "args": ["-arm64", "$HOME/.floor-voice-studio/venv-sd/bin/python",
           "-m", "imageforge.mcp_server"],
  "cwd": "~/Developer/imageforge"
}

Then call generate_image / edit_image / inpaint_image as MCP tools.


Probe scripts

Two standalone measurement scripts live at the project root (not under research/):

Script Purpose Run command
probe_klein_mps.py Measure FLUX.2-klein-4B peak RSS + latency on MPS (gates D8 migration decision) PROBE_DOWNLOAD=1 arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python probe_klein_mps.py
probe_dit_throughput.py Measure from-scratch DiT steps/sec at 256/512 px (gates Phase-3 viability) arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python probe_dit_throughput.py --resolution 256 --steps 100

Both scripts require a CUDA machine (4090) for the DiT throughput probe and a 16 GB Mac for the klein MPS-fit probe. See research/OPERATION.md for full gate criteria.


Project layout

imageforge/
├── imageforge/
│   ├── settings.py            # Global constants (PROJECT_ROOT, OLLAMA_URL, etc.)
│   ├── config.py              # Pydantic-settings singleton (get_settings())
│   ├── api/
│   │   ├── app.py             # FastAPI application (all routes + lifespan)
│   │   ├── server.py          # Compatibility shim (re-exports app)
│   │   ├── prompt_bridge.py   # Async bridge to prompt-assist for HTTP handlers
│   │   └── schemas.py         # Shared Pydantic schemas
│   ├── engine/
│   │   ├── pipeline.py        # Core SD engine (load-once, MPS-locked inference)
│   │   ├── models.py          # Model registry (specs + resolve_model)
│   │   └── cache.py           # Content-addressed cache + outputs persistence
│   ├── mcp/
│   │   └── server.py          # MCP server (re-export of mcp_server.py)
│   ├── mcp_server.py          # MCP stdio server (canonical entrypoint)
│   ├── prompt/                # Prompt-assist module namespace
│   └── services/
│       └── prompt_assist.py   # Ollama/gemma4 async client (expand + tag)
├── models/                    # Downloaded HF model weights (git-ignored)
├── outputs/                   # Generated images (git-ignored)
├── cache/                     # Content-addressed PNG + JSON cache (git-ignored)
├── tests/                     # pytest test suite
├── probe_klein_mps.py         # MPS-fit probe for FLUX.2-klein (run at repo root)
├── probe_dit_throughput.py    # DiT throughput probe (run at repo root)
├── eval_harness.py            # Evaluation harness (run at repo root)
├── bench_flux_lora_latency.py # FLUX LoRA latency benchmark (run at repo root)
├── recaption.py               # Image recaptioning utility (run at repo root)
├── run_api.sh                 # Launch HTTP API
├── run_mcp.sh                 # Launch MCP server
├── .env.example               # Config template
└── pyproject.toml

Troubleshooting

MPS OOM / generation crashes

  • Symptom: Python crashes or produces a RuntimeError: MPS out of memory during generation.
  • Fix:
    1. Reduce image size: "width": 256, "height": 256.
    2. Switch to a lighter model: IMAGEFORGE_MODEL=sd-turbo.
    3. Close memory-intensive apps (browsers, Xcode).
    4. Do NOT enable flux-q4 on 16 GB — peak memory is 16–18 GB.
    5. Restart the service after an OOM to free GPU memory.

Model not downloading / HuggingFace errors

  • Symptom: OSError: ... is not a local folder and is not a valid model identifier.
  • Fix: Models are downloaded from HuggingFace on first use. Ensure internet access for the first run; subsequent runs are fully offline. Downloaded weights land in ~/.cache/huggingface/. Check HF_HOME if disk space is a concern.
  • Verify the venv has huggingface_hub installed:
    arch -arm64 $HOME/.floor-voice-studio/venv-sd/bin/python -c "import huggingface_hub; print('ok')"

Ollama unreachable / prompt-assist disabled

  • Symptom: assist_prompt returns "assisted": false or /health shows "ollama": {"reachable": false}.
  • This is not fatal. Prompt-assist gracefully degrades — generation continues with the original prompt unchanged.
  • Fix: Start Ollama (ollama serve) and pull the model:
    ollama pull gemma4:latest
  • Verify:
    curl http://127.0.0.1:11434/api/tags

FLUX guarded — cannot load flux-q4

  • Symptom: RuntimeError: Model 'flux-q4' is guarded: ....
  • Fix: Set ALLOW_FLUX=1 in .env or environment. Read the guard reason first — 16 GB machines will swap.

Port already in use

lsof -i :8765
kill <PID>
./run_api.sh

Or change PORT in .env.

MCP server produces no output / hangs

The MCP server writes logs to stderr and the protocol to stdout. If you run it interactively you will see no stdout output — that is correct. Use --selftest to verify it works without a real MCP client.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors