Skip to content

Add blog post: expose llama.cpp over Inlets Cloud#51

Open
welteki wants to merge 1 commit into
inlets:masterfrom
welteki:inlets-cloud-llama
Open

Add blog post: expose llama.cpp over Inlets Cloud#51
welteki wants to merge 1 commit into
inlets:masterfrom
welteki:inlets-cloud-llama

Add blog post: expose llama.cpp over Inlets Cloud

93d953c
Select commit
Loading
Failed to load commit list.
reviewfn / succeeded Jul 1, 2026 in 2m 54s

AI Code Review Results

AI Pull Request Overview

Disclaimer: This review was generated by automated AI and may contain errors. Do not trust its outputs without human verification.

Summary

  • Adds a tutorial for exposing a local llama.cpp server through Inlets Cloud with bearer-token authentication.
  • The Inlets Cloud tunnel flow is generally reproducible and gives readers concrete commands.
  • The OpenCode configuration is aligned with the OpenAI-compatible /v1 endpoint pattern.
  • The Claude Code section appears to direct an Anthropic client at a plain OpenAI-compatible llama-server, which is likely to fail for readers.
  • The post image front matter is commented out, so the rollup card and social metadata will not use a post-specific image.
  • The title and description match the tutorial scope, but the Claude Code claim should be corrected before publication.

Approval rating (1-10)

6/10. Useful tutorial, but the Claude Code instructions need correction because they likely do not work against llama-server directly.

Summary per file

Summary per file
File path Summary
blog/_posts/2026-07-01-expose-llama-cpp-with-inlets-cloud.md New tutorial for tunneling and authenticating a local llama.cpp endpoint.
images/2026-07-inlets-cloud-llama-cpp/create-access-token.png Screenshot showing Inlets Cloud access token creation.

Overall Assessment

The article has a clear reader goal and most of the Inlets Cloud setup is concrete enough to follow. I would not publish it as-is because the Claude Code section appears to assume Claude Code can talk directly to llama-server's OpenAI-compatible API by setting ANTHROPIC_BASE_URL; that is a reproducibility issue for a major promised outcome of the post. The missing image metadata is lower severity, but it will affect the blog listing and social preview quality for a rollup: true post.

Detailed Review

Detailed Review

Content review

Findings

Severity File Lines Issue
High blog/_posts/2026-07-01-expose-llama-cpp-with-inlets-cloud.md 217-231 The Claude Code instructions point ANTHROPIC_BASE_URL at the Inlets tunnel for llama-server, but the rest of the article sets up a plain OpenAI-compatible /v1 API. Claude Code's Anthropic environment variables are for Anthropic-compatible endpoints, not OpenAI-compatible llama-server endpoints, so this example is likely to fail when Claude Code sends Anthropic Messages API requests. Either remove this section, add the required Anthropic-compatible proxy/router layer, or change the tutorial to use a client that supports OpenAI-compatible providers directly.
Low blog/_posts/2026-07-01-expose-llama-cpp-with-inlets-cloud.md 8-11 The post is marked rollup: true, but the only post image metadata is commented out. Existing rollup cards render post.image when present, and meta.html uses page.image for Twitter/OpenGraph metadata, so this post will publish without a card thumbnail and with the generic social image. Add a suitable image: /images/2026-07-inlets-cloud-llama-cpp/... value, or remove the commented placeholder if the generic preview is intentional.

Additional observations

The title and description fit the main tutorial: the post does explain how to expose llama.cpp through Inlets Cloud with bearer-token authentication.

The opening establishes the value quickly, but the bullet "Accessing a model of your choice remotely from anywhere with unlimited tokens" overstates the outcome. The later examples still have context, output, hardware, and tunnel availability constraints; wording such as "without per-token hosted API billing" would be more precise.

The Inlets Cloud setup is structured well for reproducibility: prerequisites, token creation, tunnel creation, auth, and endpoint testing are presented in a usable order.

The image asset matches the access-token section and renders as a relevant screenshot, but it is not configured as the post's feature image.

AI agent details.

Agent processing time: 2m47.004s
Environment preparation time: 4.007s
Total time from webhook: 2m56.696s