Skip to content

auto-download model in dit.New()#48

Merged
recepgunes1 merged 2 commits into
mainfrom
lib-auto-download
May 19, 2026
Merged

auto-download model in dit.New()#48
recepgunes1 merged 2 commits into
mainfrom
lib-auto-download

Conversation

@dogancanbakir
Copy link
Copy Markdown
Member

@dogancanbakir dogancanbakir commented May 18, 2026

Closes #47. Refs projectdiscovery/httpx#2495.

dit.New() now downloads model.json to ~/.dit/ on miss; library users no longer need to reimplement what the CLI was doing.

Summary by CodeRabbit

  • New Features

    • Automatic model download: the classifier now fetches a pretrained model on first use and caches it locally for subsequent runs.
    • Offline local load: added a way to explicitly load a local model file with no network access.
  • Documentation

    • README updated with detailed explanation and usage examples for model loading, caching, and explicit local load.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b8182a24-d7ec-4da0-b347-81120cd6d499

📥 Commits

Reviewing files that changed from the base of the PR and between ba98bd8 and 7b89a4b.

📒 Files selected for processing (1)
  • dit.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • dit.go

Walkthrough

This PR centralizes model download logic in the dit library: it adds a canonical ModelURL and Download(dest) with timeout and robust streaming, updates New() to download-and-cache model.json when not found locally, simplifies CLI callers to use the library, and documents the behavior in the README.

Changes

Model Download Consolidation

Layer / File(s) Summary
Core library model download foundation
dit.go
ModelURL and downloadTimeout constants added; imports expanded. New() now tries FindModel("model.json") and falls back to downloading ModelURL into the model cache before calling Load(). Download(dest string) error performs a context-timed HTTP GET, validates HTTP 200, creates destination directories, streams the response to dest, removes partial files on error, closes resources, and logs the downloaded size.
CLI simplification using shared download logic
internal/cli/run.go, internal/cli/data.go, internal/cli/up.go
CLI now imports github.com/happyhackingspace/dit and uses dit.ModelURL where appropriate. run.go replaces loadOrDownloadModel with loadModel that calls dit.New() for default loading, removing duplicated download/persistence code; data.go and up.go fetch the model using dit.ModelURL.
Library usage documentation
README.md
"As a Library" example updated to explain that dit.New() downloads model.json on first use into ~/.dit/ and reuses the cached file, and shows how to call dit.Load(...) to explicitly load a local model.json with no network access.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐇 I dug a tunnel to the net so wide,
Where ModelURL and cache reside.
One download first, then quiet rest—
Local model.json serves the best. 🌱📦

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: dit.New() now auto-downloads the model instead of failing when no local model is found.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lib-auto-download

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dogancanbakir dogancanbakir requested a review from recepgunes1 May 18, 2026 11:43
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dit.go (1)

71-82: ⚖️ Poor tradeoff

Concurrent New() calls may trigger parallel downloads.

If multiple goroutines call New() simultaneously on first use, each could pass the FindModel check and start downloading to the same destination file. This could cause file corruption or redundant downloads.

Consider using sync.Once or a file lock to serialize the download:

♻️ Example fix using sync.Once
+import "sync"
+
+var downloadOnce sync.Once
+var downloadErr error

 func New() (*Classifier, error) {
 	if path, err := FindModel("model.json"); err == nil {
 		return Load(path)
 	}

 	dest := filepath.Join(ModelDir(), "model.json")
-	slog.Info("Model not found, downloading", "url", ModelURL, "dest", dest)
-	if err := Download(dest); err != nil {
-		return nil, fmt.Errorf("dit: %w", err)
-	}
+	downloadOnce.Do(func() {
+		slog.Info("Model not found, downloading", "url", ModelURL, "dest", dest)
+		downloadErr = Download(dest)
+	})
+	if downloadErr != nil {
+		return nil, fmt.Errorf("dit: %w", downloadErr)
+	}
 	return Load(dest)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dit.go` around lines 71 - 82, Concurrent calls to New() can all pass
FindModel and start Download to the same dest; serialize the first-time download
by introducing a package-level guard (e.g., a sync.Once or a file lock) around
the FindModel -> Download -> Load sequence so only one goroutine performs the
download while others wait and then call Load; update New() to use that guard
(referencing New, FindModel, Download, Load, ModelDir, ModelURL) and ensure any
Download error is propagated to callers and subsequent callers re-check
FindModel/Load after the guarded operation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dit.go`:
- Around line 91-94: The call to http.Get(ModelURL) uses the default client with
no timeout and can hang; replace it with an HTTP request using a cancellable
context or an http.Client with a configured Timeout (e.g., context.WithTimeout +
http.NewRequestWithContext or a client := &http.Client{Timeout: ...} then
client.Do) to bound the total wait; update the request usage where
http.Get(ModelURL) appears (and keep proper resp.Body.Close and error wrapping
like fmt.Errorf("download model: %w", err)) so the download will fail fast on
slow/blocked servers.

---

Nitpick comments:
In `@dit.go`:
- Around line 71-82: Concurrent calls to New() can all pass FindModel and start
Download to the same dest; serialize the first-time download by introducing a
package-level guard (e.g., a sync.Once or a file lock) around the FindModel ->
Download -> Load sequence so only one goroutine performs the download while
others wait and then call Load; update New() to use that guard (referencing New,
FindModel, Download, Load, ModelDir, ModelURL) and ensure any Download error is
propagated to callers and subsequent callers re-check FindModel/Load after the
guarded operation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6429ead2-54bb-4ac8-808d-80a15df80074

📥 Commits

Reviewing files that changed from the base of the PR and between ed950ea and ba98bd8.

📒 Files selected for processing (5)
  • README.md
  • dit.go
  • internal/cli/data.go
  • internal/cli/run.go
  • internal/cli/up.go

Comment thread dit.go Outdated
Comment thread dit.go Outdated
@dogancanbakir dogancanbakir requested a review from recepgunes1 May 19, 2026 15:26
@recepgunes1 recepgunes1 merged commit 684ce13 into main May 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto-download model for library users

2 participants