v1.1.0 — Vector Search + Embeddings

Released: 2026-04-16

Forge gains hybrid semantic + keyword search. Developers can now find code by meaning, not just exact text matches. The embedding pipeline runs locally via Ollama or the ORT backend — no data ever leaves the machine.

Added

Hybrid Search

forge_search MCP tool upgraded to hybrid semantic + keyword search
Combines BM25 keyword scores with cosine-similarity vector scores from LanceDB
Existing keyword-only callers see no behavior change (backward-compatible)
Semantic results surface related code even when exact query terms are absent

Embeddings Infrastructure

--with-embeddings flag on forge index — generates embeddings during indexing, writes vectors to LanceDB, builds an IVF-PQ vector index
ORT embedding backend — ort crate v2.0.0-rc.10 (load-dynamic), HuggingFace tokenizer integration. Behind the ort-backend Cargo feature flag.
Ollama embedding backend — HTTP dispatch with batch processing, per-chunk retry on batch failure, and oversized-chunk splitting
5-candidate Ollama model path search: $OLLAMA_MODELS, ~/.ollama/, systemd installs (/usr/share/ollama/.ollama/, /var/lib/ollama/.ollama/), Homebrew prefix. User path wins over system; $OLLAMA_MODELS takes precedence over all.
HuggingFace model names automatically mapped to Ollama equivalents (e.g., nomic-ai/nomic-embed-text-v1.5 → nomic-embed-text)

Setup and Distribution

forge setup — automatic Ollama provisioning. Detects system Ollama; if absent, downloads upstream .tar.zst, extracts to ~/.forge/, starts the daemon, and pulls the embedding model. Works from a clean state in ~1 minute on a network connection.
forge bundle create / forge bundle install — offline air-gapped distribution. create packages the Ollama binary, inference libraries (CPU/CUDA/Vulkan/MLX), nomic-embed-text model blobs, ORT shared library, ONNX model, and tokenizer (~7 GB). install deploys on a machine with no network access.
forge warmup-model — pre-downloads the ONNX model so first-run latency on forge index --with-embeddings is amortized

Ingestion Improvements

SCIP and LCOV coverage ingestion paths normalized — forge ingest scip and forge coverage now correctly import from real-world monorepo paths

Changed

ort-backend split into a separate Cargo feature, decoupled from the always-on embeddings feature. The default binary builds without libonnxruntime.so, making embeddings usable on systems without glibc 2.38+.
Ollama upstream download switched from bare binary to .tar.zst archive (Ollama 0.20+ format)
forge bundle create now bundles Ollama’s sibling lib/ollama/ directory containing GPU/CPU inference runtime libraries
Workspace version bumped from 1.0.0 to 1.1.0

Fixed

forge ingest scip and forge coverage silently imported zero rows when given monorepo-shaped paths. Root cause: inconsistent path normalization between the index and report parsers. Both now share the same canonical-path logic.
Embedding requests for chunks larger than the model’s token limit were silently truncated. Oversized chunks are now split into sub-chunks so all source content reaches the embedder.
Embedding batches that failed at the provider boundary aborted the entire batch. Failed batches now retry per-chunk so a single bad chunk does not poison the rest.
forge bundle create could not locate Ollama models installed via the upstream installer (/usr/share/ollama/.ollama/). The 5-candidate search resolves this.
MCP tracing output written to stdout corrupted JSON-RPC framing — redirected to stderr.

Notes

The embeddings feature is on by default in release builds. The ort-backend feature is opt-in for builds targeting glibc 2.38+ systems with a deployable libonnxruntime.so.
A pure-Rust embedding backend via tract-onnx was attempted and abandoned. The tract analyser does not support the nomic-embed-text-v1.5 ONNX graph (fails at node #138, /embeddings/word_embeddings/Gather). ORT remains the recommended local backend until upstream resolves this.