v1.1.0 — Vector Search + Embeddings
Released: 2026-04-16
Forge gains hybrid semantic + keyword search. Developers can now find code by meaning, not just exact text matches. The embedding pipeline runs locally via Ollama or the ORT backend — no data ever leaves the machine.
Hybrid Search
Section titled “Hybrid Search”forge_searchMCP tool upgraded to hybrid semantic + keyword search- Combines BM25 keyword scores with cosine-similarity vector scores from LanceDB
- Existing keyword-only callers see no behavior change (backward-compatible)
- Semantic results surface related code even when exact query terms are absent
Embeddings Infrastructure
Section titled “Embeddings Infrastructure”--with-embeddingsflag onforge index— generates embeddings during indexing, writes vectors to LanceDB, builds an IVF-PQ vector index- ORT embedding backend —
ortcrate v2.0.0-rc.10 (load-dynamic), HuggingFace tokenizer integration. Behind theort-backendCargo feature flag. - Ollama embedding backend — HTTP dispatch with batch processing, per-chunk retry on batch failure, and oversized-chunk splitting
- 5-candidate Ollama model path search:
$OLLAMA_MODELS,~/.ollama/, systemd installs (/usr/share/ollama/.ollama/,/var/lib/ollama/.ollama/), Homebrew prefix. User path wins over system;$OLLAMA_MODELStakes precedence over all. - HuggingFace model names automatically mapped to Ollama equivalents
(e.g.,
nomic-ai/nomic-embed-text-v1.5→nomic-embed-text)
Setup and Distribution
Section titled “Setup and Distribution”forge setup— automatic Ollama provisioning. Detects system Ollama; if absent, downloads upstream.tar.zst, extracts to~/.forge/, starts the daemon, and pulls the embedding model. Works from a clean state in ~1 minute on a network connection.forge bundle create/forge bundle install— offline air-gapped distribution.createpackages the Ollama binary, inference libraries (CPU/CUDA/Vulkan/MLX),nomic-embed-textmodel blobs, ORT shared library, ONNX model, and tokenizer (~7 GB).installdeploys on a machine with no network access.forge warmup-model— pre-downloads the ONNX model so first-run latency onforge index --with-embeddingsis amortized
Ingestion Improvements
Section titled “Ingestion Improvements”- SCIP and LCOV coverage ingestion paths normalized —
forge ingest scipandforge coveragenow correctly import from real-world monorepo paths
Changed
Section titled “Changed”ort-backendsplit into a separate Cargo feature, decoupled from the always-onembeddingsfeature. The default binary builds withoutlibonnxruntime.so, makingembeddingsusable on systems without glibc 2.38+.- Ollama upstream download switched from bare binary to
.tar.zstarchive (Ollama 0.20+ format) forge bundle createnow bundles Ollama’s siblinglib/ollama/directory containing GPU/CPU inference runtime libraries- Workspace version bumped from 1.0.0 to 1.1.0
forge ingest scipandforge coveragesilently imported zero rows when given monorepo-shaped paths. Root cause: inconsistent path normalization between the index and report parsers. Both now share the same canonical-path logic.- Embedding requests for chunks larger than the model’s token limit were silently truncated. Oversized chunks are now split into sub-chunks so all source content reaches the embedder.
- Embedding batches that failed at the provider boundary aborted the entire batch. Failed batches now retry per-chunk so a single bad chunk does not poison the rest.
forge bundle createcould not locate Ollama models installed via the upstream installer (/usr/share/ollama/.ollama/). The 5-candidate search resolves this.- MCP
tracingoutput written to stdout corrupted JSON-RPC framing — redirected to stderr.
- The
embeddingsfeature is on by default in release builds. Theort-backendfeature is opt-in for builds targeting glibc 2.38+ systems with a deployablelibonnxruntime.so. - A pure-Rust embedding backend via
tract-onnxwas attempted and abandoned. Thetractanalyser does not support thenomic-embed-text-v1.5ONNX graph (fails at node #138,/embeddings/word_embeddings/Gather). ORT remains the recommended local backend until upstream resolves this.