Compute vector embeddings for all graph nodes to enable semantic search. Requires: ``pip install code-review-graph[embeddings]`` (local provider only; cloud providers like ``openai`` / ``google`` / ``minimax`` use stdlib ``urllib``). Default model: all-MiniLM-L6-v2. Override via ``model
(
repo_root: str | None = None,
model: str | None = None,
provider: str | None = None,
)
| 18 | |
| 19 | |
| 20 | def embed_graph( |
| 21 | repo_root: str | None = None, |
| 22 | model: str | None = None, |
| 23 | provider: str | None = None, |
| 24 | ) -> dict[str, Any]: |
| 25 | """Compute vector embeddings for all graph nodes to enable semantic search. |
| 26 | |
| 27 | Requires: ``pip install code-review-graph[embeddings]`` (local provider only; |
| 28 | cloud providers like ``openai`` / ``google`` / ``minimax`` use stdlib ``urllib``). |
| 29 | Default model: all-MiniLM-L6-v2. Override via ``model`` param or |
| 30 | CRG_EMBEDDING_MODEL env var. |
| 31 | Changing the model or provider re-embeds all nodes automatically. |
| 32 | |
| 33 | Only embeds nodes that don't already have up-to-date embeddings. |
| 34 | |
| 35 | Args: |
| 36 | repo_root: Repository root path. Auto-detected if omitted. |
| 37 | model: Embedding model name. For local: HuggingFace ID or path; |
| 38 | for openai: model ID (e.g. ``text-embedding-3-small``); |
| 39 | for google: Gemini model ID. Falls back to |
| 40 | CRG_EMBEDDING_MODEL / CRG_OPENAI_MODEL env vars as appropriate. |
| 41 | provider: Provider name: ``local`` (default), ``openai``, ``google``, |
| 42 | or ``minimax``. ``openai`` requires CRG_OPENAI_BASE_URL + |
| 43 | CRG_OPENAI_API_KEY + CRG_OPENAI_MODEL env vars and accepts |
| 44 | any OpenAI-compatible endpoint (real OpenAI, Azure, new-api, |
| 45 | LiteLLM, vLLM, LocalAI, Ollama openai-mode, etc.). |
| 46 | |
| 47 | Returns: |
| 48 | Number of nodes embedded and total embedding count. |
| 49 | """ |
| 50 | store, root = _get_store(repo_root) |
| 51 | try: |
| 52 | db_path = get_db_path(root) |
| 53 | try: |
| 54 | emb_store = EmbeddingStore(db_path, provider=provider, model=model) |
| 55 | except ValueError as exc: |
| 56 | # Unknown provider name or missing provider env vars — surface |
| 57 | # as a structured error rather than a traceback. |
| 58 | logger.error("embed_graph: %s", exc) |
| 59 | return {"status": "error", "error": str(exc)} |
| 60 | try: |
| 61 | if not emb_store.available: |
| 62 | if provider in ("openai", "google", "minimax"): |
| 63 | err = ( |
| 64 | f"The '{provider}' embedding provider is not available. " |
| 65 | "Check the required environment variables " |
| 66 | "(see README and `get_provider()` docstring) and that " |
| 67 | "the endpoint is reachable." |
| 68 | ) |
| 69 | else: |
| 70 | err = ( |
| 71 | "The local embedding provider needs sentence-transformers. " |
| 72 | "Install with: pip install code-review-graph[embeddings] — " |
| 73 | "or switch provider to 'openai' / 'google' / 'minimax'." |
| 74 | ) |
| 75 | return {"status": "error", "error": err} |
| 76 | |
| 77 | newly_embedded = embed_all_nodes(store, emb_store) |
no test coverage detected