Lowercase + strip to alphanumeric tokens of length ≥ 2. Tolerates ``None`` documents — Chroma can return ``None`` in the ``documents`` field for drawers without text content, which would otherwise raise ``AttributeError`` mid-rerank.
(text: str)
| 61 | |
| 62 | |
| 63 | def _tokenize(text: str) -> list: |
| 64 | """Lowercase + strip to alphanumeric tokens of length ≥ 2. |
| 65 | |
| 66 | Tolerates ``None`` documents — Chroma can return ``None`` in the |
| 67 | ``documents`` field for drawers without text content, which would |
| 68 | otherwise raise ``AttributeError`` mid-rerank. |
| 69 | """ |
| 70 | if not text: |
| 71 | return [] |
| 72 | return _TOKEN_RE.findall(text.lower()) |
| 73 | |
| 74 | |
| 75 | def _bm25_scores( |
no outgoing calls