A memory-mapped hash table backed by an index file, a heap file, and a roster file. **Index file (``path``)** Slot 0 is a *header* storing three uint64 counters: * ``occupied_count`` — number of live (OCCUPIED) data slots * ``deleted_count`` — number of soft-deleted (DEL
| 146 | |
| 147 | |
| 148 | class MmapCache: |
| 149 | """ |
| 150 | A memory-mapped hash table backed by an index file, a heap file, and a |
| 151 | roster file. |
| 152 | |
| 153 | **Index file (``path``)** |
| 154 | |
| 155 | Slot 0 is a *header* storing three uint64 counters: |
| 156 | |
| 157 | * ``occupied_count`` — number of live (OCCUPIED) data slots |
| 158 | * ``deleted_count`` — number of soft-deleted (DELETED) data slots |
| 159 | * ``high_water_mark`` — highest data-slot index ever written |
| 160 | |
| 161 | Data slots occupy positions ``1 … size-1``. Each data slot stores a |
| 162 | null-padded key plus a pointer (offset + length) into the heap file and |
| 163 | an mtime timestamp (nanoseconds). |
| 164 | |
| 165 | **Heap file (``path + ".heap"``)** |
| 166 | |
| 167 | A flat binary append-log for variable-size values. Deleted or superseded |
| 168 | heap regions are reclaimed by ``atomic_rebuild``. |
| 169 | |
| 170 | **Roster file (``path + ".roster"``)** |
| 171 | |
| 172 | A packed array of ``uint32`` slot indices, one per live (OCCUPIED) entry. |
| 173 | ``list_items()`` reads this file directly instead of scanning the full |
| 174 | index, making it O(occupied) regardless of table size or fill factor. |
| 175 | The roster is kept in sync by ``put()`` and ``delete()`` under the write |
| 176 | lock, and rebuilt atomically by ``atomic_rebuild``. |
| 177 | |
| 178 | **Performance summary** |
| 179 | |
| 180 | * ``get`` / ``get_mtime`` / ``contains``: O(1) average (open-addressing) |
| 181 | * ``put`` / ``delete``: O(1) average + one roster file append/rewrite |
| 182 | * ``list`` / ``list_items``: O(occupied) — independent of table size |
| 183 | * ``get_stats`` occupied+deleted counters: O(1) from header |
| 184 | * ``atomic_rebuild``: O(items) to repopulate |
| 185 | |
| 186 | **Key length** |
| 187 | |
| 188 | Keys are encoded to bytes and silently truncated to ``key_size`` bytes — |
| 189 | two logical keys that share the same first ``key_size`` bytes will collide |
| 190 | on the same slot. Callers that may use long or user-supplied keys must |
| 191 | either size ``key_size`` accordingly or pre-hash the keys themselves. |
| 192 | """ |
| 193 | |
| 194 | def __init__( |
| 195 | self, |
| 196 | path, |
| 197 | size=1_000_000, |
| 198 | slot_size=96, |
| 199 | key_size=64, |
| 200 | heap_path=None, |
| 201 | staleness_check_interval=0.25, |
| 202 | verify_checksums=True, |
| 203 | max_segment_bytes=DEFAULT_MAX_SEGMENT_BYTES, |
| 204 | ): |
| 205 | _ensure_xxhash() |
no outgoing calls