Filesystem-based transactional key-value store. Transactionality is achieved by using a log (aka journal) to record changes. The log is a series of numbered files called segments. Each segment is a series of log entries. The segment number together with the offset of each entry rel
| 66 | |
| 67 | |
| 68 | class LegacyRepository: |
| 69 | """ |
| 70 | Filesystem-based transactional key-value store. |
| 71 | |
| 72 | Transactionality is achieved by using a log (aka journal) to record changes. The log is a series of numbered files |
| 73 | called segments. Each segment is a series of log entries. The segment number together with the offset of each |
| 74 | entry relative to its segment start establishes an ordering of the log entries. This is the "definition" of |
| 75 | time for the purposes of the log. |
| 76 | |
| 77 | Log entries are either PUT, DELETE, or COMMIT. |
| 78 | |
| 79 | A COMMIT is always the final log entry in a segment and marks all data from the beginning of the log until the |
| 80 | segment ending with the COMMIT as committed and consistent. The segment number of a segment ending with a COMMIT |
| 81 | is called the transaction ID of that commit, and a segment ending with a COMMIT is called committed. |
| 82 | |
| 83 | When reading from a repository it is first checked whether the last segment is committed. If it is not, then |
| 84 | all segments after the last committed segment are deleted; they contain log entries whose consistency is not |
| 85 | established by a COMMIT. |
| 86 | |
| 87 | Note that the COMMIT cannot establish consistency by itself, but only manages to do so with proper support from |
| 88 | the platform (including the hardware). See platform.base.SyncFile for details. |
| 89 | |
| 90 | A PUT inserts a key-value pair. The value is stored in the log entry, hence the repository implements |
| 91 | full data logging, meaning that all data is consistent, not just metadata (which is common in filesystems). |
| 92 | |
| 93 | A DELETE marks a key as deleted. |
| 94 | |
| 95 | For a given key only the last entry regarding the key, which is called current (all other entries are called |
| 96 | superseded), is relevant: If there is no entry or the last entry is a DELETE then the key does not exist. |
| 97 | Otherwise the last PUT defines the value of the key. |
| 98 | |
| 99 | By superseding a PUT (with either another PUT or a DELETE) the log entry becomes obsolete. A segment containing |
| 100 | such obsolete entries is called sparse, while a segment containing no such entries is called compact. |
| 101 | |
| 102 | Sparse segments can be compacted and thereby disk space freed. This destroys the transaction for which the |
| 103 | superseded entries were current. |
| 104 | |
| 105 | On-disk layout: |
| 106 | |
| 107 | dir/README |
| 108 | dir/config |
| 109 | dir/data/<X // SEGMENTS_PER_DIR>/<X> |
| 110 | dir/index.X |
| 111 | dir/hints.X |
| 112 | |
| 113 | Filesystem interaction |
| 114 | ---------------------- |
| 115 | |
| 116 | LoggedIO generally tries to rely on common behaviours across transactional file systems. |
| 117 | |
| 118 | Segments that are deleted are truncated first, which avoids problems if the FS needs to |
| 119 | allocate space to delete the dirent of the segment. This mostly affects CoW file systems, |
| 120 | traditional journaling file systems have a fairly good grip on this problem. |
| 121 | |
| 122 | Note that deletion, i.e. unlink(2), is atomic on every file system that uses inode reference |
| 123 | counts, which includes pretty much all of them. To remove a dirent the inodes refcount has |
| 124 | to be decreased, but you can't decrease the refcount before removing the dirent nor can you |
| 125 | decrease the refcount after removing the dirent. File systems solve this with a lock, |
no outgoing calls