Serialize entity key to a bytestring so it can be used as a lookup key in a hash table. We need this encoding to be stable; therefore we cannot just use protobuf serialization here since it does not guarantee that two proto messages containing the same data will serialize to the sa
(
entity_key: EntityKeyProto, entity_key_serialization_version=3
)
| 121 | |
| 122 | |
| 123 | def serialize_entity_key( |
| 124 | entity_key: EntityKeyProto, entity_key_serialization_version=3 |
| 125 | ) -> bytes: |
| 126 | """ |
| 127 | Serialize entity key to a bytestring so it can be used as a lookup key in a hash table. |
| 128 | |
| 129 | We need this encoding to be stable; therefore we cannot just use protobuf serialization |
| 130 | here since it does not guarantee that two proto messages containing the same data will |
| 131 | serialize to the same byte string[1]. |
| 132 | |
| 133 | [1] https://developers.google.com/protocol-buffers/docs/encoding |
| 134 | |
| 135 | Args: |
| 136 | entity_key_serialization_version: version of the entity key serialization |
| 137 | Versions: |
| 138 | version 3: entity_key size is added to the serialization for deserialization purposes |
| 139 | entity_key: EntityKeyProto |
| 140 | |
| 141 | Returns: bytes of the serialized entity key |
| 142 | """ |
| 143 | if entity_key_serialization_version < 3: |
| 144 | # Not raising the error, keeping it in warning state for reserialization purpose |
| 145 | # We should remove this after few releases |
| 146 | warnings.warn( |
| 147 | "Serialization of entity key with version < 3 is removed. Please use version 3 by setting entity_key_serialization_version=3." |
| 148 | "To reserializa your online store featrues refer - https://github.com/feast-dev/feast/blob/master/docs/how-to-guides/entity-reserialization-of-from-v2-to-v3.md" |
| 149 | ) |
| 150 | |
| 151 | sorted_keys: List[str] |
| 152 | sorted_values: List[ValueProto] |
| 153 | if not entity_key.join_keys: |
| 154 | sorted_keys = [] |
| 155 | sorted_values = [] |
| 156 | elif len(entity_key.join_keys) == 1: |
| 157 | # Fast path: single entity, no sorting needed |
| 158 | sorted_keys = [entity_key.join_keys[0]] |
| 159 | sorted_values = [entity_key.entity_values[0]] |
| 160 | else: |
| 161 | # Multi-entity: use sorting |
| 162 | pairs = sorted(zip(entity_key.join_keys, entity_key.entity_values)) |
| 163 | sorted_keys = [k for k, _ in pairs] |
| 164 | sorted_values = [v for _, v in pairs] |
| 165 | |
| 166 | output: List[bytes] = [] |
| 167 | |
| 168 | if entity_key_serialization_version > 2: |
| 169 | output.append(struct.pack("<I", len(sorted_keys))) |
| 170 | |
| 171 | # Optimize key encoding by pre-encoding all strings |
| 172 | if sorted_keys: |
| 173 | encoded_keys = [k.encode("utf8") for k in sorted_keys] |
| 174 | for i, k_encoded in enumerate(encoded_keys): |
| 175 | output.append(struct.pack("<I", ValueType.STRING)) |
| 176 | if entity_key_serialization_version > 2: |
| 177 | output.append(struct.pack("<I", len(k_encoded))) |
| 178 | output.append(k_encoded) |
| 179 | |
| 180 | for v in sorted_values: |