MCPcopy Index your code
hub / github.com/uber/petastorm / NdarrayCodec

Class NdarrayCodec

petastorm/codecs.py:133–171  ·  view source on GitHub ↗

Encodes numpy ndarray into, or decodes an ndarray from, a spark dataframe field.

Source from the content-addressed store, hash-verified

131
132
133class NdarrayCodec(DataframeColumnCodec):
134 """Encodes numpy ndarray into, or decodes an ndarray from, a spark dataframe field."""
135
136 def encode(self, unischema_field, value):
137 expected_dtype = unischema_field.numpy_dtype
138 if isinstance(value, np.ndarray):
139 if expected_dtype != value.dtype.type:
140 raise ValueError('Unexpected type of {} feature. '
141 'Expected {}. Got {}'.format(unischema_field.name, expected_dtype, value.dtype))
142
143 expected_shape = unischema_field.shape
144 if not _is_compliant_shape(value.shape, expected_shape):
145 raise ValueError('Unexpected dimensions of {} feature. '
146 'Expected {}. Got {}'.format(unischema_field.name, expected_shape, value.shape))
147 else:
148 raise ValueError('Unexpected type of {} feature. '
149 'Expected ndarray of {}. Got {}'.format(unischema_field.name, expected_dtype, type(value)))
150
151 memfile = BytesIO()
152 np.save(memfile, value)
153 return bytearray(memfile.getvalue())
154
155 def decode(self, unischema_field, value):
156 memfile = BytesIO(value)
157 return np.load(memfile)
158
159 def spark_dtype(self):
160 # Lazy loading pyspark to avoid creating pyspark dependency on data reading code path
161 # (currently works only with make_batch_reader). We should move all pyspark related code into a separate module
162 import pyspark.sql.types as sql_types
163
164 return sql_types.BinaryType()
165
166 def __str__(self):
167 """Represent this as the following form:
168
169 >>> NdarrayCodec()
170 """
171 return f'{type(self).__name__}()'
172
173
174class CompressedNdarrayCodec(DataframeColumnCodec):

Callers 8

schema.pyFile · 0.90
test_common.pyFile · 0.90
test_str_special_methodFunction · 0.90
test_decode_row.pyFile · 0.90
test_nominal_caseFunction · 0.90
test_make_named_tupleFunction · 0.90

Calls

no outgoing calls

Tested by 4

test_str_special_methodFunction · 0.72
test_nominal_caseFunction · 0.72
test_make_named_tupleFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…