Cast numpy/pytorch/tensorflow/pandas objects to python lists. It works recursively. If `optimize_list_casting` is True, To avoid iterating over possibly long lists, it first checks (recursively) if the first element that is not None or empty (if it is a sequence) has to be casted.
(obj: Any, only_1d_for_numpy=False, optimize_list_casting=True)
| 466 | |
| 467 | |
| 468 | def cast_to_python_objects(obj: Any, only_1d_for_numpy=False, optimize_list_casting=True) -> Any: |
| 469 | """ |
| 470 | Cast numpy/pytorch/tensorflow/pandas objects to python lists. |
| 471 | It works recursively. |
| 472 | |
| 473 | If `optimize_list_casting` is True, To avoid iterating over possibly long lists, it first checks (recursively) if the first element that is not None or empty (if it is a sequence) has to be casted. |
| 474 | If the first element needs to be casted, then all the elements of the list will be casted, otherwise they'll stay the same. |
| 475 | This trick allows to cast objects that contain tokenizers outputs without iterating over every single token for example. |
| 476 | |
| 477 | Args: |
| 478 | obj: the object (nested struct) to cast |
| 479 | only_1d_for_numpy (bool, default ``False``): whether to keep the full multi-dim tensors as multi-dim numpy arrays, or convert them to |
| 480 | nested lists of 1-dimensional numpy arrays. This can be useful to keep only 1-d arrays to instantiate Arrow arrays. |
| 481 | Indeed Arrow only support converting 1-dimensional array values. |
| 482 | optimize_list_casting (bool, default ``True``): whether to optimize list casting by checking the first non-null element to see if it needs to be casted |
| 483 | and if it doesn't, not checking the rest of the list elements. |
| 484 | |
| 485 | Returns: |
| 486 | casted_obj: the casted object |
| 487 | """ |
| 488 | return _cast_to_python_objects( |
| 489 | obj, only_1d_for_numpy=only_1d_for_numpy, optimize_list_casting=optimize_list_casting |
| 490 | )[0] |
| 491 | |
| 492 | |
| 493 | @dataclass(repr=False) |