Filter out rows that don't satisfy the given predicate. You can use either a function or a callable class or an expression to perform the transformation. For functions, Ray Data uses stateless Ray tasks. For classes, Ray Data uses stateful Ray actors. For more inform
(
self,
fn: Optional[UserDefinedFunction[Dict[str, Any], bool]] = None,
expr: Optional[Union[str, Expr]] = None,
*,
compute: Union[str, ComputeStrategy] = None,
fn_args: Optional[Iterable[Any]] = None,
fn_kwargs: Optional[Dict[str, Any]] = None,
fn_constructor_args: Optional[Iterable[Any]] = None,
fn_constructor_kwargs: Optional[Dict[str, Any]] = None,
num_cpus: Optional[float] = None,
num_gpus: Optional[float] = None,
memory: Optional[float] = None,
concurrency: Optional[Union[int, Tuple[int, int], Tuple[int, int, int]]] = None,
ray_remote_args_fn: Optional[Callable[[], Dict[str, Any]]] = None,
**ray_remote_args,
)
| 1549 | |
| 1550 | @PublicAPI(api_group=BT_API_GROUP) |
| 1551 | def filter( |
| 1552 | self, |
| 1553 | fn: Optional[UserDefinedFunction[Dict[str, Any], bool]] = None, |
| 1554 | expr: Optional[Union[str, Expr]] = None, |
| 1555 | *, |
| 1556 | compute: Union[str, ComputeStrategy] = None, |
| 1557 | fn_args: Optional[Iterable[Any]] = None, |
| 1558 | fn_kwargs: Optional[Dict[str, Any]] = None, |
| 1559 | fn_constructor_args: Optional[Iterable[Any]] = None, |
| 1560 | fn_constructor_kwargs: Optional[Dict[str, Any]] = None, |
| 1561 | num_cpus: Optional[float] = None, |
| 1562 | num_gpus: Optional[float] = None, |
| 1563 | memory: Optional[float] = None, |
| 1564 | concurrency: Optional[Union[int, Tuple[int, int], Tuple[int, int, int]]] = None, |
| 1565 | ray_remote_args_fn: Optional[Callable[[], Dict[str, Any]]] = None, |
| 1566 | **ray_remote_args, |
| 1567 | ) -> "Dataset": |
| 1568 | """Filter out rows that don't satisfy the given predicate. |
| 1569 | |
| 1570 | You can use either a function or a callable class or an expression to |
| 1571 | perform the transformation. |
| 1572 | For functions, Ray Data uses stateless Ray tasks. For classes, Ray Data uses |
| 1573 | stateful Ray actors. For more information, see |
| 1574 | :ref:`Stateful Transforms <stateful_transforms>`. |
| 1575 | |
| 1576 | .. tip:: |
| 1577 | If you use the `expr` parameter with a predicate expression, Ray Data |
| 1578 | optimizes your filter with native Arrow interfaces. |
| 1579 | |
| 1580 | .. deprecated:: |
| 1581 | String expressions are deprecated and will be removed in a future version. |
| 1582 | Use predicate expressions from `ray.data.expressions` instead. |
| 1583 | |
| 1584 | Examples: |
| 1585 | |
| 1586 | >>> import ray |
| 1587 | >>> from ray.data.expressions import col |
| 1588 | >>> ds = ray.data.range(100) |
| 1589 | >>> # String expressions (deprecated - will warn) |
| 1590 | >>> ds.filter(expr="id <= 4").take_all() |
| 1591 | [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}] |
| 1592 | >>> # Using predicate expressions (preferred) |
| 1593 | >>> ds.filter(expr=(col("id") > 10) & (col("id") < 20)).take_all() |
| 1594 | [{'id': 11}, {'id': 12}, {'id': 13}, {'id': 14}, {'id': 15}, {'id': 16}, {'id': 17}, {'id': 18}, {'id': 19}] |
| 1595 | |
| 1596 | Time complexity: O(dataset size / parallelism) |
| 1597 | |
| 1598 | Args: |
| 1599 | fn: The predicate to apply to each row, or a class type |
| 1600 | that can be instantiated to create such a callable. |
| 1601 | expr: An expression that represents a predicate (boolean condition) for filtering. |
| 1602 | Can be either a string expression (deprecated) or a predicate expression |
| 1603 | from `ray.data.expressions`. |
| 1604 | compute: The compute strategy to use for the map operation. |
| 1605 | |
| 1606 | * If ``compute`` is not specified for a function, will use ``ray.data.TaskPoolStrategy()`` to launch concurrent tasks based on the available resources and number of input blocks. |
| 1607 | |
| 1608 | * Use ``ray.data.TaskPoolStrategy(size=n)`` to launch at most ``n`` concurrent Ray tasks. |
no test coverage detected