MCPcopy
hub / github.com/ray-project/ray / join

Method join

python/ray/data/dataset.py:3052–3223  ·  view source on GitHub ↗

Join :class:`Datasets ` on join keys. Args: ds: Other dataset to join against join_type: The kind of join that should be performed, one of ("inner", "left_outer", "right_outer", "full_outer", "left_semi", "right_semi",

(
        self,
        ds: "Dataset",
        join_type: str,
        num_partitions: int,
        on: Tuple[str] = ("id",),
        right_on: Optional[Tuple[str]] = None,
        left_suffix: Optional[str] = None,
        right_suffix: Optional[str] = None,
        *,
        partition_size_hint: Optional[int] = None,
        aggregator_ray_remote_args: Optional[Dict[str, Any]] = None,
        validate_schemas: bool = False,
    )

Source from the content-addressed store, hash-verified

3050 @AllToAllAPI
3051 @PublicAPI(api_group=SMJ_API_GROUP)
3052 def join(
3053 self,
3054 ds: "Dataset",
3055 join_type: str,
3056 num_partitions: int,
3057 on: Tuple[str] = ("id",),
3058 right_on: Optional[Tuple[str]] = None,
3059 left_suffix: Optional[str] = None,
3060 right_suffix: Optional[str] = None,
3061 *,
3062 partition_size_hint: Optional[int] = None,
3063 aggregator_ray_remote_args: Optional[Dict[str, Any]] = None,
3064 validate_schemas: bool = False,
3065 ) -> "Dataset":
3066 """Join :class:`Datasets <ray.data.Dataset>` on join keys.
3067
3068 Args:
3069 ds: Other dataset to join against
3070 join_type: The kind of join that should be performed, one of ("inner",
3071 "left_outer", "right_outer", "full_outer", "left_semi", "right_semi",
3072 "left_anti", "right_anti").
3073 num_partitions: Total number of "partitions" input sequences will be split
3074 into with each partition being joined independently. Increasing number
3075 of partitions allows to reduce individual partition size, hence reducing
3076 memory requirements when individual partitions are being joined. Note
3077 that, consequently, this will also be a total number of blocks that will
3078 be produced as a result of executing join.
3079 on: The columns from the left operand that will be used as
3080 keys for the join operation.
3081 right_on: The columns from the right operand that will be
3082 used as keys for the join operation. When none, `on` will
3083 be assumed to be a list of columns to be used for the right dataset
3084 as well.
3085 left_suffix: (Optional) Suffix to be appended for columns of the left
3086 operand.
3087 right_suffix: (Optional) Suffix to be appended for columns of the right
3088 operand.
3089 partition_size_hint: (Optional) Hint to joining operator about the estimated
3090 avg expected size of the individual partition (in bytes).
3091 This is used in estimating the total dataset size and allow to tune
3092 memory requirement of the individual joining workers to prevent OOMs
3093 when joining very large datasets.
3094 aggregator_ray_remote_args: (Optional) Parameter overriding `ray.remote`
3095 args passed when constructing joining (aggregator) workers.
3096 validate_schemas: (Optional) Controls whether validation of provided
3097 configuration against input schemas will be performed (defaults to
3098 false, since obtaining schemas could be prohibitively expensive).
3099
3100 Returns:
3101 A :class:`Dataset` that holds rows of input left Dataset joined with the
3102 right Dataset based on join type and keys.
3103
3104 Examples:
3105
3106 .. testcode::
3107 :skipif: True
3108
3109 doubles_ds = ray.data.range(4).map(

Callers 15

setup.pyFile · 0.45
find_versionFunction · 0.45
_find_bazel_binFunction · 0.45
fixed_isdirFunction · 0.45
buildFunction · 0.45
_walk_thirdparty_dirFunction · 0.45
copy_fileFunction · 0.45
pip_runFunction · 0.45
_init_argsMethod · 0.45
to_bytesMethod · 0.45

Calls 5

schemaMethod · 0.95
JoinClass · 0.90
LogicalPlanClass · 0.90
_validate_schemasMethod · 0.80
_from_parentMethod · 0.80

Tested by 15

test_tilde_expansionMethod · 0.36
start_usage_stats_serverFunction · 0.36
download_model_from_s3Function · 0.36
get_expected_contentMethod · 0.36
mock_serverFunction · 0.36
test_chatMethod · 0.36
test_completionMethod · 0.36