MCPcopy
hub / github.com/dask/dask / drop_duplicates

Method drop_duplicates

dask/dataframe/dask_expr/_collection.py:4341–4361  ·  view source on GitHub ↗
(
        self,
        ignore_index=False,
        split_every=None,
        split_out=True,
        shuffle_method=None,
        keep="first",
    )

Source from the content-addressed store, hash-verified

4339 return uniqs.size
4340
4341 def drop_duplicates(
4342 self,
4343 ignore_index=False,
4344 split_every=None,
4345 split_out=True,
4346 shuffle_method=None,
4347 keep="first",
4348 ):
4349 shuffle_method = _get_shuffle_preferring_order(shuffle_method)
4350 if keep is False:
4351 raise NotImplementedError("drop_duplicates with keep=False")
4352 return new_collection(
4353 DropDuplicates(
4354 self,
4355 ignore_index=ignore_index,
4356 split_out=split_out,
4357 split_every=split_every,
4358 shuffle_method=shuffle_method,
4359 keep=keep,
4360 )
4361 )
4362
4363 @insert_meta_param_description(pad=12)
4364 def apply(self, function, *args, meta=no_default, axis=0, **kwargs):

Callers 15

nuniqueMethod · 0.95
test_merge_after_renameFunction · 0.95
test_split_everyFunction · 0.95
merge_chunkFunction · 0.45
_get_categoriesFunction · 0.45
_get_categories_aggFunction · 0.45
_nunique_df_chunkFunction · 0.45
most_recent_tail_summaryFunction · 0.45
most_recent_head_summaryFunction · 0.45
_lowerMethod · 0.45
drop_duplicatesMethod · 0.45
test_disk_shuffleFunction · 0.45

Calls 3

new_collectionFunction · 0.90
DropDuplicatesClass · 0.90

Tested by 15

test_merge_after_renameFunction · 0.76
test_split_everyFunction · 0.76
test_disk_shuffleFunction · 0.36
test_task_shuffleFunction · 0.36
test_task_shuffle_indexFunction · 0.36
test_drop_duplicatesFunction · 0.36
test_drop_duplicatesFunction · 0.36