Method sample

dask/dataframe/dask_expr/_collection.py:2578–2624 · view source on GitHub ↗

Random sample of items Parameters ---------- n : int, optional Number of items to return is not supported by dask. Use frac instead. frac : float, optional Approximate fraction of items to return. This sampling fraction is

(self, n=None, frac=None, replace=False, random_state=None)

Source from the content-addressed store, hash-verified

2576	return func(self, args, *kwargs)
2577
2578	def sample(self, n=None, frac=None, replace=False, random_state=None):
2579	"""Random sample of items
2580
2581	Parameters
2582	----------
2583	n : int, optional
2584	Number of items to return is not supported by dask. Use frac
2585	instead.
2586	frac : float, optional
2587	Approximate fraction of items to return. This sampling fraction is
2588	applied to all partitions equally. Note that this is an
2589	approximate fraction. You should not expect exactly ``len(df) * frac``
2590	items to be returned, as the exact number of elements selected will
2591	depend on how your data is partitioned (but should be pretty close
2592	in practice).
2593	replace : boolean, optional
2594	Sample with or without replacement. Default = False.
2595	random_state : int or ``np.random.RandomState``
2596	If an int, we create a new RandomState with this as the seed;
2597	Otherwise we draw from the passed RandomState.
2598
2599	See Also
2600	--------
2601	DataFrame.random_split
2602	pandas.DataFrame.sample
2603	"""
2604	if n is not None:
2605	msg = (
2606	"sample does not support the number of sampled items "
2607	"parameter, 'n'. Please use the 'frac' parameter instead."
2608	)
2609	if isinstance(n, Number) and 0 <= n <= 1:
2610	warnings.warn(msg)
2611	frac = n
2612	else:
2613	raise ValueError(msg)
2614
2615	if frac is None:
2616	raise ValueError("frac must not be None")
2617
2618	if random_state is None:
2619	random_state = np.random.RandomState()
2620
2621	state_data = random_state_data(self.npartitions, random_state)
2622	return new_collection(
2623	expr.Sample(self, state_data=state_data, frac=frac, replace=replace)
2624	)
2625
2626	def _repr_data(self):
2627	raise NotImplementedError

Callers 15

sizeof_python_collectionFunction · 0.80

test_sample_with_more_bag_partitionsFunction · 0.80

test_sample_size_exactly_kFunction · 0.80

test_sample_k_bigger_than_bag_sizeFunction · 0.80

test_sample_empty_partitionFunction · 0.80

test_sample_size_k_bigger_than_smallest_partition_sizeFunction · 0.80

test_sample_k_equal_bag_size_with_unbalanced_partitionsFunction · 0.80

test_sample_k_larger_than_partitionsFunction · 0.80

test_sample_return_bagFunction · 0.80

test_reservoir_sample_map_partitions_correctnessFunction · 0.80

sampleFunction · 0.80

pd_splitFunction · 0.80

Calls 3

random_state_dataFunction · 0.90

new_collectionFunction · 0.90

RandomStateMethod · 0.45

Tested by 15

test_sample_with_more_bag_partitionsFunction · 0.64

test_sample_size_exactly_kFunction · 0.64

test_sample_k_bigger_than_bag_sizeFunction · 0.64

test_sample_empty_partitionFunction · 0.64

test_sample_size_k_bigger_than_smallest_partition_sizeFunction · 0.64

test_sample_k_equal_bag_size_with_unbalanced_partitionsFunction · 0.64

test_sample_k_larger_than_partitionsFunction · 0.64

test_sample_return_bagFunction · 0.64

test_reservoir_sample_map_partitions_correctnessFunction · 0.64

test_sampleFunction · 0.64

test_embarrassingly_parallel_operationsFunction · 0.64

test_sampleFunction · 0.64