hub / github.com/ray-project/ray / from_items

Function from_items

python/ray/data/read_api.py:166–254 · view source on GitHub ↗

Create a :class:`~ray.data.Dataset` from a list of local Python objects. Use this method to create small datasets from data that fits in memory. The column name defaults to "item". Examples: >>> import ray >>> ds = ray.data.from_items([1, 2, 3, 4, 5]) >>> ds #

(
    items: List[Any],
    *,
    parallelism: int = -1,
    override_num_blocks: Optional[int] = None,
)

Source from the content-addressed store, hash-verified

164
165	@PublicAPI
166	def from_items(
167	items: List[Any],
168	*,
169	parallelism: int = -1,
170	override_num_blocks: Optional[int] = None,
171	) -> MaterializedDataset:
172	"""Create a :class:`~ray.data.Dataset` from a list of local Python objects.
173
174	Use this method to create small datasets from data that fits in memory. The column
175	name defaults to "item".
176
177	Examples:
178
179	>>> import ray
180	>>> ds = ray.data.from_items([1, 2, 3, 4, 5])
181	>>> ds # doctest: +ELLIPSIS
182	shape: (5, 1)
183	╭───────╮
184	│ item │
185	│ --- │
186	│ int64 │
187	╞═══════╡
188	│ 1 │
189	│ 2 │
190	│ 3 │
191	│ 4 │
192	│ 5 │
193	╰───────╯
194	(Showing 5 of 5 rows)
195	>>> ds.schema()
196	Column Type
197	------ ----
198	item int64
199
200	Args:
201	items: List of local Python objects.
202	parallelism: This argument is deprecated. Use ``override_num_blocks`` argument.
203	override_num_blocks: Override the number of output blocks from all read tasks.
204	By default, the number of output blocks is dynamically decided based on
205	input data size and available resources. You shouldn't manually set this
206	value in most cases.
207
208	Returns:
209	A :class:`~ray.data.Dataset` holding the items.
210	"""
211	import builtins
212
213	parallelism = _get_num_output_blocks(parallelism, override_num_blocks)
214	if parallelism == 0:
215	raise ValueError(f"parallelism must be -1 or > 0, got: {parallelism}")
216
217	detected_parallelism, _, _ = _autodetect_parallelism(
218	parallelism,
219	ray.util.get_current_placement_group(),
220	DataContext.get_current(),
221	)
222	# Truncate parallelism to number of items to avoid empty blocks.
223	detected_parallelism = min(len(items), detected_parallelism)

Callers 1

from_tfFunction · 0.70

Calls 15

addMethod · 0.95

buildMethod · 0.95

_autodetect_parallelismFunction · 0.90

DelegatingBlockBuilderClass · 0.90

FromItemsClass · 0.90

DatasetStatsClass · 0.90

LogicalPlanClass · 0.90

MaterializedDatasetClass · 0.90

_get_num_output_blocksFunction · 0.85

from_blockMethod · 0.80

putMethod · 0.65

copyMethod · 0.65

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…