hub / github.com/pathwaycom/pathway / read

Function read

python/pathway/io/fs/__init__.py:32–269 · view source on GitHub ↗

Reads a table from one or several files with the specified format. In case the format is ``"plaintext"``, the table will consist of a single column ``data`` with each cell containing a single line from the file. In case the format is one of ``"plaintext_by_file"`` or ``"binary"`` the t

(
    path: str | PathLike,
    format: Literal[
        "csv", "json", "plaintext", "plaintext_by_file", "binary", "only_metadata"
    ],
    *,
    schema: type[Schema] | None = None,
    mode: Literal["streaming", "static"] = "streaming",
    csv_settings: CsvParserSettings | None = None,
    json_field_paths: dict[str, str] | None = None,
    object_pattern: str = "*",
    with_metadata: bool = False,
    name: str | None = None,
    autocommit_duration_ms: int | None = 1500,
    max_backlog_size: int | None = None,
    debug_data: Any = None,
    _stacklevel: int = 1,
    **kwargs,
)

Source from the content-addressed store, hash-verified

30	@check_arg_types
31	@trace_user_frame
32	def read(
33	path: str \| PathLike,
34	format: Literal[
35	"csv", "json", "plaintext", "plaintext_by_file", "binary", "only_metadata"
36	],
37	*,
38	schema: type[Schema] \| None = None,
39	mode: Literal["streaming", "static"] = "streaming",
40	csv_settings: CsvParserSettings \| None = None,
41	json_field_paths: dict[str, str] \| None = None,
42	object_pattern: str = "*",
43	with_metadata: bool = False,
44	name: str \| None = None,
45	autocommit_duration_ms: int \| None = 1500,
46	max_backlog_size: int \| None = None,
47	debug_data: Any = None,
48	_stacklevel: int = 1,
49	**kwargs,
50	) -> Table:
51	"""Reads a table from one or several files with the specified format.
52
53	In case the format is ``"plaintext"``, the table will consist of a single column
54	``data`` with each cell containing a single line from the file.
55
56	In case the format is one of ``"plaintext_by_file"`` or ``"binary"`` the table will
57	consist of a single column ``data`` with each cell containing contents of the whole file.
58
59	If the format is ``"only_metadata"``, only the metadata column will be read, without
60	opening and without reading the contents of the files. The metadata is then available
61	in the ``_metadata`` column.
62
63	Args:
64	path: Path to the file or to the folder with files or
65	`glob <https://en.wikipedia.org/wiki/Glob_(programming)>`_ pattern for the
66	objects to be read. The connector will read the contents of all matching files as well
67	as recursively read the contents of all matching folders.
68	format: Format of data to be read. Currently ``"csv"``, ``"json"``, ``"plaintext"``,
69	``"plaintext_by_file"``, ``"binary"``, and ``"only_metadata"`` formats are
70	supported. The difference between ``"plaintext"`` and ``"plaintext_by_file"`` is
71	how the input is tokenized: if the ``"plaintext"`` option is chosen, it's split
72	by the newlines. Otherwise, the files are split in full and one row will
73	correspond to one file. In case the ``"binary"`` format is specified,
74	the data is read as raw bytes without UTF-8 parsing. Finally, if ``"only_metadata"``
75	is chosen, the connector only scans the filesystem for file additions,
76	changes, modifications, and provides them in the metadata column.
77	schema: Schema of the resulting table.
78	mode: Denotes how the engine polls the new data from the source. Currently
79	``"streaming"`` and ``"static"`` are supported. If set to ``"streaming"`` the engine will wait for
80	the updates in the specified directory. It will track file additions, deletions, and
81	modifications and reflect these events in the state. For example, if a file was deleted,
82	``"streaming"`` mode will also remove rows obtained by reading this file from the table. On
83	the other hand, the ``"static"`` mode will only consider the available data and ingest all
84	of it in one commit. The default value is ``"streaming"``.
85	csv_settings: Settings for the CSV parser. This parameter is used only in case
86	the specified format is ``"csv"``.
87	json_field_paths: If the format is ``"json"``, this field allows to map field names
88	into path in the read json object. For the field which require such mapping,
89	it should be given in the format ``<field_name>: <path to be mapped>``,

Callers

nothing calls this directly

Calls 6

internal_connector_modeFunction · 0.90

internal_read_methodFunction · 0.90

construct_schema_and_data_formatFunction · 0.90

_get_unique_nameFunction · 0.90

table_from_datasourceFunction · 0.90

selectMethod · 0.45

Tested by

no test coverage detected