Reads a table from one or several files with the specified format. In case the format is ``"plaintext"``, the table will consist of a single column ``data`` with each cell containing a single line from the file. In case the format is one of ``"plaintext_by_file"`` or ``"binary"`` the t
(
path: str | PathLike,
format: Literal[
"csv", "json", "plaintext", "plaintext_by_file", "binary", "only_metadata"
],
*,
schema: type[Schema] | None = None,
mode: Literal["streaming", "static"] = "streaming",
csv_settings: CsvParserSettings | None = None,
json_field_paths: dict[str, str] | None = None,
object_pattern: str = "*",
with_metadata: bool = False,
name: str | None = None,
autocommit_duration_ms: int | None = 1500,
max_backlog_size: int | None = None,
debug_data: Any = None,
_stacklevel: int = 1,
**kwargs,
)
| 30 | @check_arg_types |
| 31 | @trace_user_frame |
| 32 | def read( |
| 33 | path: str | PathLike, |
| 34 | format: Literal[ |
| 35 | "csv", "json", "plaintext", "plaintext_by_file", "binary", "only_metadata" |
| 36 | ], |
| 37 | *, |
| 38 | schema: type[Schema] | None = None, |
| 39 | mode: Literal["streaming", "static"] = "streaming", |
| 40 | csv_settings: CsvParserSettings | None = None, |
| 41 | json_field_paths: dict[str, str] | None = None, |
| 42 | object_pattern: str = "*", |
| 43 | with_metadata: bool = False, |
| 44 | name: str | None = None, |
| 45 | autocommit_duration_ms: int | None = 1500, |
| 46 | max_backlog_size: int | None = None, |
| 47 | debug_data: Any = None, |
| 48 | _stacklevel: int = 1, |
| 49 | **kwargs, |
| 50 | ) -> Table: |
| 51 | """Reads a table from one or several files with the specified format. |
| 52 | |
| 53 | In case the format is ``"plaintext"``, the table will consist of a single column |
| 54 | ``data`` with each cell containing a single line from the file. |
| 55 | |
| 56 | In case the format is one of ``"plaintext_by_file"`` or ``"binary"`` the table will |
| 57 | consist of a single column ``data`` with each cell containing contents of the whole file. |
| 58 | |
| 59 | If the format is ``"only_metadata"``, only the metadata column will be read, without |
| 60 | opening and without reading the contents of the files. The metadata is then available |
| 61 | in the ``_metadata`` column. |
| 62 | |
| 63 | Args: |
| 64 | path: Path to the file or to the folder with files or |
| 65 | `glob <https://en.wikipedia.org/wiki/Glob_(programming)>`_ pattern for the |
| 66 | objects to be read. The connector will read the contents of all matching files as well |
| 67 | as recursively read the contents of all matching folders. |
| 68 | format: Format of data to be read. Currently ``"csv"``, ``"json"``, ``"plaintext"``, |
| 69 | ``"plaintext_by_file"``, ``"binary"``, and ``"only_metadata"`` formats are |
| 70 | supported. The difference between ``"plaintext"`` and ``"plaintext_by_file"`` is |
| 71 | how the input is tokenized: if the ``"plaintext"`` option is chosen, it's split |
| 72 | by the newlines. Otherwise, the files are split in full and one row will |
| 73 | correspond to one file. In case the ``"binary"`` format is specified, |
| 74 | the data is read as raw bytes without UTF-8 parsing. Finally, if ``"only_metadata"`` |
| 75 | is chosen, the connector only scans the filesystem for file additions, |
| 76 | changes, modifications, and provides them in the metadata column. |
| 77 | schema: Schema of the resulting table. |
| 78 | mode: Denotes how the engine polls the new data from the source. Currently |
| 79 | ``"streaming"`` and ``"static"`` are supported. If set to ``"streaming"`` the engine will wait for |
| 80 | the updates in the specified directory. It will track file additions, deletions, and |
| 81 | modifications and reflect these events in the state. For example, if a file was deleted, |
| 82 | ``"streaming"`` mode will also remove rows obtained by reading this file from the table. On |
| 83 | the other hand, the ``"static"`` mode will only consider the available data and ingest all |
| 84 | of it in one commit. The default value is ``"streaming"``. |
| 85 | csv_settings: Settings for the CSV parser. This parameter is used only in case |
| 86 | the specified format is ``"csv"``. |
| 87 | json_field_paths: If the format is ``"json"``, this field allows to map field names |
| 88 | into path in the read json object. For the field which require such mapping, |
| 89 | it should be given in the format ``<field_name>: <path to be mapped>``, |
nothing calls this directly
no test coverage detected