Returns a :class:`petastorm.unischema.Unischema` object loaded from a dataset specified by a url. :param dataset_url_or_urls: a url to a parquet directory or a url list (with the same scheme) to parquet files. :param hdfs_driver: A string denoting the hdfs driver to use (if using a dataset
(dataset_url_or_urls, hdfs_driver='libhdfs3', storage_options=None, filesystem=None)
| 386 | |
| 387 | |
| 388 | def get_schema_from_dataset_url(dataset_url_or_urls, hdfs_driver='libhdfs3', storage_options=None, filesystem=None): |
| 389 | """Returns a :class:`petastorm.unischema.Unischema` object loaded from a dataset specified by a url. |
| 390 | |
| 391 | :param dataset_url_or_urls: a url to a parquet directory or a url list (with the same scheme) to parquet files. |
| 392 | :param hdfs_driver: A string denoting the hdfs driver to use (if using a dataset on hdfs). Current choices are |
| 393 | libhdfs (java through JNI) or libhdfs3 (C++) |
| 394 | :param storage_options: Dict of kwargs forwarded to ``fsspec`` to initialize the filesystem. |
| 395 | :param fileystem: the ``pyarrow.FileSystem`` to use. |
| 396 | :return: A :class:`petastorm.unischema.Unischema` object |
| 397 | """ |
| 398 | fs, path_or_paths = get_filesystem_and_path_or_paths(dataset_url_or_urls, hdfs_driver, |
| 399 | storage_options=storage_options, |
| 400 | filesystem=filesystem) |
| 401 | |
| 402 | dataset = pq.ParquetDataset(path_or_paths, filesystem=fs, validate_schema=False, metadata_nthreads=10) |
| 403 | |
| 404 | # Get a unischema stored in the dataset metadata. |
| 405 | stored_schema = get_schema(dataset) |
| 406 | |
| 407 | return stored_schema |
| 408 | |
| 409 | |
| 410 | def infer_or_load_unischema(dataset): |
searching dependent graphs…