Pushes the [`IterableDatasetDict`] to the hub as a Parquet dataset. The [`IterableDatasetDict`] is pushed using HTTP requests and does not need to have neither git or git-lfs installed. Each dataset split will be pushed independently. The pushed dataset will keep the original split
(
self,
repo_id: str,
config_name: str = "default",
set_default: Optional[bool] = None,
data_dir: Optional[str] = None,
commit_message: Optional[str] = None,
commit_description: Optional[str] = None,
private: Optional[bool] = None,
token: Optional[str] = None,
revision: Optional[str] = None,
create_pr: Optional[bool] = False,
max_shard_size: Optional[Union[int, str]] = None,
num_shards: Optional[dict[str, int]] = None,
embed_external_files: bool = True,
num_proc: Optional[int] = None,
)
| 2329 | return IterableDatasetDict({k: dataset.cast(features=features) for k, dataset in self.items()}) |
| 2330 | |
| 2331 | def push_to_hub( |
| 2332 | self, |
| 2333 | repo_id: str, |
| 2334 | config_name: str = "default", |
| 2335 | set_default: Optional[bool] = None, |
| 2336 | data_dir: Optional[str] = None, |
| 2337 | commit_message: Optional[str] = None, |
| 2338 | commit_description: Optional[str] = None, |
| 2339 | private: Optional[bool] = None, |
| 2340 | token: Optional[str] = None, |
| 2341 | revision: Optional[str] = None, |
| 2342 | create_pr: Optional[bool] = False, |
| 2343 | max_shard_size: Optional[Union[int, str]] = None, |
| 2344 | num_shards: Optional[dict[str, int]] = None, |
| 2345 | embed_external_files: bool = True, |
| 2346 | num_proc: Optional[int] = None, |
| 2347 | ) -> CommitInfo: |
| 2348 | """Pushes the [`IterableDatasetDict`] to the hub as a Parquet dataset. |
| 2349 | The [`IterableDatasetDict`] is pushed using HTTP requests and does not need to have neither git or git-lfs installed. |
| 2350 | |
| 2351 | Each dataset split will be pushed independently. The pushed dataset will keep the original split names. |
| 2352 | |
| 2353 | The resulting Parquet files are self-contained by default: if your dataset contains [`Image`] or [`Audio`] |
| 2354 | data, the Parquet files will store the bytes of your images or audio files. |
| 2355 | You can disable this by setting `embed_external_files` to False. |
| 2356 | |
| 2357 | Args: |
| 2358 | repo_id (`str`): |
| 2359 | The ID of the repository to push to in the following format: `<user>/<dataset_name>` or |
| 2360 | `<org>/<dataset_name>`. Also accepts `<dataset_name>`, which will default to the namespace |
| 2361 | of the logged-in user. |
| 2362 | |
| 2363 | It could also be a location inside a bucket, e.g. `buckets/<user_or_org>/<bucket_name>/...` |
| 2364 | config_name (`str`): |
| 2365 | Configuration name of a dataset. Defaults to "default". |
| 2366 | set_default (`bool`, *optional*): |
| 2367 | Whether to set this configuration as the default one. Otherwise, the default configuration is the one |
| 2368 | named "default". |
| 2369 | data_dir (`str`, *optional*): |
| 2370 | Directory name that will contain the uploaded data files. Defaults to the `config_name` if different |
| 2371 | from "default", else "data". |
| 2372 | |
| 2373 | <Added version="2.17.0"/> |
| 2374 | commit_message (`str`, *optional*): |
| 2375 | Message to commit while pushing. Will default to `"Upload dataset"`. |
| 2376 | commit_description (`str`, *optional*): |
| 2377 | Description of the commit that will be created. |
| 2378 | Additionally, description of the PR if a PR is created (`create_pr` is True). |
| 2379 | |
| 2380 | <Added version="2.16.0"/> |
| 2381 | private (`bool`, *optional*): |
| 2382 | Whether to make the repo private. If `None` (default), the repo will be public unless the |
| 2383 | organization's default is private. This value is ignored if the repo already exists. |
| 2384 | token (`str`, *optional*): |
| 2385 | An optional authentication token for the Hugging Face Hub. If no token is passed, will default |
| 2386 | to the token saved locally when logging in with `huggingface-cli login`. Will raise an error |
| 2387 | if no token is passed and the user is not logged-in. |
| 2388 | revision (`str`, *optional*): |