MCPcopy
hub / github.com/huggingface/datasets / push_to_hub

Method push_to_hub

src/datasets/dataset_dict.py:2331–2518  ·  view source on GitHub ↗

Pushes the [`IterableDatasetDict`] to the hub as a Parquet dataset. The [`IterableDatasetDict`] is pushed using HTTP requests and does not need to have neither git or git-lfs installed. Each dataset split will be pushed independently. The pushed dataset will keep the original split

(
        self,
        repo_id: str,
        config_name: str = "default",
        set_default: Optional[bool] = None,
        data_dir: Optional[str] = None,
        commit_message: Optional[str] = None,
        commit_description: Optional[str] = None,
        private: Optional[bool] = None,
        token: Optional[str] = None,
        revision: Optional[str] = None,
        create_pr: Optional[bool] = False,
        max_shard_size: Optional[Union[int, str]] = None,
        num_shards: Optional[dict[str, int]] = None,
        embed_external_files: bool = True,
        num_proc: Optional[int] = None,
    )

Source from the content-addressed store, hash-verified

2329 return IterableDatasetDict({k: dataset.cast(features=features) for k, dataset in self.items()})
2330
2331 def push_to_hub(
2332 self,
2333 repo_id: str,
2334 config_name: str = "default",
2335 set_default: Optional[bool] = None,
2336 data_dir: Optional[str] = None,
2337 commit_message: Optional[str] = None,
2338 commit_description: Optional[str] = None,
2339 private: Optional[bool] = None,
2340 token: Optional[str] = None,
2341 revision: Optional[str] = None,
2342 create_pr: Optional[bool] = False,
2343 max_shard_size: Optional[Union[int, str]] = None,
2344 num_shards: Optional[dict[str, int]] = None,
2345 embed_external_files: bool = True,
2346 num_proc: Optional[int] = None,
2347 ) -> CommitInfo:
2348 """Pushes the [`IterableDatasetDict`] to the hub as a Parquet dataset.
2349 The [`IterableDatasetDict`] is pushed using HTTP requests and does not need to have neither git or git-lfs installed.
2350
2351 Each dataset split will be pushed independently. The pushed dataset will keep the original split names.
2352
2353 The resulting Parquet files are self-contained by default: if your dataset contains [`Image`] or [`Audio`]
2354 data, the Parquet files will store the bytes of your images or audio files.
2355 You can disable this by setting `embed_external_files` to False.
2356
2357 Args:
2358 repo_id (`str`):
2359 The ID of the repository to push to in the following format: `<user>/<dataset_name>` or
2360 `<org>/<dataset_name>`. Also accepts `<dataset_name>`, which will default to the namespace
2361 of the logged-in user.
2362
2363 It could also be a location inside a bucket, e.g. `buckets/<user_or_org>/<bucket_name>/...`
2364 config_name (`str`):
2365 Configuration name of a dataset. Defaults to "default".
2366 set_default (`bool`, *optional*):
2367 Whether to set this configuration as the default one. Otherwise, the default configuration is the one
2368 named "default".
2369 data_dir (`str`, *optional*):
2370 Directory name that will contain the uploaded data files. Defaults to the `config_name` if different
2371 from "default", else "data".
2372
2373 <Added version="2.17.0"/>
2374 commit_message (`str`, *optional*):
2375 Message to commit while pushing. Will default to `"Upload dataset"`.
2376 commit_description (`str`, *optional*):
2377 Description of the commit that will be created.
2378 Additionally, description of the PR if a PR is created (`create_pr` is True).
2379
2380 <Added version="2.16.0"/>
2381 private (`bool`, *optional*):
2382 Whether to make the repo private. If `None` (default), the repo will be public unless the
2383 organization&#x27;s default is private. This value is ignored if the repo already exists.
2384 token (`str`, *optional*):
2385 An optional authentication token for the Hugging Face Hub. If no token is passed, will default
2386 to the token saved locally when logging in with `huggingface-cli login`. Will raise an error
2387 if no token is passed and the user is not logged-in.
2388 revision (`str`, *optional*):

Calls 8

_check_values_typeMethod · 0.95
fromkeysMethod · 0.80
splitMethod · 0.80
repo_infoMethod · 0.80
create_branchMethod · 0.80
_push_to_bucketFunction · 0.70
_push_to_repoFunction · 0.70