Saves the model to the specified path or folder. When saving the model, make sure to also keep track of the versions of dependencies and Python used. Loading and saving the model should be done using the same dependencies and Python. Moreover, models saved in one ver
(
self,
path,
serialization: Literal["safetensors", "pickle", "pytorch"] = "pickle",
save_embedding_model: Union[bool, str] = True,
save_ctfidf: bool = False,
)
| 3414 | ) |
| 3415 | |
| 3416 | def save( |
| 3417 | self, |
| 3418 | path, |
| 3419 | serialization: Literal["safetensors", "pickle", "pytorch"] = "pickle", |
| 3420 | save_embedding_model: Union[bool, str] = True, |
| 3421 | save_ctfidf: bool = False, |
| 3422 | ): |
| 3423 | """Saves the model to the specified path or folder. |
| 3424 | |
| 3425 | When saving the model, make sure to also keep track of the versions |
| 3426 | of dependencies and Python used. Loading and saving the model should |
| 3427 | be done using the same dependencies and Python. Moreover, models |
| 3428 | saved in one version of BERTopic should not be loaded in other versions. |
| 3429 | |
| 3430 | Arguments: |
| 3431 | path: If `serialization` is 'safetensors' or `pytorch`, this is a directory. |
| 3432 | If `serialization` is `pickle`, then this is a file. |
| 3433 | serialization: If `pickle`, the entire model will be pickled. If `safetensors` |
| 3434 | or `pytorch` the model will be saved without the embedding, |
| 3435 | dimensionality reduction, and clustering algorithms. |
| 3436 | This is a very efficient format and typically advised. |
| 3437 | save_embedding_model: If serialization is `pickle`, then you can choose to skip |
| 3438 | saving the embedding model. If serialization is `safetensors` |
| 3439 | or `pytorch`, this variable can be used as a string pointing |
| 3440 | towards a huggingface model. |
| 3441 | save_ctfidf: Whether to save c-TF-IDF information if serialization is `safetensors` |
| 3442 | or `pytorch` |
| 3443 | |
| 3444 | Examples: |
| 3445 | To save the model in an efficient and safe format (safetensors) with c-TF-IDF information: |
| 3446 | |
| 3447 | ```python |
| 3448 | topic_model.save("model_dir", serialization="safetensors", save_ctfidf=True) |
| 3449 | ``` |
| 3450 | |
| 3451 | If you wish to also add a pointer to the embedding model, which will be downloaded from |
| 3452 | HuggingFace upon loading: |
| 3453 | |
| 3454 | ```python |
| 3455 | embedding_model = "sentence-transformers/all-MiniLM-L6-v2" |
| 3456 | topic_model.save("model_dir", serialization="safetensors", save_embedding_model=embedding_model) |
| 3457 | ``` |
| 3458 | |
| 3459 | or if you want save the full model with pickle: |
| 3460 | |
| 3461 | ```python |
| 3462 | topic_model.save("my_model") |
| 3463 | ``` |
| 3464 | |
| 3465 | NOTE: Pickle can run arbitrary code and is generally considered to be less safe than |
| 3466 | safetensors. |
| 3467 | """ |
| 3468 | if serialization == "pickle": |
| 3469 | logger.warning( |
| 3470 | "When you use `pickle` to save/load a BERTopic model," |
| 3471 | "please make sure that the environments in which you save" |
| 3472 | "and load the model are **exactly** the same. The version of BERTopic," |
| 3473 | "its dependencies, and python need to remain the same." |