Main method to integrate documents into a Library - pass a local filepath folder and all files will be routed to appropriate parser by file type extension. Parameters ---------- input_folder_path : str, default=None The path
(self, input_folder_path=None, encoding="utf-8",chunk_size=400,
get_images=True,get_tables=True, smart_chunking=1, max_chunk_size=600,
table_grid=True, get_header_text=True, table_strategy=1, strip_header=False,
verbose_level=2, copy_files_to_library=True, set_custom_logging=-1,
use_logging_file=False)
| 514 | return self.add_files() |
| 515 | |
| 516 | def add_files (self, input_folder_path=None, encoding="utf-8",chunk_size=400, |
| 517 | get_images=True,get_tables=True, smart_chunking=1, max_chunk_size=600, |
| 518 | table_grid=True, get_header_text=True, table_strategy=1, strip_header=False, |
| 519 | verbose_level=2, copy_files_to_library=True, set_custom_logging=-1, |
| 520 | use_logging_file=False): |
| 521 | |
| 522 | """Main method to integrate documents into a Library - pass a local filepath folder and all files will be |
| 523 | routed to appropriate parser by file type extension. |
| 524 | |
| 525 | Parameters |
| 526 | ---------- |
| 527 | input_folder_path : str, default=None |
| 528 | The path to the folder containing files to be ingested. If not provided, defaults to None. |
| 529 | |
| 530 | encoding : str, default="utf-8" |
| 531 | The encoding to use for reading files. |
| 532 | |
| 533 | chunk_size : int, default=400 |
| 534 | The size of text chunks to create during parsing. |
| 535 | |
| 536 | get_images : bool, default=True |
| 537 | Whether to extract images from the documents. |
| 538 | |
| 539 | get_tables : bool, default=True |
| 540 | Whether to extract tables from the documents. |
| 541 | |
| 542 | smart_chunking : int, default=1 |
| 543 | The strategy for smart chunking of text. |
| 544 | |
| 545 | max_chunk_size : int, default=600 |
| 546 | The maximum size of text chunks. |
| 547 | |
| 548 | table_grid : bool, default=True |
| 549 | Whether to use a grid for tables. |
| 550 | |
| 551 | get_header_text : bool, default=True |
| 552 | Whether to extract header text from the documents. |
| 553 | |
| 554 | table_strategy : int, default=1 |
| 555 | The strategy to use for table extraction. |
| 556 | |
| 557 | strip_header : bool, default=False |
| 558 | Whether to strip headers from the documents. |
| 559 | |
| 560 | verbose_level : int, default=2 |
| 561 | The level of verbosity for logging. |
| 562 | |
| 563 | copy_files_to_library : bool, default=True |
| 564 | Whether to copy the files to the library. |
| 565 | |
| 566 | set_custom_logging : int, default=-1, will apply a custom logging level between 0-50 for the |
| 567 | parsing job. |
| 568 | |
| 569 | use_logging_file : bool, default=False |
| 570 | Whether parse should log to stdout (default) or to file (set to True) |
| 571 | |
| 572 | Returns |
| 573 | ------- |