MCPcopy
hub / github.com/llmware-ai/llmware / add_files

Method add_files

llmware/library.py:516–625  ·  view source on GitHub ↗

Main method to integrate documents into a Library - pass a local filepath folder and all files will be routed to appropriate parser by file type extension. Parameters ---------- input_folder_path : str, default=None The path

(self, input_folder_path=None, encoding="utf-8",chunk_size=400,
                   get_images=True,get_tables=True, smart_chunking=1, max_chunk_size=600,
                   table_grid=True, get_header_text=True, table_strategy=1, strip_header=False,
                   verbose_level=2, copy_files_to_library=True, set_custom_logging=-1,
                   use_logging_file=False)

Source from the content-addressed store, hash-verified

514 return self.add_files()
515
516 def add_files (self, input_folder_path=None, encoding="utf-8",chunk_size=400,
517 get_images=True,get_tables=True, smart_chunking=1, max_chunk_size=600,
518 table_grid=True, get_header_text=True, table_strategy=1, strip_header=False,
519 verbose_level=2, copy_files_to_library=True, set_custom_logging=-1,
520 use_logging_file=False):
521
522 """Main method to integrate documents into a Library - pass a local filepath folder and all files will be
523 routed to appropriate parser by file type extension.
524
525 Parameters
526 ----------
527 input_folder_path : str, default=None
528 The path to the folder containing files to be ingested. If not provided, defaults to None.
529
530 encoding : str, default="utf-8"
531 The encoding to use for reading files.
532
533 chunk_size : int, default=400
534 The size of text chunks to create during parsing.
535
536 get_images : bool, default=True
537 Whether to extract images from the documents.
538
539 get_tables : bool, default=True
540 Whether to extract tables from the documents.
541
542 smart_chunking : int, default=1
543 The strategy for smart chunking of text.
544
545 max_chunk_size : int, default=600
546 The maximum size of text chunks.
547
548 table_grid : bool, default=True
549 Whether to use a grid for tables.
550
551 get_header_text : bool, default=True
552 Whether to extract header text from the documents.
553
554 table_strategy : int, default=1
555 The strategy to use for table extraction.
556
557 strip_header : bool, default=False
558 Whether to strip headers from the documents.
559
560 verbose_level : int, default=2
561 The level of verbosity for logging.
562
563 copy_files_to_library : bool, default=True
564 Whether to copy the files to the library.
565
566 set_custom_logging : int, default=-1, will apply a custom logging level between 0-50 for the
567 parsing job.
568
569 use_logging_file : bool, default=False
570 Whether parse should log to stdout (default) or to file (set to True)
571
572 Returns
573 -------

Callers 15

add_fileMethod · 0.95
embeddings_fast_startFunction · 0.80
build_libFunction · 0.80
build_libFunction · 0.80
ragFunction · 0.80
set_up_libraryFunction · 0.80
setup_libraryFunction · 0.80
build_libFunction · 0.80
build_libFunction · 0.80
embeddings_lancedbFunction · 0.80

Calls 6

get_library_cardMethod · 0.95
ParserClass · 0.90
CollectionWriterClass · 0.90
get_input_pathMethod · 0.80
ingestMethod · 0.80
build_text_indexMethod · 0.45