MCPcopy
hub / github.com/llmware-ai/llmware / Setup

Class Setup

llmware/setup.py:36–179  ·  view source on GitHub ↗

Implements the download of sample files from an AWS S3 bucket. ``Setup`` implements the download of sample files from an AWS S3 bucket. Currently, there are samples from eight domains: - AgreementsLarge (~80 sample contracts) - Agreements (~15 sample employment agreements) - UN

Source from the content-addressed store, hash-verified

34
35
36class Setup:
37
38 """Implements the download of sample files from an AWS S3 bucket.
39
40 ``Setup`` implements the download of sample files from an AWS S3 bucket. Currently, there are samples
41 from eight domains:
42
43 - AgreementsLarge (~80 sample contracts)
44 - Agreements (~15 sample employment agreements)
45 - UN-Resolutions-500 (500 United Nations Resolutions over ~2 years)
46 - Invoices (~40 invoice sample documents)
47 - FinDocs (~15 financial annual reports, earnings and 10Ks)
48 - AWS-Transcribe (~5 AWS-transcribe JSON files)
49 - SmallLibrary (~10 mixed document types for quick testing)
50 - Images (~3 images for OCR processing)
51
52 The sample files are updated continously. By calling ``Setup().load_sample_files(over_write=True)``
53 you will get the newest version of the sample files.
54
55 The sample files were prepared by LLMWare from public domain materials, or invented bespoke.
56 If you have any concerns about Personally Identifiable Information (PII), or the suitability of any material
57 we included, please contact us, e.g. either by raising an issue on GitHub or sending an E-Mail.
58 We reserve the right to withdraw documents at any time.
59
60 Examples
61 ----------
62 >>> import os
63 >>> from llmware.setup import Setup
64 >>> sample_files_path = Setup().load_sample_files()
65 >>> sample_files_path
66 '/home/user/llmware_data/sample_files'
67 >>> os.listdir(sample_files_path)
68 ['AWS-Transcribe', '.DS_Store', 'SmallLibrary', 'UN-Resolutions-500', 'Invoices', 'Images', 'AgreementsLarge', 'Agreements', 'FinDocs']
69
70 If you have called the function before but want to get the newest updates to the sample files, or you simply
71 want to get the newest sample files, you simply set ``over_write=True``.
72 >>> sample_files_path = Setup().load_sample_files(over_write=True)
73 """
74 @staticmethod
75 def load_sample_files(over_write=False):
76
77 """ Downloads sample document files from non-restricted AWS S3 bucket. """
78
79 if not os.path.exists(LLMWareConfig.get_llmware_path()):
80 LLMWareConfig.setup_llmware_workspace()
81
82 # not configurable - will pull into /sample_files under llmware_path
83 sample_files_path = os.path.join(LLMWareConfig.get_llmware_path(), "sample_files")
84
85 if not os.path.exists(sample_files_path):
86 os.makedirs(sample_files_path,exist_ok=True)
87 else:
88 if not over_write:
89 logger.info(f"Setup - sample_files path already exists - {sample_files_path}")
90 return sample_files_path
91
92 # pull from sample files bucket
93 logger.info(f"Setup - sample_files - downloading requested sample files from AWS S3 bucket - may take a minute.")

Callers 15

embeddings_fast_startFunction · 0.90
build_libFunction · 0.90
build_libFunction · 0.90
ragFunction · 0.90
set_up_agreementsFunction · 0.90
setup_libraryFunction · 0.90
build_libFunction · 0.90
build_libFunction · 0.90
embeddings_lancedbFunction · 0.90

Calls

no outgoing calls