MCPcopy
hub / github.com/bhaskatripathi/pdfGPT / text_to_chunks

Function text_to_chunks

api.py:48–65  ·  view source on GitHub ↗
(texts, word_length=150, start_page=1)

Source from the content-addressed store, hash-verified

46
47
48def text_to_chunks(texts, word_length=150, start_page=1):
49 text_toks = [t.split(' ') for t in texts]
50 chunks = []
51
52 for idx, words in enumerate(text_toks):
53 for i in range(0, len(words), word_length):
54 chunk = words[i : i + word_length]
55 if (
56 (i + word_length) > len(words)
57 and (len(chunk) < word_length)
58 and (len(text_toks) != (idx + 1))
59 ):
60 text_toks[idx + 1] = chunk + text_toks[idx + 1]
61 continue
62 chunk = ' '.join(chunk).strip()
63 chunk = f'[Page no. {idx+start_page}]' + ' ' + '"' + chunk + '"'
64 chunks.append(chunk)
65 return chunks
66
67
68class SemanticSearch:

Callers 1

load_recommenderFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected