MCPcopy
hub / github.com/langroid/langroid / test_text_token_chunking

Function test_text_token_chunking

tests/main/test_parser.py:79–97  ·  view source on GitHub ↗
(
    chunk_size: int, max_chunks: int, min_chunk_chars: int, discard_chunk_chars: int
)

Source from the content-addressed store, hash-verified

77 ],
78)
79def test_text_token_chunking(
80 chunk_size: int, max_chunks: int, min_chunk_chars: int, discard_chunk_chars: int
81):
82 cfg = ParsingConfig(
83 chunk_size=chunk_size,
84 max_chunks=max_chunks,
85 min_chunk_chars=min_chunk_chars,
86 discard_chunk_chars=discard_chunk_chars,
87 token_encoding_model="text-embedding-3-small",
88 )
89
90 parser = Parser(cfg)
91
92 text = generate_random_text(60)
93 chunks = parser.chunk_tokens(text)
94
95 assert len(chunks) <= max_chunks
96 assert all(len(c) >= discard_chunk_chars for c in chunks)
97 assert all(parser.num_tokens(c) <= chunk_size + 5 for c in chunks)
98
99
100def test_extract_content():

Callers

nothing calls this directly

Calls 5

chunk_tokensMethod · 0.95
num_tokensMethod · 0.95
ParsingConfigClass · 0.90
ParserClass · 0.90
generate_random_textFunction · 0.90

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…