MCPcopy
hub / github.com/unclecode/crawl4ai / RegexChunking

Class RegexChunking

crawl4ai/chunking_strategy.py:19–32  ·  view source on GitHub ↗

Source from the content-addressed store, hash-verified

17
18# Regex-based chunking
19class RegexChunking(ChunkingStrategy):
20 def __init__(self, patterns=None, **kwargs):
21 if patterns is None:
22 patterns = [r'\n\n'] # Default split pattern
23 self.patterns = patterns
24
25 def chunk(self, text: str) -> list:
26 paragraphs = [text]
27 for pattern in self.patterns:
28 new_paragraphs = []
29 for paragraph in paragraphs:
30 new_paragraphs.extend(re.split(pattern, paragraph))
31 paragraphs = new_paragraphs
32 return paragraphs
33
34# NLP-based sentence chunking
35class NlpSentenceChunking(ChunkingStrategy):

Callers 13

test_regex_chunkingFunction · 0.90
fetch_pageMethod · 0.85
run_oldMethod · 0.85
fetch_pagesMethod · 0.85
runMethod · 0.85
arunMethod · 0.85
arun_manyMethod · 0.85
fetch_pageMethod · 0.85
fetch_pagesMethod · 0.85
runMethod · 0.85

Calls

no outgoing calls

Tested by 3

test_regex_chunkingFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…