MCPcopy Index your code
hub / github.com/unclecode/crawl4ai / split_and_parse_json_objects

Function split_and_parse_json_objects

crawl4ai/utils.py:88–130  ·  view source on GitHub ↗

Splits a JSON string which is a list of objects and tries to parse each object. Parameters: json_string (str): A string representation of a list of JSON objects, e.g., '[{...}, {...}, ...]'. Returns: tuple: A tuple containing two lists: - First list contains al

(json_string)

Source from the content-addressed store, hash-verified

86 return pretty_html
87
88def split_and_parse_json_objects(json_string):
89 """
90 Splits a JSON string which is a list of objects and tries to parse each object.
91
92 Parameters:
93 json_string (str): A string representation of a list of JSON objects, e.g., '[{...}, {...}, ...]'.
94
95 Returns:
96 tuple: A tuple containing two lists:
97 - First list contains all successfully parsed JSON objects.
98 - Second list contains the string representations of all segments that couldn't be parsed.
99 """
100 # Trim the leading '[' and trailing ']'
101 if json_string.startswith('[') and json_string.endswith(']'):
102 json_string = json_string[1:-1].strip()
103
104 # Split the string into segments that look like individual JSON objects
105 segments = []
106 depth = 0
107 start_index = 0
108
109 for i, char in enumerate(json_string):
110 if char == '{':
111 if depth == 0:
112 start_index = i
113 depth += 1
114 elif char == '}':
115 depth -= 1
116 if depth == 0:
117 segments.append(json_string[start_index:i+1])
118
119 # Try parsing each segment
120 parsed_objects = []
121 unparsed_segments = []
122
123 for segment in segments:
124 try:
125 obj = json.loads(segment)
126 parsed_objects.append(obj)
127 except json.JSONDecodeError:
128 unparsed_segments.append(segment)
129
130 return parsed_objects, unparsed_segments
131
132def sanitize_html(html):
133 # Replace all unwanted and special characters with an empty string

Callers 2

extract_blocksFunction · 0.85
extractMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…