hub / github.com/microsoft/SkillOpt / extract_json

Function extract_json

skillopt/utils/json_utils.py:93–155 · view source on GitHub ↗

Extract a JSON object from LLM response text. Tries ```json fences first, then bare {...} patterns.

(text: str)

Source from the content-addressed store, hash-verified

91
92
93	def extract_json(text: str) -> dict \| None:
94	"""Extract a JSON object from LLM response text.
95
96	Tries ```json fences first, then bare {...} patterns.
97	"""
98	m = re.search(r"```json\s(.?)```", text, re.DOTALL)
99	if m:
100	try:
101	return json.loads(m.group(1))
102	except json.JSONDecodeError:
103	pass
104	m = re.search(r"\{.*\}", text, re.DOTALL)
105	if m:
106	try:
107	return json.loads(m.group(0))
108	except json.JSONDecodeError:
109	pass
110	# Tolerant fallback for non-OpenAI backends (Claude/Qwen, …) whose free-form
111	# JSON strict json.loads rejects — unescaped ASCII quotes inside CJK string
112	# values, trailing commas, etc. Repair so the analyst's edits aren't silently
113	# dropped, but ONLY a single unambiguous object: never feed the greedy `{.*}`
114	# span or the raw text, or json_repair would quietly return one of several
115	# objects (empirically the wrong/last one) — strictly worse than None, which
116	# the caller can detect and retry/skip.
117	#
118	# Pick the candidate FIRST, before importing json_repair, so the optional
119	# dependency only matters (and only warns) when there is genuinely a single
120	# malformed object we could have repaired. Ordinary no-JSON / prose replies
121	# have no candidate and return None silently.
122	candidate = None
123	fenced = re.search(r"```json\s(.?)```", text, re.DOTALL)
124	if fenced and len(_top_level_brace_objects(fenced.group(1))) == 1:
125	candidate = fenced.group(1)
126	else:
127	objs = _top_level_brace_objects(text)
128	if len(objs) == 1:
129	candidate = objs[0]
130	# 0 or >1 top-level objects → too ambiguous to repair safely → None
131	if not candidate:
132	return None
133	# Final guard: only repair spans that actually look like an intended JSON
134	# object. Prose pseudo-objects in single quotes / backticks / bare text
135	# (e.g. `{op: delete}`) reach here because the scan only skips double-quoted
136	# prose; repairing them would fabricate a wrong dict (worse than None).
137	if not _looks_json_like(candidate):
138	return None
139	try:
140	from json_repair import repair_json
141	except ModuleNotFoundError:
142	warnings.warn(
143	"json_repair not installed; malformed-JSON recovery disabled — "
144	"a non-OpenAI analyst edit may be silently dropped. pip install json_repair",
145	RuntimeWarning,
146	stacklevel=2,
147	)
148	return None
149	try:
150	repaired = repair_json(candidate, return_objects=True)

Callers 15

run_error_analyst_minibatchFunction · 0.90

run_success_analyst_minibatchFunction · 0.90

_merge_batchFunction · 0.90

merge_patchesFunction · 0.90

run_meta_skillFunction · 0.90

consolidate_appendix_notesFunction · 0.90

rewrite_skill_from_suggestionsFunction · 0.90

rank_and_selectFunction · 0.90

decide_autonomous_learning_rateFunction · 0.90

run_slow_updateFunction · 0.90

test_code_fence_jsonMethod · 0.90

test_bare_json_objectMethod · 0.90

Calls 2

_top_level_brace_objectsFunction · 0.85

_looks_json_likeFunction · 0.85

Tested by 15

test_code_fence_jsonMethod · 0.72

test_bare_json_objectMethod · 0.72

test_code_fence_takes_precedenceMethod · 0.72

test_broken_fence_falls_back_to_bareMethod · 0.72

test_nested_jsonMethod · 0.72

test_no_json_returns_noneMethod · 0.72

test_empty_string_returns_noneMethod · 0.72

test_malformed_json_returns_noneMethod · 0.72

test_empty_json_objectMethod · 0.72

test_json_with_escaped_charsMethod · 0.72

test_only_fence_with_no_json_syntaxMethod · 0.72

test_prose_pseudo_json_returns_noneMethod · 0.72