MCPcopy
hub / github.com/camelot-dev/camelot / compute_parse_errors

Method compute_parse_errors

camelot/parsers/base.py:178–238  ·  view source on GitHub ↗

Compute parse errors for the table . Parameters ---------- table : camelot.core.Table Returns ------- Tuple Parse errors

(self, table)

Source from the content-addressed store, hash-verified

176 return idx
177
178 def compute_parse_errors(self, table):
179 """Compute parse errors for the table .
180
181 Parameters
182 ----------
183 table : camelot.core.Table
184
185 Returns
186 -------
187 Tuple
188 Parse errors
189 """
190 pos_errors = []
191 # Process textlines from both orientations in a single global
192 # reading-order stream (-y0 top-first, then x0 left-first) rather
193 # than the previous vertical-pass-then-horizontal-pass loop.
194 #
195 # Cell.text is an *appending* setter, so the order textlines are
196 # visited determines the order their fragments concatenate in a
197 # cell. The old "all vertical, then all horizontal" order meant a
198 # glyph that playa happened to classify as a vertical textline
199 # (e.g. a lone single character split off from a word) was
200 # appended *before* the horizontal textlines of the same cell —
201 # floating it to the front of the cell text. Reported as #385
202 # ('d' of 'dihydroclorid' jumping to the start of the cell).
203 #
204 # Sorting both orientations together by reading order places each
205 # textline by its own position, so the cell accumulates
206 # top-to-bottom, left-to-right regardless of orientation tag.
207 textlines = [
208 (t, direction)
209 for direction in ("vertical", "horizontal")
210 for t in self.t_bbox[direction]
211 ]
212 textlines.sort(key=lambda td: (-td[0].y0, td[0].x0))
213 for t, direction in textlines:
214 indices, error = get_table_index(
215 table,
216 t,
217 direction,
218 split_text=self.split_text,
219 flag_size=self.flag_size,
220 strip_text=self.strip_text,
221 )
222 if len(indices) > 0:
223 if indices[0][:2] != (-1, -1):
224 pos_errors.append(error)
225 indices = type(self)._reduce_index(
226 table, indices, shift_text=self.shift_text
227 )
228 for r_idx, c_idx, text in indices:
229 # replace_text (#482) is applied after the
230 # split/strip/flag-size pipeline, at the
231 # last point before the text reaches the
232 # output cell. Order: strip first (already
233 # done upstream in get_table_index), then
234 # replace, then assign.
235 if self.replace_text:

Callers 1

record_parse_metadataMethod · 0.95

Calls 3

get_table_indexFunction · 0.85
text_replaceFunction · 0.85
_reduce_indexMethod · 0.45

Tested by

no test coverage detected