MCPcopy
hub / github.com/potamides/DeTikZify / preprocess

Function preprocess

examples/pretrain.py:17–28  ·  view source on GitHub ↗

Concatenate captions and OCR tokens.

(batch, size)

Source from the content-addressed store, hash-verified

15
16@batchify
17def preprocess(batch, size):
18 """Concatenate captions and OCR tokens."""
19 for caption_images in chain.from_iterable(batch['caption_images']):
20 caption = caption_images['caption']
21 for cil_pair in caption_images['cil_pairs']:
22 sub_caption = cil_pair['sub_caption']
23 ocr = " ".join(cil_pair['image_ocr'])
24 if text:=" ".join(filter(None, [caption, sub_caption, ocr])):
25 yield dict(
26 text=text,
27 image=convert(expand(cil_pair['image'], size, do_trim=True), "png")
28 )
29
30def parse_args():
31 argument_parser = ArgumentParser(

Callers

nothing calls this directly

Calls 2

convertFunction · 0.90
expandFunction · 0.90

Tested by

no test coverage detected