Parse base64 image with the LLM. `model` will be set to llm's default if not provided. If llm's `model` is also not set, ``OpenAI`` ``gpt-4o-mini`` will be used. Args: b_64_img: Image in base64 format to be parsed. See `img_to_b64` for the conversion utility. llm: LLM i
(
b_64_img,
llm: pw.UDF,
prompt: str,
model: str | None = None,
**kwargs,
)
| 178 | |
| 179 | |
| 180 | async def parse_image( |
| 181 | b_64_img, |
| 182 | llm: pw.UDF, |
| 183 | prompt: str, |
| 184 | model: str | None = None, |
| 185 | **kwargs, |
| 186 | ) -> str: |
| 187 | """ |
| 188 | Parse base64 image with the LLM. `model` will be set to llm's default if not provided. |
| 189 | If llm's `model` is also not set, ``OpenAI`` ``gpt-4o-mini`` will be used. |
| 190 | |
| 191 | Args: |
| 192 | b_64_img: Image in base64 format to be parsed. See `img_to_b64` for the conversion utility. |
| 193 | llm: LLM instance to be called with image. |
| 194 | prompt: Instructions for image parsing. |
| 195 | model: Optional LLM model name. Defaults to ``OpenAI`` ``gpt-4o-mini``, |
| 196 | if neither `model` nor `llm.model` is set. |
| 197 | kwargs: Additional arguments to be sent to the LLM inference. |
| 198 | Refer to the specific provider's API for available options. |
| 199 | Examples include `temperature`, `max_tokens`, etc. |
| 200 | """ |
| 201 | model = model or llm.kwargs.get("model") or DEFAULT_VISION_MODEL # type:ignore |
| 202 | |
| 203 | content = [ |
| 204 | {"type": "text", "text": prompt}, |
| 205 | { |
| 206 | "type": "image_url", |
| 207 | "image_url": {"url": f"data:image/png;base64,{b_64_img}"}, |
| 208 | }, |
| 209 | ] |
| 210 | |
| 211 | messages = [ |
| 212 | { |
| 213 | "role": "user", |
| 214 | "content": content, |
| 215 | } |
| 216 | ] |
| 217 | |
| 218 | logger.info(f"Parsing table, model: {model}\nmessages: {str(content)[:350]}...") |
| 219 | |
| 220 | response = await coerce_async(llm.func)(model=model, messages=messages, **kwargs) |
| 221 | |
| 222 | logger.info(f"Parsed table, model: {model}\nmessages: {str(response)}...") |
| 223 | |
| 224 | return response |
| 225 | |
| 226 | |
| 227 | async def parse_image_details( |
nothing calls this directly
no test coverage detected