MCPcopy
hub / github.com/langroid/langroid / test_get_pdf_doc_path

Function test_get_pdf_doc_path

tests/test_pdf_parser_extra.py:40–57  ·  view source on GitHub ↗
(pdflib: str)

Source from the content-addressed store, hash-verified

38
39@pytest.mark.parametrize("pdflib", ["unstructured"])
40def test_get_pdf_doc_path(pdflib: str):
41 current_dir = os.path.dirname(os.path.abspath(__file__))
42 tests_root = os.path.abspath(os.path.join(current_dir, ".."))
43 path = os.path.join(tests_root, "main", "data", "dummy.pdf")
44 pdf_parser = DocumentParser.create(
45 path, ParsingConfig(pdf=PdfParsingConfig(library=pdflib))
46 )
47 doc = pdf_parser.get_doc()
48
49 # Check the results
50 assert isinstance(doc.content, str)
51 assert len(doc.content) > 0 # assuming the PDF is not empty
52 assert doc.metadata.source == path
53
54 docs = pdf_parser.get_doc_chunks()
55 assert len(docs) > 0
56 assert all(d.metadata.is_chunk for d in docs)
57 assert all(path in d.metadata.source for d in docs)

Callers

nothing calls this directly

Calls 5

ParsingConfigClass · 0.90
PdfParsingConfigClass · 0.90
get_docMethod · 0.80
get_doc_chunksMethod · 0.80
createMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…