MCPcopy Index your code
hub / github.com/clips/pattern / process_pdf

Function process_pdf

pattern/web/pdf/pdfinterp.py:812–834  ·  view source on GitHub ↗
(rsrcmgr, device, fp, pagenos=None, maxpages=0, password='',
                caching=True, check_extractable=True)

Source from the content-addressed store, hash-verified

810class PDFTextExtractionNotAllowed(PDFInterpreterError): pass
811
812def process_pdf(rsrcmgr, device, fp, pagenos=None, maxpages=0, password='',
813 caching=True, check_extractable=True):
814 # Create a PDF parser object associated with the file object.
815 parser = PDFParser(fp)
816 # Create a PDF document object that stores the document structure.
817 doc = PDFDocument(caching=caching)
818 # Connect the parser and document objects.
819 parser.set_document(doc)
820 doc.set_parser(parser)
821 # Supply the document password for initialization.
822 # (If no password is set, give an empty string.)
823 doc.initialize(password)
824 # Check if the document allows text extraction. If not, abort.
825 if check_extractable and not doc.is_extractable:
826 raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp)
827 # Create a PDF interpreter object.
828 interpreter = PDFPageInterpreter(rsrcmgr, device)
829 # Process each page contained in the document.
830 for (pageno,page) in enumerate(doc.get_pages()):
831 if pagenos and (pageno not in pagenos): continue
832 interpreter.process_page(page)
833 if maxpages and maxpages <= pageno+1: break
834 return

Callers 1

_parseMethod · 0.90

Calls 9

set_documentMethod · 0.95
set_parserMethod · 0.95
initializeMethod · 0.95
get_pagesMethod · 0.95
process_pageMethod · 0.95
PDFParserClass · 0.90
PDFDocumentClass · 0.90
PDFPageInterpreterClass · 0.85

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…