hub / github.com/ArtifexSoftware/pdf2docx / table

Method table

pdf2docx/main.py:82–108 · view source on GitHub ↗

Extract table content from pdf pages. Args: pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, option

(pdf_file, password:str=None, start:int=0, end:int=None, pages:list=None, **kwargs)

Source from the content-addressed store, hash-verified

80
81	@staticmethod
82	def table(pdf_file, password:str=None, start:int=0, end:int=None, pages:list=None, **kwargs):
83	'''Extract table content from pdf pages.
84
85	Args:
86	pdf_file (str) : PDF filename to read from.
87	password (str): Password for encrypted pdf. Default to None if not encrypted.
88	start (int, optional): First page to process. Defaults to 0.
89	end (int, optional): Last page to process. Defaults to None.
90	pages (list, optional): Range of pages, e.g. --pages=1,3,5. Defaults to None.
91	'''
92	# index starts from zero or one
93	if isinstance(pages, int): pages = [pages] # in case --pages=1
94	if not kwargs.get('zero_based_index', True):
95	start = max(start-1, 0)
96	if end: end -= 1
97	if pages: pages = [i-1 for i in pages]
98
99	cv = Converter(pdf_file, password)
100	try:
101	tables = cv.extract_tables(start, end, pages, **kwargs)
102	except Exception as e:
103	tables = []
104	logging.error(e)
105	finally:
106	cv.close()
107
108	return tables
109
110
111	@staticmethod

Callers

nothing calls this directly

Calls 4

extract_tablesMethod · 0.95

closeMethod · 0.95

ConverterClass · 0.85

getMethod · 0.80

Tested by

no test coverage detected