Function save_pdf

archivebox/extractors/pdf.py:35–70 · view source on GitHub ↗

print PDF of site to file using chrome --headless

(link: Link, out_dir: Optional[Path]=None, timeout: int=TIMEOUT)

Source from the content-addressed store, hash-verified

33
34	@enforce_types
35	def save_pdf(link: Link, out_dir: Optional[Path]=None, timeout: int=TIMEOUT) -> ArchiveResult:
36	"""print PDF of site to file using chrome --headless"""
37
38	out_dir = out_dir or Path(link.link_dir)
39	output: ArchiveOutput = 'output.pdf'
40	cmd = [
41	*chrome_args(),
42	'--print-to-pdf',
43	link.url,
44	]
45	status = 'succeeded'
46	timer = TimedProgress(timeout, prefix=' ')
47	try:
48	result = run(cmd, cwd=str(out_dir), timeout=timeout)
49
50	if result.returncode:
51	hints = (result.stderr or result.stdout).decode()
52	raise ArchiveError('Failed to save PDF', hints)
53
54	chmod_file('output.pdf', cwd=str(out_dir))
55	except Exception as err:
56	status = 'failed'
57	output = err
58	chrome_cleanup()
59	finally:
60	timer.end()
61
62
63	return ArchiveResult(
64	cmd=cmd,
65	pwd=str(out_dir),
66	cmd_version=CHROME_VERSION,
67	output=output,
68	status=status,
69	**timer.stats,
70	)

nothing calls this directly

endMethod · 0.95

chrome_argsFunction · 0.85

TimedProgressClass · 0.85

ArchiveErrorClass · 0.85

chmod_fileFunction · 0.85

chrome_cleanupFunction · 0.85

runFunction · 0.50

ArchiveResultClass · 0.50

no test coverage detected