MCPcopy
hub / github.com/ArchiveBox/ArchiveBox / save_pdf

Function save_pdf

archivebox/extractors/pdf.py:35–70  ·  view source on GitHub ↗

print PDF of site to file using chrome --headless

(link: Link, out_dir: Optional[Path]=None, timeout: int=TIMEOUT)

Source from the content-addressed store, hash-verified

33
34@enforce_types
35def save_pdf(link: Link, out_dir: Optional[Path]=None, timeout: int=TIMEOUT) -> ArchiveResult:
36 """print PDF of site to file using chrome --headless"""
37
38 out_dir = out_dir or Path(link.link_dir)
39 output: ArchiveOutput = 'output.pdf'
40 cmd = [
41 *chrome_args(),
42 '--print-to-pdf',
43 link.url,
44 ]
45 status = 'succeeded'
46 timer = TimedProgress(timeout, prefix=' ')
47 try:
48 result = run(cmd, cwd=str(out_dir), timeout=timeout)
49
50 if result.returncode:
51 hints = (result.stderr or result.stdout).decode()
52 raise ArchiveError('Failed to save PDF', hints)
53
54 chmod_file('output.pdf', cwd=str(out_dir))
55 except Exception as err:
56 status = 'failed'
57 output = err
58 chrome_cleanup()
59 finally:
60 timer.end()
61
62
63 return ArchiveResult(
64 cmd=cmd,
65 pwd=str(out_dir),
66 cmd_version=CHROME_VERSION,
67 output=output,
68 status=status,
69 **timer.stats,
70 )

Callers

nothing calls this directly

Calls 8

endMethod · 0.95
chrome_argsFunction · 0.85
TimedProgressClass · 0.85
ArchiveErrorClass · 0.85
chmod_fileFunction · 0.85
chrome_cleanupFunction · 0.85
runFunction · 0.50
ArchiveResultClass · 0.50

Tested by

no test coverage detected