Function benchmark_batch

scripts/benchmark.py:15–37 · view source on GitHub ↗

(documents: list[str])

Source from the content-addressed store, hash-verified

13
14
15	def benchmark_batch(documents: list[str]) -> None:
16	num_threads = int(os.environ["RAYON_NUM_THREADS"])
17	num_bytes = sum(map(len, map(str.encode, documents)))
18	print(f"num_threads: {num_threads}, num_bytes: {num_bytes}")
19
20	enc = tiktoken.get_encoding("gpt2")
21	enc.encode("warmup")
22
23	start = time.perf_counter_ns()
24	enc.encode_ordinary_batch(documents, num_threads=num_threads)
25	end = time.perf_counter_ns()
26	print(f"tiktoken \t{num_bytes / (end - start) * 1e9} bytes / s")
27
28	import transformers
29
30	hf_enc = cast(Any, transformers).GPT2TokenizerFast.from_pretrained("gpt2")
31	hf_enc.model_max_length = 1e30 # silence!
32	hf_enc.encode("warmup")
33
34	start = time.perf_counter_ns()
35	hf_enc(documents)
36	end = time.perf_counter_ns()
37	print(f"huggingface \t{num_bytes / (end - start) * 1e9} bytes / s")
38
39

nothing calls this directly

encode_ordinary_batchMethod · 0.80

encodeMethod · 0.45

no test coverage detected

searching dependent graphs…