MCPcopy
hub / github.com/openai/tiktoken / test_hyp_offsets

Function test_hyp_offsets

tests/test_offsets.py:31–46  ·  view source on GitHub ↗
(make_enc: Callable[[], tiktoken.Encoding], data)

Source from the content-addressed store, hash-verified

29@hypothesis.given(data=st.data())
30@hypothesis.settings(deadline=None, max_examples=MAX_EXAMPLES)
31def test_hyp_offsets(make_enc: Callable[[], tiktoken.Encoding], data):
32 enc = make_enc()
33
34 tokens_st = st.lists(
35 st.integers(0, enc.n_vocab - 1).filter(
36 lambda x: x in enc._special_tokens.values() or x in enc._mergeable_ranks.values()
37 ),
38 min_size=1,
39 max_size=20,
40 )
41 tokens = data.draw(tokens_st)
42
43 # This is a dumb hack to make sure that our tokens are a valid UTF-8 string
44 # We could potentially drop this, see the TODO in decode_with_offsets
45 tokens = enc.encode(enc.decode(tokens, errors="ignore"), allowed_special="all")
46 assert enc.decode_with_offsets(tokens)[1] == _token_offsets_reference(enc, tokens)
47
48
49def test_basic_offsets():

Callers

nothing calls this directly

Calls 4

_token_offsets_referenceFunction · 0.85
decode_with_offsetsMethod · 0.80
encodeMethod · 0.45
decodeMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…