MCPcopy
hub / github.com/jaymody/picoGPT / get_pairs

Function get_pairs

encoder.py:35–44  ·  view source on GitHub ↗

Return set of symbol pairs in a word. Word is represented as tuple of symbols (symbols being variable-length strings).

(word)

Source from the content-addressed store, hash-verified

33
34
35def get_pairs(word):
36 """Return set of symbol pairs in a word.
37 Word is represented as tuple of symbols (symbols being variable-length strings).
38 """
39 pairs = set()
40 prev_char = word[0]
41 for char in word[1:]:
42 pairs.add((prev_char, char))
43 prev_char = char
44 return pairs
45
46
47class Encoder:

Callers 1

bpeMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected