MCPcopy
hub / github.com/lucidrains/DALLE-pytorch / encode

Method encode

dalle_pytorch/tokenizer.py:119–125  ·  view source on GitHub ↗
(self, text)

Source from the content-addressed store, hash-verified

117 return word
118
119 def encode(self, text):
120 bpe_tokens = []
121 text = whitespace_clean(basic_clean(text)).lower()
122 for token in re.findall(self.pat, text):
123 token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))
124 bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
125 return bpe_tokens
126
127 def decode(self, tokens, remove_start_end = True, pad_tokens = set()):
128 if torch.is_tensor(tokens):

Callers 1

tokenizeMethod · 0.95

Calls 4

bpeMethod · 0.95
whitespace_cleanFunction · 0.85
basic_cleanFunction · 0.85
encodeMethod · 0.45

Tested by

no test coverage detected