MCPcopy
hub / github.com/facebookresearch/MetaCLIP / encode

Method encode

src/mini_clip/tokenizer.py:137–143  ·  view source on GitHub ↗
(self, text)

Source from the content-addressed store, hash-verified

135 return word
136
137 def encode(self, text):
138 bpe_tokens = []
139 text = whitespace_clean(basic_clean(text)).lower()
140 for token in re.findall(self.pat, text):
141 token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))
142 bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
143 return bpe_tokens
144
145 def decode(self, tokens):
146 text = ''.join([self.decoder[token] for token in tokens])

Callers 5

wiki_bigramsFunction · 0.80
tokenizeFunction · 0.80
__iter__Method · 0.80
__getitem__Method · 0.80
__iter__Method · 0.80

Calls 3

bpeMethod · 0.95
whitespace_cleanFunction · 0.85
basic_cleanFunction · 0.85

Tested by

no test coverage detected