Function tokenizer_image_token

llava/mm_utils.py:185–204 · view source on GitHub ↗

(prompt, tokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)

Source from the content-addressed store, hash-verified

183
184
185	def tokenizer_image_token(prompt, tokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None):
186	prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split('<image>')]
187
188	def insert_separator(X, sep):
189	return [ele for sublist in zip(X, [sep]*len(X)) for ele in sublist][:-1]
190
191	input_ids = []
192	offset = 0
193	if len(prompt_chunks) > 0 and len(prompt_chunks[0]) > 0 and prompt_chunks[0][0] == tokenizer.bos_token_id:
194	offset = 1
195	input_ids.append(prompt_chunks[0][0])
196
197	for x in insert_separator(prompt_chunks, [image_token_index] * (offset + 1)):
198	input_ids.extend(x[offset:])
199
200	if return_tensors is not None:
201	if return_tensors == 'pt':
202	return torch.tensor(input_ids, dtype=torch.long)
203	raise ValueError(f'Unsupported tensor type: {return_tensors}')
204	return input_ids
205
206
207	def get_model_name_from_path(model_path):

generate_streamMethod · 0.90

mainFunction · 0.90

__getitem__Method · 0.90

eval_modelFunction · 0.90

preprocess_llama_2Function · 0.90

preprocess_v1Function · 0.90

preprocess_mptFunction · 0.90

preprocess_plainFunction · 0.90

get_tokenize_lenFunction · 0.90

insert_separatorFunction · 0.85

no test coverage detected