Function tokenize_fn

examples/discrete_diffusion/train_llada2.py:105–108 · view source on GitHub ↗

(examples: Dict, tokenizer, text_column: str, max_length: int)

Source from the content-addressed store, hash-verified

103
104
105	def tokenize_fn(examples: Dict, tokenizer, text_column: str, max_length: int):
106	texts = examples[text_column]
107	texts = [t for t in texts if isinstance(t, str) and len(t.strip()) > 0]
108	return tokenizer(texts, truncation=True, padding=False, max_length=max_length)
109
110
111	class RandomTokenDataset(torch.utils.data.Dataset):

mainFunction · 0.85

no outgoing calls

no test coverage detected

searching dependent graphs…