MCPcopy Index your code
hub / github.com/huggingface/diffusers / tokenize_fn

Function tokenize_fn

examples/discrete_diffusion/train_llada2.py:105–108  ·  view source on GitHub ↗
(examples: Dict, tokenizer, text_column: str, max_length: int)

Source from the content-addressed store, hash-verified

103
104
105def tokenize_fn(examples: Dict, tokenizer, text_column: str, max_length: int):
106 texts = examples[text_column]
107 texts = [t for t in texts if isinstance(t, str) and len(t.strip()) > 0]
108 return tokenizer(texts, truncation=True, padding=False, max_length=max_length)
109
110
111class RandomTokenDataset(torch.utils.data.Dataset):

Callers 1

mainFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…