hub / github.com/modelscope/FunASR / build_model

Method build_model

funasr/auto/auto_model.py:281–428 · view source on GitHub ↗

Download model from hub, build all components, and load pretrained weights. This method handles the full model construction pipeline: 1. Download model files from ModelScope/HuggingFace (if not local) 2. Parse config.yaml to determine model class, tokenizer, frontend

(**kwargs)

Source from the content-addressed store, hash-verified

279
280	@staticmethod
281	def build_model(**kwargs):
282	"""Download model from hub, build all components, and load pretrained weights.
283
284	This method handles the full model construction pipeline:
285	1. Download model files from ModelScope/HuggingFace (if not local)
286	2. Parse config.yaml to determine model class, tokenizer, frontend
287	3. Instantiate tokenizer, frontend, and model via the registry
288	4. Load pretrained weights from model.pt
289
290	Args:
291	**kwargs: Must include 'model' (str). All other config.yaml fields can be overridden.
292
293	Returns:
294	tuple: (model, kwargs) where model is the instantiated nn.Module and
295	kwargs contains the resolved configuration.
296	"""
297	assert "model" in kwargs
298	if "model_conf" not in kwargs:
299	logging.info("download models from model hub: {}".format(kwargs.get("hub", "ms")))
300	kwargs = download_model(**kwargs)
301
302	set_all_random_seed(kwargs.get("seed", 0))
303
304	device = kwargs.get("device", "cuda")
305	if (
306	(device.startswith("cuda") and not torch.cuda.is_available())
307	or (device.startswith("xpu") and not torch.xpu.is_available())
308	or (device.startswith("mps") and not torch.backends.mps.is_available())
309	or (device.startswith("npu") and not is_npu_available())
310	or kwargs.get("ngpu", 1) == 0
311	):
312	device = "cpu"
313	kwargs["batch_size"] = 1
314	kwargs["device"] = device
315
316	ncpu = _resolve_ncpu(kwargs, 4)
317	kwargs["ncpu"] = ncpu
318	if torch.get_num_threads() != ncpu:
319	torch.set_num_threads(ncpu)
320
321	# build tokenizer
322	tokenizer = kwargs.get("tokenizer", None)
323	kwargs["tokenizer"] = tokenizer
324	kwargs["vocab_size"] = -1
325
326	if tokenizer is not None:
327	tokenizers = (
328	tokenizer.split(",") if isinstance(tokenizer, str) else tokenizer
329	) # type of tokenizers is list!!!
330	tokenizers_conf = kwargs.get("tokenizer_conf", {})
331	tokenizers_build = []
332	vocab_sizes = []
333	token_lists = []
334
335	### === only for kws ===
336	token_list_files = kwargs.get("token_lists", [])
337	seg_dicts = kwargs.get("seg_dicts", [])
338	### === only for kws ===

Callers 5

__init__Method · 0.95

test_unregistered_model_reports_import_failuresMethod · 0.80

from_pretrainedMethod · 0.80

Calls 11

download_modelFunction · 0.90

set_all_random_seedFunction · 0.90

deep_updateFunction · 0.90

load_pretrained_modelFunction · 0.90

is_npu_availableFunction · 0.85

_resolve_ncpuFunction · 0.85

_format_unregistered_component_errorFunction · 0.85

printMethod · 0.80

get_vocab_sizeMethod · 0.45

output_sizeMethod · 0.45

evalMethod · 0.45

Tested by 1

test_unregistered_model_reports_import_failuresMethod · 0.64