hub / github.com/explosion/spaCy / train

Function train

spacy/training/loop.py:35–150 · view source on GitHub ↗

Train a pipeline. nlp (Language): The initialized nlp object with the full config. output_path (Optional[Path]): Optional output path to save trained model to. use_gpu (int): Whether to train on GPU. Make sure to call require_gpu before calling this function. stdout (file):

(
    nlp: "Language",
    output_path: Optional[Path] = None,
    *,
    use_gpu: int = -1,
    stdout: IO = sys.stdout,
    stderr: IO = sys.stderr,
)

Source from the content-addressed store, hash-verified

33
34
35	def train(
36	nlp: "Language",
37	output_path: Optional[Path] = None,
38	*,
39	use_gpu: int = -1,
40	stdout: IO = sys.stdout,
41	stderr: IO = sys.stderr,
42	) -> Tuple["Language", Optional[Path]]:
43	"""Train a pipeline.
44
45	nlp (Language): The initialized nlp object with the full config.
46	output_path (Optional[Path]): Optional output path to save trained model to.
47	use_gpu (int): Whether to train on GPU. Make sure to call require_gpu
48	before calling this function.
49	stdout (file): A file-like object to write output messages. To disable
50	printing, set to io.StringIO.
51	stderr (file): A second file-like object to write output messages. To disable
52	printing, set to io.StringIO.
53
54	RETURNS (tuple): The final nlp object and the path to the exported model.
55	"""
56	# We use no_print here so we can respect the stdout/stderr options.
57	msg = Printer(no_print=True)
58	# Create iterator, which yields out info after each optimization step.
59	config = nlp.config.interpolate()
60	if config["training"]["seed"] is not None:
61	fix_random_seed(config["training"]["seed"])
62	allocator = config["training"]["gpu_allocator"]
63	if use_gpu >= 0 and allocator:
64	set_gpu_allocator(allocator)
65	T = registry.resolve(config["training"], schema=ConfigSchemaTraining) # type: ignore[arg-type]
66	dot_names = [T["train_corpus"], T["dev_corpus"]]
67	train_corpus, dev_corpus = resolve_dot_names(config, dot_names)
68	optimizer = T["optimizer"]
69	score_weights = T["score_weights"]
70	batcher = T["batcher"]
71	train_logger = T["logger"]
72	before_to_disk = create_before_to_disk_callback(T["before_to_disk"])
73	before_update = T["before_update"]
74
75	# Helper function to save checkpoints. This is a closure for convenience,
76	# to avoid passing in all the args all the time.
77	def save_checkpoint(is_best):
78	with nlp.use_params(optimizer.averages):
79	before_to_disk(nlp).to_disk(output_path / DIR_MODEL_LAST)
80	if is_best:
81	# Avoid saving twice (saving will be more expensive than
82	# the dir copy)
83	if (output_path / DIR_MODEL_BEST).exists():
84	shutil.rmtree(output_path / DIR_MODEL_BEST)
85	shutil.copytree(output_path / DIR_MODEL_LAST, output_path / DIR_MODEL_BEST)
86
87	# Components that shouldn't be updated during training
88	frozen_components = T["frozen_components"]
89	# Components that should set annotations on update
90	annotating_components = T["annotating_components"]
91	# Create iterator, which yields out info after each optimization step.
92	training_step_iterator = train_while_improving(

Callers 2

test_pretraining_trainingFunction · 0.90

test_annotating_components_from_configFunction · 0.90

Calls 11

resolve_dot_namesFunction · 0.85

create_before_to_disk_callbackFunction · 0.85

train_while_improvingFunction · 0.85

create_train_batchesFunction · 0.85

create_evaluation_callbackFunction · 0.85

clean_output_dirFunction · 0.85

update_metaFunction · 0.85

save_checkpointFunction · 0.85

log_stepFunction · 0.85

select_pipesMethod · 0.80

use_paramsMethod · 0.80

Tested by 2

test_pretraining_trainingFunction · 0.72

test_annotating_components_from_configFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…