MCPcopy
hub / github.com/explosion/spaCy / train

Function train

spacy/training/loop.py:35–150  ·  view source on GitHub ↗

Train a pipeline. nlp (Language): The initialized nlp object with the full config. output_path (Optional[Path]): Optional output path to save trained model to. use_gpu (int): Whether to train on GPU. Make sure to call require_gpu before calling this function. stdout (file):

(
    nlp: "Language",
    output_path: Optional[Path] = None,
    *,
    use_gpu: int = -1,
    stdout: IO = sys.stdout,
    stderr: IO = sys.stderr,
)

Source from the content-addressed store, hash-verified

33
34
35def train(
36 nlp: "Language",
37 output_path: Optional[Path] = None,
38 *,
39 use_gpu: int = -1,
40 stdout: IO = sys.stdout,
41 stderr: IO = sys.stderr,
42) -> Tuple["Language", Optional[Path]]:
43 """Train a pipeline.
44
45 nlp (Language): The initialized nlp object with the full config.
46 output_path (Optional[Path]): Optional output path to save trained model to.
47 use_gpu (int): Whether to train on GPU. Make sure to call require_gpu
48 before calling this function.
49 stdout (file): A file-like object to write output messages. To disable
50 printing, set to io.StringIO.
51 stderr (file): A second file-like object to write output messages. To disable
52 printing, set to io.StringIO.
53
54 RETURNS (tuple): The final nlp object and the path to the exported model.
55 """
56 # We use no_print here so we can respect the stdout/stderr options.
57 msg = Printer(no_print=True)
58 # Create iterator, which yields out info after each optimization step.
59 config = nlp.config.interpolate()
60 if config["training"]["seed"] is not None:
61 fix_random_seed(config["training"]["seed"])
62 allocator = config["training"]["gpu_allocator"]
63 if use_gpu >= 0 and allocator:
64 set_gpu_allocator(allocator)
65 T = registry.resolve(config["training"], schema=ConfigSchemaTraining) # type: ignore[arg-type]
66 dot_names = [T["train_corpus"], T["dev_corpus"]]
67 train_corpus, dev_corpus = resolve_dot_names(config, dot_names)
68 optimizer = T["optimizer"]
69 score_weights = T["score_weights"]
70 batcher = T["batcher"]
71 train_logger = T["logger"]
72 before_to_disk = create_before_to_disk_callback(T["before_to_disk"])
73 before_update = T["before_update"]
74
75 # Helper function to save checkpoints. This is a closure for convenience,
76 # to avoid passing in all the args all the time.
77 def save_checkpoint(is_best):
78 with nlp.use_params(optimizer.averages):
79 before_to_disk(nlp).to_disk(output_path / DIR_MODEL_LAST)
80 if is_best:
81 # Avoid saving twice (saving will be more expensive than
82 # the dir copy)
83 if (output_path / DIR_MODEL_BEST).exists():
84 shutil.rmtree(output_path / DIR_MODEL_BEST)
85 shutil.copytree(output_path / DIR_MODEL_LAST, output_path / DIR_MODEL_BEST)
86
87 # Components that shouldn't be updated during training
88 frozen_components = T["frozen_components"]
89 # Components that should set annotations on update
90 annotating_components = T["annotating_components"]
91 # Create iterator, which yields out info after each optimization step.
92 training_step_iterator = train_while_improving(

Calls 11

resolve_dot_namesFunction · 0.85
train_while_improvingFunction · 0.85
create_train_batchesFunction · 0.85
clean_output_dirFunction · 0.85
update_metaFunction · 0.85
save_checkpointFunction · 0.85
log_stepFunction · 0.85
select_pipesMethod · 0.80
use_paramsMethod · 0.80

Tested by 2

Used in the wild real call sites across dependent graphs

searching dependent graphs…