MCPcopy
hub / github.com/tensorlayer/TensorLayer / create_distributed_session

Function create_distributed_session

tensorlayer/distributed.py:397–490  ·  view source on GitHub ↗

Creates a distributed session. It calls `MonitoredTrainingSession` to create a :class:`MonitoredSession` for distributed training. Parameters ---------- task_spec : :class:`TaskSpecDef`. The task spec definition from create_task_spec_def() checkpoint_dir : str.

(
    task_spec=None, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600,
    save_summaries_steps=object(), save_summaries_secs=object(), config=None, stop_grace_period_secs=120,
    log_step_count_steps=100
)

Source from the content-addressed store, hash-verified

395
396@deprecated(date="2018-10-30", instructions="Using the TensorLayer distributed trainer.")
397def create_distributed_session(
398 task_spec=None, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600,
399 save_summaries_steps=object(), save_summaries_secs=object(), config=None, stop_grace_period_secs=120,
400 log_step_count_steps=100
401):
402 """Creates a distributed session.
403
404 It calls `MonitoredTrainingSession` to create a :class:`MonitoredSession` for distributed training.
405
406 Parameters
407 ----------
408 task_spec : :class:`TaskSpecDef`.
409 The task spec definition from create_task_spec_def()
410 checkpoint_dir : str.
411 Optional path to a directory where to restore variables.
412 scaffold : ``Scaffold``
413 A `Scaffold` used for gathering or building supportive ops.
414 If not specified, a default one is created. It's used to finalize the graph.
415 hooks : list of ``SessionRunHook`` objects.
416 Optional
417 chief_only_hooks : list of ``SessionRunHook`` objects.
418 Activate these hooks if `is_chief==True`, ignore otherwise.
419 save_checkpoint_secs : int
420 The frequency, in seconds, that a checkpoint is saved
421 using a default checkpoint saver. If `save_checkpoint_secs` is set to
422 `None`, then the default checkpoint saver isn't used.
423 save_summaries_steps : int
424 The frequency, in number of global steps, that the
425 summaries are written to disk using a default summary saver. If both
426 `save_summaries_steps` and `save_summaries_secs` are set to `None`, then
427 the default summary saver isn't used. Default 100.
428 save_summaries_secs : int
429 The frequency, in secs, that the summaries are written
430 to disk using a default summary saver. If both `save_summaries_steps` and
431 `save_summaries_secs` are set to `None`, then the default summary saver
432 isn't used. Default not enabled.
433 config : ``tf.ConfigProto``
434 an instance of `tf.ConfigProto` proto used to configure the session.
435 It's the `config` argument of constructor of `tf.Session`.
436 stop_grace_period_secs : int
437 Number of seconds given to threads to stop after
438 `close()` has been called.
439 log_step_count_steps : int
440 The frequency, in number of global steps, that the
441 global step/sec is logged.
442
443 Examples
444 --------
445 A simple example for distributed training where all the workers use the same dataset:
446
447 >>> task_spec = TaskSpec()
448 >>> with tf.device(task_spec.device_fn()):
449 >>> tensors = create_graph()
450 >>> with tl.DistributedSession(task_spec=task_spec,
451 ... checkpoint_dir='/tmp/ckpt') as session:
452 >>> while not session.should_stop():
453 >>> session.run(tensors)
454

Callers

nothing calls this directly

Calls 2

targetMethod · 0.80
is_masterMethod · 0.80

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…