MCPcopy
hub / github.com/appvision-ai/fast-bert / lr_find

Method lr_find

fast_bert/learner_cls.py:798–893  ·  view source on GitHub ↗

Performs the learning rate range test. Arguments: start_lr (float, optional): the starting learning rate for the range test. Default: None (uses the learning rate from the optimizer). end_lr (float, optional): the maximum learning rate to test. Default

(
        self,
        start_lr,
        end_lr=10,
        use_val_loss=True,
        optimizer_type="lamb",
        num_iter=100,
        step_mode="exp",
        smooth_f=0.05,
        diverge_th=5,
    )

Source from the content-addressed store, hash-verified

796 self.model.to(self.device)
797
798 def lr_find(
799 self,
800 start_lr,
801 end_lr=10,
802 use_val_loss=True,
803 optimizer_type="lamb",
804 num_iter=100,
805 step_mode="exp",
806 smooth_f=0.05,
807 diverge_th=5,
808 ):
809 """Performs the learning rate range test.
810 Arguments:
811 start_lr (float, optional): the starting learning rate for the range test.
812 Default: None (uses the learning rate from the optimizer).
813 end_lr (float, optional): the maximum learning rate to test. Default: 10.
814 num_iter (int, optional): the number of iterations over which the test
815 occurs. Default: 100.
816 step_mode (str, optional): one of the available learning rate policies,
817 linear or exponential ("linear", "exp"). Default: "exp".
818 smooth_f (float, optional): the loss smoothing factor within the [0, 1[
819 interval. Disabled if set to 0, otherwise the loss is smoothed using
820 exponential smoothing. Default: 0.05.
821 diverge_th (int, optional): the test is stopped when the loss surpasses the
822 threshold: diverge_th * best_loss. Default: 5.
823 Reference:
824 [Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups](
825 https://medium.com/huggingface/ec88c3e51255)
826 [thomwolf/gradient_accumulation](https://gist.github.com/thomwolf/ac7a7da6b1888c2eeac8ac8b9b05d3d3)
827 """
828
829 # Reset test results
830 self.history = {"lr": [], "loss": []}
831 self.best_loss = None
832 self.state_cacher = StateCacher(True, cache_dir=self.output_dir)
833
834 self.optimizer = self.get_optimizer(lr=start_lr, optimizer_type=optimizer_type)
835
836 if hasattr(self.model, "module"):
837 self.model = self.model.module
838
839 self.state_cacher.store("model", self.model.state_dict())
840 self.state_cacher.store("optimizer", self.optimizer.state_dict())
841
842 # Parallelize the model architecture
843 if self.multi_gpu is True:
844 self.model = torch.nn.DataParallel(self.model)
845
846 # Check if the optimizer is already attached to a scheduler
847 self._check_for_scheduler()
848
849 # Set the starting learning rate
850 if start_lr:
851 self._set_learning_rate(start_lr)
852
853 # Initialize the proper learning rate policy
854 if step_mode.lower() == "exp":
855 lr_schedule = ExponentialLR(self.optimizer, end_lr, num_iter)

Callers

nothing calls this directly

Calls 14

_check_for_schedulerMethod · 0.95
_set_learning_rateMethod · 0.95
_train_batchMethod · 0.95
validateMethod · 0.95
get_lrMethod · 0.95
resetMethod · 0.95
plotMethod · 0.95
get_optimizerMethod · 0.80
StateCacherClass · 0.70
ExponentialLRClass · 0.70
LinearLRClass · 0.70
TrainDataLoaderIterClass · 0.70

Tested by

no test coverage detected