Stopper

Contents

Stopper#

class liesel.goose.Stopper(max_iter, patience, atol=0.001, rtol=0.0)[source]#

Bases: object

Handles (early) stopping for optim_flat().

Parameters:
  • max_iter (int) – The maximum number of optimization steps.

  • patience (int) – Length of the recent loss window considered for early stopping. Early stopping is checked only after more than patience optimization steps have been evaluated. In other words, because i is zero-based, stop_early() first returns True no earlier than i == patience + 1.

  • atol (float, default: 0.001) – The non-negative absolute tolerance for early stopping.

  • rtol (float, default: 0.0) – The non-negative relative tolerance for early stopping. The default of 0.0 means that no early stopping happens based on the relative tolerance.

Notes

Early stopping is based on the window of the most recent patience loss values ending at the current zero-based iteration i. Without tolerances, early stopping happens when the oldest loss value in this window is also the best loss value in this window. This is a rolling-window rule, not a best-so-far rule that counts the number of iterations since the global best loss. It can therefore continue while the recent window still contains newer improvements, even if the global best loss was observed before the current window. A simplified pseudo-implementation is:

def stop(patience, i, loss_history):
    current_history = loss_history[: i + 1]
    recent_history = current_history[-patience:]
    oldest_within_patience = recent_history[0]
    best_within_patience = np.min(recent_history)

    return oldest_within_patience <= best_within_patience

Absolute and relative tolerance make it possible to stop even in cases when the oldest loss within patience is not the best. Instead, the algorithm stops, when the absolute or relative difference between the oldest loss within patience and the best loss within patience is so small that it can be neglected. To be clear: If either of the two conditions is met, then early stopping happens. The relative magnitude of the difference is calculated with respect to the best loss within patience. A simplified pseudo-implementation is:

def stop(patience, i, loss_history, atol, rtol):
    current_history = loss_history[: i + 1]
    recent_history = current_history[-patience:]
    oldest_within_patience = recent_history[0]
    best_within_patience = np.min(recent_history)

    diff = oldest_within_patience - best_within_patience
    rel_diff = diff / np.abs(best_within_patience)

    abs_improvement_is_neglectable = diff <= atol
    rel_improvement_is_neglectable = rel_diff <= rtol

    return (abs_improvement_is_neglectable | rel_improvement_is_neglectable)

Methods

continue_(i, loss_history)

Whether optimization should continue (inverse of stop_now()).

stop_early(i, loss_history)

stop_now(i, loss_history)

Whether optimization should stop now.

which_best_in_recent_history(i, loss_history)

Identifies the index of the best observation in the recent loss window.

Attributes