Optimizer

class optimizer.Adadelta(rho=0.9, epsilon=1e-06, *args, **kwargs)[source]

Bases: optimizer.Optimizer

AdaDelta optimization algorithm

Update the parameters according to the rule

c = rho * c + (1. - rho) * gradient * gradient
update = gradient * sqrt(d + epsilon) / (sqrt(c) + epsilon)
parameter -= learning_rate * update
d = rho * d + (1. - rho) * update * update
Parameters
  • rho (float (default=0.9)) – Decay factor

  • epsilon (float (default=1e-6)) – Precision parameter to overcome numerical overflows

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.Adagrad(epsilon=1e-06, *args, **kwargs)[source]

Bases: optimizer.Optimizer

Adagrad optimizer specialization

Update the parameters according to the rule

c += gradient * gradient
parameter -= learning_rate * gradient / (sqrt(c) + epsilon)
Parameters
  • epsilon (float (default=1e-6)) – Precision parameter to overcome numerical overflows

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.Adam(beta1=0.9, beta2=0.999, epsilon=1e-08, *args, **kwargs)[source]

Bases: optimizer.Optimizer

Adam optimization algorithm

Update the parameters according to the rule

at  = learning_rate * sqrt(1 - B2**iterations) / (1 - B1**iterations)
m = B1 * m + (1 - B1) * gradient
v = B2 * m + (1 - B2) * gradient * gradient
parameter -= at * m / (sqrt(v) + epsilon)
Parameters
  • beta1 (float (default=0.9)) – B1 factor

  • beta2 (float (default=0.999)) – B2 factor

  • epsilon (float (default=1e-8)) – Precision parameter to overcome numerical overflows

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.Adamax(beta1=0.9, beta2=0.999, epsilon=1e-08, *args, **kwargs)[source]

Bases: optimizer.Optimizer

Adamax optimization algorithm

Update the parameters according to the rule

at  = learning_rate / (1 - B1**iterations)
m = B1 * m + (1 - B1) * gradient
v = max(B2 * v, abs(gradient))
parameter -= at * m / (v + epsilon)
Parameters
  • beta1 (float (default=0.9)) – B1 factor

  • beta2 (float (default=0.999)) – B2 factor

  • epsilon (float (default=1e-8)) – Precision parameter to overcome numerical overflows

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.Momentum(momentum=0.9, *args, **kwargs)[source]

Bases: optimizer.Optimizer

Stochastic Gradient Descent with Momentum specialiation

Update the parameters according to the rule

v = momentum * v - lr * gradient
parameter += v - learning_rate * gradient
Parameters
  • momentum (float (default=0.9)) – Momentum value

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.NesterovMomentum(momentum=0.9, *args, **kwargs)[source]

Bases: optimizer.Optimizer

Stochastic Gradient Descent with Nesterov Momentum specialiation

Update the parameters according to the rule

v = momentum * v - lr * gradient
parameter += momentum * v - learning_rate * gradient
Parameters
  • momentum (float (default=0.9)) – Momentum value

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.Optimizer(lr=0.001, decay=0.0, lr_min=0.0, lr_max=inf, *args, **kwargs)[source]

Bases: object

Abstract base class for the optimizers

Parameters
  • lr (float (default=2e-2)) – Learning rate value

  • decay (float (default=0.)) – Learning rate decay

  • lr_min (float (default=0.)) – Minimum of learning rate domain

  • lr_max (float (default=np.inf)) – Maximum of learning rate domain

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the optimizer parameters

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Return type

self

class optimizer.RMSprop(rho=0.9, epsilon=1e-06, *args, **kwargs)[source]

Bases: optimizer.Optimizer

RMSprop optimization algorithm

Update the parameters according to the rule

c = rho * c + (1. - rho) * gradient * gradient
parameter -= learning_rate * gradient / (sqrt(c) + epsilon)
Parameters
  • rho (float (default=0.9)) – Decay factor

  • epsilon (float (default=1e-6)) – Precision parameter to overcome numerical overflows

  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list

class optimizer.SGD(*args, **kwargs)[source]

Bases: optimizer.Optimizer

Stochastic Gradient Descent specialization

Update the parameters according to the rule

parameter -= learning_rate * gradient
Parameters
  • *args (list) – Class specialization variables.

  • **kwargs (dict) – Class Specialization variables.

update(params, gradients)[source]

Update the given parameters according to the class optimization algorithm

Parameters
  • params (list) – List of parameters to update

  • gradients (list) – List of corresponding gradients

Returns

params – The updated parameters

Return type

list