tf.keras.mixed_precision.experimental.Policy

View source

Class Policy

A dtype policy for a Keras layer.

Aliases:

  • Class tf.compat.v1.keras.mixed_precision.experimental.Policy
  • Class tf.compat.v2.keras.mixed_precision.experimental.Policy

A dtype policy determines the computation dtype and the variable dtype of a Keras layer. Each layer has a policy. Policies can be passed to the 'dtype' argument of layer constructors, or a global policy can be set with 'tf.keras.mixed_precision.experimental.set_policy'. A layer will default to the global policy if no policy is passed to it's constructor.

For most models, each layer will have the same computation dtype and variable dtype, which will typically be float32. However, when mixed precision training is used, most layers will instead have a float16 computation dtype and a float32 variable dtype. See this link for more information on mixed precision training. When the variable dtype does not match the computation dtype, variables will be automatically casted to the computation dtype to avoid type errors.

In the near future, policies will also determine the loss scaling algorithm for Keras models.

Policies are constructed by passing a string to the constructor, e.g. tf.keras.mixed_precision.experimental.Policy('float32'). The string determines the compute and variable dtypes. Currently, it can be one of in one of the following forms:

  • Any dtype name, such as 'float32' or 'float64'. Both the variable and compute dtypes will be that dtype.
  • '_with_float32_vars', where is any dtype. The compute dtype will be , while the variable dtype is float32. This is intended for the use of mixed precision, which uses float16 or bfloat16 for most computations, and float32 for variables. This policy is only useful if is float16 or bfloat16, although is allowed to be any dtype. Note we will have a "mixed" policy in the future, which will make it even easier to use mixed precision by enabling other features such as loss scaling.

How to use mixed precision in layers with Policies

To use mixed precision in a model, the 'float16_with_float32_vars' policy can be used. tf.keras.mixed_precision.experimental.set_policy can be used to set the default policy for layers if no policy is passed to them. Note loss scaling must also be done, e.g. with a tf.keras.mixed_precision.experimental.LossScaleOptimizer. For example

tf.keras.mixed_precision.experimental.set_policy(
    'float16_with_float32_vars')
model = tf.keras.models.Sequential(
    tf.keras.layers.Input((100,)),
    # Dense layers use global policy of 'float16_with_float32_vars'
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(10),
    # Softmax should be done in float32 for numeric stability. We pass
    # dtype='float32' to use float32 instead of the global policy.
    tf.keras.layers.Activation('Softmax', dtype='float32')
)
opt = tf.keras.mixed_precision.experimental.LossScaleOptimizer(...)
... # Train `model` with `opt`.

Alternatively, the policy can be passed to individual layers instead of setting the global policy with set_policy:

policy = tf.keras.mixed_precision.experimental.Policy(
    'float16_with_float32_vars')
model = tf.keras.models.Sequential(
    tf.keras.layers.Input((100,)),
    tf.keras.layers.Dense(10, dtype=policy),
    tf.keras.layers.Dense(10, dtype=policy),
    # Softmax should be done in float32 for numeric stability.
    tf.keras.layers.Activation('Softmax', dtype='float32')
)
opt = tf.keras.mixed_precision.experimental.LossScaleOptimizer(...)
... # Train `model` with `opt`.

As the above example shows, strings can be directly passed to layer constructors in the dtype argument instead of policies, but only if the string is convertible to a dtype.

The deprecated "infer" policy

In addition to a dtype or "_with_float32_vars", a policy can also be "infer". This Policy is deprecated, and it is not recommended. When a layer has an infer policy, it will infer the computation and variable dtype from the first input the first time the layer is called.

Once the layer is called for the first time, the layer's policy will change to the dtype of the first input.

Similarly to "infer", there is a deprecated "infer_with_float32_vars" policy that infers the compute dtype, but not the variable dtype.

In TensorFlow 1, only the "infer" and "infer_with_float32_vars" policies are available.

__init__

View source

__init__(name)

Constructs the policy.

The name argument determines the compute and variable dtype, and has no additional effect on the Policy. The compute and variable dtypes can only be specified through name, and cannot be specified directly.

Args:

  • name: A string. Can be one of the following values:
    • Any dtype name, such as 'float32' or 'float64'. Both the variable and compute dtypes will be that dtype.
    • _with_float32_vars, where is any dtype. The compute dtype will be , while the variable dtype is float32. This is intended for the use of mixed precision, which uses float16 or bfloat16 for most computations, and float32 for variables. This policy is only useful if is float16 or bfloat16, although is allowed to be any dtype. Note we will have a "mixed" policy in the future, which will make it even easier to use mixed precision by enabling other features such as loss scaling.
    • 'infer' or 'infer_with_float32_vars' (deprecated): Infer the computation dtype from the input dtype.

Properties

compute_dtype

The compute dtype of this policy.

This is the dtype layers will do their computations in.

If this is None, the policy is "infer" or "infer_with_float32_vars" and variable_dtype is either None or float32 respectively.

Note that even if the compute dtype is float16 or bfloat16, hardware devices may not do individual adds, multiplies, and other fundamental operations in [b]float16, but instead may do some of them in float32 for numeric stability. The compute dtype is the dtype of the inputs and outputs of the TensorFlow ops that the layer executes. Internally, many TensorFlow ops will do certain internal calculations in float32, or some other device-internal intermediate format with higher precision than [b]float16, to increase numeric stability.

For example, a tf.keras.layers.Dense layer, when run on a GPU with a float16 compute dtype, will pass float16 inputs to tf.matmul. But, tf.matmul will do use float32 intermediate math. The performance benefit of float16 is still apparent, due to increased memory bandwidth and the fact GPUs have specialized hardware for computating matmuls on float16 while still keeping intermediate computations in float32.

Returns:

The variable dtype of this policy, or None if the variable dtype should be inferred from the inputs.

name

Returns the name of this policy.

should_cast_variables

Returns True if variables should be casted.

This is true if the variable dtype is not the same as the compute dtype.

Returns:

True, if variables should be casted.

variable_dtype

The variable dtype of this policy.

This is the dtype layers will create their variables in, unless a layer explicit chooses a different dtype. If this is different than Policy.compute_dtype and both are non-None, Layers will cast variables to the compute dtype to avoid type errors.

If this is None, the policy is "infer" and the compute_dtype is also None. If compute_dtype is None, this is either None or float32.

Returns:

The variable dtype of this policy, or None if the variable dtype should be inferred from the inputs.

results matching ""

    No results matching ""