Retry Policies
A Retry Policy works in cooperation with the timeouts to provide fine controls to optimize the execution experience.
A Retry Policy is a collection of attributes that instructs the Temporal Server how to retry a failure of a Workflow Execution or an Activity Task ExecutionWhat is an Activity Task Execution?
An Activity Task Execution occurs when a Worker uses the context provided from the Activity Task and executes the Activity Definition.
Learn more.
(Retry Policies do not apply to Workflow Task ExecutionsWhat is a Workflow Task Execution?
A Workflow Task Execution occurs when a Worker picks up a Workflow Task and uses it to make progress on the execution of a Workflow Definition.
Learn more, which always retry indefinitely.)
- Activity retry simulator
- How to set a custom Retry Policy for an Activity
- How to set a Retry Policy for a Workflow
Diagram that shows the retry interval and its formula
Default behavior
Workflow Execution: When a Workflow Execution is spawned, it is not associated with a default Retry Policy and thus does not retry by default. The intention is that a Workflow Definition should be written to never fail due to intermittent issues; an Activity is designed to handle such issues.
Activity Execution: When an Activity Execution is spawned, it is associated with a default Retry Policy, and thus Activity Task Executions are retried by default. When an Activity Task Execution is retried, the Cluster places a new Activity TaskWhat is an Activity Task?
An Activity Task contains the context needed to make an Activity Task Execution.
Learn more into its respective Activity Task QueueWhat is a Task Queue?
A Task Queue is a first-in, first-out queue that a Worker Process polls for Tasks.
Learn more, which results in a new Activity Task Execution.
Custom Retry Policy
To use a custom Retry Policy, provide it as an options parameter when starting a Workflow Execution or Activity Execution. Only certain scenarios merit starting a Workflow Execution with a custom Retry Policy, such as the following:
- A Temporal Cron JobWhat is a Temporal Cron Job?
A Temporal Cron Job is the series of Workflow Executions that occur when a Cron Schedule is provided in the call to spawn a Workflow Execution.
Learn more or some other stateless, always-running Workflow Execution that can benefit from retries. - A file-processing or media-encoding Workflow Execution that downloads files to a host.
Properties
Default values for Retry Policy
Initial Interval = 1 second
Backoff Coefficient = 2.0
Maximum Interval = 100 × Initial Interval
Maximum Attempts = ∞
Non-Retryable Errors = []
Initial Interval
- Description: Amount of time that must elapse before the first retry occurs.
- The default value is 1 second.
- Use case: This is used as the base interval time for the Backoff Coefficient to multiply against.
Backoff Coefficient
- Description: The value dictates how much the retry interval increases.
- The default value is 2.0.
- A backoff coefficient of 1.0 means that the retry interval always equals the Initial Interval.
- Use case: Use this attribute to increase the interval between retries. By having a backoff coefficient greater than 1.0, the first few retries happen relatively quickly to overcome intermittent failures, but subsequent retries happen farther and farther apart to account for longer outages. Use the Maximum Interval attribute to prevent the coefficient from increasing the retry interval too much.
Maximum Interval
- Description: Specifies the maximum interval between retries.
- The default value is 100 times the Initial Interval.
- Use case: This attribute is useful for Backoff Coefficients that are greater than 1.0 because it prevents the retry interval from growing infinitely.
Maximum Attempts
- Description: Specifies the maximum number of execution attempts that can be made in the presence of failures.
- The default is unlimited.
- If this limit is exceeded, the execution fails without retrying again. When this happens an error is returned.
- Setting the value to 0 also means unlimited.
- Setting the value to 1 means a single execution attempt and no retries.
- Setting the value to a negative integer results in an error when the execution is invoked.
- Use case: Use this attribute to ensure that retries do not continue indefinitely. However, in the majority of cases, we recommend relying on the Workflow Execution Timeout, in the case of Workflows, or Schedule-To-Close Timeout, in the case of Activities, to limit the total duration of retries instead of using this attribute.
Non-Retryable Errors
- Description: Specifies errors that shouldn't be retried.
- Default is none.
- If one of those errors occurs, the Activity Task Execution or Workflow Execution is not retried.
- The errors are matched against the
type
field of the ApplicationFailure.
- Use case: There may be errors that you know of that should not trigger a retry. In this case, you can specify them such that, if they occur, the given execution is not retried.
Retry interval
The wait time before a retry is the retry interval. A retry interval is the smaller of two values:
- The Initial Interval multiplied by the Backoff Coefficient raised to the power of the number of retries.
- The Maximum Interval.
Event History
There are some subtle nuances to how Events are recorded to an Event History when a Retry Policy comes into play.
For an Activity Execution, the ActivityTaskStartedWhat is an Event?
Events are created by the Temporal Cluster in response to external occurrences and Commands generated by a Workflow Execution.
Learn more Event will not show up in the Workflow Execution Event History until the Activity Execution has completed or failed (having exhausted all retries). This is to avoid filling the Event History with noise. Use the Describe API to get a pending Activity Execution's attempt count.For a Workflow Execution with a Retry Policy, if the Workflow Execution fails, the Workflow Execution will Continue-As-NewWhat is Continue-As-New?
Continue-As-New is the mechanism by which all relevant state is passed to a new Workflow Execution with a fresh Event History.
Learn more and the associated Event is written to the Event History. The WorkflowExecutionContinuedAsNewWhat is an Event?
Events are created by the Temporal Cluster in response to external occurrences and Commands generated by a Workflow Execution.
Learn more Event will have an "initiator" field that will specify the Retry Policy as the value and the new Run Id for the next retry attempt. The new Workflow Execution is created immediately. But the first Workflow Task won't be scheduled until the backoff duration is exhausted. That duration is recorded as thefirstWorkflowTaskBackoff
field of the new run'sWorkflowExecutionStartedEventAttributes
event.