Skip to content

How to handle retry logic in Python for API calls and transient failures

Problem

I was getting intermittent failures when calling external APIs in my Python application. Sometimes the network would hiccup, sometimes the API would return a 503 Service Unavailable, and sometimes I’d hit rate limits. My application would crash, and I’d have to manually restart it.

I tried wrapping my API calls in try-except blocks, but that only caught the errors - it didn’t retry the failed requests. I needed a way to automatically retry failed API calls with some backoff strategy to avoid hammering the service.

What I Tried First

My first instinct was to write a custom retry decorator. I created something like this:

custom_retry.py
import time
import functools
def retry_on_failure(max_retries=3, delay=1):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
time.sleep(delay * (2 ** attempt))
return wrapper
return decorator
@retry_on_failure(max_retries=3, delay=1)
def fetch_api_data():
import requests
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

This worked for a simple case, but I quickly ran into issues:

  1. Different projects had different implementations - I had three different retry decorators across my codebase, each with slightly different logic
  2. No jitter - Multiple instances retrying at the same time would cause thundering herd problems
  3. Retrying on all exceptions - Some exceptions shouldn’t be retried (like 401 Unauthorized)
  4. No logging - I couldn’t see when retries were happening
  5. Fixed exponential multiplier - Different APIs needed different backoff strategies

I tried improving my custom decorator, but it kept growing more complex:

improved_retry.py
import time
import functools
import random
import logging
def retry_on_failure(max_retries=3, base_delay=1, max_delay=60, jitter=True, retry_on=None):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if retry_on and not isinstance(e, retry_on):
raise
if attempt == max_retries - 1:
logging.error(f"All {max_retries} attempts failed")
raise
delay = min(base_delay * (2 ** attempt), max_delay)
if jitter:
delay = delay + random.uniform(0, 1)
logging.warning(f"Attempt {attempt + 1} failed, retrying in {delay:.2f}s")
time.sleep(delay)
return wrapper
return decorator

This was getting unwieldy, and I realized I was reinventing the wheel. There had to be a better way.

Solution: Tenacity Library

After searching for “python retry decorator” and “python retry logic”, I discovered the Tenacity library - a production-ready, composable retry framework that solved all my problems.

Basic Usage

basic_retry.py
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

This single decorator replaced my entire custom implementation. The key insight from a Reddit discussion:

“Tenacity for retry logic. Before finding it I had custom retry decorators scattered across every project, each with slightly different backoff logic. Tenacity gives you composable retry strategies in one decorator - exponential backoff, retry on specific exceptions, stop after N attempts, all just stacked as parameters.”

Retry on Specific Exceptions

One critical improvement: only retry on transient errors, not on authentication or validation errors:

retry_specific_exceptions.py
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from requests.exceptions import RequestException
# Don't retry on 4xx client errors, only 5xx server errors
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type((requests.exceptions.Timeout, requests.exceptions.ConnectionError))
)
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

Adding Logging and Observability

I needed to see what was happening during retries:

retry_with_logging.py
import requests
import logging
from tenacity import retry, stop_after_attempt, wait_exponential, before_sleep_log
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
before_sleep=before_sleep_log(logger, logging.WARNING)
)
def fetch_api_data():
logger.info("Fetching API data...")
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

Now I get log output like:

log_output.txt
WARNING: Retrying fetch_api_data in 2.0 seconds as it raised ConnectionError: HTTPSConnectionPool...
WARNING: Retrying fetch_api_data in 4.0 seconds as it raised ConnectionError: HTTPSConnectionPool...
INFO: Fetching API data...

Retry Based on Return Value

Sometimes the API doesn’t throw an exception but returns an error response. Tenacity can retry based on the return value:

retry_on_return_value.py
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_result
def is_rate_limited(response):
"""Check if response indicates rate limiting"""
return response.status_code == 429
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=60),
retry=retry_if_result(is_rate_limited)
)
def fetch_api_data():
response = requests.get("https://api.example.com/data")
return response
response = fetch_api_data()
if response.status_code == 200:
data = response.json()

Combining Multiple Conditions

You can combine multiple retry conditions:

combined_retry.py
import requests
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
wait_random,
retry_if_exception_type,
retry_any
)
# Combine exception-based and result-based retry logic
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=60) + wait_random(0, 1), # Add jitter
retry=retry_any(
retry_if_exception_type((requests.exceptions.Timeout, requests.exceptions.ConnectionError)),
retry_if_result(lambda r: r.status_code == 429)
)
)
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response

Using Retry Context in the Function

Sometimes you need to know which attempt you’re on:

retry_context.py
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def fetch_api_data():
# Access retry state through the call stack (advanced usage)
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()
# Or use a callback to handle final failure
def on_retry_failure(retry_state):
print(f"Retrying... attempt {retry_state.attempt_number}")
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry_error_callback=lambda retry_state: None # Return None on final failure
)
def fetch_with_fallback():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

Common Mistakes to Avoid

1. Retrying on All Exceptions

wrong_retry_all.py
# WRONG: Will retry even on authentication errors
@retry(stop=stop_after_attempt(3))
def fetch_api_data():
response = requests.get("https://api.example.com/data", headers={"Authorization": "Bearer invalid"})
response.raise_for_status() # Raises 401 - shouldn't retry!
return response.json()
correct_retry_specific.py
# CORRECT: Only retry on transient errors
@retry(
stop=stop_after_attempt(3),
retry=retry_if_exception_type((requests.exceptions.Timeout, requests.exceptions.ConnectionError))
)
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

2. No Backoff or Fixed Delay

wrong_fixed_delay.py
# WRONG: No backoff - hammers the server
@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()
correct_exponential_backoff.py
# CORRECT: Exponential backoff gives the server time to recover
@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=60))
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

3. Retrying Indefinitely

wrong_infinite_retry.py
# WRONG: Will retry forever!
@retry()
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()
correct_limited_retry.py
# CORRECT: Always have a stop condition
@retry(stop=stop_after_attempt(5))
def fetch_api_data():
response = requests.get("https://api.example.com/data")
response.raise_for_status()
return response.json()

Summary

Using Tenacity for retry logic in Python is straightforward:

  1. Install the library: pip install tenacity
  2. Add the decorator with composable retry strategies
  3. Configure stop conditions (attempt count or time limit)
  4. Configure wait strategies (exponential backoff with jitter)
  5. Specify retry conditions (exception types or return values)
  6. Add logging for observability

The key insight from the Reddit discussion: “Tenacity for retry behavior mechanism. It is very helpful for handling transient failures especially for API calls.”

Instead of scattering custom retry decorators across your codebase, use Tenacity for a unified, maintainable approach to handling transient failures.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments