How to handle retry logic in Python for API calls and transient failures
Problem
I was getting intermittent failures when calling external APIs in my Python application. Sometimes the network would hiccup, sometimes the API would return a 503 Service Unavailable, and sometimes I’d hit rate limits. My application would crash, and I’d have to manually restart it.
I tried wrapping my API calls in try-except blocks, but that only caught the errors - it didn’t retry the failed requests. I needed a way to automatically retry failed API calls with some backoff strategy to avoid hammering the service.
What I Tried First
My first instinct was to write a custom retry decorator. I created something like this:
import timeimport functools
def retry_on_failure(max_retries=3, delay=1): def decorator(func): @functools.wraps(func) def wrapper(*args, **kwargs): for attempt in range(max_retries): try: return func(*args, **kwargs) except Exception as e: if attempt == max_retries - 1: raise time.sleep(delay * (2 ** attempt)) return wrapper return decorator
@retry_on_failure(max_retries=3, delay=1)def fetch_api_data(): import requests response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()This worked for a simple case, but I quickly ran into issues:
- Different projects had different implementations - I had three different retry decorators across my codebase, each with slightly different logic
- No jitter - Multiple instances retrying at the same time would cause thundering herd problems
- Retrying on all exceptions - Some exceptions shouldn’t be retried (like 401 Unauthorized)
- No logging - I couldn’t see when retries were happening
- Fixed exponential multiplier - Different APIs needed different backoff strategies
I tried improving my custom decorator, but it kept growing more complex:
import timeimport functoolsimport randomimport logging
def retry_on_failure(max_retries=3, base_delay=1, max_delay=60, jitter=True, retry_on=None): def decorator(func): @functools.wraps(func) def wrapper(*args, **kwargs): for attempt in range(max_retries): try: return func(*args, **kwargs) except Exception as e: if retry_on and not isinstance(e, retry_on): raise if attempt == max_retries - 1: logging.error(f"All {max_retries} attempts failed") raise delay = min(base_delay * (2 ** attempt), max_delay) if jitter: delay = delay + random.uniform(0, 1) logging.warning(f"Attempt {attempt + 1} failed, retrying in {delay:.2f}s") time.sleep(delay) return wrapper return decoratorThis was getting unwieldy, and I realized I was reinventing the wheel. There had to be a better way.
Solution: Tenacity Library
After searching for “python retry decorator” and “python retry logic”, I discovered the Tenacity library - a production-ready, composable retry framework that solved all my problems.
Basic Usage
import requestsfrom tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()This single decorator replaced my entire custom implementation. The key insight from a Reddit discussion:
“Tenacity for retry logic. Before finding it I had custom retry decorators scattered across every project, each with slightly different backoff logic. Tenacity gives you composable retry strategies in one decorator - exponential backoff, retry on specific exceptions, stop after N attempts, all just stacked as parameters.”
Retry on Specific Exceptions
One critical improvement: only retry on transient errors, not on authentication or validation errors:
import requestsfrom tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_typefrom requests.exceptions import RequestException
# Don't retry on 4xx client errors, only 5xx server errors@retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10), retry=retry_if_exception_type((requests.exceptions.Timeout, requests.exceptions.ConnectionError)))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()Adding Logging and Observability
I needed to see what was happening during retries:
import requestsimport loggingfrom tenacity import retry, stop_after_attempt, wait_exponential, before_sleep_log
logging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)
@retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10), before_sleep=before_sleep_log(logger, logging.WARNING))def fetch_api_data(): logger.info("Fetching API data...") response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()Now I get log output like:
WARNING: Retrying fetch_api_data in 2.0 seconds as it raised ConnectionError: HTTPSConnectionPool...WARNING: Retrying fetch_api_data in 4.0 seconds as it raised ConnectionError: HTTPSConnectionPool...INFO: Fetching API data...Retry Based on Return Value
Sometimes the API doesn’t throw an exception but returns an error response. Tenacity can retry based on the return value:
import requestsfrom tenacity import retry, stop_after_attempt, wait_exponential, retry_if_result
def is_rate_limited(response): """Check if response indicates rate limiting""" return response.status_code == 429
@retry( stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=2, max=60), retry=retry_if_result(is_rate_limited))def fetch_api_data(): response = requests.get("https://api.example.com/data") return response
response = fetch_api_data()if response.status_code == 200: data = response.json()Combining Multiple Conditions
You can combine multiple retry conditions:
import requestsfrom tenacity import ( retry, stop_after_attempt, wait_exponential, wait_random, retry_if_exception_type, retry_any)
# Combine exception-based and result-based retry logic@retry( stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=2, max=60) + wait_random(0, 1), # Add jitter retry=retry_any( retry_if_exception_type((requests.exceptions.Timeout, requests.exceptions.ConnectionError)), retry_if_result(lambda r: r.status_code == 429) ))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return responseUsing Retry Context in the Function
Sometimes you need to know which attempt you’re on:
import requestsfrom tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))def fetch_api_data(): # Access retry state through the call stack (advanced usage) response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()
# Or use a callback to handle final failuredef on_retry_failure(retry_state): print(f"Retrying... attempt {retry_state.attempt_number}")
@retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10), retry_error_callback=lambda retry_state: None # Return None on final failure)def fetch_with_fallback(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()Common Mistakes to Avoid
1. Retrying on All Exceptions
# WRONG: Will retry even on authentication errors@retry(stop=stop_after_attempt(3))def fetch_api_data(): response = requests.get("https://api.example.com/data", headers={"Authorization": "Bearer invalid"}) response.raise_for_status() # Raises 401 - shouldn't retry! return response.json()# CORRECT: Only retry on transient errors@retry( stop=stop_after_attempt(3), retry=retry_if_exception_type((requests.exceptions.Timeout, requests.exceptions.ConnectionError)))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()2. No Backoff or Fixed Delay
# WRONG: No backoff - hammers the server@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()# CORRECT: Exponential backoff gives the server time to recover@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=60))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()3. Retrying Indefinitely
# WRONG: Will retry forever!@retry()def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()# CORRECT: Always have a stop condition@retry(stop=stop_after_attempt(5))def fetch_api_data(): response = requests.get("https://api.example.com/data") response.raise_for_status() return response.json()Summary
Using Tenacity for retry logic in Python is straightforward:
- Install the library:
pip install tenacity - Add the decorator with composable retry strategies
- Configure stop conditions (attempt count or time limit)
- Configure wait strategies (exponential backoff with jitter)
- Specify retry conditions (exception types or return values)
- Add logging for observability
The key insight from the Reddit discussion: “Tenacity for retry behavior mechanism. It is very helpful for handling transient failures especially for API calls.”
Instead of scattering custom retry decorators across your codebase, use Tenacity for a unified, maintainable approach to handling transient failures.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments