Python tenacity 實戰：重試、退避與容錯機制完全攻略

Python 學習 - 本文屬於一個選集。

§ 44: 本文

§ 45: Python Textual 實戰：終端機 TUI 應用開發完全攻略

§ 46: Python watchdog 實戰：檔案變更監控與自動化完全攻略

§ 47: Python hypothesis 實戰：Property-Based Testing 與自動化找 bug 完全攻略

§ 48: Python prompt_toolkit 實戰：打造互動式 CLI、Auto-Completion 與 REPL 完全攻略

§ 49: Python difflib 實戰：文字差異比對、相似度比較與 patch 輸出完全攻略

§ 50: Python tomllib 實戰：內建 TOML 解析、設定檔管理與 pyproject.toml 完全攻略

§ 51: Python uv 進階：workspace、lockfile、script 與專案管理完全攻略

§ 52: Streamlit 進階：session_state、cache 與多頁 Dashboard 完全攻略

§ 53: Python Typer 進階：巢狀 subcommands、callback 與 CLI 架構

§ 54: Streamlit 部署實戰：Secrets、設定檔與雲端上線完整攻略

§ 55: Rich + Typer：打造漂亮又好用的 Python CLI 體驗

§ 56: Python DuckDB 實戰：用 SQL 快速分析 CSV 與 Parquet

§ 57: Python APScheduler 實戰：讓程式定時執行背景工作

§ 58: Python marimo 實戰：可重現的 Reactive Notebook 與資料小工具

§ 59: Python orjson 實戰：高速 JSON 序列化與 API 資料處理

§ 60: Python zoneinfo 實戰：時區、DST 與排程時間處理完全攻略

§ 61: Python tempfile 實戰：安全建立暫存檔案、目錄與測試資料

§ 62: Python secrets 實戰：安全產生 Token、密碼與一次性連結

§ 63: Python Plotly 實戰：互動式資料視覺化與 Dashboard 圖表

§ 64: Python pydantic-settings 實戰：型別安全管理 .env 與設定檔

§ 65: FastAPI + Streamlit 實戰：API 後端與互動前端分工

§ 66: Python SQLAlchemy 2.0 實戰：Typed ORM、Session 與查詢模式

§ 67: Python pytest fixtures 進階：conftest、factory 與測試資料管理

§ 68: Python Alembic 實戰：資料庫 Migration、版本控管與團隊協作

§ 69: Python uv scripts 實戰：PEP 723、inline dependencies 與單檔工具

§ 70: Streamlit + DuckDB 實戰：本地資料查詢 Dashboard

一. 前言：不是每次失敗都真的失敗
#

你一定碰過這種情況：

打第三方 API，偶爾回 503 Service Unavailable
打資料庫時剛好連線瞬斷
背景 job 搶資源，第一下 timeout，第二下又好了
爬蟲或 webhook 流程遇到短暫網路抖動

這種錯誤很煩，因為它們常常 不是邏輯錯，而是環境剛好不穩。

如果每次失敗都直接 raise，你的程式會顯得很脆弱。反過來說，如果你自己手寫一堆 for i in range(3): try: ... except: time.sleep(...)，又會很快變成一團難維護的 spaghetti。

這時候就很適合請出 tenacity。

tenacity 是 Python 世界裡很受歡迎的 retry library，專門處理：

重試幾次
每次等多久
要不要用指數退避（exponential backoff）
哪些例外要重試，哪些不要
async 函式要怎麼重試
失敗前要不要記 log

今天拍拍君就帶你從最簡單的 decorator 開始，一路走到 API client 與 async 實戰。看完之後，你會比單純的 sleep(1) 成熟很多。真的。

二. 安裝
#

pip install tenacity

# 或用 uv（推薦）
uv add tenacity

安裝完之後，最常用的匯入大概是這些：

from tenacity import (
    retry,
    stop_after_attempt,
    stop_after_delay,
    wait_fixed,
    wait_exponential,
    wait_random_exponential,
    retry_if_exception_type,
    retry_if_result,
)

如果你之前看過標準庫自己寫 retry，會發現 tenacity 把「停止條件」、「等待策略」、「重試條件」切得很乾淨，組合起來非常舒服。

三. 最基本的 `@retry`：先讓函式變耐打
#

最簡單的寫法只有一個 decorator：

from tenacity import retry
import random

@retry
def call_unstable_service():
    if random.random() < 0.7:
        raise RuntimeError("服務暫時不穩")
    return "成功拿到資料"

print(call_unstable_service())

意思很直白：

函式失敗時，tenacity 會幫你重試
如果後面某次成功，就回傳成功結果
如果一路失敗到上限，就把最後的錯誤拋出去

但是，預設行為通常不夠明確。實務上你幾乎一定會想指定：

最多重試幾次
每次重試前等多久

所以更常見的是下面這種寫法。

3.1 指定次數與固定等待
#

from tenacity import retry, stop_after_attempt, wait_fixed

@retry(stop=stop_after_attempt(3), wait=wait_fixed(2))
def fetch_profile(user_id: str) -> dict:
    print(f"正在抓 {user_id} 的資料...")
    raise ConnectionError("API 暫時無法連線")

這段的意思是：

最多試 3 次
每次失敗後固定等 2 秒

等價的人肉版大概像這樣：

for attempt in range(3):
    try:
        return fetch_profile_once(user_id)
    except ConnectionError:
        if attempt == 2:
            raise
        time.sleep(2)

但 tenacity 的好處是，之後你要改策略時不用整段重寫。

3.2 失敗後丟出的例外
#

預設情況下，tenacity 最後會拋出 RetryError，裡面包著原始例外。很多人第一次用會有點疑惑。

如果你想在耗盡重試次數後，直接重新丟出最後一次的原始錯誤，可以加上 reraise=True：

from tenacity import retry, stop_after_attempt, wait_fixed

@retry(
    stop=stop_after_attempt(3),
    wait=wait_fixed(1),
    reraise=True,
)
def fetch_orders():
    raise TimeoutError("訂單服務 timeout")

拍拍君很推薦加 reraise=True，因為 traceback 通常比較直觀。

四. 停止條件與等待策略：不要只會硬等
#

retry 不只是「再試一次」，而是「用合理策略再試一次」。

4.1 `stop_after_attempt`，按次數停止
#

這是最常見的停止條件：

from tenacity import retry, stop_after_attempt

@retry(stop=stop_after_attempt(5), reraise=True)
def sync_inventory():
    ...

適合這種場景：

API 偶發失敗，但通常幾次內會恢復
使用者請求不能等太久
你希望成本可控

4.2 `stop_after_delay`，按總時間停止
#

有時你不在意重試幾次，而是更在意 最多拖多久：

from tenacity import retry, stop_after_delay, wait_fixed

@retry(
    stop=stop_after_delay(10),
    wait=wait_fixed(2),
    reraise=True,
)
def wait_for_lock():
    raise RuntimeError("資源鎖尚未釋放")

這表示：

最多重試 10 秒
每次間隔 2 秒

很適合等待某個暫時性資源，例如：

檔案鎖
任務佇列空位
短暫 unavailable 的內部服務

4.3 `wait_fixed`，簡單但不一定最好
#

固定等待很好懂：

wait=wait_fixed(3)

但如果很多 client 同時失敗，大家都每 3 秒一起重打，伺服器可能更容易被壓垮。

這也是為什麼大家常講 backoff。

4.4 `wait_exponential`，指數退避
#

指數退避會讓等待時間慢慢拉長：

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=10),
    reraise=True,
)
def call_payment_gateway():
    raise ConnectionError("gateway busy")

等待時間大致會像這樣：

第 1 次失敗後等 1 秒
第 2 次失敗後等 2 秒
第 3 次失敗後等 4 秒
第 4 次失敗後等 8 秒
之後最多等到 10 秒上限

這種策略很適合：

第三方 API
webhook callback
雲端服務暫時過載

因為你不是一直猛敲，而是給對方一點恢復空間。

4.5 `wait_random_exponential`，加上 jitter 更實用
#

如果是多個 worker 同時跑，拍拍君會更推薦加一點隨機抖動（jitter）：

from tenacity import retry, stop_after_attempt, wait_random_exponential

@retry(
    stop=stop_after_attempt(6),
    wait=wait_random_exponential(multiplier=1, max=30),
    reraise=True,
)
def send_webhook():
    raise TimeoutError("webhook timeout")

好處是：

不同 worker 不會同時在整秒點一起重打
能降低 thundering herd 問題
在分散式系統裡通常比固定等待更溫柔

如果你今天只記住一件事，那大概就是：

打外部服務時，固定 sleep 往往只是能用，指數退避加 jitter 才比較像正式環境的寫法。

五. 精準控制「什麼情況要重試」
#

不是所有錯誤都該重試。

舉例來說：

TimeoutError，可以重試
ConnectionError，常常可以重試
HTTP 500，通常可以重試
HTTP 400，多半是你的 request 壞了，重試也沒用
ValueError，如果是你自己的資料格式錯，重試只是浪費時間

5.1 只對特定例外重試
#

from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type

@retry(
    stop=stop_after_attempt(4),
    wait=wait_fixed(1),
    retry=retry_if_exception_type((TimeoutError, ConnectionError)),
    reraise=True,
)
def fetch_user_feed():
    ...

這樣只有 TimeoutError 和 ConnectionError 會觸發 retry。

如果函式丟出 ValueError，就會直接炸出去，不浪費時間。

5.2 依照 HTTP status 決定
#

實務上很常見的模式是，把「該不該重試」包在自己的例外類別裡：

import requests
from tenacity import retry, stop_after_attempt, wait_random_exponential
from tenacity import retry_if_exception_type

class RetryableAPIError(Exception):
    pass

class FatalAPIError(Exception):
    pass

@retry(
    stop=stop_after_attempt(5),
    wait=wait_random_exponential(multiplier=1, max=20),
    retry=retry_if_exception_type(RetryableAPIError),
    reraise=True,
)
def get_invoice(invoice_id: str) -> dict:
    response = requests.get(f"https://api.example.com/invoices/{invoice_id}", timeout=5)

    if response.status_code in {500, 502, 503, 504}:
        raise RetryableAPIError(f"server error: {response.status_code}")

    if response.status_code >= 400:
        raise FatalAPIError(f"bad request: {response.status_code}")

    return response.json()

這個 pattern 很乾淨，因為：

retry policy 集中在 decorator
商業邏輯集中在函式本體
呼叫端讀起來很好懂

5.3 不是例外也能重試，`retry_if_result`
#

有些函式不會丟例外，而是回傳一個代表「還沒準備好」的結果。

像是輪詢 job 狀態時很常見：

from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_result


def is_not_ready(result: dict) -> bool:
    return result["status"] != "done"


@retry(
    stop=stop_after_attempt(10),
    wait=wait_fixed(2),
    retry=retry_if_result(is_not_ready),
)
def poll_report(job_id: str) -> dict:
    # 假裝這裡去查遠端 job 狀態
    return {"job_id": job_id, "status": "processing"}

當回傳結果符合 is_not_ready() 時，tenacity 就會繼續重試。

這在以下情境很實用：

等待背景 job 完成
等待檔案轉碼完成
等待某個 asynchronous workflow 收斂

拍拍君很喜歡這個功能，因為它讓 retry 不只適用於 exception，也適用於「狀態還沒 ready」。

六. 實戰模式：把 tenacity 用在 API client
#

光會 decorator 還不夠，實務上你通常會想把 retry policy 包進 client。

下面是一個簡化版範例：

from dataclasses import dataclass
import requests
from tenacity import retry, stop_after_attempt, wait_random_exponential
from tenacity import retry_if_exception_type


class RetryableAPIError(Exception):
    pass


class FatalAPIError(Exception):
    pass


@dataclass
class ChatPTTClient:
    base_url: str
    api_key: str

    @retry(
        stop=stop_after_attempt(5),
        wait=wait_random_exponential(multiplier=1, max=16),
        retry=retry_if_exception_type((requests.Timeout, RetryableAPIError)),
        reraise=True,
    )
    def summarize(self, text: str) -> dict:
        response = requests.post(
            f"{self.base_url}/summaries",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"text": text},
            timeout=10,
        )

        if response.status_code in {429, 500, 502, 503, 504}:
            raise RetryableAPIError(f"temporary error: {response.status_code}")

        if response.status_code >= 400:
            raise FatalAPIError(response.text)

        return response.json()

這樣你的呼叫端就會很乾淨：

client = ChatPTTClient(base_url="https://api.example.com", api_key="secret")
result = client.summarize("拍拍醬今天想要整理會議記錄")
print(result)

呼叫端不用知道 retry 細節，只要知道這個 client 本身夠耐打。

6.1 429 要不要重試？
#

多數情況下，429 是很值得重試的，但前提是要搭配合理等待。

如果你拿到 rate limit，建議：

用 exponential backoff
最好讀 Retry-After header
避免無限重試

tenacity 可以做到很進階的客製化，不過大多數服務先用這種策略就夠好了：

@retry(
    stop=stop_after_attempt(6),
    wait=wait_random_exponential(multiplier=2, max=60),
    reraise=True,
)
def call_rate_limited_api():
    ...

如果你的上游很嚴格，記得 retry 只是保護機制，不是免死金牌。該限流還是要限流。

6.2 配合 `requests` timeout 一起用
#

很多人會重試，卻忘了設 timeout，結果單次 request 卡超久，整體體驗更糟。

拍拍君的原則很簡單：

每次 request 都設 timeout
retry 次數有限
backoff 不要太暴力

例如：

response = requests.get(url, timeout=5)

這通常比「完全不設 timeout，然後 retry 三次」健康很多。

七. Async 也能用：`async def` 一樣吃得到 retry
#

如果你在寫 asyncio 應用，tenacity 也能直接用。

import httpx
from tenacity import retry, stop_after_attempt, wait_random_exponential
from tenacity import retry_if_exception_type


@retry(
    stop=stop_after_attempt(5),
    wait=wait_random_exponential(multiplier=1, max=20),
    retry=retry_if_exception_type((httpx.ReadTimeout, httpx.ConnectError)),
    reraise=True,
)
async def fetch_embeddings(text: str) -> dict:
    async with httpx.AsyncClient(timeout=10) as client:
        response = await client.post(
            "https://api.example.com/embeddings",
            json={"text": text},
        )
        response.raise_for_status()
        return response.json()

看起來幾乎跟同步版本一樣，這點真的很舒服。

如果你剛好也在用 httpx，可以順手回顧一下拍拍之前寫過的 Python HTTPX 教學。httpx + tenacity 算是很好搭的一組。

7.1 async 場景特別要注意什麼？
#

拍拍君提醒三件事：

不要把 retry 當成吞錯工具
- 最後還是要讓真正的錯誤浮出來
注意總延遲
- async 不代表可以無腦一直 retry
小心重複副作用
- 如果你的 API 是「扣款」、「建立訂單」、「寄信」，重試前要先確認是否具備 idempotency

第三點非常重要。

如果請求不是 idempotent，retry 可能會讓你：

扣款兩次
建立重複資源
重複發送通知

所以 retry 最適合：

GET 類讀取操作
可安全重送的 POST
有 idempotency key 的 API

八. 加入 log 與觀測性，才知道到底重試了幾次
#

程式有 retry 不代表你就看得到 retry。

正式環境中，拍拍君很建議至少要讓重試事件能被觀測到，不然只會變成：「系統偶爾慢一下，但誰也不知道它剛才重試五次。」

tenacity 提供一些 hook，最常見的是 before_sleep_log：

import logging
from tenacity import retry, stop_after_attempt, wait_fixed, before_sleep_log

logger = logging.getLogger(__name__)

@retry(
    stop=stop_after_attempt(4),
    wait=wait_fixed(2),
    before_sleep=before_sleep_log(logger, logging.WARNING),
    reraise=True,
)
def sync_metrics():
    raise TimeoutError("metrics service timeout")

這樣每次重試前都會記一筆 warning log。

如果你昨天剛學完 Python argparse，其實也可以把 retry 參數做成 CLI option，讓工具在不同環境下可調整，例如：

本地測試只試 2 次
CI 環境試 5 次
production 用 exponential backoff

這樣彈性就很高。

8.1 自訂 retry 狀態輸出
#

你也可以讀取 RetryCallState 做更細的控制，不過大多數專案其實不用一開始就玩太深。

下面這個例子只是示意：

from tenacity import retry, stop_after_attempt, wait_fixed


def before_sleep(retry_state):
    print(
        f"第 {retry_state.attempt_number} 次失敗，"
        f"下一次重試前等待 {retry_state.next_action.sleep} 秒"
    )


@retry(
    stop=stop_after_attempt(3),
    wait=wait_fixed(1),
    before_sleep=before_sleep,
    reraise=True,
)
def fragile_task():
    raise RuntimeError("又壞掉了")