Python Profiling：cProfile + line_profiler 效能分析完全指南

Python 學習 - 本文屬於一個選集。

§ 33: 本文

一、前言
#

嗨，這裡是拍拍君！🐍

昨天我們學了 multiprocessing 來突破 GIL 的限制，讓 CPU 密集任務飛起來。但等等——你確定你知道程式到底慢在哪裡嗎？

拍拍君見過太多人的優化流程是這樣的：

「程式好慢喔」
「我猜是這個 for 迴圈的問題」
花三天重寫那段程式碼
跑完發現……速度完全沒變 😇

這就是「憑感覺優化」的悲劇。Donald Knuth 說過：「過早優化是萬惡之源」——但更精確地說，不量測就優化才是真正的萬惡之源。

今天拍拍君要教你兩個 Python 效能分析利器：

cProfile：Python 內建的函式級 profiler，找出哪個函式最耗時
line_profiler：逐行分析，精確到每一行程式碼的執行時間

再搭配視覺化工具 snakeviz，讓你一眼看出瓶頸在哪。從此告別瞎猜，科學優化！🔬

二、安裝
#

cProfile 是 Python 標準庫，不用裝。其他工具需要安裝：

# line_profiler：逐行分析
pip install line_profiler

# snakeviz：cProfile 結果的互動式視覺化
pip install snakeviz

# 如果你用 uv
uv pip install line_profiler snakeviz

三、準備範例程式：一個「看起來很正常」的腳本
#

為了示範 profiling，拍拍君準備了一個「看起來沒問題但其實有瓶頸」的程式。假設我們要處理一份學生成績資料：

# slow_program.py
import time
import random
import statistics

def generate_students(n: int) -> list[dict]:
    """產生 n 筆學生資料"""
    names = ["拍拍君", "拍拍醬", "chatPTT", "小明", "小華"]
    students = []
    for _ in range(n):
        student = {
            "name": random.choice(names),
            "scores": [random.randint(0, 100) for _ in range(50)],
        }
        students.append(student)
    return students

def calculate_stats(scores: list[int]) -> dict:
    """計算一位學生的統計數據"""
    return {
        "mean": statistics.mean(scores),
        "median": statistics.median(scores),
        "stdev": statistics.stdev(scores),
        "max": max(scores),
        "min": min(scores),
    }

def find_top_students(students: list[dict], threshold: float) -> list[str]:
    """找出平均分數超過門檻的學生"""
    top = []
    for student in students:
        stats = calculate_stats(student["scores"])
        if stats["mean"] > threshold:
            top.append(student["name"])
    return top

def create_report(students: list[dict]) -> str:
    """產生完整報告"""
    lines = []
    for student in students:
        stats = calculate_stats(student["scores"])
        line = (
            f"{student['name']}: "
            f"平均={stats['mean']:.1f}, "
            f"中位數={stats['median']:.1f}, "
            f"標準差={stats['stdev']:.1f}"
        )
        lines.append(line)
    return "\n".join(lines)

def main():
    print("產生學生資料...")
    students = generate_students(5000)

    print("尋找頂尖學生...")
    top = find_top_students(students, 60.0)

    print("產生報告...")
    report = create_report(students)

    print(f"共 {len(top)} 位頂尖學生")
    print(f"報告長度：{len(report)} 字元")

if __name__ == "__main__":
    main()

這段程式跑起來大概要好幾秒——但到底是哪裡慢？🤔

四、cProfile：函式級效能分析
#

4.1 最簡單的用法：命令列
#

不需要改任何程式碼，直接用命令列跑：

python -m cProfile slow_program.py

輸出會是一大堆函式呼叫統計：

         1250012 function calls in 3.456 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    2.890    0.000    3.210    0.000 slow_program.py:19(calculate_stats)
     5000    0.123    0.000    1.567    0.000 slow_program.py:33(find_top_students)
     5000    0.134    0.000    1.634    0.000 slow_program.py:42(create_report)
        1    0.200    0.200    0.200    0.200 slow_program.py:8(generate_students)
   ...

4.2 解讀欄位
#

欄位	意義
`ncalls`	該函式被呼叫的次數
`tottime`	該函式本身花的時間（不含子函式）
`percall`	tottime / ncalls
`cumtime`	該函式累計花的時間（含子函式）

重點看 tottime 和 cumtime。如果 cumtime 很大但 tottime 很小，表示時間花在它呼叫的子函式裡。

4.3 排序：找出最耗時的函式
#

預設按名字排序很難讀，加上 -s 參數排序：

# 按累計時間排序（最常用）
python -m cProfile -s cumulative slow_program.py

# 按函式本身時間排序
python -m cProfile -s tottime slow_program.py

# 按呼叫次數排序
python -m cProfile -s calls slow_program.py

4.4 在程式碼裡使用 cProfile
#

如果只想分析特定區段：

import cProfile
import pstats

# 方法一：用 context manager（Python 3.8+）
with cProfile.Profile() as pr:
    result = find_top_students(students, 60.0)

stats = pstats.Stats(pr)
stats.sort_stats("cumulative")
stats.print_stats(20)  # 只印前 20 行

# 方法二：存檔後再分析
cProfile.run("main()", "output.prof")

# 之後可以用 pstats 載入分析
stats = pstats.Stats("output.prof")
stats.strip_dirs()
stats.sort_stats("cumulative")
stats.print_stats(10)

4.5 用 snakeviz 視覺化
#

文字報表看多了眼花？snakeviz 給你互動式的火焰圖：

# 先存成檔案
python -m cProfile -o output.prof slow_program.py

# 用 snakeviz 開啟（會自動開瀏覽器）
snakeviz output.prof

snakeviz 會顯示一個日暈圖（Sunburst）或冰柱圖（Icicle），讓你一眼看出：

哪個函式佔比最大（面積最大）
呼叫層級關係（從外到內）
點擊可以 zoom in 看細節

從我們的範例中，你會清楚看到 calculate_stats 被呼叫了 10000 次（find_top_students 5000 次 + create_report 5000 次），佔了大部分時間。

💡 拍拍君小提示：cumtime 排序通常最有用——它告訴你「從使用者角度看，時間花在哪裡」。

五、line_profiler：逐行效能分析
#

cProfile 告訴你「哪個函式慢」，但有時候你需要更精確——到底是函式裡的哪一行慢？

5.1 基本用法：加上 `@profile` 裝飾器
#

在你想分析的函式上加 @profile（不需要 import，line_profiler 會自動注入）：

# slow_program_lp.py

@profile
def calculate_stats(scores: list[int]) -> dict:
    """計算一位學生的統計數據"""
    return {
        "mean": statistics.mean(scores),
        "median": statistics.median(scores),
        "stdev": statistics.stdev(scores),
        "max": max(scores),
        "min": min(scores),
    }

然後用 kernprof 執行：

kernprof -l -v slow_program_lp.py

-l：逐行分析模式
-v：直接印出結果（否則只存檔）

5.2 解讀逐行輸出
#

Timer unit: 1e-06 s

Total time: 2.856 s
File: slow_program_lp.py
Function: calculate_stats at line 20

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           @profile
    21                                           def calculate_stats(scores):
    22     10000     895234.0     89.5     31.3       "mean": statistics.mean(scores),
    23     10000     623412.0     62.3     21.8       "median": statistics.median(scores),
    24     10000    1125890.0    112.6     39.4       "stdev": statistics.stdev(scores),
    25     10000      98765.0      9.9      3.5       "max": max(scores),
    26     10000      87654.0      8.8      3.1       "min": min(scores),

一目瞭然！statistics.stdev() 佔了近 40% 的時間，其次是 statistics.mean() 和 statistics.median()。

5.3 欄位說明
#

欄位	意義
`Hits`	該行被執行的次數
`Time`	該行的總執行時間（微秒）
`Per Hit`	每次執行的平均時間
`% Time`	佔該函式總時間的百分比

5.4 程式碼中使用（不用 `@profile`）
#

如果你不想修改原始碼，可以在程式裡動態使用：

from line_profiler import LineProfiler

def analyze_performance():
    lp = LineProfiler()

    # 指定要分析的函式
    lp.add_function(calculate_stats)
    lp.add_function(find_top_students)

    # 包裝要執行的入口
    lp_wrapper = lp(main)
    lp_wrapper()

    # 印出結果
    lp.print_stats()

    # 也可以存檔
    lp.dump_stats("line_profile.lprof")

analyze_performance()

六、實戰：找到瓶頸後怎麼優化？
#

6.1 分析結論
#

從 profiling 結果，我們發現：

calculate_stats 被呼叫了 10000 次（兩個地方各 5000 次）
statistics.stdev() 是最慢的一行
statistics 模組的函式比內建的 max/min 慢很多

6.2 優化策略一：快取重複計算
#

calculate_stats 對同一個學生被呼叫了兩次——一次在 find_top_students，一次在 create_report。直接快取結果：

def main():
    students = generate_students(5000)

    # 一次算好，重複使用
    for student in students:
        student["stats"] = calculate_stats(student["scores"])

    top = [s["name"] for s in students if s["stats"]["mean"] > 60.0]
    report = "\n".join(
        f"{s['name']}: 平均={s['stats']['mean']:.1f}"
        for s in students
    )

光這一步就把呼叫次數從 10000 砍到 5000！🎉

6.3 優化策略二：用更快的實作
#

statistics 模組為了精確度（處理 Decimal、Fraction），速度較慢。如果你的資料是普通數字，可以用 NumPy：

import numpy as np

def calculate_stats_fast(scores: list[int]) -> dict:
    """用 numpy 加速統計計算"""
    arr = np.array(scores)
    return {
        "mean": float(np.mean(arr)),
        "median": float(np.median(arr)),
        "stdev": float(np.std(arr, ddof=1)),
        "max": int(np.max(arr)),
        "min": int(np.min(arr)),
    }

或者如果你不想加 NumPy 依賴，用純 Python 也可以更快：

def calculate_stats_pure(scores: list[int]) -> dict:
    """純 Python 快速版"""
    n = len(scores)
    mean = sum(scores) / n
    sorted_scores = sorted(scores)
    mid = n // 2
    median = (
        sorted_scores[mid]
        if n % 2
        else (sorted_scores[mid - 1] + sorted_scores[mid]) / 2
    )
    variance = sum((x - mean) ** 2 for x in scores) / (n - 1)
    return {
        "mean": mean,
        "median": median,
        "stdev": variance**0.5,
        "max": sorted_scores[-1],
        "min": sorted_scores[0],
    }

6.4 優化前後對比
#

用 cProfile 驗證優化效果：

import cProfile

# 優化前
cProfile.run("main_original()", sort="cumulative")

# 優化後
cProfile.run("main_optimized()", sort="cumulative")

# 優化前：~3.5 秒
# 快取 + statistics：~1.8 秒（-49%）
# 快取 + numpy：~0.3 秒（-91%）
# 快取 + 純 Python：~0.5 秒（-86%）

看到沒？不是猜的，是量出來的。 📊

七、timeit：微觀計時的好搭檔
#

有時候你只想比較兩個小片段的速度，不需要大費周章跑 profiler。Python 內建的 timeit 就很好用：

import timeit

# 比較不同寫法的速度
setup = "data = list(range(1000))"

# 方法一：list comprehension
t1 = timeit.timeit("[x**2 for x in data]", setup=setup, number=10000)

# 方法二：map
t2 = timeit.timeit("list(map(lambda x: x**2, data))", setup=setup, number=10000)

print(f"List comp: {t1:.3f}s")
print(f"Map:       {t2:.3f}s")

命令列也可以直接用：

# 比較字串拼接方式
python -m timeit -s "parts = ['hello'] * 100" "' '.join(parts)"
python -m timeit -s "parts = ['hello'] * 100" "s = ''; [s := s + ' ' + p for p in parts]"

💡 拍拍君小提示：timeit 適合比較微小的程式碼片段（微秒～毫秒級），cProfile 適合分析整個程式的效能瓶頸。兩者搭配使用效果最好。

八、進階：其他好用的 Profiling 工具
#

拍拍君再介紹幾個進階工具，讓你的工具箱更完整：

8.1 memory_profiler：記憶體分析
#

pip install memory_profiler

from memory_profiler import profile

@profile
def memory_hungry():
    """這個函式吃很多記憶體"""
    big_list = [i ** 2 for i in range(1_000_000)]
    big_dict = {i: str(i) for i in range(500_000)}
    del big_list  # 釋放記憶體
    return big_dict

python -m memory_profiler your_script.py

輸出會顯示每一行的記憶體增量，超級好用！

8.2 py-spy：無侵入式 Profiler
#

pip install py-spy

最酷的是，它可以分析正在執行的程式，不需要修改程式碼：

# 直接執行並產生火焰圖
py-spy record -o profile.svg -- python slow_program.py

# 附加到正在執行的程式（需要 PID）
py-spy record -o profile.svg --pid 12345

# 即時 top 模式
py-spy top --pid 12345

8.3 scalene：全方位 Profiler
#

pip install scalene

scalene slow_program.py

scalene 能同時分析 CPU 時間、記憶體使用、GPU 時間，還會區分 Python 程式碼和原生 C 程式碼的時間，輸出非常漂亮。

九、Profiling 最佳實踐
#

拍拍君總結幾個重要心法：

✅ DO
#

先量測，再優化——永遠不要憑感覺
在接近真實的環境下 profiling——用真實大小的資料
多跑幾次取平均——避免隨機波動
從最大瓶頸開始優化——效益最高
優化後再量測一次——確認真的有改善

❌ DON’T
#

不要 profile 太小的資料集——看不出瓶頸
不要同時優化多個地方——無法歸因改善來自哪裡
不要為了微小的效能差異犧牲可讀性
不要在 debug mode 下 profiling——結果不準

📋 Profiling 工作流
#

1. 發現效能問題
2. cProfile 找出慢的函式（巨觀）
3. line_profiler 找出慢的行（微觀）
4. 分析原因（演算法？I/O？重複計算？）
5. 制定優化策略
6. 實作優化
7. 再次 profiling 驗證效果
8. 回到第 2 步，直到滿意

結語
#

今天我們學會了 Python 效能分析的完整工具鏈：

cProfile：內建、零配置，快速找出哪個函式慢
line_profiler：逐行分析，精確定位瓶頸
snakeviz：互動式視覺化，一眼看出問題
timeit：微觀計時，比較小片段的速度
py-spy、scalene：進階工具，更強大的分析能力

記住拍拍君的話：優化的第一步永遠是量測，不是猜測。 先用工具找到真正的瓶頸，再對症下藥，這才是專業工程師的做法。

下次寫完程式覺得慢的時候，別急著重寫——先跑個 cProfile 吧！🔬

Happy profiling！🐍✨

延伸閱讀
#

Python 學習 - 本文屬於一個選集。

§ 10: 管理秘密環境變數 python-dotenv

§ 11: 開發的好習慣 Unit Test

§ 12: Python: 我需要進度條！ tqdm

§ 13: 讓你的終端機華麗變身：Rich 套件教學

§ 14: Python Typing：讓你的程式碼更安全、更好維護

§ 14: Python 資料驗證小幫手：Pydantic

§ 15: Ruff：用 Rust 寫的 Python Linter，快到你會懷疑人生

§ 15: httpx：Python 新世代 HTTP 客戶端完全攻略

§ 15: 超快速 Python 套件管理：uv 完全教學

§ 16: pathlib：優雅處理檔案路徑的現代方式

§ 17: 少寫一半程式碼：dataclasses 讓你的 Python 類別煥然一新

§ 18: 用 Typer 打造專業 CLI 工具：Python 命令列框架教學

§ 19: Python asyncio 非同步程式設計入門：讓你的程式不再傻等

§ 20: 科學計算：數值積分

§ 21: FastAPI：Python 最潮的 Web API 框架

§ 21: MLX 入門教學：在 Apple Silicon 上跑機器學習

§ 21: Streamlit：用 Python 快速打造互動式資料應用

§ 21: Pre-commit Hooks：讓壞 Code 連 Commit 的機會都沒有

§ 21: Polars：比 Pandas 快 10 倍的 DataFrame 新選擇

§ 21: PyTorch 神經網路入門：從零開始建立你的第一個模型

§ 22: Python Logging：別再 print 了，用正經的方式記錄日誌吧

§ 23: Docker for Python：讓你的程式在任何地方都能跑

§ 24: Python subprocess：外部命令執行與管道串接完全指南

§ 24: Python 裝飾器：讓你的函式穿上超能力外套

§ 24: Python itertools：迭代器的瑞士刀

§ 25: Python multiprocessing：突破 GIL 的平行運算完全指南

§ 25: Python collections 模組：讓你的資料結構更強大

§ 30: sqlite3：Python 內建輕量資料庫完全攻略

§ 31: Python 正規表達式完全攻略：re 模組從入門到實戰

§ 32: Python contextlib：掌握 Context Manager 的進階魔法

§ 33: 本文

Python multiprocessing：突破 GIL 的平行運算完全指南

2026年3月12日·9 分鐘· loading · loading

Python Multiprocessing Parallel Concurrency Performance

MLX 入門教學：在 Apple Silicon 上跑機器學習

2026年2月26日·4 分鐘· loading · loading

Python Mlx Apple-Silicon Machine-Learning Deep-Learning

Python subprocess：外部命令執行與管道串接完全指南

2026年3月10日·8 分鐘· loading · loading

Python Subprocess Shell Automation Cli

Python 裝飾器：讓你的函式穿上超能力外套

2026年3月7日·7 分鐘· loading · loading

Python Decorator 裝飾器進階語法設計模式

FastAPI：Python 最潮的 Web API 框架

2026年2月27日·5 分鐘· loading · loading

Python Fastapi Web Api Async

Docker for Python：讓你的程式在任何地方都能跑

2026年2月25日·6 分鐘· loading · loading

Python Docker Container Devops 部署

一、前言 #

二、安裝 #

三、準備範例程式：一個「看起來很正常」的腳本 #

四、cProfile：函式級效能分析 #

4.1 最簡單的用法：命令列 #

4.2 解讀欄位 #

4.3 排序：找出最耗時的函式 #

4.4 在程式碼裡使用 cProfile #

4.5 用 snakeviz 視覺化 #

五、line_profiler：逐行效能分析 #

5.1 基本用法：加上 @profile 裝飾器 #

5.2 解讀逐行輸出 #

5.3 欄位說明 #

5.4 程式碼中使用（不用 @profile） #

六、實戰：找到瓶頸後怎麼優化？ #

6.1 分析結論 #

6.2 優化策略一：快取重複計算 #

6.3 優化策略二：用更快的實作 #

6.4 優化前後對比 #

七、timeit：微觀計時的好搭檔 #

八、進階：其他好用的 Profiling 工具 #

8.1 memory_profiler：記憶體分析 #

8.2 py-spy：無侵入式 Profiler #

8.3 scalene：全方位 Profiler #

九、Profiling 最佳實踐 #

✅ DO #

❌ DON’T #

📋 Profiling 工作流 #

結語 #

延伸閱讀 #

相關文章