Python itertools：迭代器的瑞士刀

Python 學習 - 本文屬於一個選集。

§ 24: 本文

一、前言
#

嗨，這裡是拍拍君！🔄

你有沒有遇過這種情況：想把兩個 list 串在一起迭代、想產生所有排列組合、想把連續的資料分組處理——然後就開始寫一堆巢狀 for 迴圈和暫存變數？

Python 的 itertools 模組就是來拯救你的。它是標準函式庫裡的迭代器瑞士刀，提供了一整套高效、記憶體友善的工具函式，讓你用一行程式碼解決原本要十行才能搞定的迭代問題。

今天拍拍君就帶你把 itertools 裡最實用的工具通通走一遍！

二、為什麼要用 itertools？
#

先來看一個簡單的例子。假設你有三個 list，想把它們串在一起處理：

# 不用 itertools 的寫法
list_a = [1, 2, 3]
list_b = [4, 5, 6]
list_c = [7, 8, 9]

combined = list_a + list_b + list_c  # 建立新的 list，浪費記憶體
for item in combined:
    print(item)

# 用 itertools 的寫法
from itertools import chain

list_a = [1, 2, 3]
list_b = [4, 5, 6]
list_c = [7, 8, 9]

for item in chain(list_a, list_b, list_c):  # 惰性迭代，不建立新 list
    print(item)

差在哪？chain 不會把三個 list 合成一個新的 list，而是惰性地依序迭代每個 iterable。當你的資料量很大時，這個差異就很明顯了。

itertools 的核心哲學就是：

✅ 惰性求值（lazy evaluation）——用多少算多少
✅ 記憶體友善——不會一次把所有結果都載入記憶體
✅ 可組合——工具之間可以像積木一樣疊起來用

三、無限迭代器
#

itertools 提供三個可以產生無限序列的迭代器。使用時要搭配 break 或 islice，否則會跑到天荒地老。

3.1 count — 無限計數器
#

from itertools import count

# 從 10 開始，每次加 2
for i in count(10, 2):
    if i > 20:
        break
    print(i)
# 10, 12, 14, 16, 18, 20

超適合拿來當自動遞增的 ID 生成器：

from itertools import count

id_generator = count(1)

users = ["Alice", "Bob", "Charlie"]
user_records = [{"id": next(id_generator), "name": name} for name in users]
print(user_records)
# [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 3, 'name': 'Charlie'}]

3.2 cycle — 無限循環
#

from itertools import cycle, islice

colors = cycle(["red", "green", "blue"])

# 取前 7 個
print(list(islice(colors, 7)))
# ['red', 'green', 'blue', 'red', 'green', 'blue', 'red']

實用場景——交替分配任務：

from itertools import cycle

workers = cycle(["Alice", "Bob", "Charlie"])
tasks = ["task_1", "task_2", "task_3", "task_4", "task_5"]

assignments = {task: next(workers) for task in tasks}
print(assignments)
# {'task_1': 'Alice', 'task_2': 'Bob', 'task_3': 'Charlie',
#  'task_4': 'Alice', 'task_5': 'Bob'}

3.3 repeat — 重複產生同一個值
#

from itertools import repeat

# 無限重複
squares = map(pow, range(10), repeat(2))
print(list(squares))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

這裡 repeat(2) 會不斷產生 2，搭配 map 和 pow 就能算出平方數。比寫 [x**2 for x in range(10)] 多了一點 functional 的味道。

四、終止迭代器
#

這一類的工具會消耗一個或多個 iterable，然後在某個時間點停下來。

4.1 chain — 串接多個 iterable
#

前面已經看過了，這裡再補一個常用技巧：

from itertools import chain

# 展平巢狀 list（只展一層）
nested = [[1, 2], [3, 4], [5, 6]]
flat = list(chain.from_iterable(nested))
print(flat)
# [1, 2, 3, 4, 5, 6]

chain.from_iterable 接受一個「iterable 的 iterable」，超適合展平巢狀結構。

4.2 islice — 切片迭代器
#

就像 list 的 [start:stop:step]，但可以用在任何 iterable 上：

from itertools import islice

# 取檔案前 5 行（不用把整個檔案讀進記憶體）
with open("huge_file.txt") as f:
    for line in islice(f, 5):
        print(line, end="")

from itertools import islice, count

# 從無限序列中取一段
print(list(islice(count(100), 5, 10)))
# [105, 106, 107, 108, 109]

4.3 compress — 用 selector 過濾
#

from itertools import compress

data = ["A", "B", "C", "D", "E"]
selectors = [True, False, True, False, True]

print(list(compress(data, selectors)))
# ['A', 'C', 'E']

比 list comprehension 搭配 zip 簡潔多了：

# 等效寫法，但比較囉嗦
result = [d for d, s in zip(data, selectors) if s]

4.4 dropwhile 和 takewhile — 條件過濾
#

from itertools import dropwhile, takewhile

data = [1, 3, 5, 2, 4, 6, 1]

# 丟掉開頭小於 4 的元素
print(list(dropwhile(lambda x: x < 4, data)))
# [5, 2, 4, 6, 1]

# 取開頭小於 4 的元素
print(list(takewhile(lambda x: x < 4, data)))
# [1, 3]

注意！它們只看「開頭」的連續符合條件的元素，不是過濾整個 list。

4.5 accumulate — 累積運算
#

from itertools import accumulate
import operator

# 預設是累加
data = [1, 2, 3, 4, 5]
print(list(accumulate(data)))
# [1, 3, 6, 10, 15]

# 累乘
print(list(accumulate(data, operator.mul)))
# [1, 2, 6, 24, 120]

# 取累積最大值
data2 = [3, 1, 4, 1, 5, 9, 2, 6]
print(list(accumulate(data2, max)))
# [3, 3, 4, 4, 5, 9, 9, 9]

超適合拿來做 running total 或 cumulative max/min！

4.6 starmap — 展開參數的 map
#

from itertools import starmap

pairs = [(2, 3), (4, 5), (6, 7)]

# 每對都做乘法
print(list(starmap(pow, pairs)))
# [8, 1024, 279936]

# 等效於
print([pow(a, b) for a, b in pairs])

當你的資料本來就是 tuple 形式，starmap 比自己拆包更乾淨。

五、排列組合迭代器
#

這是 itertools 最華麗的部分——排列、組合、笛卡兒積，一行搞定。

5.1 product — 笛卡兒積
#

from itertools import product

# 兩個 list 的所有組合
colors = ["red", "blue"]
sizes = ["S", "M", "L"]

for combo in product(colors, sizes):
    print(combo)
# ('red', 'S'), ('red', 'M'), ('red', 'L'),
# ('blue', 'S'), ('blue', 'M'), ('blue', 'L')

取代巢狀迴圈：

# 用 product 取代三層巢狀 for
from itertools import product

for x, y, z in product(range(3), range(3), range(3)):
    if x + y + z == 3:
        print(f"({x}, {y}, {z})")

repeat 參數可以做自己跟自己的笛卡兒積：

from itertools import product

# 等同於 product([0, 1], [0, 1], [0, 1])
print(list(product([0, 1], repeat=3)))
# [(0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1)]

5.2 permutations — 排列
#

from itertools import permutations

# 所有排列
print(list(permutations([1, 2, 3])))
# [(1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1)]

# 指定長度
print(list(permutations([1, 2, 3], 2)))
# [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]

5.3 combinations 和 combinations_with_replacement
#

from itertools import combinations, combinations_with_replacement

# 組合（不重複、不考慮順序）
print(list(combinations([1, 2, 3, 4], 2)))
# [(1,2), (1,3), (1,4), (2,3), (2,4), (3,4)]

# 可重複組合
print(list(combinations_with_replacement([1, 2, 3], 2)))
# [(1,1), (1,2), (1,3), (2,2), (2,3), (3,3)]

六、groupby — 分組利器
#

groupby 是 itertools 裡最容易用錯的函式，因為它有一個前提：資料必須先排序。

from itertools import groupby

data = [
    {"name": "Alice", "dept": "Engineering"},
    {"name": "Bob", "dept": "Engineering"},
    {"name": "Charlie", "dept": "Marketing"},
    {"name": "Diana", "dept": "Marketing"},
    {"name": "Eve", "dept": "Engineering"},
]

# ❌ 錯誤！Eve 會被分到另一個 Engineering 群組
for key, group in groupby(data, key=lambda x: x["dept"]):
    print(f"{key}: {[p['name'] for p in group]}")
# Engineering: ['Alice', 'Bob']
# Marketing: ['Charlie', 'Diana']
# Engineering: ['Eve']          ← 被拆開了！

# ✅ 正確！先排序再 groupby
sorted_data = sorted(data, key=lambda x: x["dept"])
for key, group in groupby(sorted_data, key=lambda x: x["dept"]):
    print(f"{key}: {[p['name'] for p in group]}")
# Engineering: ['Alice', 'Bob', 'Eve']
# Marketing: ['Charlie', 'Diana']

拍拍君提醒：如果你不需要惰性迭代，用 pandas.DataFrame.groupby 或 defaultdict(list) 可能更直覺。itertools.groupby 最適合用在已排序的串流資料上。

七、實戰範例
#

7.1 產生密碼候選
#

from itertools import product
import string

# 所有 4 位數字密碼
pins = product(string.digits, repeat=4)
# 總共 10^4 = 10000 種，但不會一次全部載入記憶體

# 取前 5 個看看
from itertools import islice
print(["".join(p) for p in islice(pins, 5)])
# ['0000', '0001', '0002', '0003', '0004']

7.2 滑動視窗（sliding window）
#

Python 3.12+ 有 itertools.batched，但滑動視窗要自己組合：

from itertools import islice
from collections import deque

def sliding_window(iterable, n):
    """產生長度為 n 的滑動視窗"""
    iterator = iter(iterable)
    window = deque(islice(iterator, n), maxlen=n)
    if len(window) == n:
        yield tuple(window)
    for item in iterator:
        window.append(item)
        yield tuple(window)

data = [1, 2, 3, 4, 5, 6]
print(list(sliding_window(data, 3)))
# [(1,2,3), (2,3,4), (3,4,5), (4,5,6)]

💡 Python 3.12+ 可以直接用 itertools.pairwise 做長度為 2 的滑動視窗。

7.3 展平多層巢狀結構
#

from itertools import chain

def flatten(nested):
    """遞迴展平任意深度的巢狀 list"""
    for item in nested:
        if isinstance(item, (list, tuple)):
            yield from flatten(item)
        else:
            yield item

data = [1, [2, [3, 4]], [5, 6], [[7], 8]]
print(list(flatten(data)))
# [1, 2, 3, 4, 5, 6, 7, 8]

7.4 用 product 做 grid search
#

from itertools import product

learning_rates = [0.001, 0.01, 0.1]
batch_sizes = [16, 32, 64]
optimizers = ["adam", "sgd"]

for lr, bs, opt in product(learning_rates, batch_sizes, optimizers):
    print(f"lr={lr}, batch_size={bs}, optimizer={opt}")
    # train_model(lr=lr, batch_size=bs, optimizer=opt)

比三層巢狀 for 清爽多了！

八、Python 3.12+ 新增工具
#

8.1 batched — 分批處理
#

from itertools import batched  # Python 3.12+

data = range(10)
for batch in batched(data, 3):
    print(batch)
# (0, 1, 2)
# (3, 4, 5)
# (6, 7, 8)
# (9,)

超適合拿來做 batch API 呼叫或批次寫入資料庫！

8.2 pairwise — 相鄰配對
#

from itertools import pairwise  # Python 3.10+

data = [1, 4, 2, 8, 5]
diffs = [b - a for a, b in pairwise(data)]
print(diffs)
# [3, -2, 6, -3]

九、效能比較
#

來量化一下 itertools 到底快多少：

import timeit
from itertools import chain

list_a = list(range(100_000))
list_b = list(range(100_000))

# 方法 1：串接成新 list
def concat_lists():
    return list(list_a + list_b)

# 方法 2：用 chain
def chain_lists():
    return list(chain(list_a, list_b))

print(f"concat: {timeit.timeit(concat_lists, number=100):.3f}s")
print(f"chain:  {timeit.timeit(chain_lists, number=100):.3f}s")
# 典型結果：
# concat: 0.450s
# chain:  0.350s   ← 快了約 22%

差距在資料量越大時越明顯，尤其是當你只需要迭代而不需要完整 list 時，chain 的記憶體優勢更為顯著。

十、常用速查表
#

函式	用途	範例
`chain(a, b)`	串接 iterable	`chain([1,2], [3,4])` → `1,2,3,4`
`islice(it, n)`	取前 n 個	`islice(count(), 3)` → `0,1,2`
`count(n)`	從 n 開始計數	`count(10)` → `10,11,12,...`
`cycle(it)`	無限循環	`cycle("AB")` → `A,B,A,B,...`
`repeat(x, n)`	重複 n 次	`repeat(0, 3)` → `0,0,0`
`product(a, b)`	笛卡兒積	`product("AB","12")` → `A1,A2,B1,B2`
`permutations(it, r)`	排列	`permutations("ABC",2)` → `AB,AC,BA,...`
`combinations(it, r)`	組合	`combinations("ABC",2)` → `AB,AC,BC`
`groupby(it, key)`	分組	先排序再分組
`accumulate(it)`	累積	`accumulate([1,2,3])` → `1,3,6`
`compress(d, s)`	選擇器過濾	`compress("ABCD",[1,0,1,0])` → `A,C`
`starmap(f, it)`	展開參數 map	`starmap(pow,[(2,3)])` → `8`
`batched(it, n)`	分批（3.12+）	`batched(range(5),2)` → `(0,1),(2,3),(4,)`