Python collections 模組：讓你的資料結構更強大

Python 學習 - 本文屬於一個選集。

§ 25: 本文

一. 前言
#

嗨，大家好！我是拍拍君 🎉

Python 內建的 list、dict、set、tuple 已經非常好用了，但你有沒有遇過這些情境：

想統計一段文字中每個字母出現幾次？自己寫迴圈 + dict 好麻煩 😫
想讓 dict 在存取不存在的 key 時自動給預設值，不要一直 KeyError？
需要一個兩端都能快速插入刪除的佇列？
想讓 tuple 的欄位有名字，不要再 data[0]、data[1] 猜半天？

這些通通可以靠 Python 標準庫的 collections 模組解決！

今天拍拍君帶你認識 collections 中最實用的五大工具，學完之後你會發現：「原來 Python 早就幫我準備好了！」

二. Counter：計數神器
#

基本用法
#

Counter 是一個特殊的 dict，專門用來計數：

from collections import Counter

# 統計字母出現次數
text = "abracadabra"
counter = Counter(text)
print(counter)
# Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

也可以傳入 list：

votes = ["apple", "banana", "apple", "cherry", "banana", "apple"]
result = Counter(votes)
print(result)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1})

most_common()：找出最常見的元素
#

# 前 2 名
print(result.most_common(2))
# [('apple', 3), ('banana', 2)]

這在做文字分析、日誌統計時超好用！

Counter 的運算
#

Counter 支援加減運算：

c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)

print(c1 + c2)  # Counter({'a': 4, 'b': 3})
print(c1 - c2)  # Counter({'a': 2})  ← 負數會被自動移除

實戰：統計單詞頻率
#

from collections import Counter

article = """
Python is great. Python is fun.
Python is the best programming language.
I love Python and I love coding.
"""

words = article.lower().split()
word_freq = Counter(words)

print("前 5 個最常出現的單詞：")
for word, count in word_freq.most_common(5):
    print(f"  {word}: {count} 次")
# python: 4 次
# is: 3 次
# i: 2 次
# love: 2 次
# great.: 1 次

三. defaultdict：不再 KeyError
#

痛點
#

普通的 dict 存取不存在的 key 會 KeyError：

d = {}
d["fruit"].append("apple")  # ❌ KeyError: 'fruit'

你得先手動初始化：

d = {}
if "fruit" not in d:
    d["fruit"] = []
d["fruit"].append("apple")

或用 setdefault()：

d = {}
d.setdefault("fruit", []).append("apple")

但這些寫法都不夠優雅。

defaultdict 來救場
#

from collections import defaultdict

d = defaultdict(list)  # 預設值是空 list
d["fruit"].append("apple")
d["fruit"].append("banana")
d["veggie"].append("carrot")

print(dict(d))
# {'fruit': ['apple', 'banana'], 'veggie': ['carrot']}

常見的 default_factory
#

# 預設值為 0（計數器）
counter = defaultdict(int)
for char in "hello":
    counter[char] += 1
print(dict(counter))
# {'h': 1, 'e': 1, 'l': 2, 'o': 1}

# 預設值為空 set
groups = defaultdict(set)
groups["team_a"].add("Alice")
groups["team_a"].add("Bob")
groups["team_b"].add("Charlie")
print(dict(groups))
# {'team_a': {'Alice', 'Bob'}, 'team_b': {'Charlie'}}

實戰：將資料分組
#

from collections import defaultdict

students = [
    ("Alice", "Math"),
    ("Bob", "Science"),
    ("Charlie", "Math"),
    ("Diana", "Science"),
    ("Eve", "Art"),
]

by_subject = defaultdict(list)
for name, subject in students:
    by_subject[subject].append(name)

for subject, names in by_subject.items():
    print(f"{subject}: {', '.join(names)}")
# Math: Alice, Charlie
# Science: Bob, Diana
# Art: Eve

四. deque：高效雙端佇列
#

為什麼不用 list？
#

Python 的 list 在尾端操作（append、pop）很快，是 O(1)。但在開頭操作就很慢：

lst = [1, 2, 3, 4, 5]
lst.insert(0, 0)  # O(n) — 全部元素都要往後移！
lst.pop(0)         # O(n) — 全部元素都要往前移！

deque 來了
#

deque（讀作 “deck”）兩端操作都是 O(1)：

from collections import deque

dq = deque([1, 2, 3])

# 右端操作
dq.append(4)        # [1, 2, 3, 4]
dq.pop()             # [1, 2, 3]

# 左端操作 — 超快！
dq.appendleft(0)     # [0, 1, 2, 3]
dq.popleft()         # [1, 2, 3]

maxlen：固定長度的滑動窗口
#

# 只保留最近 3 筆資料
recent = deque(maxlen=3)
for i in range(5):
    recent.append(i)
    print(list(recent))
# [0]
# [0, 1]
# [0, 1, 2]
# [1, 2, 3]    ← 自動丟掉最舊的
# [2, 3, 4]

這在處理日誌滑動窗口、最近 N 筆紀錄時非常實用。

rotate：旋轉
#

dq = deque([1, 2, 3, 4, 5])

dq.rotate(2)   # 右旋 2 步
print(list(dq))  # [4, 5, 1, 2, 3]

dq.rotate(-2)  # 左旋 2 步
print(list(dq))  # [1, 2, 3, 4, 5]

實戰：實作 BFS（廣度優先搜尋）
#

from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    order = []

    while queue:
        node = queue.popleft()  # O(1)！
        if node not in visited:
            visited.add(node)
            order.append(node)
            queue.extend(graph.get(node, []))

    return order

graph = {
    "A": ["B", "C"],
    "B": ["D", "E"],
    "C": ["F"],
    "D": [], "E": [], "F": [],
}

print(bfs(graph, "A"))
# ['A', 'B', 'C', 'D', 'E', 'F']

五. namedtuple：有名字的 tuple
#

痛點
#

普通 tuple 用索引存取，可讀性差：

point = (3, 4)
print(point[0])  # x？y？誰知道 🤷

namedtuple 讓欄位有名字
#

from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])

p = Point(3, 4)
print(p.x)       # 3
print(p.y)       # 4
print(p[0])      # 3（還是可以用索引）

比 dict 更輕量
#

namedtuple 的記憶體用量跟普通 tuple 一樣小，比 dict 小很多：

import sys

Point = namedtuple("Point", ["x", "y"])
p_nt = Point(3, 4)
p_dict = {"x": 3, "y": 4}
p_tuple = (3, 4)

print(sys.getsizeof(p_nt))    # 64
print(sys.getsizeof(p_dict))  # 184
print(sys.getsizeof(p_tuple)) # 56

_replace()：建立修改後的副本
#

namedtuple 是不可變的（跟 tuple 一樣），但可以用 _replace() 建立新的：

p1 = Point(3, 4)
p2 = p1._replace(y=10)
print(p2)  # Point(x=3, y=10)
print(p1)  # Point(x=3, y=4)  ← 原本不變

_asdict()：轉成 dict
#

print(p1._asdict())
# {'x': 3, 'y': 4}

實戰：處理 CSV 資料
#

from collections import namedtuple
import csv

# 假設 CSV 有 name, age, city 欄位
Employee = namedtuple("Employee", ["name", "age", "city"])

data = [
    "Alice,30,Taipei",
    "Bob,25,Tokyo",
    "Charlie,35,Seoul",
]

employees = []
for line in data:
    parts = line.split(",")
    emp = Employee(name=parts[0], age=int(parts[1]), city=parts[2])
    employees.append(emp)

for emp in employees:
    print(f"{emp.name} ({emp.age}) lives in {emp.city}")
# Alice (30) lives in Taipei
# Bob (25) lives in Tokyo
# Charlie (35) lives in Seoul

💡 小提示：Python 3.6+ 可以考慮用 typing.NamedTuple 搭配 type hints，或直接用 dataclasses。但 namedtuple 在處理簡單不可變資料時依然是最輕量的選擇！如果你對 dataclasses 有興趣，可以看看拍拍君之前的 Python dataclasses 教學。

六. OrderedDict：有順序的字典
#

Python 3.7+ 的 dict 不是已經有序了嗎？
#

沒錯！從 Python 3.7 開始，普通的 dict 就會保留插入順序。但 OrderedDict 還是有幾個獨特功能：

move_to_end()：移動元素到頭或尾
#

from collections import OrderedDict

od = OrderedDict()
od["a"] = 1
od["b"] = 2
od["c"] = 3

od.move_to_end("a")           # 移到最後
print(list(od.keys()))         # ['b', 'c', 'a']

od.move_to_end("c", last=False)  # 移到最前
print(list(od.keys()))            # ['c', 'b', 'a']

順序相等比較
#

普通 dict 只比較內容，不管順序。但 OrderedDict 順序不同就不相等：

from collections import OrderedDict

d1 = OrderedDict([("a", 1), ("b", 2)])
d2 = OrderedDict([("b", 2), ("a", 1)])

print(d1 == d2)  # False ← 順序不同！

# 但普通 dict：
print({"a": 1, "b": 2} == {"b": 2, "a": 1})  # True

實戰：簡易 LRU Cache
#

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # 最近使用，移到最後
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # 移除最久沒用的

cache = LRUCache(2)
cache.put("a", 1)
cache.put("b", 2)
print(cache.get("a"))   # 1
cache.put("c", 3)       # 'b' 被淘汰
print(cache.get("b"))   # -1（已被移除）
print(cache.get("c"))   # 3

七. 比較表：一張圖看懂五大工具
#

工具	用途	繼承自
`Counter`	計數	`dict`
`defaultdict`	自動給預設值的字典	`dict`
`deque`	雙端佇列	—
`namedtuple`	有名字的 tuple	`tuple`
`OrderedDict`	有順序的字典	`dict`

八. 額外推薦：ChainMap
#

如果你有多個 dict 想要「合併查找」但不想真的合併，ChainMap 很好用：

from collections import ChainMap

defaults = {"color": "red", "size": "medium"}
user_settings = {"color": "blue"}

config = ChainMap(user_settings, defaults)
print(config["color"])  # blue（優先用 user_settings）
print(config["size"])   # medium（fallback 到 defaults）

這在處理多層設定檔（CLI 參數 > 環境變數 > 設定檔 > 預設值）時特別好用！

九. 總結
#

Python 的 collections 模組提供了比基本資料結構更強大的工具：

🔢 Counter → 計數、統計
📦 defaultdict → 避免 KeyError，優雅分組
↔️ deque → 高效雙端操作、滑動窗口
🏷️ namedtuple → 可讀的不可變資料
📋 OrderedDict → 順序敏感的字典操作

這些工具都在標準庫裡，不用安裝任何東西就能用！下次寫 Python 時，先想想 collections 裡有沒有現成的工具，不要重複造輪子喔 🛞

Python collections 模組：讓你的資料結構更強大

一. 前言
#

二. Counter：計數神器
#

基本用法
#

most_common()：找出最常見的元素
#

Counter 的運算
#

實戰：統計單詞頻率
#

三. defaultdict：不再 KeyError
#

痛點
#

defaultdict 來救場
#

常見的 default_factory
#

實戰：將資料分組
#

四. deque：高效雙端佇列
#

為什麼不用 list？
#

deque 來了
#

maxlen：固定長度的滑動窗口
#

rotate：旋轉
#

實戰：實作 BFS（廣度優先搜尋）
#

五. namedtuple：有名字的 tuple
#

痛點
#

namedtuple 讓欄位有名字
#

比 dict 更輕量
#

_replace()：建立修改後的副本
#

_asdict()：轉成 dict
#

實戰：處理 CSV 資料
#

六. OrderedDict：有順序的字典
#

Python 3.7+ 的 dict 不是已經有序了嗎？
#

move_to_end()：移動元素到頭或尾
#

順序相等比較
#

實戰：簡易 LRU Cache
#

七. 比較表：一張圖看懂五大工具
#

八. 額外推薦：ChainMap
#

九. 總結
#

延伸閱讀
#

相關文章

一. 前言 #

二. Counter：計數神器 #

基本用法 #

most_common()：找出最常見的元素 #

Counter 的運算 #

實戰：統計單詞頻率 #

三. defaultdict：不再 KeyError #

痛點 #

defaultdict 來救場 #

常見的 default_factory #

實戰：將資料分組 #

四. deque：高效雙端佇列 #

為什麼不用 list？ #

deque 來了 #

maxlen：固定長度的滑動窗口 #

rotate：旋轉 #

實戰：實作 BFS（廣度優先搜尋） #

五. namedtuple：有名字的 tuple #

痛點 #

namedtuple 讓欄位有名字 #

比 dict 更輕量 #

_replace()：建立修改後的副本 #

_asdict()：轉成 dict #

實戰：處理 CSV 資料 #

六. OrderedDict：有順序的字典 #

Python 3.7+ 的 dict 不是已經有序了嗎？ #

move_to_end()：移動元素到頭或尾 #

順序相等比較 #

實戰：簡易 LRU Cache #

七. 比較表：一張圖看懂五大工具 #

八. 額外推薦：ChainMap #

九. 總結 #

延伸閱讀 #

相關文章

一. 前言
#

二. Counter：計數神器
#

基本用法
#

most_common()：找出最常見的元素
#

Counter 的運算
#

實戰：統計單詞頻率
#

三. defaultdict：不再 KeyError
#

痛點
#

defaultdict 來救場
#

常見的 default_factory
#

實戰：將資料分組
#

四. deque：高效雙端佇列
#

為什麼不用 list？
#

deque 來了
#

maxlen：固定長度的滑動窗口
#

rotate：旋轉
#

實戰：實作 BFS（廣度優先搜尋）
#

五. namedtuple：有名字的 tuple
#

痛點
#

namedtuple 讓欄位有名字
#

比 dict 更輕量
#

_replace()：建立修改後的副本
#

_asdict()：轉成 dict
#

實戰：處理 CSV 資料
#

六. OrderedDict：有順序的字典
#

Python 3.7+ 的 dict 不是已經有序了嗎？
#

move_to_end()：移動元素到頭或尾
#

順序相等比較
#

實戰：簡易 LRU Cache
#

七. 比較表：一張圖看懂五大工具
#

八. 額外推薦：ChainMap
#

九. 總結
#

延伸閱讀
#