Python watchdog 實戰：檔案變更監控與自動化完全攻略

Python 學習 - 本文屬於一個選集。

§ 46: 本文

§ 47: Python hypothesis 實戰：Property-Based Testing 與自動化找 bug 完全攻略

§ 48: Python prompt_toolkit 實戰：打造互動式 CLI、Auto-Completion 與 REPL 完全攻略

§ 49: Python difflib 實戰：文字差異比對、相似度比較與 patch 輸出完全攻略

§ 50: Python tomllib 實戰：內建 TOML 解析、設定檔管理與 pyproject.toml 完全攻略

§ 51: Python uv 進階：workspace、lockfile、script 與專案管理完全攻略

§ 52: Streamlit 進階：session_state、cache 與多頁 Dashboard 完全攻略

§ 53: Python Typer 進階：巢狀 subcommands、callback 與 CLI 架構

§ 54: Streamlit 部署實戰：Secrets、設定檔與雲端上線完整攻略

§ 55: Rich + Typer：打造漂亮又好用的 Python CLI 體驗

§ 56: Python DuckDB 實戰：用 SQL 快速分析 CSV 與 Parquet

§ 57: Python APScheduler 實戰：讓程式定時執行背景工作

§ 58: Python marimo 實戰：可重現的 Reactive Notebook 與資料小工具

§ 59: Python orjson 實戰：高速 JSON 序列化與 API 資料處理

§ 60: Python zoneinfo 實戰：時區、DST 與排程時間處理完全攻略

§ 61: Python tempfile 實戰：安全建立暫存檔案、目錄與測試資料

§ 62: Python secrets 實戰：安全產生 Token、密碼與一次性連結

§ 63: Python Plotly 實戰：互動式資料視覺化與 Dashboard 圖表

§ 64: Python pydantic-settings 實戰：型別安全管理 .env 與設定檔

§ 65: FastAPI + Streamlit 實戰：API 後端與互動前端分工

§ 66: Python SQLAlchemy 2.0 實戰：Typed ORM、Session 與查詢模式

§ 67: Python pytest fixtures 進階：conftest、factory 與測試資料管理

§ 68: Python Alembic 實戰：資料庫 Migration、版本控管與團隊協作

§ 69: Python uv scripts 實戰：PEP 723、inline dependencies 與單檔工具

§ 70: Streamlit + DuckDB 實戰：本地資料查詢 Dashboard

§ 71: Streamlit Auth 實戰：session_state、登入狀態與權限頁面

一. 前言：有些工具不是只跑一次，而是要一直盯著變化
#

很多開發流程真正麻煩的，不是把事情做完一次，而是每次檔案有變更時，都要再做一次。

例如 Markdown 一更新就重建網站，某個資料夾多了新檔案就自動整理，設定檔改動後就重啟服務，或是圖片丟進 incoming/ 後立刻壓縮與搬移。

最直覺的做法，通常是寫個 while True:，每秒掃一次資料夾，看有沒有新東西。但這種輪詢很快就會遇到幾個問題：不夠即時、浪費資源、邏輯容易混亂，而且很難判斷到底是哪個檔案真的發生了變化。

這時候就很適合請出 watchdog。

watchdog 是 Python 裡很常用的檔案系統監控套件，可以監聽檔案或資料夾的建立、修改、刪除、重新命名等事件。你不用自己一直輪詢，而是等作業系統在事件發生時通知你。

如果你前面看過拍拍君寫的 Python pathlib 教學與 Python subprocess 教學，今天這篇可以看成把兩者接起來的實戰篇。 pathlib 幫你處理路徑，subprocess 幫你在事件發生後執行其他工具，而 watchdog 則負責觀察變化本身。

這篇文章會一路帶你做到幾件事：

理解 Observer 與 FileSystemEventHandler 的角色
只監聽特定副檔名或特定目錄
避免存一次檔案卻觸發好多次事件
把監控接上 build、測試、轉檔等自動化 workflow
寫出比較像正式專案的小工具版本

如果你平常會寫 dev tools、資料處理腳本、靜態網站工作流，這套真的很值得學起來。

二. 安裝：先把監看能力裝起來
#

安裝方式很簡單：

pip install watchdog

# 或用 uv
uv add watchdog

watchdog 最常用的核心元件大概就兩個：

Observer，負責在背景監聽檔案系統事件
FileSystemEventHandler，負責定義事件發生時要做什麼

你可以把它想成這樣：Observer 是保全系統，Handler 是收到通知後出動的人。

2.1 最小可執行範例
#

下面先來一個最基本的版本。這個程式會監聽 ./watched 資料夾，只要有檔案建立、修改或刪除，就印出訊息。

from pathlib import Path
from time import sleep

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer


class SimpleHandler(FileSystemEventHandler):
    def on_created(self, event):
        print(f"[created] {event.src_path}")

    def on_modified(self, event):
        print(f"[modified] {event.src_path}")

    def on_deleted(self, event):
        print(f"[deleted] {event.src_path}")


watch_path = Path("./watched")
watch_path.mkdir(exist_ok=True)

handler = SimpleHandler()
observer = Observer()
observer.schedule(handler, str(watch_path), recursive=True)
observer.start()

print(f"Watching: {watch_path.resolve()}")

try:
    while True:
        sleep(1)
except KeyboardInterrupt:
    observer.stop()

observer.join()

跑起來之後，去 watched/ 裡新增或修改檔案，就能在 terminal 看到事件輸出。

這個範例已經包含幾個重要觀念：

observer.start() 之後，監控會在背景進行
主執行緒不能立刻結束，所以通常要維持一個簡單迴圈
recursive=True 代表連子資料夾也一起監聽
要用 KeyboardInterrupt 收尾，最後記得 observer.join()

三. 認識事件模型：不只有 modified，還有 moved 與 directory event
#

watchdog 不只會告訴你檔案被改了。它也會回報建立、刪除、搬移等事件，而且事件目標可能是檔案，也可能是資料夾。

最常見的第一步，就是先把資料夾事件過濾掉。

3.1 用 `on_any_event()` 先看清楚發生什麼事
#

當你還在摸清某個編輯器到底會觸發哪些事件時，on_any_event() 很方便。

from watchdog.events import FileSystemEventHandler


class FileOnlyHandler(FileSystemEventHandler):
    def on_any_event(self, event):
        if event.is_directory:
            return

        print(f"[{event.event_type}] {event.src_path}")

這段程式的重點不是最終設計，而是觀察。你可以先實際存幾次檔案，看看會出現哪些 event type，再決定後面要不要拆成 on_modified()、on_created() 或 on_moved()。

3.2 搬移或重新命名時，記得看 `dest_path`
#

如果你想處理 rename 或移動檔案，通常會用 on_moved()。

from watchdog.events import FileSystemEventHandler


class MoveHandler(FileSystemEventHandler):
    def on_moved(self, event):
        if event.is_directory:
            return

        print(f"moved: {event.src_path} -> {event.dest_path}")

這在很多工作流裡都很有用，例如暫存檔改名成正式檔、下載完成後由 .part 變成正式名稱，或是某個整理程序把檔案搬到另一個子目錄。

也就是說，你監聽的不只是 bytes 有沒有變，而是工作流的狀態有沒有往下一步走。

四. 只處理你真的在乎的檔案
#

在真實專案裡，我們很少想監聽所有東西。大部分情況下，你只想在 .md、.csv、.py 之類的檔案變動時處理。

4.1 用 `pathlib` 過濾副檔名
#

from pathlib import Path

from watchdog.events import FileSystemEventHandler


class MarkdownHandler(FileSystemEventHandler):
    def on_modified(self, event):
        if event.is_directory:
            return

        path = Path(event.src_path)
        if path.suffix != ".md":
            return

        print(f"Markdown updated: {path.name}")

這裡搭配 pathlib 的好處很直接。你後面如果要判斷父層目錄、改副檔名、檢查檔名模式，都比手動切字串舒服很多。

4.2 把條件抽成 `should_handle()`
#

如果過濾規則開始變多，建議不要把判斷全部塞在事件函式裡。把它抽成方法，程式會乾淨很多。

from pathlib import Path

from watchdog.events import FileSystemEventHandler


class FilteredHandler(FileSystemEventHandler):
    allowed_suffixes = {".md", ".txt"}

    def should_handle(self, src_path: str) -> bool:
        path = Path(src_path)
        if path.name.startswith("."):
            return False
        return path.suffix.lower() in self.allowed_suffixes

    def on_modified(self, event):
        if event.is_directory:
            return

        if not self.should_handle(event.src_path):
            return

        print(f"handle file: {event.src_path}")

這樣之後如果你要加入更多規則，例如忽略 .tmp、只處理特定子目錄、避開編輯器產生的 swap file，就能集中在一個地方維護。

五. 初學者最常遇到的坑：存一次檔案，事件卻跳很多次
#

這件事真的很常見。你明明只按了一次儲存，結果 terminal 卻冒出兩次、三次，甚至更多 modified。

通常不是你程式壞掉，而是不同編輯器與不同平台在儲存檔案時，可能真的會產生多個事件。常見原因包括：

先寫暫存檔，再 rename 成正式檔
內容與 metadata 分開更新
編輯器自己生成 swap file
儲存後又自動跑 formatter

如果你的處理動作只是印個 log，那可能無所謂。但如果每次觸發都會重跑 build、lint、上傳或轉檔，最好做 debounce。

5.1 用時間窗避免重複處理
#

下面這個版本會記錄每個檔案上次處理的時間。如果短時間內重複收到事件，就先跳過。

from pathlib import Path
from time import monotonic

from watchdog.events import FileSystemEventHandler


class DebouncedHandler(FileSystemEventHandler):
    def __init__(self, cooldown: float = 0.8):
        self.cooldown = cooldown
        self.last_seen: dict[str, float] = {}

    def should_skip(self, src_path: str) -> bool:
        now = monotonic()
        previous = self.last_seen.get(src_path)
        self.last_seen[src_path] = now

        if previous is None:
            return False

        return now - previous < self.cooldown

    def on_modified(self, event):
        if event.is_directory:
            return

        path = Path(event.src_path)
        if path.suffix != ".md":
            return

        if self.should_skip(event.src_path):
            return

        print(f"rebuild for: {path.name}")

這個做法的好處是簡單、直覺、在很多工作流裡也夠用。缺點則是它只看時間，不看內容，如果你真的在很短時間內存了兩次不同內容，它也可能把第二次吞掉。

5.2 如果你要更穩，可以比較內容 hash
#

另一個思路是只在內容真的變化時才處理。概念上可以先對檔案算 digest，和上次記錄比對後再決定要不要執行。

from hashlib import sha256
from pathlib import Path


def file_digest(path: Path) -> str:
    return sha256(path.read_bytes()).hexdigest()

這種作法比較穩，但成本也比較高，尤其是大檔案。所以拍拍君通常會先從 debounce 開始，除非你的觸發成本真的很高，才會再升級成 hash-based 判斷。

六. 把 `watchdog` 接上自動化 workflow
#

很多人用 watchdog，其實不是想知道檔案變了。真正想做的是在變動後 自動做事。

像是：

.md 更新後重建靜態網站
.py 更新後跑測試
incoming/ 新增圖片後壓縮並搬移
config.yaml 變動後 reload 本地服務

這時候就很適合搭配 subprocess.run()。

6.1 Markdown 變更後自動 build
#

from pathlib import Path
from subprocess import run
from time import monotonic

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer


class BuildHandler(FileSystemEventHandler):
    def __init__(self, cooldown: float = 1.0):
        self.cooldown = cooldown
        self.last_seen: dict[str, float] = {}

    def should_skip(self, src_path: str) -> bool:
        now = monotonic()
        previous = self.last_seen.get(src_path)
        self.last_seen[src_path] = now
        return previous is not None and now - previous < self.cooldown

    def on_modified(self, event):
        if event.is_directory:
            return

        path = Path(event.src_path)
        if path.suffix != ".md":
            return

        if self.should_skip(event.src_path):
            return

        print(f"[build] detected update: {path.name}")
        result = run(["hugo"], cwd="/Users/maho/code/my-site")
        print(f"[build] exit code: {result.returncode}")


observer = Observer()
handler = BuildHandler()
observer.schedule(handler, "/Users/maho/code/my-site/content", recursive=True)
observer.start()

try:
    observer.join()
except KeyboardInterrupt:
    observer.stop()
    observer.join()

這裡的重點很簡單。 watchdog 負責感知事件，subprocess 負責執行外部命令，兩者搭起來，就能讓很多手動流程自動化。

如果你對 subprocess 還不熟，建議補一下 Python subprocess 教學，跟這種工作流真的很搭。

6.2 指令執行時，最好把錯誤也收好
#

正式一點的版本，通常不會只印 return code。你會希望把 stdout、stderr 接住，失敗時能留下線索。

from subprocess import CalledProcessError, run


def run_build() -> None:
    try:
        result = run(
            ["hugo"],
            cwd="/Users/maho/code/my-site",
            capture_output=True,
            text=True,
            check=True,
        )
        print("build success")
        print(result.stdout)
    except CalledProcessError as exc:
        print("build failed")
        print(exc.stdout)
        print(exc.stderr)

如果你的外部流程偶爾會因為短暫狀況失敗，也可以再搭配 Python tenacity 教學，讓整條自動化管線更耐打一點。

七. 如果事件處理很重，先丟進 queue 比較穩
#

如果你的事件處理本身很重，不建議直接在 handler 裡做完所有事。回呼最好維持短小，避免一個長任務把後續事件全部卡住。

這時候，一個很常見的設計是把事件先丟進 queue，再由背景 worker 慢慢處理。

from pathlib import Path
from queue import Queue
from time import monotonic

from watchdog.events import FileSystemEventHandler


class QueueHandler(FileSystemEventHandler):
    def __init__(self, queue: Queue, cooldown: float = 0.5):
        self.queue = queue
        self.cooldown = cooldown
        self.last_seen: dict[str, float] = {}

    def on_modified(self, event):
        if event.is_directory:
            return

        path = Path(event.src_path)
        if path.suffix != ".py":
            return

        now = monotonic()
        previous = self.last_seen.get(event.src_path)
        self.last_seen[event.src_path] = now

        if previous is not None and now - previous < self.cooldown:
            return

        self.queue.put(path)


def worker(queue: Queue) -> None:
    while True:
        path = queue.get()
        print(f"run checks for: {path.name}")
        queue.task_done()

這種設計的好處有幾個：

handler 本身會變得很薄
你可以很容易擴充成多個 worker
後續若要做失敗重試或批次處理，也比較好加
測試時可以把事件過濾邏輯和真正工作邏輯拆開驗證

如果你喜歡把工具慢慢工程化，這一步很值得。

八. 一個比較像正式專案的版本
#

下面整理一個稍微完整一點的範例。功能包括：監聽指定資料夾、只處理 .md 與 .txt、忽略資料夾事件、使用 debounce，最後在事件發生時呼叫自訂 callback。

from __future__ import annotations

from dataclasses import dataclass, field
from pathlib import Path
from time import monotonic, sleep
from typing import Callable

from watchdog.events import FileSystemEvent, FileSystemEventHandler
from watchdog.observers import Observer


@dataclass
class WatchConfig:
    root: Path
    suffixes: set[str] = field(default_factory=lambda: {".md", ".txt"})
    recursive: bool = True
    cooldown: float = 0.8


class SmartHandler(FileSystemEventHandler):
    def __init__(self, config: WatchConfig, callback: Callable[[Path], None]):
        self.config = config
        self.callback = callback
        self.last_seen: dict[Path, float] = {}

    def should_handle(self, event: FileSystemEvent) -> bool:
        if event.is_directory:
            return False

        path = Path(event.src_path)
        return path.suffix.lower() in self.config.suffixes

    def is_debounced(self, path: Path) -> bool:
        now = monotonic()
        previous = self.last_seen.get(path)
        self.last_seen[path] = now
        return previous is not None and now - previous < self.config.cooldown

    def on_modified(self, event: FileSystemEvent) -> None:
        if not self.should_handle(event):
            return

        path = Path(event.src_path)
        if self.is_debounced(path):
            return

        self.callback(path)


def on_change(path: Path) -> None:
    print(f"processing: {path.name}")


def main() -> None:
    config = WatchConfig(root=Path("./notes"))
    config.root.mkdir(exist_ok=True)

    observer = Observer()
    handler = SmartHandler(config, on_change)
    observer.schedule(handler, str(config.root), recursive=config.recursive)
    observer.start()

    print(f"watching {config.root.resolve()}")

    try:
        while True:
            sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()


if __name__ == "__main__":
    main()