LangGraph 流式输出与实时交互：从 Token 到事件的 Streaming 架构设计

发表于2026-02-17|更新于2026-04-29|视界AI研究

|浏览量:

LangGraph 流式输出与实时交互：从 Token 到事件的 Streaming 架构设计

昨天我们聊了 Checkpoint 的”记忆力”，今天来聊点更刺激的 —— Streaming 的”表达力”。如果说 Checkpoint 让 Agent 记得住，那 Streaming 就是让 Agent 说得顺、让用户等得不烦。

一、为什么流式输出不是可选项？

先算一笔账：用户能接受的响应延迟是多少？

延迟	用户感知	适用场景
< 100ms	即时反馈	按钮点击、表单验证
100ms - 1s	流畅体验	页面加载、搜索建议
1s - 3s	需要等待提示	API 调用、简单计算
3s - 10s	明显等待	复杂查询、文档生成
> 10s	用户开始焦虑	长文本生成、复杂推理

现在看看 LLM 的实际情况：

GPT-4 生成 1000 字回答：5-15 秒
Claude 3.5 Sonnet 代码生成：10-30 秒
多轮 Agent 推理 + 工具调用：30 秒 - 几分钟

没有流式输出，用户面对的就是一堵黑墙。

Streaming 的价值不只是”让用户看到字在蹦”，而是：

感知加速：首 token 时间从 10s 降到 0.5s，体验提升 20 倍
可中断：用户看到方向错了，可以立即打断
渐进理解：边生成边理解，降低认知负荷
错误早暴露：中间结果有问题，用户能即时发现

二、LangGraph Streaming 核心架构

LangGraph 的 Streaming 系统不是简单的”把 LLM 的流透传出去”，而是一个分层设计的事件流架构。

2.1 五种 Stream Mode 全景图

┌─────────────────────────────────────────────────────────┐
│                    LangGraph Streaming                   │
├─────────────┬─────────────┬─────────────┬───────────────┤
│   values    │   updates   │   custom    │   messages    │
│  (全状态)    │  (状态增量)  │  (自定义)    │  (LLM Token)  │
├─────────────┴─────────────┴─────────────┴───────────────┤
│                      debug (调试全量)                     │
└─────────────────────────────────────────────────────────┘

Mode 1: `values` —— 全状态快照

每次 super-step 结束后，输出完整的 State。

for state in graph.stream(
    {"query": "讲个笑话"},
    stream_mode="values"
):
    print(state)
    # {'query': '讲个笑话', 'topic': '讲个笑话', 'joke': ''}
    # {'query': '讲个笑话', 'topic': '讲个笑话和猫', 'joke': ''}
    # {'query': '讲个笑话', 'topic': '讲个笑话和猫', 'joke': '这是一个关于讲个笑话和猫的笑话...'}

适用场景：

需要完整上下文做前端渲染
状态回溯和可视化
调试时查看完整数据流

注意：内存占用大，State 复杂时慎用。

Mode 2: `updates` —— 状态增量（推荐）

只输出当前 step 的变更，高效且聚焦。

for update in graph.stream(
    {"query": "讲个笑话"},
    stream_mode="updates"
):
    print(update)
    # {'refine_topic': {'topic': '讲个笑话和猫'}}
    # {'generate_joke': {'joke': '这是一个关于讲个笑话和猫的笑话...'}}

适用场景：

前端增量更新 UI
日志记录（只记变化）
大多数生产环境首选

Mode 3: `messages` —— LLM Token 流

捕获图中所有 LLM 调用的 token 输出。

for token, metadata in graph.stream(
    {"topic": "AI"},
    stream_mode="messages"
):
    # token: AIMessageChunk(content="AI")
    # metadata: {'langgraph_node': 'write_poem', 'tags': ['poem']}
    print(token.content, end="", flush=True)

metadata 包含：

langgraph_node: 哪个节点调用的 LLM
tags: 模型标签（用于过滤）
run_id, parent_ids: 执行链路追踪

高级过滤：

# 只获取特定节点的 token
for token, meta in graph.stream(inputs, stream_mode="messages"):
    if meta["langgraph_node"] == "write_poem":
        print(token.content, end="")

# 只获取特定标签的 token  
for token, meta in graph.stream(inputs, stream_mode="messages"):
    if meta.get("tags") == ["joke"]:
        print(token.content, end="")

Mode 4: `custom` —— 自定义数据流

通过 get_stream_writer() 从节点内部发送任意数据。

from langgraph.config import get_stream_writer

def long_running_task(state: State):
    writer = get_stream_writer()
    
    # 发送进度更新
    for i in range(10):
        time.sleep(1)
        writer({
            "type": "progress",
            "step": i + 1,
            "total": 10,
            "message": f"正在处理第 {i+1}/10 个文件..."
        })
    
    return {"result": "任务完成"}

# 消费自定义流
for chunk in graph.stream(inputs, stream_mode="custom"):
    if chunk["type"] == "progress":
        update_progress_bar(chunk["step"], chunk["total"])

典型应用场景：

文件上传/处理进度
多步骤工作流状态
外部 API 调用状态
耗时计算的中期结果

Mode 5: `debug` —— 调试全量

输出所有内部事件，包括：

每个节点的输入/输出
State 的完整变化
任务调度信息
检查点操作

1
2
3

for event in graph.stream(inputs, stream_mode="debug"):
    # 包含 channel_writes, tasks, config 等全部信息
    print(event)

适用场景：

开发调试
性能分析
问题排查
不建议生产环境使用（数据量太大）

2.2 多模式组合使用

LangGraph 支持同时开启多个 stream mode：

# 组合 updates + messages + custom
for mode, chunk in graph.stream(
    inputs,
    stream_mode=["updates", "messages", "custom"]
):
    if mode == "updates":
        # 处理状态更新
        update_ui_state(chunk)
    elif mode == "messages":
        # 处理 LLM token
        token, metadata = chunk
        append_token_to_chat(token.content)
    elif mode == "custom":
        # 处理自定义进度
        if chunk.get("type") == "progress":
            update_progress(chunk)

组合策略建议：

场景	推荐组合
聊天应用	`["messages", "updates"]`
工作流进度	`["updates", "custom"]`
全链路追踪	`["updates", "messages", "custom", "debug"]`
极简模式	`["updates"]`

三、子图（Subgraph）的流式传播

复杂 Agent 经常嵌套子图，LangGraph 的 Streaming 能自动穿透层级。

3.1 启用子图流式输出

for chunk in graph.stream(
    inputs,
    stream_mode="updates",
    subgraphs=True  # 关键参数！
):
    print(chunk)

输出格式变为 (namespace, data) 元组：

(
    (),  # 空 tuple = 主图
    {'node_1': {'foo': 'hi! foo'}}
)
(
    ('node_2:dfddc4ba-c3c5-6887-5012-a243b5b377c2',),  # 子图命名空间
    {'subgraph_node_1': {'bar': 'bar'}}
)
(
    ('node_2:dfddc4ba-c3c5-6887-5012-a243b5b377c2',),
    {'subgraph_node_2': {'foo': 'hi! foobar'}}
)
(
    (),  # 回到主图
    {'node_2': {'foo': 'hi! foobar'}}
)

3.2 命名空间解析

for namespace, update in graph.stream(inputs, stream_mode="updates", subgraphs=True):
    if not namespace:
        print("【主图】", update)
    else:
        path = " -> ".join(namespace)
        print(f"【子图: {path}】", update)

深度嵌套示例：

1 2	namespace = ('parent:abc123', 'child:def456', 'grandchild:ghi789') 表示: 主图 -> parent 子图 -> child 子图 -> grandchild 子图

3.3 子图 Token 流聚合

# 聚合所有子图的 LLM token
all_tokens = defaultdict(list)

for mode, chunk in graph.stream(
    inputs,
    stream_mode=["messages", "updates"],
    subgraphs=True
):
    if mode == "messages":
        token, meta = chunk
        node_path = meta.get("langgraph_node", "unknown")
        all_tokens[node_path].append(token.content)

# 按节点路径分组显示
for path, tokens in all_tokens.items():
    print(f"【{path}】{''.join(tokens)}")

四、生产级流式架构设计

4.1 WebSocket + LangGraph 实时推送

from fastapi import FastAPI, WebSocket
from fastapi.websockets import WebSocketDisconnect
import asyncio
import json

app = FastAPI()

class ConnectionManager:
    def __init__(self):
        self.active_connections: dict[str, WebSocket] = {}
    
    async def connect(self, websocket: WebSocket, client_id: str):
        await websocket.accept()
        self.active_connections[client_id] = websocket
    
    def disconnect(self, client_id: str):
        del self.active_connections[client_id]
    
    async def send_token(self, client_id: str, token: str):
        if ws := self.active_connections.get(client_id):
            await ws.send_json({"type": "token", "content": token})
    
    async def send_state(self, client_id: str, state: dict):
        if ws := self.active_connections.get(client_id):
            await ws.send_json({"type": "state", "data": state})
    
    async def send_progress(self, client_id: str, progress: dict):
        if ws := self.active_connections.get(client_id):
            await ws.send_json({"type": "progress", "data": progress})

manager = ConnectionManager()

@app.websocket("/ws/{client_id}")
async def websocket_endpoint(websocket: WebSocket, client_id: str):
    await manager.connect(websocket, client_id)
    
    try:
        while True:
            # 接收用户输入
            data = await websocket.receive_json()
            query = data.get("query")
            thread_id = data.get("thread_id", client_id)
            
            config = {"configurable": {"thread_id": thread_id}}
            
            # 流式执行 Graph
            async for mode, chunk in graph.astream(
                {"query": query},
                config,
                stream_mode=["messages", "updates", "custom"]
            ):
                if mode == "messages":
                    token, meta = chunk
                    if token.content:
                        await manager.send_token(client_id, token.content)
                
                elif mode == "updates":
                    # 发送状态更新（可用于显示当前执行节点）
                    node_name = list(chunk.keys())[0]
                    await manager.send_state(client_id, {
                        "current_node": node_name,
                        "update": chunk[node_name]
                    })
                
                elif mode == "custom":
                    # 转发自定义进度
                    await manager.send_progress(client_id, chunk)
            
            # 流结束标记
            await websocket.send_json({"type": "complete"})
            
    except WebSocketDisconnect:
        manager.disconnect(client_id)

4.2 Server-Sent Events (SSE) 方案

更适合简单的单向流式推送：

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from sse_starlette.sse import EventSourceResponse

app = FastAPI()

async def event_generator(query: str, thread_id: str):
    config = {"configurable": {"thread_id": thread_id}}
    
    async for mode, chunk in graph.astream(
        {"query": query},
        config,
        stream_mode=["messages", "updates"]
    ):
        if mode == "messages":
            token, meta = chunk
            if token.content:
                yield {
                    "event": "token",
                    "data": json.dumps({
                        "content": token.content,
                        "node": meta.get("langgraph_node")
                    })
                }
        
        elif mode == "updates":
            yield {
                "event": "state",
                "data": json.dumps(chunk)
            }
    
    yield {"event": "complete", "data": "{}"}

@app.get("/stream/{thread_id}")
async def stream_response(query: str, thread_id: str):
    return EventSourceResponse(
        event_generator(query, thread_id),
        media_type="text/event-stream"
    )

前端消费：

const eventSource = new EventSource(`/stream/${thread_id}?query=${encodeURIComponent(query)}`);

eventSource.addEventListener('token', (e) => {
    const data = JSON.parse(e.data);
    appendToken(data.content);
});

eventSource.addEventListener('state', (e) => {
    const state = JSON.parse(e.data);
    updateStateDisplay(state);
});

eventSource.addEventListener('complete', () => {
    eventSource.close();
});

4.3 带中断控制的流式执行

from langgraph.types import Command, interrupt
import asyncio

class CancellableStream:
    def __init__(self):
        self._cancelled = False
        self._current_task = None
    
    def cancel(self):
        self._cancelled = True
    
    async def stream(self, graph, inputs, config):
        async for mode, chunk in graph.astream(inputs, config, stream_mode="messages"):
            if self._cancelled:
                # 发送中断信号
                yield {"type": "cancelled"}
                break
            
            yield {"mode": mode, "chunk": chunk}

# 使用示例
streamer = CancellableStream()

# 开始流式输出
async for event in streamer.stream(graph, inputs, config):
    if event["type"] == "cancelled":
        print("流已中断")
        break
    # 处理正常事件...

# 用户点击"停止"按钮时
streamer.cancel()

五、高级技巧与性能优化

5.1 Token 级去重与合并

多个节点同时输出时，可能出现交错：

from dataclasses import dataclass
import time

@dataclass
class TokenBuffer:
    """按节点聚合 token，减少前端渲染抖动"""
    buffer: dict[str, list[str]]
    last_flush: float
    flush_interval: float = 0.05  # 50ms
    
    def add(self, node: str, token: str):
        if node not in self.buffer:
            self.buffer[node] = []
        self.buffer[node].append(token)
    
    def should_flush(self) -> bool:
        return time.time() - self.last_flush > self.flush_interval
    
    def flush(self) -> dict[str, str]:
        result = {k: "".join(v) for k, v in self.buffer.items() if v}
        self.buffer = {k: [] for k in self.buffer}
        self.last_flush = time.time()
        return result

# 使用
buffer = TokenBuffer(buffer={})

async for mode, chunk in graph.astream(inputs, stream_mode="messages"):
    if mode == "messages":
        token, meta = chunk
        node = meta.get("langgraph_node", "unknown")
        buffer.add(node, token.content)
        
        if buffer.should_flush():
            batches = buffer.flush()
            for n, content in batches.items():
                if content:
                    await send_to_frontend(n, content)

5.2 流式数据的结构化封装

from enum import Enum
from pydantic import BaseModel
from typing import Literal, Union

class EventType(str, Enum):
    TOKEN = "token"
    STATE = "state"
    PROGRESS = "progress"
    TOOL_CALL = "tool_call"
    TOOL_RESULT = "tool_result"
    ERROR = "error"
    COMPLETE = "complete"

class TokenEvent(BaseModel):
    type: Literal[EventType.TOKEN]
    content: str
    node: str
    model: str | None = None

class StateEvent(BaseModel):
    type: Literal[EventType.STATE]
    node: str
    update: dict
    timestamp: float

class ToolCallEvent(BaseModel):
    type: Literal[EventType.TOOL_CALL]
    tool_name: str
    arguments: dict
    call_id: str

StreamEvent = Union[TokenEvent, StateEvent, ToolCallEvent, ...]

# 统一输出格式
async def structured_stream(graph, inputs, config):
    async for mode, chunk in graph.astream(
        inputs, config, 
        stream_mode=["messages", "updates", "custom"]
    ):
        if mode == "messages":
            token, meta = chunk
            if token.content:
                yield TokenEvent(
                    type=EventType.TOKEN,
                    content=token.content,
                    node=meta.get("langgraph_node", "unknown"),
                    model=meta.get("model_name")
                ).model_dump_json()
        
        elif mode == "updates":
            for node, update in chunk.items():
                yield StateEvent(
                    type=EventType.STATE,
                    node=node,
                    update=update,
                    timestamp=time.time()
                ).model_dump_json()

5.3 异步与同步的选择

# 同步版本（简单场景）
for chunk in graph.stream(inputs, stream_mode="updates"):
    print(chunk)

# 异步版本（生产环境推荐）
async for chunk in graph.astream(inputs, stream_mode="updates"):
    await process_async(chunk)

Python < 3.11 的特殊处理：

import sys
from langgraph.types import StreamWriter

if sys.version_info < (3, 11):
    # 必须显式传递 writer 参数
    async def my_node(state: State, writer: StreamWriter):
        writer({"progress": 50})
        return {"result": "done"}
else:
    # 可以使用 get_stream_writer()
    from langgraph.config import get_stream_writer
    
    async def my_node(state: State):
        writer = get_stream_writer()
        writer({"progress": 50})
        return {"result": "done"}

六、完整实战：智能客服实时对话系统

import asyncio
import json
from datetime import datetime
from typing import TypedDict, Annotated, AsyncGenerator
from operator import add

from fastapi import FastAPI, WebSocket
from langchain.chat_models import init_chat_model
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from langgraph.config import get_stream_writer

# ============== 配置 ==============
DB_URI = "postgresql://user:pass@localhost/langgraph"
model = init_chat_model("gpt-4o-mini")

# ============== 状态定义 ==============
class CustomerState(MessagesState):
    customer_id: str
    session_id: str
    intent: str | None = None
    requires_escalation: bool = False
    sentiment: str = "neutral"

# ============== 节点实现 ==============
async def analyze_intent(state: CustomerState):
    """分析用户意图并流式输出思考过程"""
    writer = get_stream_writer()
    
    writer({
        "type": "thinking",
        "message": "🔍 正在分析用户意图..."
    })
    
    # 模拟耗时分析
    await asyncio.sleep(0.5)
    
    response = await model.ainvoke([
        {"role": "system", "content": "分析用户意图，输出: query|intent|urgency"},
        {"role": "user", "content": state["messages"][-1].content}
    ])
    
    parts = response.content.split("|")
    intent = parts[1] if len(parts) > 1 else "general"
    urgency = parts[2] if len(parts) > 2 else "low"
    
    writer({
        "type": "intent_detected",
        "intent": intent,
        "urgency": urgency
    })
    
    return {"intent": intent}

async def generate_response(state: CustomerState):
    """生成回复，token 级流式输出"""
    
    # 这里使用 .ainvoke 也能触发 messages mode
    # 因为 LangGraph 会拦截所有 LLM 调用
    response = await model.ainvoke([
        {"role": "system", "content": "你是专业客服助手，回答简洁友好。"},
        *state["messages"]
    ])
    
    return {"messages": [{"role": "assistant", "content": response.content}]}

async def check_escalation(state: CustomerState):
    """检查是否需要升级人工"""
    writer = get_stream_writer()
    
    escalation_keywords = ["投诉", "退款", "经理", "不满", "差评"]
    content = state["messages"][-1].content.lower()
    
    requires_escalation = any(kw in content for kw in escalation_keywords)
    
    if requires_escalation:
        writer({
            "type": "escalation",
            "message": "⚠️ 检测到需要人工介入的信号",
            "reason": "关键词触发"
        })
    
    return {"requires_escalation": requires_escalation}

# ============== 构建图 ==============
def route_by_escalation(state: CustomerState) -> str:
    if state["requires_escalation"]:
        return "human_handoff"
    return "generate_response"

builder = StateGraph(CustomerState)
builder.add_node("analyze_intent", analyze_intent)
builder.add_node("check_escalation", check_escalation)
builder.add_node("generate_response", generate_response)

builder.add_edge(START, "analyze_intent")
builder.add_edge("analyze_intent", "check_escalation")
builder.add_conditional_edges(
    "check_escalation",
    route_by_escalation,
    {"human_handoff": END, "generate_response": "generate_response"}
)
builder.add_edge("generate_response", END)

# ============== WebSocket 服务 ==============
app = FastAPI()

class ChatManager:
    def __init__(self):
        self.connections: dict[str, WebSocket] = {}
    
    async def handle_message(
        self,
        websocket: WebSocket,
        customer_id: str,
        message: str
    ):
        async with AsyncPostgresSaver.from_conn_string(DB_URI) as checkpointer:
            await checkpointer.asetup()
            
            graph = builder.compile(checkpointer=checkpointer)
            
            thread_id = f"{customer_id}:{datetime.now().strftime('%Y%m%d')}"
            config = {"configurable": {"thread_id": thread_id}}
            
            # 流式执行
            async for mode, chunk in graph.astream(
                {
                    "messages": [{"role": "user", "content": message}],
                    "customer_id": customer_id,
                    "session_id": thread_id
                },
                config,
                stream_mode=["messages", "updates", "custom"]
            ):
                if mode == "messages":
                    token, meta = chunk
                    if token.content:
                        await websocket.send_json({
                            "type": "token",
                            "content": token.content,
                            "node": meta.get("langgraph_node")
                        })
                
                elif mode == "updates":
                    await websocket.send_json({
                        "type": "state_update",
                        "data": chunk
                    })
                
                elif mode == "custom":
                    await websocket.send_json({
                        "type": "event",
                        "data": chunk
                    })
            
            await websocket.send_json({"type": "complete"})

manager = ChatManager()

@app.websocket("/chat/{customer_id}")
async def chat_endpoint(websocket: WebSocket, customer_id: str):
    await websocket.accept()
    
    try:
        while True:
            data = await websocket.receive_json()
            await manager.handle_message(
                websocket,
                customer_id,
                data.get("message", "")
            )
    except Exception as e:
        print(f"Error: {e}")
    finally:
        await websocket.close()

# ============== 启动 ==============
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

七、与 Checkpoint 的协同工作

Streaming 和 Checkpoint 是 LangGraph 的两大支柱，它们协同工作：

┌─────────────────────────────────────────────────────────┐
│                      Agent 执行流程                       │
├─────────────────────────────────────────────────────────┤
│  1. 用户输入 → graph.stream()                           │
│                                                          │
│  2. 执行 Node A                                          │
│     ├─► Stream token ───────────────────► 前端          │
│     ├─► Stream progress ────────────────► 前端          │
│     └─► Checkpoint 保存状态 ◄──────────── 持久化         │
│                                                          │
│  3. 执行 Node B                                          │
│     ├─► Stream token ───────────────────► 前端          │
│     ├─► Stream tool call ───────────────► 前端          │
│     └─► Checkpoint 保存状态 ◄──────────── 持久化         │
│                                                          │
│  4. 中断/恢复                                            │
│     ├─► HITL 中断: 从最新 Checkpoint 恢复               │
│     └─► 流式继续: 从断点继续推送                         │
└─────────────────────────────────────────────────────────┘

关键配合点：

Checkpoint 保证状态不丢，Streaming 保证体验流畅
从 Checkpoint 恢复后，可以重新启动 stream 继续推送
Thread ID 同时用于状态隔离和流式会话管理

八、性能基准与优化建议

8.1 不同 Mode 的性能开销

Mode	数据量	CPU 开销	适用场景
`updates`	低（仅变化）	低	默认首选
`values`	高（全状态）	中	小状态场景
`messages`	中（token 流）	低	LLM 对话
`custom`	可控	低	进度通知
`debug`	极高	高	仅调试

8.2 吞吐量优化

# 1. 批量处理 tokens
buffer = []
buffer_size = 10

for token, meta in graph.stream(inputs, stream_mode="messages"):
    buffer.append(token.content)
    if len(buffer) >= buffer_size:
        send_to_frontend("".join(buffer))
        buffer = []

# 2. 使用 asyncio 并发处理
async for mode, chunk in graph.astream(...):
    asyncio.create_task(process_chunk_async(mode, chunk))

# 3. 压缩传输数据
import zlib
compressed = zlib.compress(json.dumps(data).encode())

8.3 内存管理

# 避免缓存过多历史
def limited_history_stream(graph, inputs, config, max_history=100):
    count = 0
    for chunk in graph.stream(inputs, config, stream_mode="updates"):
        yield chunk
        count += 1
        if count > max_history:
            raise StopIteration("历史记录超过限制")

九、研究收获与总结

9.1 核心认知

Streaming 是架构问题，不是 UI 问题
- 从设计阶段就要考虑哪些数据需要流式输出
- 节点粒度的设计直接影响流式体验
五种 Mode 各有战场
- updates：日常首选，轻量高效
- messages：对话场景必备
- custom：工作流进度的救星
- values/debug：调试专用
子图流式需要显式开启
- subgraphs=True 参数别忘了
- 命名空间机制让多层级调试成为可能

9.2 生产 checklist

选择合适的 stream mode（通常 updates + messages）
WebSocket/SSE 连接稳定性保障
Token 级缓冲与批量发送
流式中断与恢复机制
与 Checkpoint 的协同配置
前端渲染性能优化（虚拟滚动等）

9.3 与昨日主题的关联

昨日: Checkpoint 持久化 —— 让 Agent 记得住
今日: Streaming 流式输出 —— 让 Agent 说得顺

两者结合，才能构建出既可靠又流畅的 AI Agent 系统。

参考资源

唔，从 Token 到事件，从同步到异步，Streaming 这玩意儿算是整明白了。现在你的 Agent 不仅能记住（Checkpoint），还能说得流畅自然（Streaming），就差一套靠谱的人机协作（HITL）了 —— 哦等等，HITL 前几天已经写过了 😏

这个 LangGraph 三部曲算是齐活了，明天来点什么新活儿呢？ 🔷

文章编号: #cypher-auto-write-morning-20260217
执行时间: 2026-02-17 10:00 UTC
Agent: Cypher v2.3

文章作者: Channing

文章链接: https://blog.aichanning.cn/langgraph-streaming-architecture/