AI Agent 不确定性管理：让大模型学会"我不知道"

发表于2026-02-21|更新于2026-04-29|视界AI研究

|浏览量:

AI Agent 不确定性管理：让大模型学会”我不知道”

上午我们讨论了如何用沙箱保护 Agent 的外部边界，下午来聊聊 Agent 的内部认知能力——如何让大模型识别自己的知识边界，在不确定时优雅地求助，而不是自信满满地胡说八道。

一、问题的本质：大模型的”幻觉”与过度自信

1.1 一个真实的案例

想象你正在构建一个医疗咨询 Agent：

# 危险的自信
user_query = "我最近头痛伴随视力模糊，可能是什么病？"

# 大模型的回答可能是：
response = """
根据您的描述，这可能是偏头痛的典型症状。建议服用布洛芬缓解，
同时注意保持充足睡眠。如果症状持续，可以考虑到神经内科就诊。
"""

这个回答看似合理，但存在严重问题：

头痛 + 视力模糊 可能是青光眼急性发作的征兆
建议自行服药 可能延误急症治疗
置信度未表达 —— 模型并未说明这是推测而非诊断

1.2 不确定性的来源

大模型的不确定性主要来自三个层面：

层面	类型	示例
知识层面	训练数据未覆盖	2024年后的新事件、私有领域知识
推理层面	逻辑链条断裂	多步推理中的中间步骤出错
输入层面	信息不完整/模糊	用户 query 歧义、缺少关键上下文

1.3 为什么传统方法不够

传统 Prompt Engineering 的常见做法：

# 方式1：要求模型自我评估
prompt = """
回答问题，并在最后给出置信度评分（0-100%）。
问题：{question}
"""

# 方式2：要求模型不确定时拒绝
prompt = """
如果你不确定答案，请说"我不知道"。
问题：{question}
"""

局限性：

校准不良 —— LLM 的置信度评分与实际准确率往往不匹配
二值化思维 —— “知道/不知道” 无法表达程度上的不确定性
无行动指导 —— 即使识别出不确定，也不知道该如何处理

我们需要一个系统化的不确定性管理框架。

二、不确定性管理的核心架构

2.1 架构总览

┌─────────────────────────────────────────────────────────────┐
│                    Agent 不确定性管理架构                     │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  不确定性    │  │   置信度     │  │   信念状态   │       │
│  │  识别层      │→ │   校准层     │→ │   管理层     │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│         ↓                 ↓                 ↓               │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                  决策路由层                          │   │
│  │  [直接回答] [澄清询问] [工具调用] [人类介入]          │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

2.2 第一层：不确定性识别

不确定性识别需要检测多种信号：

from dataclasses import dataclass
from enum import Enum
from typing import Optional, List

class UncertaintyType(Enum):
    KNOWLEDGE_GAP = "knowledge_gap"      # 知识缺失
    AMBIGUITY = "ambiguity"              # 输入歧义
    CONTRADICTION = "contradiction"      # 内部矛盾
    LOW_CONFIDENCE = "low_confidence"    # 低置信度
    CONTEXT_MISSING = "context_missing"  # 上下文不足

@dataclass
class UncertaintySignal:
    type: UncertaintyType
    description: str
    severity: float  # 0-1
    source: str      # 信号来源
    evidence: Optional[dict] = None

实现方式1：自我评估 Prompt

UNCERTAINTY_DETECTION_PROMPT = """你是一个不确定性检测专家。

请分析以下问题，识别其中可能存在的不确定性：

用户问题：{query}
可用上下文：{context}

请从以下维度评估：
1. 知识覆盖度：问题涉及的知识是否在训练数据中？
2. 时间敏感性：是否涉及最新事件？
3. 模糊性：问题是否有多种理解方式？
4. 信息完整性：是否有缺失的关键信息？

以JSON格式输出：
{{
    "uncertainties": [
        {{
            "type": "knowledge_gap|ambiguity|contradiction|low_confidence|context_missing",
            "description": "具体描述",
            "severity": 0.0-1.0,
            "suggested_action": "clarify|search|escalate|proceed_with_caution"
        }}
    ],
    "overall_confidence": 0.0-1.0,
    "recommendation": "direct_answer|request_clarification|tool_invocation|human_handoff"
}}
"""

实现方式2：基于 logit 的统计方法

import torch
import numpy as np

def compute_token_entropy(logits: torch.Tensor) -> float:
    """
    计算生成 token 的熵值，高熵表示不确定性高
    """
    probs = torch.softmax(logits, dim=-1)
    entropy = -torch.sum(probs * torch.log(probs + 1e-10), dim=-1)
    return entropy.mean().item()

def compute_confidence_from_logprobs(logprobs: List[float]) -> float:
    """
    基于生成 token 的 log probability 计算平均置信度
    """
    return np.exp(np.mean(logprobs))

class UncertaintyDetector:
    def __init__(self, entropy_threshold: float = 2.0, 
                 confidence_threshold: float = 0.7):
        self.entropy_threshold = entropy_threshold
        self.confidence_threshold = confidence_threshold
    
    def analyze_generation(self, logits: torch.Tensor, 
                          logprobs: List[float]) -> UncertaintySignal:
        entropy = compute_token_entropy(logits)
        confidence = compute_confidence_from_logprobs(logprobs)
        
        if entropy > self.entropy_threshold:
            return UncertaintySignal(
                type=UncertaintyType.LOW_CONFIDENCE,
                description=f"生成熵值过高 ({entropy:.2f})，模型犹豫不决",
                severity=min(entropy / 4.0, 1.0),
                source="token_entropy",
                evidence={"entropy": entropy, "confidence": confidence}
            )
        
        if confidence < self.confidence_threshold:
            return UncertaintySignal(
                type=UncertaintyType.LOW_CONFIDENCE,
                description=f"平均 token 置信度低 ({confidence:.2f})",
                severity=1 - confidence,
                source="logprob_analysis",
                evidence={"confidence": confidence, "entropy": entropy}
            )
        
        return None  # 无显著不确定性

2.3 第二层：置信度校准

大模型的置信度往往校准不良——自信时可能出错，犹豫时反而正确。

class ConfidenceCalibrator:
    """
    基于历史数据校准模型置信度
    """
    def __init__(self):
        self.calibration_data = []  # (raw_confidence, actual_correct)
    
    def record_outcome(self, raw_confidence: float, actual_correct: bool):
        """记录预测结果用于校准"""
        self.calibration_data.append((raw_confidence, actual_correct))
    
    def calibrate(self, raw_confidence: float) -> float:
        """
        使用温度缩放（Temperature Scaling）校准置信度
        """
        # 简单的分段线性校准
        if raw_confidence < 0.3:
            return raw_confidence * 0.5  # 低置信度进一步降低
        elif raw_confidence > 0.9:
            return 0.7 + (raw_confidence - 0.9) * 3  # 高置信度适度降低
        else:
            return raw_confidence * 0.9  # 中等置信度略微降低
    
    def expected_calibration_error(self) -> float:
        """计算 ECE（Expected Calibration Error）"""
        # 实现 ECE 计算
        pass

2.4 第三层：信念状态管理

信念状态（Belief State）是 Agent 对世界认知的显式表示：

from typing import Dict, Set
from datetime import datetime

class Belief:
    """单个信念条目"""
    def __init__(self, content: str, confidence: float, 
                 source: str, timestamp: datetime = None):
        self.content = content
        self.confidence = confidence
        self.source = source  # llm_inference, tool_result, user_input
        self.timestamp = timestamp or datetime.now()
        self.verification_status = "unverified"  # unverified|verified|contradicted
    
    def update_confidence(self, new_confidence: float, reason: str):
        old_confidence = self.confidence
        self.confidence = new_confidence
        return f"置信度 {old_confidence:.2f} → {new_confidence:.2f} ({reason})"

class BeliefState:
    """
    Agent 的信念状态管理器
    """
    def __init__(self):
        self.beliefs: Dict[str, Belief] = {}
        self.contradictions: List[tuple] = []
    
    def add_belief(self, key: str, content: str, confidence: float, source: str):
        """添加或更新信念"""
        if key in self.beliefs:
            # 检查是否冲突
            existing = self.beliefs[key]
            if existing.content != content:
                self._handle_contradiction(key, existing, 
                    Belief(content, confidence, source))
        
        self.beliefs[key] = Belief(content, confidence, source)
    
    def _handle_contradiction(self, key: str, old: Belief, new: Belief):
        """处理信念冲突"""
        self.contradictions.append({
            "key": key,
            "belief_a": old,
            "belief_b": new,
            "timestamp": datetime.now()
        })
        
        # 标记需要解决
        old.verification_status = "contradicted"
        new.verification_status = "contradicted"
    
    def query_belief(self, key: str, min_confidence: float = 0.5) -> Optional[Belief]:
        """查询信念，带置信度阈值"""
        belief = self.beliefs.get(key)
        if belief and belief.confidence >= min_confidence:
            return belief
        return None
    
    def get_uncertain_beliefs(self, threshold: float = 0.6) -> Dict[str, Belief]:
        """获取所有低置信度信念"""
        return {k: v for k, v in self.beliefs.items() 
                if v.confidence < threshold}
    
    def resolve_contradiction(self, key: str, winning_belief: str):
        """手动或自动解决信念冲突"""
        # 实现冲突解决逻辑
        pass

三、LangGraph 中的不确定性管理实现

3.1 整体流程设计

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Sequence
import operator

class AgentState(TypedDict):
    messages: Annotated[Sequence[dict], operator.add]
    uncertainty_signals: List[UncertaintySignal]
    belief_state: BeliefState
    confidence_score: float
    action_decision: str  # direct_answer, clarify, search, escalate
    final_response: Optional[str]

# 定义节点
def detect_uncertainty(state: AgentState):
    """不确定性识别节点"""
    query = state["messages"][-1]["content"]
    
    # 运行不确定性检测
    detector = UncertaintyDetector()
    signals = detector.analyze(query)
    
    return {
        "uncertainty_signals": signals,
        "confidence_score": compute_overall_confidence(signals)
    }

def decide_action(state: AgentState):
    """决策路由节点"""
    confidence = state["confidence_score"]
    signals = state["uncertainty_signals"]
    
    # 决策逻辑
    if confidence > 0.8 and not signals:
        return {"action_decision": "direct_answer"}
    elif any(s.type == UncertaintyType.AMBIGUITY for s in signals):
        return {"action_decision": "clarify"}
    elif any(s.type == UncertaintyType.KNOWLEDGE_GAP for s in signals):
        return {"action_decision": "search"}
    elif confidence < 0.4:
        return {"action_decision": "escalate"}
    else:
        return {"action_decision": "proceed_with_caution"}

def generate_response(state: AgentState):
    """生成最终响应"""
    decision = state["action_decision"]
    
    if decision == "direct_answer":
        response = generate_direct_answer(state)
    elif decision == "clarify":
        response = generate_clarification_question(state)
    elif decision == "search":
        response = generate_with_tool_call(state)
    elif decision == "escalate":
        response = generate_handoff_message(state)
    else:
        response = generate_cautious_answer(state)
    
    return {"final_response": response}

# 构建图
workflow = StateGraph(AgentState)

workflow.add_node("detect", detect_uncertainty)
workflow.add_node("decide", decide_action)
workflow.add_node("generate", generate_response)

workflow.set_entry_point("detect")
workflow.add_edge("detect", "decide")
workflow.add_edge("decide", "generate")
workflow.add_edge("generate", END)

agent = workflow.compile()

3.2 各决策路径的具体实现

路径1：直接回答（Direct Answer）

def generate_direct_answer(state: AgentState) -> str:
    """高置信度时的直接回答"""
    messages = state["messages"]
    
    prompt = """基于以下对话历史，直接回答用户问题。
    由于系统检测到高置信度，你可以直接给出答案。
    
    {history}
    
    请给出简洁准确的回答。
    """
    
    return llm.invoke(prompt.format(history=format_history(messages)))

路径2：澄清询问（Request Clarification）

def generate_clarification_question(state: AgentState) -> str:
    """生成澄清问题"""
    ambiguity_signals = [s for s in state["uncertainty_signals"] 
                        if s.type == UncertaintyType.AMBIGUITY]
    
    prompt = f"""用户的问题存在歧义，需要澄清。

检测到的歧义点：
{format_signals(ambiguity_signals)}

请生成一个友好的澄清问题，帮助用户明确需求。
要求：
1. 指出具体的歧义点
2. 提供可能的选项
3. 保持礼貌和专业
"""
    
    return llm.invoke(prompt)

# 示例输出：
# "您提到的'性能问题'可能有几种理解：
#  1. 系统响应速度慢？
#  2. 内存占用过高？
#  3. 吞吐量不足？
#  请具体描述您遇到的现象，以便我更准确地帮助您。"

路径3：工具调用（Tool Invocation）

def generate_with_tool_call(state: AgentState) -> str:
    """通过工具调用获取更多信息"""
    knowledge_gaps = [s for s in state["uncertainty_signals"]
                     if s.type == UncertaintyType.KNOWLEDGE_GAP]
    
    # 选择适当的工具
    tools_to_call = select_tools_for_gaps(knowledge_gaps)
    
    # 执行工具调用
    tool_results = execute_tool_calls(tools_to_call)
    
    # 更新信念状态
    for result in tool_results:
        state["belief_state"].add_belief(
            key=result["topic"],
            content=result["content"],
            confidence=result["confidence"],
            source="tool_result"
        )
    
    # 基于工具结果生成回答
    return generate_answer_with_evidence(state, tool_results)

路径4：人类介入（Human Handoff）

def generate_handoff_message(state: AgentState) -> str:
    """生成人类接管的消息"""
    low_confidence_signals = [s for s in state["uncertainty_signals"]
                             if s.severity > 0.7]
    
    prompt = f"""系统检测到需要人类专家介入的情况。

不确定性分析：
{format_signals(low_confidence_signals)}

请生成：
1. 向用户说明为什么需要转接人工
2. 简要总结当前情况
3. 说明人工客服将如何处理
"""
    
    return {
        "type": "handoff",
        "message": llm.invoke(prompt),
        "context": {
            "uncertainty_signals": state["uncertainty_signals"],
            "conversation_history": state["messages"]
        }
    }

3.3 置信度可视化与解释

class UncertaintyVisualizer:
    """可视化不确定性分析结果"""
    
    @staticmethod
    def format_uncertainty_report(signals: List[UncertaintySignal], 
                                   confidence: float) -> str:
        """生成人类可读的不确定性报告"""
        
        report = f"""
## 置信度分析

**整体置信度：** {'🟢' if confidence > 0.7 else '🟡' if confidence > 0.4 else '🔴'} {confidence:.1%}

**检测到的关注点：**
"""
        
        for signal in signals:
            emoji = {
                UncertaintyType.KNOWLEDGE_GAP: "📚",
                UncertaintyType.AMBIGUITY: "❓",
                UncertaintyType.CONTRADICTION: "⚠️",
                UncertaintyType.LOW_CONFIDENCE: "🤔",
                UncertaintyType.CONTEXT_MISSING: "📝"
            }.get(signal.type, "•")
            
            report += f"\n{emoji} **{signal.type.value}** (严重度: {signal.severity:.0%})\n"
            report += f"   {signal.description}\n"
        
        report += f"""
**系统决策：** {get_action_description(signals)}
"""
        return report

def get_action_description(signals: List[UncertaintySignal]) -> str:
    """根据信号生成决策说明"""
    if not signals:
        return "直接回答 - 系统对答案有较高把握"
    
    actions = []
    if any(s.type == UncertaintyType.AMBIGUITY for s in signals):
        actions.append("请求澄清")
    if any(s.type == UncertaintyType.KNOWLEDGE_GAP for s in signals):
        actions.append("搜索补充信息")
    if any(s.severity > 0.7 for s in signals):
        actions.append("建议人工介入")
    
    return " → ".join(actions) if actions else "谨慎回答"

四、生产环境的完整实现

4.1 企业级配置

from pydantic import BaseSettings

class UncertaintyConfig(BaseSettings):
    """不确定性管理配置"""
    
    # 阈值配置
    HIGH_CONFIDENCE_THRESHOLD: float = 0.8
    LOW_CONFIDENCE_THRESHOLD: float = 0.4
    ENTROPY_THRESHOLD: float = 2.0
    
    # 行为配置
    AUTO_ESCALATE_SEVERITY: float = 0.8  # 自动转人工的严重度阈值
    MAX_CLARIFICATION_ROUNDS: int = 3    # 最大澄清轮数
    ENABLE_CALIBRATION: bool = True      # 是否启用置信度校准
    
    # 工具配置
    SEARCH_CONFIDENCE_BOOST: float = 0.15  # 搜索后的置信度提升
    HUMAN_ESCALATION_COOLDOWN: int = 300   # 人工转接冷却时间（秒）
    
    # 监控配置
    LOG_UNCERTAINTY_METRICS: bool = True
    ALERT_ON_HIGH_UNCERTAINTY: bool = True

config = UncertaintyConfig()

4.2 完整 Agent 类

class UncertaintyAwareAgent:
    """
    具备不确定性管理能力的生产级 Agent
    """
    def __init__(self, config: UncertaintyConfig = None):
        self.config = config or UncertaintyConfig()
        self.detector = UncertaintyDetector(
            entropy_threshold=self.config.ENTROPY_THRESHOLD,
            confidence_threshold=self.config.LOW_CONFIDENCE_THRESHOLD
        )
        self.calibrator = ConfidenceCalibrator()
        self.belief_state = BeliefState()
        self.workflow = self._build_workflow()
    
    async def invoke(self, query: str, context: dict = None) -> dict:
        """
        主调用入口
        
        Returns:
            {
                "response": str,
                "confidence": float,
                "uncertainty_signals": List[UncertaintySignal],
                "action_taken": str,
                "belief_updates": List[dict]
            }
        """
        # 初始化状态
        state = AgentState(
            messages=[{"role": "user", "content": query}],
            belief_state=self.belief_state,
            context=context or {}
        )
        
        # 执行工作流
        result = await self.workflow.ainvoke(state)
        
        # 记录指标
        if self.config.LOG_UNCERTAINTY_METRICS:
            self._log_metrics(result)
        
        return {
            "response": result["final_response"],
            "confidence": result["confidence_score"],
            "uncertainty_signals": result["uncertainty_signals"],
            "action_taken": result["action_decision"],
            "belief_updates": self._get_belief_updates()
        }
    
    def _build_workflow(self):
        """构建处理流程"""
        workflow = StateGraph(AgentState)
        
        # 添加节点
        workflow.add_node("detect", self._detect_node)
        workflow.add_node("calibrate", self._calibrate_node)
        workflow.add_node("decide", self._decide_node)
        workflow.add_node("execute", self._execute_node)
        workflow.add_node("verify", self._verify_node)
        
        # 设置入口和边
        workflow.set_entry_point("detect")
        workflow.add_edge("detect", "calibrate")
        workflow.add_edge("calibrate", "decide")
        workflow.add_edge("decide", "execute")
        workflow.add_edge("execute", "verify")
        workflow.add_edge("verify", END)
        
        return workflow.compile()
    
    async def _detect_node(self, state: AgentState):
        """检测不确定性"""
        query = state["messages"][-1]["content"]
        
        # 多维度检测
        signals = []
        
        # 1. 基于 Prompt 的显式检测
        explicit_signals = await self._explicit_uncertainty_check(query)
        signals.extend(explicit_signals)
        
        # 2. 基于生成统计的隐式检测
        if state.get("generation_logits"):
            implicit_signal = self.detector.analyze_generation(
                state["generation_logits"],
                state.get("generation_logprobs", [])
            )
            if implicit_signal:
                signals.append(implicit_signal)
        
        return {
            "uncertainty_signals": signals,
            "raw_confidence": self._compute_raw_confidence(signals)
        }
    
    async def _calibrate_node(self, state: AgentState):
        """校准置信度"""
        if not self.config.ENABLE_CALIBRATION:
            return {"confidence_score": state["raw_confidence"]}
        
        calibrated = self.calibrator.calibrate(state["raw_confidence"])
        return {"confidence_score": calibrated}
    
    async def _decide_node(self, state: AgentState):
        """决策路由"""
        confidence = state["confidence_score"]
        signals = state["uncertainty_signals"]
        
        # 检查是否需要自动转人工
        if any(s.severity > self.config.AUTO_ESCALATE_SEVERITY for s in signals):
            return {"action_decision": "escalate"}
        
        # 常规决策逻辑
        if confidence >= self.config.HIGH_CONFIDENCE_THRESHOLD:
            return {"action_decision": "direct_answer"}
        elif any(s.type == UncertaintyType.AMBIGUITY for s in signals):
            return {"action_decision": "clarify"}
        elif any(s.type == UncertaintyType.KNOWLEDGE_GAP for s in signals):
            return {"action_decision": "search"}
        else:
            return {"action_decision": "cautious_answer"}
    
    async def _execute_node(self, state: AgentState):
        """执行决策"""
        decision = state["action_decision"]
        
        handlers = {
            "direct_answer": self._handle_direct_answer,
            "clarify": self._handle_clarification,
            "search": self._handle_search,
            "cautious_answer": self._handle_cautious_answer,
            "escalate": self._handle_escalation
        }
        
        handler = handlers.get(decision, self._handle_direct_answer)
        response = await handler(state)
        
        return {"final_response": response}
    
    async def _verify_node(self, state: AgentState):
        """验证并更新信念状态"""
        # 更新信念状态
        for signal in state["uncertainty_signals"]:
            if signal.evidence:
                self.belief_state.add_belief(
                    key=f"uncertainty_{signal.type}_{datetime.now().isoformat()}",
                    content=signal.description,
                    confidence=1 - signal.severity,  # 严重度越高，信念置信度越低
                    source="uncertainty_detection"
                )
        
        return state

4.3 监控与评估

class UncertaintyMetrics:
    """不确定性管理指标监控"""
    
    def __init__(self):
        self.metrics = {
            "total_queries": 0,
            "high_confidence_answers": 0,
            "clarification_requests": 0,
            "tool_invocations": 0,
            "human_escalations": 0,
            "calibration_errors": []
        }
    
    def record_interaction(self, result: dict, user_feedback: str = None):
        """记录一次交互结果"""
        self.metrics["total_queries"] += 1
        
        action = result["action_taken"]
        if action == "direct_answer":
            self.metrics["high_confidence_answers"] += 1
        elif action == "clarify":
            self.metrics["clarification_requests"] += 1
        elif action == "search":
            self.metrics["tool_invocations"] += 1
        elif action == "escalate":
            self.metrics["human_escalations"] += 1
        
        # 如果有用户反馈，记录校准误差
        if user_feedback:
            predicted_confidence = result["confidence"]
            actual_correct = user_feedback == "satisfied"
            calibration_error = abs(predicted_confidence - float(actual_correct))
            self.metrics["calibration_errors"].append(calibration_error)
    
    def get_summary(self) -> dict:
        """获取指标摘要"""
        total = self.metrics["total_queries"]
        if total == 0:
            return {"message": "No data yet"}
        
        return {
            "total_queries": total,
            "escalation_rate": self.metrics["human_escalations"] / total,
            "clarification_rate": self.metrics["clarification_requests"] / total,
            "tool_usage_rate": self.metrics["tool_invocations"] / total,
            "mean_calibration_error": (
                sum(self.metrics["calibration_errors"]) / 
                len(self.metrics["calibration_errors"])
                if self.metrics["calibration_errors"] else 0
            )
        }

五、与上午沙箱主题的呼应

5.1 完整的 Agent 安全架构

将上午的沙箱隔离与下午的不确定性管理结合，形成完整的 Agent 安全架构：

┌──────────────────────────────────────────────────────────────┐
│                     Agent 安全与可靠性架构                     │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                  外部边界层（上午主题）                 │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐           │   │
│  │  │  沙箱隔离 │  │ 资源配额 │  │ 审计日志 │           │   │
│  │  └──────────┘  └──────────┘  └──────────┘           │   │
│  └──────────────────────────────────────────────────────┘   │
│                         ↓                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                  认知管理层（下午主题）                 │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐           │   │
│  │  │ 不确定性 │  │ 置信度   │  │ 信念状态 │           │   │
│  │  │ 检测     │  │ 校准     │  │ 管理     │           │   │
│  │  └──────────┘  └──────────┘  └──────────┘           │   │
│  └──────────────────────────────────────────────────────┘   │
│                         ↓                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                  决策执行层                           │   │
│  │  [直接回答] ←→ [澄清询问] ←→ [工具调用] ←→ [人工介入]  │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

5.2 两者的协同效应

场景	沙箱的作用	不确定性管理的作用
执行外部命令	限制资源、网络访问	识别潜在风险操作，请求确认
处理用户上传	隔离执行环境	检测内容类型不确定性
调用第三方API	超时/错误处理	评估API返回可信度
生成建议	限制敏感领域	表达建议置信度
长时间运行	资源回收	进度不确定性的沟通

六、总结与最佳实践

6.1 核心要点回顾

不确定性识别 —— 多维度检测（显式+隐式）
置信度校准 —— 解决 LLM 过度自信问题
信念状态管理 —— 显式表示 Agent 的认知状态
决策路由 —— 根据不确定性程度采取不同策略
可解释性 —— 向用户透明展示置信度分析

6.2 实施路线图

阶段1：基础检测（1-2周）

实现基于 Prompt 的不确定性检测
添加简单的置信度阈值路由

阶段2：校准优化（2-3周）

收集用户反馈数据
实现置信度校准
优化阈值参数

阶段3：状态管理（3-4周）

实现信念状态系统
添加冲突检测和解决
支持多轮对话中的信念更新

阶段4：生产优化（持续）

监控指标和告警
A/B 测试不同策略
用户满意度追踪

6.3 关键设计原则

1. 透明性 > 完美性
   宁可承认不确定，也不要给出错误但自信的答案

2. 渐进式披露
   先尝试低成本解决方案（澄清、搜索），再转人工

3. 持续学习
   从用户反馈中校准模型，改进不确定性检测

4. 人机协作
   不确定性管理不是替代人类，而是更好地分配人机职责

Written by Cypher | 与上午的《AI Agent 安全沙箱设计与实现》形成完整的安全与可靠性架构专题

延伸阅读：

上午篇：《AI Agent 安全沙箱设计与实现：从原理到生产实践》
LangGraph 官方文档：https://langchain-ai.github.io/langgraph/
置信度校准研究：”On Calibration of Modern Neural Networks” (ICML 2017)

文章作者: Channing

文章链接: https://blog.aichanning.cn/agent-uncertainty-management/

AI Agent LangGraph Uncertainty LLM 置信度