引言:当 AI 做出决定,我们需要知道为什么

2024年,某银行的风控 AI 系统拒绝了一位客户的贷款申请,却无法给出明确原因。客户起诉银行”算法歧视”,最终银行因无法解释决策逻辑而败诉。同一年,某医疗 AI 推荐的治疗方案被医生质疑,但系统只能输出”置信度 94%”——这个数字对临床决策毫无帮助。

这些事件揭示了一个被忽视的核心命题:当 AI Agent 从”回答问题”进化到”做出决策”,可解释性不再是锦上添花,而是系统可信度的基石。

不确定性管理让我们知道 Agent “有多确定”,安全沙箱让我们确保 Agent “安全执行”,而可解释性让我们理解 Agent “为什么这样做”。三者共同构成可信 Agent 系统的支柱。

本文将系统性拆解 AI Agent 可解释性的工程实现路径,从 Tracing 追踪到 Attention 可视化,从本地调试到生产监控。


一、可解释性的层次模型

1.1 可解释性的三个维度

AI Agent 的可解释性不是单一概念,而是分层的系统工程:

层次 解释对象 关键问题 技术方案
系统级 整个决策流程 Agent 是如何一步步得出结论的? Tracing、流程图可视化
模型级 单次推理过程 模型为什么生成这个 token? Logprobs、Attention 可视化
工具级 外部工具调用 为什么调用这个工具?参数怎么来的? 工具调用追踪、输入输出记录

1.2 为什么 LLM Agent 的可解释性更难

传统软件的可解释性相对简单——代码逻辑是确定的,输入输出可追溯。但 LLM Agent 面临独特挑战:

1
2
3
4
5
6
7
8
9
10
11
12
# 传统软件:逻辑完全可追踪
def calculate_risk(score):
if score > 80:
return "high"
elif score > 50:
return "medium"
return "low"
# 输入 75 -> 输出 "medium",逻辑完全透明

# LLM Agent:黑盒推理
response = llm.invoke("评估这个客户的风险等级")
# 输出可能是 "medium",但我们不知道模型"想"了什么

LLM Agent 的可解释性挑战:

  1. 非确定性:相同输入可能产生不同推理路径
  2. 涌现行为:复杂行为无法从单步推理预测
  3. 工具链复杂:多工具调用的因果关系难以追踪
  4. 长上下文:长链推理中的信息流动不透明

二、系统级可解释性:Tracing 完整追踪

2.1 Tracing 的核心价值

Tracing(追踪)是理解 Agent 行为的入口。它记录 Agent 执行的全生命周期,将黑盒过程转化为可分析的数据流。

1
2
3
4
5
6
7
8
9
10
11
用户输入

[节点: intent_analysis] - 意图分析
↓ (输出: {"intent": "weather_query", "confidence": 0.95})
[节点: tool_selection] - 工具选择
↓ (选择: get_weather, 参数: {"city": "Beijing"})
[节点: tool_execution] - 工具执行
↓ (结果: {"temp": 22, "condition": "sunny"})
[节点: response_generation] - 响应生成

最终输出: "北京今天天气晴朗,22度"

2.2 LangSmith Tracing 实战

LangSmith 是 LangChain 官方提供的可观测性平台,提供开箱即用的 Tracing 能力。

基础集成:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import os
from langsmith import Client
from langchain_openai import ChatOpenAI

# 配置 LangSmith
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-api-key"
os.environ["LANGSMITH_PROJECT"] = "agent-explainability-demo"

# 创建带追踪的 LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)

# 所有调用自动被追踪
response = llm.invoke("什么是 AI Agent 的可解释性?")
print(response.content)

查看追踪结果:
运行后访问 https://smith.langchain.com 即可看到详细的执行追踪,包括:

  • 每次 LLM 调用的输入输出
  • Token 使用量
  • 延迟统计
  • 完整的调用链

2.3 自定义 Tracing 属性

为了让追踪更具解释性,我们可以添加自定义元数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from langchain_core.runnables import RunnableConfig
from langsmith import traceable

@traceable(
run_type="chain",
name="RiskAssessment",
tags=["production", "v2.1"]
)
def assess_risk(customer_data: dict) -> dict:
"""
风险评估节点 - 带完整追踪
"""
# 业务逻辑
risk_score = calculate_risk(customer_data)

# 返回带解释性元数据的结果
return {
"risk_level": "high" if risk_score > 80 else "medium" if risk_score > 50 else "low",
"risk_score": risk_score,
"factors": ["credit_history", "income_stability", "debt_ratio"],
"confidence": 0.92
}

# 使用
result = assess_risk({"age": 35, "income": 50000})

追踪输出的解释性价值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"run_id": "abc-123",
"name": "RiskAssessment",
"inputs": {"customer_data": {"age": 35, "income": 50000}},
"outputs": {
"risk_level": "medium",
"risk_score": 65,
"factors": ["credit_history", "income_stability", "debt_ratio"],
"confidence": 0.92
},
"metadata": {
"customer_id": "cust_789",
"model_version": "v2.1",
"review_required": false
},
"latency_ms": 245,
"tokens_used": 156
}

2.4 LangGraph 中的 Tracing 集成

LangGraph 原生支持 LangSmith Tracing,无需额外配置即可获得完整的图执行追踪。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from langgraph.graph import StateGraph, END
from typing import TypedDict
from langsmith import traceable

class AgentState(TypedDict):
query: str
intent: dict
tool_calls: list
response: str
reasoning_chain: list # 记录推理链用于解释

# 每个节点自动被追踪
@traceable(run_type="chain")
def analyze_intent(state: AgentState) -> AgentState:
"""意图分析节点"""
llm = ChatOpenAI(model="gpt-4")

prompt = f"分析用户查询的意图:{state['query']}"
response = llm.invoke(prompt)

# 记录推理步骤
reasoning_step = {
"node": "analyze_intent",
"input": state["query"],
"output": response.content,
"timestamp": time.time()
}

return {
**state,
"intent": {"category": "weather", "confidence": 0.95},
"reasoning_chain": state.get("reasoning_chain", []) + [reasoning_step]
}

@traceable(run_type="tool")
def execute_tool(state: AgentState) -> AgentState:
"""工具执行节点"""
tool_name = state["intent"].get("tool")
params = state["intent"].get("params")

# 执行工具
result = call_tool(tool_name, params)

reasoning_step = {
"node": "execute_tool",
"tool": tool_name,
"params": params,
"result": result,
"timestamp": time.time()
}

return {
**state,
"tool_calls": state.get("tool_calls", []) + [result],
"reasoning_chain": state.get("reasoning_chain", []) + [reasoning_step]
}

# 构建图
workflow = StateGraph(AgentState)
workflow.add_node("analyze_intent", analyze_intent)
workflow.add_node("execute_tool", execute_tool)
workflow.add_node("generate_response", generate_response)

workflow.set_entry_point("analyze_intent")
workflow.add_edge("analyze_intent", "execute_tool")
workflow.add_edge("execute_tool", "generate_response")
workflow.add_edge("generate_response", END)

# 编译 - 自动启用 Tracing
app = workflow.compile()

# 执行
result = app.invoke({"query": "北京今天天气怎么样?"})

LangSmith 中的可视化效果:

在 LangSmith 界面中,你会看到:

  1. 拓扑图:整个 Agent 图的执行路径
  2. 时间线:每个节点的执行时序
  3. 数据流:状态在各节点间的传递
  4. Token 消耗:每个 LLM 调用的成本

三、模型级可解释性:深入推理过程

3.1 Token 级别的置信度分析

LLM 生成文本时,每个 token 都有概率分布。分析这些分布可以揭示模型的”犹豫”和”确定”。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from langchain_openai import ChatOpenAI
import numpy as np

def analyze_token_confidence(response) -> dict:
"""
分析响应中每个 token 的置信度
"""
if not hasattr(response, 'response_metadata'):
return {"error": "No metadata available"}

# 获取 token 级别的 logprobs
logprobs = response.response_metadata.get('logprobs', {})

if not logprobs:
return {"error": "Logprobs not enabled"}

token_probs = []
low_confidence_tokens = []

for token_info in logprobs.get('content', []):
token = token_info.get('token', '')
logprob = token_info.get('logprob', 0)
prob = np.exp(logprob)

token_probs.append({
'token': token,
'probability': prob,
'logprob': logprob
})

# 标记低置信度 token(概率 < 0.7)
if prob < 0.7:
low_confidence_tokens.append({
'token': token,
'probability': prob,
'top_alternatives': token_info.get('top_logprobs', [])[:3]
})

return {
'tokens': token_probs,
'avg_confidence': np.mean([t['probability'] for t in token_probs]),
'low_confidence_count': len(low_confidence_tokens),
'low_confidence_tokens': low_confidence_tokens,
'overall_uncertainty': len(low_confidence_tokens) / len(token_probs) if token_probs else 0
}

# 使用
llm = ChatOpenAI(
model="gpt-4",
temperature=0,
model_kwargs={"logprobs": True} # 启用 logprobs
)

response = llm.invoke("解释量子纠缠现象")
confidence_analysis = analyze_token_confidence(response)

print(f"平均置信度: {confidence_analysis['avg_confidence']:.3f}")
print(f"低置信度 Token 数量: {confidence_analysis['low_confidence_count']}")
print(f"低置信度 Token: {[t['token'] for t in confidence_analysis['low_confidence_tokens']]}")

解释性价值:

当模型在生成专业术语或数字时”犹豫”(低置信度 token),这往往意味着:

  • 训练数据中该领域的样本较少
  • 问题本身存在歧义
  • 模型在”猜测”而非”知道”

3.2 Self-Consistency 可视化

Self-Consistency(自一致性)是一种强大的不确定性量化方法,同样具有很强的解释性价值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
import asyncio
from typing import List, Tuple
from collections import Counter
import difflib

class ExplainableSelfConsistency:
"""带解释性的 Self-Consistency 检查器"""

def __init__(self, num_samples: int = 5, temperature: float = 0.7):
self.num_samples = num_samples
self.temperature = temperature

async def generate_with_explanation(
self,
llm,
prompt: str
) -> Tuple[str, List[dict]]:
"""
生成响应并附带一致性解释
"""
# 生成多个样本
samples = []
for i in range(self.num_samples):
response = await llm.ainvoke(
prompt,
config={"temperature": self.temperature, "seed": i}
)
samples.append(response.content)

# 分析一致性
analysis = self._analyze_consistency(samples)

# 生成解释报告
explanation = self._generate_explanation(samples, analysis)

return analysis['consensus'], explanation

def _analyze_consistency(self, samples: List[str]) -> dict:
"""分析样本间的一致性"""
# 简化的精确匹配分析
normalized = [s.strip().lower() for s in samples]
counter = Counter(normalized)

most_common = counter.most_common(1)[0]
consensus_answer, count = most_common

# 找出差异点
differences = []
if len(counter) > 1:
for i, sample in enumerate(samples):
for j, other in enumerate(samples[i+1:], i+1):
diff = list(difflib.unified_diff(
sample.splitlines(),
other.splitlines(),
lineterm='',
n=2
))
if diff:
differences.append({
'sample_a': i,
'sample_b': j,
'diff': '\n'.join(diff[:10]) # 限制长度
})

return {
'consensus': consensus_answer,
'consensus_ratio': count / len(samples),
'total_samples': len(samples),
'unique_answers': len(counter),
'all_samples': samples,
'differences': differences
}

def _generate_explanation(self, samples: List[str], analysis: dict) -> List[dict]:
"""生成人类可读的解释"""
explanation = []

# 一致性概述
consensus_ratio = analysis['consensus_ratio']
if consensus_ratio == 1.0:
confidence_desc = "高度一致 - 所有样本给出相同回答"
elif consensus_ratio >= 0.8:
confidence_desc = "较为一致 - 大部分样本观点一致"
elif consensus_ratio >= 0.5:
confidence_desc = "存在分歧 - 样本间有显著差异"
else:
confidence_desc = "高度不确定 - 样本分歧很大"

explanation.append({
'type': 'consensus_summary',
'description': confidence_desc,
'ratio': consensus_ratio,
'unique_count': analysis['unique_answers']
})

# 分歧分析
if analysis['differences']:
explanation.append({
'type': 'differences',
'description': f"发现 {len(analysis['differences'])} 处显著差异",
'details': analysis['differences'][:3] # 只显示前3个差异
})

# 样本展示
explanation.append({
'type': 'samples',
'description': '所有生成的样本',
'samples': [{'index': i, 'content': s[:200]} for i, s in enumerate(samples)]
})

return explanation

# 使用示例
async def main():
llm = ChatOpenAI(model="gpt-4")
checker = ExplainableSelfConsistency(num_samples=3)

# 测试不同复杂度的问题
questions = [
"2 + 2 等于几?", # 应该高度一致
"解释区块链的工作原理", # 可能有不同角度
"预测明年股市走势", # 应该分歧很大
]

for question in questions:
print(f"\n{'='*60}")
print(f"问题: {question}")

answer, explanation = await checker.generate_with_explanation(llm, question)

print(f"\n共识答案: {answer[:100]}...")
print(f"\n解释报告:")
for item in explanation:
print(f" [{item['type']}] {item['description']}")

# asyncio.run(main())

解释性价值:

通过 Self-Consistency 分析,我们可以:

  1. 量化不确定性:一致性比率直接映射到置信度
  2. 识别模糊点:查看差异分析了解模型在哪些方面”犹豫”
  3. 提供替代视角:展示多个有效回答,让用户了解问题的开放性

3.3 Chain-of-Thought 可视化

Chain-of-Thought(思维链) prompting 不仅提升推理能力,也提供了可解释性。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
import json

def trace_reasoning_chain(problem: str) -> dict:
"""
追踪思维链推理过程
"""
llm = ChatOpenAI(model="gpt-4", temperature=0)

# CoT Prompt
prompt = ChatPromptTemplate.from_messages([
("system", """你是一个会展示思考过程的 AI。请按以下格式回答:

THINKING:
1. 首先...
2. 然后...
3. 最后...

CONCLUSION:
最终答案"""),
("human", "{problem}")
])

chain = prompt | llm
response = chain.invoke({"problem": problem})

# 解析思维链
content = response.content
thinking_part = ""
conclusion_part = ""

if "THINKING:" in content and "CONCLUSION:" in content:
thinking_start = content.find("THINKING:") + len("THINKING:")
conclusion_start = content.find("CONCLUSION:")

thinking_part = content[thinking_start:conclusion_start].strip()
conclusion_part = content[conclusion_start + len("CONCLUSION:"):].strip()

# 结构化推理步骤
steps = []
for line in thinking_part.split('\n'):
line = line.strip()
if line and line[0].isdigit():
# 提取步骤编号和内容
step_num = line.split('.')[0]
step_content = '.'.join(line.split('.')[1:]).strip()
steps.append({
"step": int(step_num),
"content": step_content
})

return {
"problem": problem,
"reasoning_steps": steps,
"thinking_raw": thinking_part,
"conclusion": conclusion_part,
"step_count": len(steps)
}

# 使用示例
result = trace_reasoning_chain("""
一个水箱有 100 升水,每天蒸发 5%。
每天又加入 3 升水。
10 天后水箱里有多少水?
""")

print(f"问题: {result['problem']}")
print(f"\n思维链步骤 ({result['step_count']} 步):")
for step in result['reasoning_steps']:
print(f" {step['step']}. {step['content']}")
print(f"\n结论: {result['conclusion']}")

解释性价值:

思维链可视化让用户能够:

  1. 验证推理逻辑:检查每一步是否合理
  2. 定位错误点:如果结论错误,在哪一步出了问题
  3. 学习推理模式:了解 AI 是如何处理这类问题的

四、工具级可解释性:追踪工具调用

4.1 工具调用决策追踪

当 Agent 调用外部工具时,我们需要理解:为什么调用这个工具?参数是如何确定的?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
from typing import Any, Dict
from dataclasses import dataclass
from datetime import datetime

@dataclass
class ToolCallRecord:
"""工具调用记录"""
timestamp: str
tool_name: str
reasoning: str # 为什么调用这个工具
parameters: Dict[str, Any]
result: Any
execution_time_ms: float
success: bool

class ExplainableToolExecutor:
"""带解释性的工具执行器"""

def __init__(self):
self.call_history: List[ToolCallRecord] = []

async def execute_with_explanation(
self,
tool_name: str,
parameters: Dict[str, Any],
reasoning_context: str = ""
) -> Dict[str, Any]:
"""
执行工具并记录完整上下文
"""
start_time = datetime.now()

# 生成工具选择解释
reasoning = self._generate_tool_reasoning(
tool_name,
parameters,
reasoning_context
)

try:
# 执行工具
result = await self._call_tool(tool_name, parameters)
success = True
error = None
except Exception as e:
result = None
success = False
error = str(e)

execution_time = (datetime.now() - start_time).total_seconds() * 1000

# 记录调用
record = ToolCallRecord(
timestamp=start_time.isoformat(),
tool_name=tool_name,
reasoning=reasoning,
parameters=parameters,
result=result if success else error,
execution_time_ms=execution_time,
success=success
)
self.call_history.append(record)

return {
"success": success,
"result": result,
"error": error,
"explanation": {
"why_this_tool": reasoning,
"parameter_selection": self._explain_parameters(parameters),
"execution_time_ms": execution_time
}
}

def _generate_tool_reasoning(
self,
tool_name: str,
parameters: Dict,
context: str
) -> str:
"""生成工具选择的理由"""
tool_descriptions = {
"search": "需要查找外部信息",
"calculator": "需要进行精确计算",
"weather_api": "需要获取实时天气数据",
"database_query": "需要查询历史记录"
}

base_reason = tool_descriptions.get(tool_name, "匹配工具功能")

if context:
return f"{base_reason}。具体上下文: {context}"
return base_reason

def _explain_parameters(self, parameters: Dict) -> str:
"""解释参数选择"""
explanations = []
for key, value in parameters.items():
explanations.append(f"{key}='{value}' (从用户输入提取)")
return "; ".join(explanations)

def get_call_history(self) -> List[ToolCallRecord]:
"""获取调用历史"""
return self.call_history

def generate_execution_report(self) -> dict:
"""生成执行报告"""
total_calls = len(self.call_history)
successful_calls = sum(1 for r in self.call_history if r.success)
total_time = sum(r.execution_time_ms for r in self.call_history)

return {
"total_calls": total_calls,
"success_rate": successful_calls / total_calls if total_calls > 0 else 0,
"avg_execution_time_ms": total_time / total_calls if total_calls > 0 else 0,
"tool_usage": self._get_tool_usage_stats(),
"timeline": [
{
"time": r.timestamp,
"tool": r.tool_name,
"success": r.success,
"duration_ms": r.execution_time_ms
}
for r in self.call_history
]
}

def _get_tool_usage_stats(self) -> Dict[str, int]:
"""统计工具使用频率"""
stats = {}
for record in self.call_history:
stats[record.tool_name] = stats.get(record.tool_name, 0) + 1
return stats

# 使用示例
async def demo_tool_explanation():
executor = ExplainableToolExecutor()

# 模拟工具调用
tools = [
("search", {"query": "LangGraph 可解释性"}),
("calculator", {"expression": "100 * 0.95"}),
("weather_api", {"city": "Beijing"})
]

for tool_name, params in tools:
result = await executor.execute_with_explanation(
tool_name=tool_name,
parameters=params,
reasoning_context=f"用户需求涉及 {tool_name} 相关功能"
)
print(f"\n工具: {tool_name}")
print(f"解释: {result['explanation']['why_this_tool']}")

# 生成报告
report = executor.generate_execution_report()
print(f"\n执行报告:")
print(f"总调用次数: {report['total_calls']}")
print(f"成功率: {report['success_rate']:.1%}")
print(f"平均执行时间: {report['avg_execution_time_ms']:.1f}ms")
print(f"工具使用统计: {report['tool_usage']}")

# asyncio.run(demo_tool_explanation())

4.2 工具调用链可视化

复杂的 Agent 可能调用多个工具,形成调用链。可视化这个链条对理解 Agent 行为至关重要。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def visualize_tool_chain(tool_records: List[ToolCallRecord]) -> str:
"""
生成工具调用链的可视化文本表示
"""
if not tool_records:
return "无工具调用记录"

lines = ["工具调用链:", "=" * 50]

for i, record in enumerate(tool_records, 1):
status_icon = "✓" if record.success else "✗"
lines.append(f"\n[{i}] {status_icon} {record.tool_name}")
lines.append(f" 时间: {record.timestamp}")
lines.append(f" 原因: {record.reasoning}")
lines.append(f" 参数: {record.parameters}")
lines.append(f" 耗时: {record.execution_time_ms:.1f}ms")

if not record.success:
lines.append(f" 错误: {record.result}")
else:
result_preview = str(record.result)[:100]
lines.append(f" 结果: {result_preview}...")

lines.append("\n" + "=" * 50)
return "\n".join(lines)

五、生产级可解释性架构

5.1 完整的可解释性监控体系

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
from dataclasses import dataclass, field
from typing import Optional, List, Dict
import json
import time

@dataclass
class ExplainabilityReport:
"""可解释性报告"""
request_id: str
timestamp: float

# 系统级
execution_trace: List[Dict] = field(default_factory=list)
latency_breakdown: Dict[str, float] = field(default_factory=dict)

# 模型级
token_confidence: Optional[Dict] = None
self_consistency_score: Optional[float] = None
reasoning_chain: Optional[List[str]] = None

# 工具级
tool_calls: List[Dict] = field(default_factory=list)

# 综合评估
explainability_score: float = 0.0 # 0-1,整体可解释性评分
confidence_level: str = "unknown" # high/medium/low

def to_dict(self) -> Dict:
return {
"request_id": self.request_id,
"timestamp": self.timestamp,
"system_level": {
"execution_trace": self.execution_trace,
"latency_breakdown": self.latency_breakdown
},
"model_level": {
"token_confidence": self.token_confidence,
"self_consistency": self.self_consistency_score,
"reasoning_chain": self.reasoning_chain
},
"tool_level": {
"calls": self.tool_calls
},
"assessment": {
"explainability_score": self.explainability_score,
"confidence_level": self.confidence_level
}
}

class ExplainabilityMonitor:
"""
可解释性监控系统
整合所有级别的可解释性数据
"""

def __init__(self, storage_path: str = "explainability_logs"):
self.storage_path = storage_path
self.reports: Dict[str, ExplainabilityReport] = {}

def start_request(self, request_id: str) -> ExplainabilityReport:
"""开始记录新请求"""
report = ExplainabilityReport(
request_id=request_id,
timestamp=time.time()
)
self.reports[request_id] = report
return report

def add_execution_step(
self,
request_id: str,
node_name: str,
input_data: Dict,
output_data: Dict,
latency_ms: float
):
"""添加执行步骤"""
if request_id not in self.reports:
return

report = self.reports[request_id]
report.execution_trace.append({
"node": node_name,
"input": input_data,
"output": output_data,
"latency_ms": latency_ms,
"timestamp": time.time()
})
report.latency_breakdown[node_name] = latency_ms

def add_model_explanation(
self,
request_id: str,
token_confidence: Optional[Dict] = None,
self_consistency: Optional[float] = None,
reasoning: Optional[List[str]] = None
):
"""添加模型级解释"""
if request_id not in self.reports:
return

report = self.reports[request_id]
report.token_confidence = token_confidence
report.self_consistency_score = self_consistency
report.reasoning_chain = reasoning

def add_tool_call(
self,
request_id: str,
tool_name: str,
reasoning: str,
parameters: Dict,
result: Dict,
success: bool
):
"""添加工具调用记录"""
if request_id not in self.reports:
return

self.reports[request_id].tool_calls.append({
"tool": tool_name,
"reasoning": reasoning,
"parameters": parameters,
"result": result,
"success": success,
"timestamp": time.time()
})

def finalize_report(self, request_id: str) -> ExplainabilityReport:
"""完成报告并计算可解释性评分"""
if request_id not in self.reports:
raise ValueError(f"Unknown request: {request_id}")

report = self.reports[request_id]

# 计算可解释性评分
scores = []

# 系统级:是否有完整的执行追踪
if len(report.execution_trace) > 0:
scores.append(1.0)

# 模型级:是否有置信度信息
if report.token_confidence:
scores.append(0.8)
if report.self_consistency_score:
scores.append(0.7)
if report.reasoning_chain:
scores.append(0.9)

# 工具级:是否有工具调用解释
if all("reasoning" in tc for tc in report.tool_calls):
scores.append(1.0)

report.explainability_score = sum(scores) / len(scores) if scores else 0

# 确定置信度级别
if report.self_consistency_score:
if report.self_consistency_score >= 0.8:
report.confidence_level = "high"
elif report.self_consistency_score >= 0.5:
report.confidence_level = "medium"
else:
report.confidence_level = "low"

# 保存报告
self._save_report(report)

return report

def _save_report(self, report: ExplainabilityReport):
"""保存报告到文件"""
import os
os.makedirs(self.storage_path, exist_ok=True)

filepath = f"{self.storage_path}/{report.request_id}.json"
with open(filepath, "w") as f:
json.dump(report.to_dict(), f, indent=2)

def get_report(self, request_id: str) -> Optional[ExplainabilityReport]:
"""获取报告"""
return self.reports.get(request_id)

def generate_summary(self, time_window_hours: int = 24) -> Dict:
"""生成汇总统计"""
cutoff_time = time.time() - (time_window_hours * 3600)
recent_reports = [
r for r in self.reports.values()
if r.timestamp > cutoff_time
]

if not recent_reports:
return {"error": "No data in time window"}

return {
"period_hours": time_window_hours,
"total_requests": len(recent_reports),
"avg_explainability_score": sum(
r.explainability_score for r in recent_reports
) / len(recent_reports),
"confidence_distribution": {
"high": sum(1 for r in recent_reports if r.confidence_level == "high"),
"medium": sum(1 for r in recent_reports if r.confidence_level == "medium"),
"low": sum(1 for r in recent_reports if r.confidence_level == "low")
},
"tool_usage": self._aggregate_tool_usage(recent_reports)
}

def _aggregate_tool_usage(self, reports: List[ExplainabilityReport]) -> Dict:
"""聚合工具使用统计"""
usage = {}
for report in reports:
for tc in report.tool_calls:
tool = tc["tool"]
usage[tool] = usage.get(tool, 0) + 1
return usage

5.2 人机协作的可解释性界面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def generate_human_readable_explanation(report: ExplainabilityReport) -> str:
"""
生成人类可读的解释报告
"""
lines = ["🤖 Agent 决策解释报告", "=" * 50, ""]

# 概述
lines.append(f"📊 整体可解释性评分: {report.explainability_score:.1%}")
lines.append(f"🎯 置信度级别: {report.confidence_level.upper()}")
lines.append("")

# 执行流程
lines.append("📋 执行流程:")
for i, step in enumerate(report.execution_trace, 1):
lines.append(f" {i}. {step['node']} ({step['latency_ms']:.0f}ms)")
lines.append("")

# 思维链
if report.reasoning_chain:
lines.append("💭 推理过程:")
for i, reasoning in enumerate(report.reasoning_chain, 1):
lines.append(f" {i}. {reasoning}")
lines.append("")

# 工具调用
if report.tool_calls:
lines.append("🔧 工具调用:")
for tc in report.tool_calls:
status = "✓" if tc["success"] else "✗"
lines.append(f" {status} {tc['tool']}")
lines.append(f" 原因: {tc['reasoning']}")
lines.append("")

# 建议
lines.append("💡 建议:")
if report.confidence_level == "low":
lines.append(" - 此决策置信度较低,建议人工复核")
elif report.confidence_level == "medium":
lines.append(" - 决策可信度中等,关键场景建议验证")
else:
lines.append(" - 决策可信度较高,可自动执行")

if report.explainability_score < 0.5:
lines.append(" - 可解释性信息不足,建议增强监控")

return "\n".join(lines)

六、最佳实践与实施路线图

6.1 可解释性策略选择矩阵

场景 推荐技术 实施复杂度 解释价值
日常调试 LangSmith Tracing
生产监控 完整监控体系
合规审计 全链路记录 极高
用户解释 CoT + 可视化
模型调试 Token Logprobs
关键决策 Self-Consistency 极高

6.2 实施路线图

阶段 任务 产出
第1周 集成 LangSmith Tracing 基础可观测性
第2周 添加自定义元数据 业务上下文追踪
第3周 实现 Token 置信度分析 模型级解释
第4周 构建 Self-Consistency 检查 不确定性量化
第5-6周 开发可解释性监控系统 生产级可观测性
第7-8周 构建人机协作界面 用户友好的解释

6.3 与其他 P0 能力的协同

可解释性与安全沙箱、不确定性管理形成完整的能力三角:

1
2
3
4
5
6
7
      不确定性管理
/ \
/ \
/ \
/ ⚡ \
/ \
可解释性 ←──→ 安全沙箱

协同效果:

  1. 不确定性 → 可解释性:高不确定性区域需要更强的解释能力
  2. 可解释性 → 安全沙箱:解释工具调用意图,验证执行合理性
  3. 安全沙箱 → 可解释性:沙箱执行日志本身就是可解释性数据

七、结语

可解释性不是奢侈品,而是生产级 AI Agent 的必要基础设施。当 Agent 做出影响用户的重要决策时,”为什么”和”怎么做”同样重要。

核心要点:

  1. 分层实施:从系统级 Tracing 开始,逐步深入到模型级和工具级
  2. 持续监控:可解释性数据需要长期积累才能发挥价值
  3. 人机协作:最好的解释是让人类能够理解并验证 AI 的决策
  4. 与可信 AI 三位一体:可解释性 + 不确定性管理 + 安全沙箱 = 可信 Agent

记住:能被解释的 AI 才能被信任,能被信任的 AI 才能被使用。


参考资源


本文是 AI Agent 可信系统系列的一部分。相关文章: