# 机器学习术语表：强化学习

#rl

#rl

## 贝尔曼方程

#rl

$Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s'|s,a} \max_{a'} Q(s', a')$

$Q(s,a) \gets Q(s,a) + \alpha \left[r(s,a) + \gamma \displaystyle\max_{\substack{a_1}} Q(s',a') - Q(s,a) \right]$

#rl

## 深度 Q 网络 (DQN)

#rl

Q 学习中，一种预测 Q 函数的深度神经网络

Critic 是深度 Q-Network 的同义词。

#rl

#rl

#rl

#rl

#rl

#rl

#rl

#rl

#rl

#rl

Q 函数也称为状态-操作值函数

#rl

#rl

#rl

#生成式 AI
#rl

#rl

## return

#rl

$$\text{Return} = r_0 + \gamma r_1 + \gamma^2 r_2 + \ldots + \gamma^{N-1} r_{N-1}$$

#rl

#rl

#rl

Q 函数的同义词。

#rl

#rl

#rl

## 轨迹

#rl

[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"没有我需要的信息" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"太复杂/步骤太多" },{ "type": "thumb-down", "id": "outOfDate", "label":"内容需要更新" },{ "type": "thumb-down", "id": "translationIssue", "label":"翻译问题" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"示例/代码问题" },{ "type": "thumb-down", "id": "otherDown", "label":"其他" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"易于理解" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"解决了我的问题" },{ "type": "thumb-up", "id": "otherUp", "label":"其他" }]