Media Summary: Agent in "reacher" environment trained to reach the ball using Google DeepMind 提出的一种使用Actor Critic 结构, 但是输出的不是行为的概率, 而是具体的行为, 用于连续动作(continuous action) ... ... in this way to work well with continuous actions is called
6 2 Ddpg Deep Deterministic - Detailed Analysis & Overview
Agent in "reacher" environment trained to reach the ball using Google DeepMind 提出的一种使用Actor Critic 结构, 但是输出的不是行为的概率, 而是具体的行为, 用于连续动作(continuous action) ... ... in this way to work well with continuous actions is called Welcome to Week 10 Lecture 4 of the course "Special topics in ML (Reinforcement Learning)" by Prof. Balaraman Ravindran.