Why Mars Rover have 1.53 expected returns when starting from the first cell ?

For answering the question that we mentioned at the previous post, about the difference in the critic neural network between the AC and DDPG algorithms, I thought it will be better to review the Stanford CS234, Reinforcement Learning online course, the first lecture went well, but in the second lecture discussing the Markov Reward Process …

Why is the critic network in the AC algorithm doesn’t have action path like it does in the DDPG algorithm ?

I am currently working on answering this question before moving forward and apply the AC algorithm, I noticed that difference when I was migrating from the DDPG algorithm to the AC algorithm, the reason for why I am migrating to AC, that’s because I need discrete actions for controlling my stepper motor, instead of DDPG …