COMA

文章目录

  • credit assignment problem

credit assignment problem

  Since all agents are exploring and learning at the same time,
it is difficult for any given agent to estimate the impact of their action on the
overall return

  For example, an agent might have chosen the optimal action in a
given state, but the returns are lower than average since the teammate took an
exploratory action. The agent(not the teammate) will thus (falsely) learn to reduce the probability of selecting this (optimal) action

你可能感兴趣的:(MADRL)