Replies

“Why make the thing work on a reward system? Isn’t it enough to tell it to “go there, do that”?”

Some things from the article that are clear. (1) This was all in a computer simulation. A real operator did not die. (2) The reward system is a method of training an AI to do what you want it to. Think of it as giving pleasure to the AI when it does what you want. But designing the reward system can be much more difficult that you would think — as demonstrated by this article. (3) One would only use “go there and do that” if we knew exactly how to go there and do that in every possible situation and could code the instructions as an algorithm. The whole idea of an AI that learns is for the intelligent machine to learn how to go there and do that in a wide variety of situations by trying over and over and getting better as it goes.