…some of this is The Lebowski Theorem of machine superintelligence in action. These agents didn’t necessarily hack their reward functions but they did take a far easiest path to their goals, e.g. the Tetris playing bot that “paused the game indefinitely to avoid losing”.
Even for humans, we seem to always be hacking and optimizing for whatever gets us the best reward. Incentives are so important.