Game Day Exercises are days where you test the reliability of your system. How do you know if your database replication works? You don’t want to know during a real emergency, you want to know in a “controlled” environment when you are already alerted, so in case something goes wrong you know it and fix it.
Stripe does game days exercises often, to test some critical parts of the system, and the results are often surprising, and they can learn from those results and fix the problem before your pager goes off at 3 AM during a weekend.
In this article, they are explaining how they do game day exercises, and one of the interesting parts is that they are injecting all their crashes on the production environment, to really test the reliability of it, instead of a sandboxed environment.