NNs learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, even after overfitting. After completely overfitting to the training set, generalization performance improves rapidly.