Third normal form
(3NF for short) is a theory that describes how data should be structured in an SQL database. The general idea behind 3NF is that data should not be duplicated, because that increases the risk of data getting out of sync.
Unfortunately, 3NF proved to not work well for web applications. 3NF was designed to save disk space and reduce update anomalies, not to make it easy to display all relevant information. Imagine a simple blog post: not only do you need to display the blog post content, you need to display:
- The author’s name and profile picture
- A list of related blog posts
- The blog post’s tags
Building an API on top of a 3NF database lead to massive amounts of JOIN statements - displaying a simple blog post meant you needed a separate JOIN for each of the above data. Around 2008, this meant most web applications were built with caching layers like memcached
on between the API and the database. Memcached would cache query results so you wouldn’t have to execute 20 JOINs to display a single blog post every single time.
But making your API fast came with a cost: update anomalies. An update anomaly is a case where two objects in the database have a copy of the same data, and you only update one of those objects. For example, if each blog post stores the author’s name, you need to update each blog post if you want to change the author’s name. If you miss one blog post, that’s an update anomaly.
3NF was designed to prevent update anomalies, but once you introduced caching, you needed to be careful to invalidate the cache once the cached data was invalid.
When MongoDB was first released, it successfully challenged the notion that every app should be built on top of a 3NF database. You could trade some performance in writes and the risk of update anomalies, to make reading the relevant data to display a page fast without resorting to caching.
With MongoDB, we largely replaced 3NF with the handy mnemomic “store what you query for.”
Store the data that is relevant to the document on the actual document, rather than scattering it for the sake of saving disk space. Update anomalies can be troublesome, but they’re easier to identify and fix than widespread performance degradation.