The Standard DBMS Ideal: Normalization
Relational database theory teaches that databases should be highly normalized. The target is Third Normal Form (3NF), which ensures:
- ◆No data redundancy: Every fact is stored in exactly one place.
- ◆Referential Integrity: Enforcing strict foreign keys.
- ◆Anomalies prevention: Preventing updates from causing data inconsistencies.
While normalized databases are efficient for write transactions (OLTP), they require complex SQL joins to read data.
The Web Scale Reality
As web sites scale to millions of concurrent reads, joining five or six tables (e.g. Users, Profiles, Posts, Comments, Likes) degrades query times. CPU cores spend cycles computing hash joins on identical data over and over.
Under scale pressure, database engineers must adopt Denormalization.
Denormalization Patterns
Denormalization means intentionally storing redundant copies of data to speed up reads:
1. Computed Aggregates
Instead of running COUNT() queries inside query scopes:
-- Update counter directly on save
UPDATE Posts SET CommentCount = CommentCount + 1 WHERE PostId = 120;2. Data Duplication
Storing a username directly on the Comments table instead of looking it up in the Users table, saving a query join.
Comparing Architectures
| Feature | Normalized (3NF) | Denormalized |
|---|---|---|
| Write Performance | Fast (Writes once) | Slower (Updates multiple tables) |
| Read Performance | Slower (requires joins) | Fast (Single table lookup) |
| Data Integrity | Enforced by schema | Handled by application logic |
| Storage Cost | Minimal | Higher (Redundant data) |
Architectural Recommendation: Start with a normalized 3NF database schema to protect your system's core transactional records. As read bottlenecks occur, denormalize tables selectively using application-level validation or cache layers (like Redis or Memcached).