Functional Dependencies and Relational Database Design

SQL relational database theory

Determine functional dependencies to structure relations in third normal form.

I recently completed the Databases for Data Scientists specialization on Coursera from the University of Colorado Boulder. Most of the three courses in the specialization were a review for me, but the Relational Database Design course covered normalization theory in greater detail than I had previously seen.

Although I was familiar with entity relationship diagrams and the benefits of normalization as far as minimizing data redundancy, I had not studied the more formal normalization process and the concept of functional dependencies.

Functional Dependencies

In a relation (a table), an attribute (a column) such as “date_of_birth” would always determine a column “zodiac_sign”. In other words, “zodiac_sign” is functionally dependent on “date_of_birth”. Knowing these kinds of relationships between attributes forms the basis for which attributes belong in a relation and which should be linked together with foreign keys.

Normal Forms

Previously, I’ve focused primarily on minimizing data redundancy when determining what attributes belong in the same relation. Having a more formal system of first, second, and third normal forms adds structure to this process.

In particular, the course introduced:

Speeches Example

In a previous project, I created a normalized PostgreSQL database from a CSV file of India’s Independence Day speeches. Spurred on by what I learned in the PostgreSQL for Everybody specialization, my goal was to store every string only once and link tables together with integer keys.

ERD for speech database

Now armed with the terminology of relational database theory, I would not change the database’s structure, but I can provide new justifications for why the design is “suitable”.

For every relation, the primary key determines all of the other attributes. For example, the corpus relation, can be represented as:

corpus (year (pk), pm_id(fk), party_id(fk), title_id(fk), footnote_id(fk), source_id(fk), url_id(fk), text_id(fk))

It has only one functional dependency, which could be written as:

year -> pm_id, party_id, title_id, footnote_id, source_id, url_id, text_id

Of course, this is an overly simple example, but the principles can be useful for working out more complicated situations.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".