Introduction

What do libraries and databases have in common? Superficially, nothing -- one's a place where people go to get books, the other is a program for storing data. So they're totally different, right? Not so fast ...

Libraries and Databases: definitions

Libraries are building that have shelves full of books on all topics. Patrons are allowed to come and borrow books, provided they have a library code and they return the books later. The books are organized in a specific way on the shelves that makes it easy to find them. There are many librarians to help patrons in case of difficulties. Periodically, the library buys new books and gets rid of old ones.

A database is a program for the electronic storage, retrieval, and analysis of data. Rows of data are stored in tables; tables have columns for storing units of data. Programs can access the database, provided they have a username and password, and permissions. Databases are highly optimized to provide efficient access to data. Rows may be added and deleted from tables.

A shallow comparison

For the rest of this article, I'll investigate, compare, and contrast facets of libraries with corresponding facets of databases.

patron: client program

A patron is a person who uses a library; a client program uses a database.

librarian: in-memory caching

A librarian assists patrons, helping decrease the time they spend trying to accomplish tasks at the library. Librarians are paid for their time. Databases can cache query results, indexes, and data pages in memory to answer client requests faster. Such caching uses valuable memory. Thus, both increase the efficiency of their respective systems, at the cost of system resources.

checking out books: locking rows and tables

A patron may check out a book from the library, and take that book home, during which time he or she is the only one able to use the book. A database transaction may place a lock on a row or a table, preventing access for other transactions until it is finished. Thus, both of these restrict concurrent access to resources.

Non-borrowable materials: access control policies

Some library items are not available for checkout: magazines and reference books often must be read in the library. Similarly, database systems use permissions to control and limit access to tables, schemas, and attributes.

physical location of books on shelves: physical location of rows on disk

Books are placed on shelves in aisles, close to some books (i.e. on the same shelf), but far from others (i.e. on a different floor). They're organized by the Dewey decimal system. Rows are placed on disk in pages and files, close to some rows and far from others -- based on rotational latency and seek time of the hard disk. Rows are organized by the clustered index.

looking at nearby books: reading nearby rows

Imagine how long it would take to find 15 books scattered throughout the library, given only their Dewey decimal numbers. How much faster would it be to find 15 books sitting on the same shelf? (Answer: seconds versus minutes) Similarly, it can be faster to read 10000 sequential rows from a database, than to read 15 randomly scattered rows.

card catalog: secondary index

A card catalog is a listing of all the books in the library, typically sorted by author. The advantage of such a listing is that books can easily and quickly be found by author (or any author cataloged attribute) -- simply find the author's entries in the card catalog, read off the Dewey decimal numbers, and locate the books. Similarly, database rows can be quickly searched using an indexed attribute. The index stores the column's values in sorted order, along with a primary key value or row pointer. This can dramatically decrease the search time and disk usage.

Summary

Libraries and databases are both repositories of information; the one stores that information in books, and the other in rows and tables. Both have evolved mechanisms for efficient searching, restricted access, and isolation of physical layout from logical layout. I'm not up to speed on the history of databases, but I'd guess that at least some inspiration for their design was taken from analyzing libraries!