Learn the two main strategies for distributing data: fragmentation and replication.
Fragmentation and Replication are the two primary techniques used to distribute data across the different sites of a distributed database system. Fragmentation is the process of breaking a database relation (table) into smaller pieces, called fragments, and storing these fragments at different sites. There are two main types of fragmentation. Horizontal Fragmentation splits a table by its rows. Each fragment contains a subset of the rows of the original table. For example, a `Customers` table could be fragmented based on city, with all New York customers stored at the New York site and all London customers at the London site. Vertical Fragmentation splits a table by its columns. Each fragment contains a subset of the columns, along with the primary key. For example, an `Employees` table could be split into one fragment with `employee_id`, `name`, and personal data, and another fragment with `employee_id`, `salary`, and job data. Replication, on the other hand, is the process of storing copies of the same data fragment at multiple sites. This is done to increase availability and performance. If a site containing a data fragment fails, users can still access the data from one of its replicas at another site. It also improves read performance, as queries can be directed to the nearest replica. The major downside of replication is the overhead of keeping all the replicas consistent. When data is updated, the change must be propagated to all its copies, which can be complex and resource-intensive.