What occurs when the same data is duplicated in multiple files of a database?

Computer systems organize data in a hierarchy that begins with bits and bytes and progresses to more complex groupings of data:

  • Fields: Group of characters, words, or a complete number
  • Records: Group of related fields, describes an entity (a person, place or thing about which information must be kept - each characteristic of an entity is an attribute
  • File: Group of records of the same type
  • Database: Group of related files

Figure 6-1

What occurs when the same data is duplicated in multiple files of a database?


FIGURE 6-1 THE DATA HIERARCHY

A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.

In most organizations, the traditional approach to information processing meant that databases and other systems tended to grow independently without a company-wide plan. Accounting, finance, manufacturing, human resources, and sales and marketing all developed their own systems and data files.

- [Instructor] Your typical spreadsheet contains lots of columns that will contain the same piece of information over and over. When these types of structures are moved into a database environment, sometimes the same habits come along for the ride. The first problem in developing a relational database is not realizing when you're storing redundant information. And efforts need to be taken to eliminate any potential tendency towards storing duplicate data. Duplicate data are an issue for a couple of reasons. First, they slow down the system by taking up extra bandwidth across the network and extra disc space on the server. Second, they're difficult to maintain requiring multiple reads and writes to update records and keep things in sync. And finally, duplicated data runs the risk of becoming inconsistent. Let's take a look at how duplicates can occur and how the databases design can help us out of the situation. Here we have a simplified table that's keeping track of our customers purchases. We have the data broken down by the invoice number and we can see the name of the customer, their shipping address, what product they ordered, and who manufactured that particular product. We have three invoices for the same customer Bread Express, who ordered three different kinds of olive oil made by the same company, Two Trees. So far, so good. This structure gives us all the information we need to correctly ship the product to the customer. But what happens if Bread Express calls us up and says that they made a mistake. They recently moved and they need all of the oils to go to their new storefront. In order to make that change in the database we need to update the address of all three invoice lines. This creates a couple of opportunities to mix up our record keeping. What happens if we only update two and miss the third? Or, what happens if we type in two correctly but mistype one of them? This structure, while it initially seems like it will meet our needs, it becomes difficult to manage in the long run. The solution is to remove the components that repeat over and over again from the main invoices table. And to place them in completely different tables. We can do this by determining which fields depend on which other fields. In our case, the address really isn't telling us information about the invoice itself. But it's about the customer that placed the order. In other words, the address depends on the customer. Wherever we see this is the case we'll do the following. First, we'll remove the column that isn't dependent on the main topic of the table and place it in a new table. Then, we'll make a copy of the column that it was dependent on and place that in the new table on the side. If we apply this to our invoices table we'll remove the address column from the main table and create a new table that is more specifically about our customers. Now we have a simple way of looking up the correct address whenever we need it. When we want to know where to ship Bread Expresses order we'll look to this new customers table we just made. And we'll find the address listed there a single time. More importantly, in getting back to our example, it makes updates a breeze. We simply change the information in a single location now instead of having to go through and update every invoice. Now you might have noticed that we have a similar situation here with our manufacturers. Although it might not be as apparent since the three invoices we're looking at are all for different products. I'm going to make a huge assumption here that we only have a single supplier for each type of oil. This may or may not be the case and it will be something that I need to verify before implementing the following change. But if this is the case, then if we have more invoices showing you would expect to see that every time cold press olive oil was ordered it was supplied by the Two Trees Olive Oil company. Again, the solution is to decompose our tables into smaller groupings of information about a single topic. So we'll take out the manufacturer column from the invoices table and make a new table about products. Now, if a manufacturer changes for a particular oil it is a simple update to a single area of our database rather than going through every invoice and changing it there. So that's one way a well-structured database can help keep our information up to date. By eliminating any redundancy that's often present in simpler spreadsheet files such as we saw here with the customer addresses, and by making use of multiple tables to divide up the data into smaller related groups, it makes the data easier to manage in the long term.

Practice while you learn with exercise files

Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.

Download courses and learn on the go

Watch courses on your mobile device without an internet connection. Download courses using your iOS or Android LinkedIn Learning app.

What occurs when the same data is duplicated in multiple files of database?

Data redundancy occurs when the same piece of data is stored in two or more separate places and is a common occurrence in many businesses.

What is data redundancy and inconsistency?

When the same data exists in different formats in multiple tables. This condition is known as Data Inconsistency. It means that different files contain different information about a particular object or person. This can cause unreliable and meaningless information. Data Redundancy leads to Data Inconsistency.

What is the presence of duplicate data in multiple data files?

The presence of duplicate data in multiple data files is called program-data dependence.

What is the duplication of data or the storage of the same data in multiple places quizlet?

Redundancy is the duplication of data or the storing of the same data in more than one place. Redundancy wastes space b/c you're storing the same data in multiple places. The extra space results in larger spreadsheets that require more space in memory and on disk that take longer to save and open.