How Does a Data Warehouse Differ From a Database
There are a number of fundamental differences which separate a data warehouse from a database. The biggest difference between the two is that most databases place an emphasis on a single application, and this application will generally be one that is based on transactions. If the data is analyzed, it will be done within a single domain, but multiple domains are not uncommon.
Some of the separate units that may be comprised within a database include payroll or inventory. Each system will place an emphasis on one subject, and it will not deal with other areas. In contrast, data warehouses deal with multiple domains simultaneously.
Because it deals with multiple subject areas, the data warehouse finds connections between them. This allows the data warehouse to show how the company is performing as a whole, rather than in individual areas. Another powerful aspect of data warehouses is their ability to support the analysis of trends. They are not volatile, and the information stored in them doesn’t change as much as it would in a common database. The two types of data that you will want to become familiar with is operational data and decision support data. The purpose, format, and structure of these two data types are quite different. In most cases, the operational data will be placed in a relational database.
In the relational database, tables are frequently used, and they may be normalized. The operational data will be calibrated in a way that allows it to deal with transactions that are made on a daily basis. Every time an item is sold to a customer by the company, a record must be made of it. As can be expected, this data will be updated on a frequent basis. To ensure the efficiency of the system, the data must be placed in a certain number of tables, and the tables must have fields. Because of this, a single transaction may be comprised of at least five fields. While this system may be highly efficient in an operational database, it is not conducive to queries. In this situation, decision support data is often useful, and it offers support for things that are not readily used by operational data.
If you wish to take out a single invoice, you will often be required to join multiple tables. While operational data will deal mostly with transactions that are made daily, decision support data will give meaning to the data that is operational. The differences between decision support data and operational data can be split into three categories, and these are dimensionality, timespan, and granularity. Dimensionality is a concept which shows that the data is connected in various ways. The data that is stored in a data warehouse will often be multidimensional, and it is much different than the simple view that is often seen with operational data. Many data analysts are concerned with the many dimensional aspects of data.
The timespan deals with transactions that are atomic, or current. These transactions will deal with things such as the inventory movement, or the purchase of an order. Generally, operational data will deal with a short time frame. However, decision support data tends to deal with long time frames. Many company managers are interested in transactions that occured over a certain time period. Instead of dealing with the purchase of one customer, managers are often more interested in the buying patterns of a group of customers. If a sale has just been made, it will not be found in a decision support data warehouse.
Granularity is the third concept that separates operational data from decision support data. Operational data will deal with transactions that have occured within a certain period of time. However, the decision support data must be broken down into different parts of aggregation. While it may be summarized, it may also be more current. The managers within an organization will need information that is summarized at various degrees. Data warehouses have become more important in the Information Age, and they are a necessity for many large corporations, as well as some medium sized businesses. They are much more elaborate than a mere database, and they can find connections in data that cannot be readily found within most databases.