Trends in Data Quality and Management

One of the biggest technological trends for 2007 is database management, specifically insuring data quality. Databases are a collection of pieces of information organized according to a schema. Schemas describe the information pieces in the database and the relationship between them. Database models include relational, hierarchical, and network models to organize relationships between data.

This structure allows for programs to access databases and efficiently use information to perform functions, answer queries and otherwise process data. The software that performs this procedure is called the Database Management System (DBMS).

DBMSs are usually organized according to the database’s model, but can also independent of them. These software systems are concerned primarily with concurrency, performance, integrity, and recovery.

The management and improvement of data quality is a major trend in 2007. This article will discuss the history and uses of DBMSs and trends for 2007.

Background

The first DBMS was born in 1968. Processing was file-based with data stored in flat files with processing traits arranged by the common use of a magnetic tape medium. Mapping occurred through interfaces and the correspondence of one logical function to a physical file. Programming was complicated and extensive, requiring the use of third generation languages like COBOL or BASIC. Third generation languages are high-level programming languages that make it easier to write programs than previous languages.

File-based DBMS, while innovative had lots of limitations. Because each program contains its own data, there is separation and isolation of data between programs. This, in turn, caused high levels of redundancy, with the same data in several different programs. These early DBMSs also, lacked security and data integrity was expensive to maintain.

In the 1960’s non-relational databases were introduced. Their dominance continued into the 1980’s. Non-relational models integrated and structured operational data so it could be used across applications. Hierarchical models, such as IMS, IBM’s first DBMS, provided more efficient searching, less redundancy, and increased security and data integrity. They were based on binary trees, which function like an upside down tree with many relationships existing between data and functions.

However, these systems were difficult to implement and lacked standards and structural independence. This made it difficult for them to handle the many relationships set up by the binary tree and inhibited complexity because new applications could not be added to the model.

Prominent network models like CODASYL DBTG were developed by the Honeywell Integrated Data Store (IDS) in the early 1960’s and standardized by 1971. These divided the model into three components. First, there was the network schema, which organized the database. The subschema offered views of the database per user. Finally, it offered a low-level, procedural database language. This model composed sets of relationships that contained many relationships between owner and member records. Each set of data or record could function as an owner or a member.

In the network model, lower level structures or branches could be connected to several higher level structures or nodes. This feature makes redundancy in data more easily identifiable. Network models also increase retrieval through the use of pointers that directly locate records. This feature, however, complicates data loading and re-organization.

Relational models and DBMSs continued to mature through the 1970s and 1980. In the 1990’s, relational DBMSs became object-oriented. This allowed for new application areas such as data warehousing, the exploration of text and multimedia on the Web and Internet, ERP (enterprise resource planning) and MRP (management resource planning). In 1991, Microsoft shipped a personal DBMS as a part of Windows, supplanting all other personal DBMSs.

The first Internet database applications were available in 1995. In 1997, XML was integrated into DBMSs. Because XML provides such a common and easily understandable language and structure, it allows all of an organization’s information resources to be stored together and structured based on need. XML databases are the current trend since the turn on the millennium.

Uses

One of the main benefits of database systems is that multiple applications and different types of users can use them, all drawing from the same data, to perform many functions. Some examples of DBMSs include Oracle, Microsoft Access, Filemaker, and Microsoft SQL Server.

DBMSs support queries written in programming languages that allow users to analyze and update data. Query ability also regulates security, controlling access to databases through the use of passwords and other electronic barriers.

In addition, DBMSs provide periodic backups of set attributes or fields and redundancy transparency. This replicates information throughout the system to insure data consistency. DBMSs also facilitate computation (counting, summing, averaging, cross-referencing) and automation. The trend toward object-relational database management systems (ORDBMS), currently governs DBMS production.

Points of Interest

Some of the most popular ORDBMSs are IBM’s DB2, Oracle database and Microsoft’s Microsoft SQL Server. Trends in development include masterdata management both as a source and repository of masterdata. This includes software infrastructure and business processing applications that consistently manage and analyze data across diverse environments. The further development and integration of OLAP or on-line analytical processing into DBMSs allows for faster and more reliable data analysis and forecasting across different environments.

Trends in Data Quality and Management