"Unlock your business data and gain inovation and competitive advantage with Expert Database Solutions."




Data and Database Integration

Provide a Unified View of Data Across Applications

Data integration involves combining data residing in different sources and providing users with a unified view of these data.

The need for data integration ocurrs as the volume of data collected by a business entity over time becomes very large and the need to share existing data across the entire business enterprise explodes.The process of data integration becomes significant in a variety of situations both commercial and scientific.

Examples:

  • two similar companies merge and need to merge their databases
  • different bioinformatics repositories want to combine research results

The rapid adoption of databases after the 1960s naturally led to the need to share or to merge existing data repositories. This merging can take place at several levels in the database architecture.

One popular solution involves data warehousing. The data warehouse system extracts, transforms, and loads data (ETL) from several data sources into a single queriable database schema. Architecturally, this offers a tightly coupled approach because the data reside together in a single database repository at query-time.

Problems can arise with the data warehouse solution in terms of data "freshness." For example when an original data source gets updated, but the data warehouse still contains the older data and the ETL process needs re-execution. Difficulties also arise in constructing data warehouses when one has only a query interface to a data sources and no access to the full data.

The trend in data integration in recent years has favored loosening the coupling between data. This may involve providing a uniform query interface over a mediated schema, thus transforming a query into specialized queries over the original databases. This process is sometimes called "view-based query-answering" because each of the data sources functions as a view over the (nonexistent) mediated schema. The formal label for this approach is "Local As View" (LAV) where "Local" refers to the local sources/databases.

This model of data integration is not maintenace free as whenever a new source gets integrated and/or an existing source changes its schema the view for the mediated schema must be re-written.