Joe the Business Intelligence Guy

A journey through the design and delivery of business intelligence solutions.

SQL Server™ Schemas and Data Warehouse Solution Organization

Good organization helps us to develop efficiency and, even more importantly, increases our solutions understandability. Organizing the physical implementation of a business intelligence solution’s data tier within SQL Server™ is critical for organizing the overall business intelligence solution. The use of schemas to organize the business intelligence solution’s SQL Server™ database objects (tables, views, stored procedures, functions) has many benefits including:

1.       Clearly differentiates the objects based on their purpose (e.g. Dim = Dimension, Fact = Fact)

2.       Eliminates the usage of suffixes or prefixes in object names (e.g. DimEmployee, FactRetailSales)

3.       Enables control of object visibility by applying security to the schema (e.g. end users only access Dim and Fact schemas)

As somewhat of a relational purist I object to names of many database objects such as tables and views with prefixes to indicate their purpose. For example, many business intelligence or data warehousing efforts include table names like DimDate and DimEmployee for dimension tables and FactSales for fact tables.

Since SQL Server™ 2005, schemas are not just a user based security object and now form a distinct container of objects that exists independently of database users. Additional details on this separation can be found within the SQL Server™ Books on Line topic “User-Schema Separation”.

The great data warehousing visionary, Ralph Kimball, rightly espouses the need to separate concerns in our data tier. Specifically, he addresses the separation of the data staging area and the data presentation area. His colorful analogy of the areas of a restaurant is useful. Ralph likens the data staging area of a data warehouse, “… to the kitchen of a restaurant, where raw food products are transformed into a fine meal.” He also correctly cites that this area should only be accessed by skilled professionals and should not be having to respond to customer inquiries. Lastly, Ralph states that it just is not safe for the patrons to go into our staging area as they could, “…be injured by the dangerous equipment, hot surfaces, and sharp knifes they may encounter in the kitchen.” Of course to further extend this warning, Ralph includes my favorite rationale when he states that, “Besides, things happen in the kitchen that customers just shouldn’t be privy to.”

So what has this advice to do with SQL Server™ schemas? Simple really, by our use of schemas within our data tier we are able to physically and logically separate the data staging and data presentation areas of our data warehouse.

So as a simple example of how schemas help organize our efforts, I’ll share how I use them as part of my own solution design strategy. First, the tables in the presentation area of the data warehouse would be organized into either the Dim or Fact schemas. As such you can expect to find tables named (with the schema qualifier) Dim.Date and Dim.Employee for dimension tables and Fact.Sales for fact tables. Any additional views, stored procedures, functions, etc. that are related to the dimensions, they get created in the related object’s schema.

I use the following schemas:

·       Dim — used for all objects related to the dimensions in the model

·       Fact — used for all objects related to the facts in the model

·       Admin – used for all objects related to configuration that may be accessible to technical end users (e.g. SSIS package configuration table)

·       Audit — used to hold objects related to auditing (e.g. SSIS custom logging and error handling – except for the sysdtslog90 table for default SSIS logging which is assigned to the dbo schema and cannot be changed)

·       Etl — used to hold objects directly related to the ETL process including staging data (configuration values, processing directives, etc.)

·       Report – used to hold objects specifically used by reporting applications (such as views and stored procedures used directly by SSRS)

·       Olap – used to hold objects specifically used by analysis applications (such as views used directly by SSAS

·       Test – used to hold objects specifically used for testing (such as tear down and setup data and scripts including those that may modify configurations)

·       Util – used to hold objects that can be used generically by any part of the solution (such as a uf_TableOfDate function used to create values for a date dimension or to be used to join during direct relation queries)

I hope this information helps you to consider using schemas to help organize your data warehouse solutions.

Advertisements

Written by Joe Salvatore

January 22, 2009 at 3:46 pm

%d bloggers like this: