One good reason for introducing a new standard is efficiency. Most modern relational database management systems are transactional databases. SQL is inherently transactional, making it ideal for OLTP databases. So, does that necessarily mean that if you have a database, you should access the data using SQL? Since you have a hammer, does everything look like a nail?
OLTP databases often need to be able to write lots of data, quickly. But businesses later need to deeply analyze all that transactional data. For that, a copy is extracted from the OLTP database, transformed into a more optimal format, and loaded into a data warehouse or an OLAP database. Yes, you can still use SQL to look inside a data warehouse, but it's harder to use SQL to find patterns, quickly aggregate across the whole dataset, or do comparisons such as a time series. You can grab a coffee while you wait for it to finish.
In the mid-1990s, Microsoft introduced a new language for Multi-Dimensional eXpressions against an OLAP database, called MDX. It is a powerful but compact query language that knows and understands multi-dimensional databases/cubes. For example, if you want to know the sales of souvenir mittens at your retail outlet in Vancouver, BC during the month of February when the 2010 Olympics were held, you would write an MDX query that looks like this:
SELECT
[Store].[Store Country].[Canada].[Vancouver] ON COLUMNS,
[Product].[All Products].[Clothing].[Mittens] ON ROWS
FROM [Sales]
WHERE ([Measures].[Unit Sales], [Date].[2010].[February])
However, if you wanted to write this in SQL, you’d need a more cumbersome SQL query, something like this:
SELECT SUM(Sales.[Unit Sales])
FROM (Sales INNER JOIN Stores
ON Sales.StoreID = Stores.StoreID)
INNER JOIN Products
ON Sales.ProductID = Products.ProductID
WHERE Stores.StoreCity = 'Vancouver'
AND Products.ProductName = 'Mittens'
AND Sales.SaleDate BETWEEN '01-02-2010' AND '28-02-2010'
It becomes even more verbose to write an SQL expression that compares Feb 2010 sales to that of the previous month, or the previous year. In contrast, MDX makes it easy to write very short succinct queries that are more direct to read and understand, which in turn reduces errors. And you can write more general expressions like "prior period" comparisons, that at every level of drill-down in time hierarchy (e.g. day, month, quarter, year) show or compare to the prior unit of time at that level.
I liken it to writing a program in assembler code versus in a high-level programming language. You could do it, but why would you? This is not a perfect example, since assembler code is often faster than the high level language. However, in the SQL vs. MDX debate, it would be the MDX that is both easier to read and often faster since it expresses more clearly the goal of the query!
In recent years, many BI client vendors (SAP BusinessObjects, Microsoft, Cognos, and others) allow connections using MDX. This is a great move as it allows more powerful and general queries, an opportunity for better performance (especially against a cube), and greater interoperability. The latter is particularly valuable in the modern enterprise composed of acquisitions and mergers, thus leading to multi-vendor BI challenges that need to be bridged.
Well explained. I was exactly wondering why I would add an MDX-OLAP layer over an SQL-based Warehouse.
Posted by: Raffael Vogler | April 22, 2013 at 08:26 AM