Data Warehouses

August 20, 2008

Data Warehousing and Ad Hoc Queries

It is interesting to see how various vendors believe that data warehouses and data warehouse appliances will help when it comes to ad hoc query performance.  A lot of databases are tuned to perform well for certain types of queries.  When the user of the database asks something unexpected, performance can grind to a halt.  Stephen Swoyer at TDWI wrote a very good article entitled "In Depth: Closing the Ad Hoc Query Performance Gap for Good" http://www.tdwi.org/News/display.aspx?ID=9034 about this issue.

Performance of ad hoc queries in a data warehouse or data warehouse appliance is a major issue.  As more data is gathered, more people want to try to glean actionable information from that data.  The better able the users are to glean actionable information, the more successful they will be.  There are many companies in this space including:

- illuminate

- Teradata

- Netezza

- DATAllegro

- Dataupia

- Kognitio

- Sybase

- ParAccel

- InfoBright

- Vertica

And of course, the traditional companies like Oracle, IBM, and Microsoft should not be forgotten as well as SAP with its NetWeaver BI (BW) and BIA offerings.  BI is powerful but to get the most value, you will need a database engine that performs.

July 31, 2008

Good article on Microsoft's acquisition of DATAllegro...

Stephen Swoyer at TDWI wrote a good piece on Microsoft's acquisition of DATAllegro entitled "Analysis: What's Behind Microsoft's DATAllegro Acquisition?" http://www.tdwi.org/News/display.aspx?ID=9056.  Definitely worth a read.  I too expect Microsoft to say more at their BI Conference which runs from October 6-8 and is located in Seattle http://www.microsoft.com/bi/conference/.  I went to the first conference last year and found it gave good insight into Microsoft's BI directions.  I expect there will be much more this year especially now that PPS is a shipping product.

July 25, 2008

Microsoft's Data Warehouse Appliance Play

I was thinking more about the announcement yesterday that Microsoft is acquiring DATAllegro.  How will this move by Microsoft affect the Data Warehouse space?  In Gartner's 2007 Magic Quadrant http://mediaproducts.gartner.com/reprints/microsoft/article19/article19.html, Microsoft was positioned behind Teradata, Oracle, and IBM.  Granted all four of these companies were in the magic quadrant but it was clear that Microsoft was behind.  DATAllegro on the other hand was considered a visionary and was considered as having a lesser ability to execute.

It is interesting that this acquisition comes at the end of the development cycle of Microsoft SQL Server 2008.  Towards the end of a development cycle, the product management folks within Microsoft start looking at what will be in the next version of the product.  It looks like they decided that MPP was an important piece to have and DATAllegro has the technology they want.  Unless there was a lot of work that was going on behind the scenes over the last year, it is highly unlikely that any of the DATAllegro technology will find its way into SQL Server 2008.  More likely, Microsoft will introduce something in a service pack to SQL Server 2008 or with the next version of SQL Server.

According to Gartner "Not only has DATAllegro added value to the Ingres DBMS through many enhancements to the functionality (now supplied as part of the Ingres OSS DBMS), but its has also added a software layer over the top of Ingres, creating the MPP architecture to parallelize queries across the processors and to manage the workload of the appliance."  The fact that DATAllegro has figured out how to add MPP functionality with a layer on top of the data warehouse likely means that Microsoft will take this technology and wrap the DATAllegro MPP software layer on top of SQL Server.  It is highly unlikely that Microsoft would want to start shipping Ingres.

In a June 2008 DMReview interview http://www.datallegro.com/pdf/articles/stuart_frost_interview_rackemup.pdf, Stuart Frost, DATAllegro's CEO talks about how they are aggressively targeting Teradata customers right now.  Frost says "We're targeting Teradata customers today because we think they're being overcharged for the capabilities they're getting."  When Microsoft enters a market, they tend to push prices down.  So, how will this impact Teradata?  My guess is that any effect on Teradata is still some time away.  It will take a year or two for Microsoft to deliver a product using the DATAllegro technology and SQL Server.  Then it will take some time for Microsoft to get market traction.  This gives Teradata a lot of time to respond.  What the Microsoft/DATAllegro move means to companies like Netezza, Greenplum, and Kognitio is more interesting because these other companies now have to compete with the deep pockets and marketing muscle of Microsoft.

July 24, 2008

Consolidation in the Data Warehouse Appliance Space

Wow!  We sort of expected to see some consolidation in the data warehouse appliance space but I have to admit I was caught somewhat off-guard by Microsoft's acquisition of DatAllegro http://www.datallegro.com/pr/7_24_08_microsoft_acquisition.asp.  I think it is a great move by Microsoft.  The data warehouse appliance is a much needed product for corporations because the amount of data that corporations collect is ever growing.  Being able to quickly deploy data warehouse solutions and having highly scalable data warehouse solutions is a must.  Good for Microsoft.  SQL Server 2008 (code named Katmai) is pretty much in the box at this point so my guess is that it will be post Katmai that we will see the fruits of this acquisition.

July 16, 2008

The "last 18 inches"...

I was reading the article entitled "Visualize This: A Fresh Perspective on Business Intelligence Systems" by Angela Shen-Hsieh http://www.dmreview.com/specialreports/2008_88/10001663-1.html.  Angela comments "I like to refer to this visual approach as “the last 18 inches” - that is, the distance between the computer screen and the human brain. Focusing on this last and most vital link of the data chain is essential to getting value out of the massive investments in IT infrastructure that companies have made. After all, if we can’t get information to our brains in a meaningful way, then we’re just drowning in numbers."  This is indeed a very important piece of the BI puzzle - how to get information from data.  The data warehouse appliance has done a good job of scaling and allowing us to process more and more data.  The big question is how to understand and comprehend all this data?

June 30, 2008

How will Cloud Computing change BI and Data Warehousing?

This is a question I have been pondering for some time.  How will the Cloud Computing change BI and Data Warehousing?  In theory, it should be much easier to get a system up an running.  This is the same reason why the data warehouse appliance is so compelling.  Give me a box, let me plug it in and load my data, and off I go to do the work I really need to do.  The idea of Cloud Computing is even easier.  This is the CPU power/storage that I need, where do I load my data, and off I go to do the work I really need to do.

I work in the technology industry and sometimes we forget that the users of our software/technology don't really care how it works.  Rather, they care that software/technology enables them to do their jobs.  Cloud Computing holds that great promise.

In his article entitled "Send in the Clouds" http://www.bireview.com/bnews/10001601-1.html, JimEricson talks about how "Cloud computing will impact the data warehousing/BI equation".  I think this is very pertinent.  We have to get BI to more users and along with the data warehousing appliance, cloud computing will help to make BI more pervasive. 

June 16, 2008

SAP NetWeaver BI (SAP BW) and Microsoft Excel 2007 – MDX Connectivity

I was poking around the SAP Developer Network (http://www.sdn.sap.com) website this weekend and found a presentation entitled "SAP NetWeaver BI 7.0 Native Microsoft Excel 2007 Integration"
(https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/7083c7d3-d1ab-2a10-08ae-c8470148c7c2). Although these slides are dated January 2008, they are worth looking at. Slides 3 and 4 talk about the competitive positioning of Excel 2007 and the business case for Excel vs BEx Analyzer. Slides 5 and 6 show some architecture. In slide 6, I would like to point out that irrespective of how you access NetWeaver BI – ODBO, XMLA, or BAPI – you will hit the same MDX Processor. Many people have asked me if there are advantages of one API vs another and my answer has been that if it is just about the MDX, there is no difference because it is the same MDX Processor that underlies all the different APIs. ODBO and XMLA are industry standard APIs whereas BAPI is an SAP specific API. BAPI does have some slight advantages in that it exposes things that are SAP specific but again, these are minor. When it comes to supporting the MDX Query Language, the SAP MDX Processor has been upgraded to better work with Excel 2007. Slide 9 tells you more about what SAP Support Packages are needed to enable the Excel 2007 ODBO/MDX connectivity.

June 15, 2008

What do you think a Data Warehouse Appliance is?

A recent TDWI survey asked people "What do you think a data warehouse appliance is?"  It is interesting that 53% of respondents replied that a data warehouse appliance is "server hardware and database software built specifically to be a data warehouse platform".  Also interesting is that only 19% answered that they don't know.  These results again point to how data warehouse appliances are becoming much more mainstream and adoption is picking up.  You can read more about the survey in Philip Russom's article "Defining the Data Warehouse Appliance" http://www.tdwi.org/Publications/display.aspx?id=7784.

May 30, 2008

A Major Shift in the Data Warehouse Space...

I have been a fan of the Data Warehouse Appliance for some time.  The concepts, technologies, and price have real game changer qualities.  Doug Henschen, Editor-in-Chief at Intelligent Enterprise, has done some research on companies who have implemented Data Warehouse Appliances and in his article "Why Not Data Warehouse Appliances?" (http://www.intelligententerprise.com/blog/archives/2008/05/why_not_data_wa.html), Doug also makes the case that "we're on the cusp of a broad adoption phase".

May 29, 2008

IBM Cubing Services - MDX and ODBO - Excel Pivot Tables

In case you missed it, on May 6, IBM announced support for OLE DB for OLAP (ODBO) in their Cubing Services product.  You can read the official notice at http://www-01.ibm.com/common/ssi/rep_ca/3/897/ENUS208-113/ENUS208-113.PDF.  DB2 Warehouse edition is now called InfoSphere Warehouse.

In October 2007, IBM shipped MDX Query Language support in this product.  However, you had to access the MDX functionality via a proprietary interface.  At the Information OnDemand conference in October 2007, IBM had a demo of Business Objects Voyager working with Cubing Services.  Voyager is an OLAP reporting tool from Business Objects which connects to MDX data sources.  Voyager works with Microsoft Analysis Services, SAP BW, and some of the Business Objects Performance Management applications.  It was nice to see Voyager working with Cubing Services, albeit using a proprietary API.  Now Cubing Services supports ODBO so it is possible to connect any ODBO client to Cubing Services.  I have not yet played with this new version of Cubing Services.  I hope to soon.

One thing I am told is that Microsoft Excel works with Cubing Services via ODBO and MDX.  However, IBM's MDX Query Language implementation is what I call the MDX 1999 variant.  The MDX Language Specification is part of the OLE DB for OLAP Specification which Microsoft last published in 1999, therefore I call it MDX 1999.  Microsoft has of course added a lot of extensions to the MDX Query Language and the MDX supported by Microsoft Analysis Services 2005 is what I call MDX 2005.

Excel 2007 uses the MDX Query Language to connect to OLAP data sources.  Excel 2007 is an adaptive product - it will adapt to the MDX supported by the underlying data source.  Therefore, if you connect Excel 2007 to Microsoft Analysis Services 2005, it will function differently than if you connect it to Microsoft Analysis Services 2000.  Excel 2007 is able to determine what MDX Query Language features are supported by the underlying data source and expose functionality accordingly.

Since Cubing Services supports MDX and ODBO, you can connect Excel 2007 to Cubing Services.  However, only the MDX 1999 variant is supported and Excel 2007 on Cubing Services will function like Excel 2007 on Microsoft Analysis Services 2000.  So far, only SAP BW 7.0 and Microsoft Analysis Services 2005 support the MDX 2005 variant.  I am not sure of IBM's timeline for supporting MDX 2005.  Also, Cubing Services does not yet support XMLA.  I expect XMLA support to come relatively soon as this is technically easier to develop than ODBO and much easier to develop than MDX 2005 functionality.  If I was a betting man, I would expect IBM to announce XMLA at their next Information OnDemand conference.