At one of our clients we implemented the MicroStrategy BI tool and used IBM DataStage as our ETL tool.
As you probably know, two key factors to measure the success of BI implementation are data quality and performance (as measured by report execution time). Users always want the reports to run quickly (in seconds) otherwise they lose interest in using the reports. One of the challenges I have faced when designing and developing MicroStrategy reports is with the performance of a report. In this post, I will discuss a solution I implemented to improve the performance.
The business needed a detailed report to analyze the performance of its stores based on customer surveys. For each survey there are about 50 different score types with values provided by the customers. This survey detail report has metrics derived from the values for each score type.
When we deployed the report in production, the performance of the report was good and it was taking only a few seconds to execute and return the results to the users. After a few months, however, users reported issue with the run time of the report, as it was taking about 5 minutes to produce the results.
Finding the Symptom
When the users reported issue with performance of the report, the first thing I looked at was indexes on the fact table and found that the correct indexes were in place. Then I examined the execution flow of the report in detail and determined that 95% of the total report execution time was when querying the data warehouse.
The diagram above shows the “report query flow” in MicroStrategy. When a report is requested by the user, the following steps are involved in executing and delivering a report:
- User sends a report request to client interface (desktop or web server such as MicroStrategy web)
- Desktop / web server passes the request to the Intelligence Server which will coordinate all tasks necessary to execute the report
- The Intelligence Server first checks to see whether the report results are already in memory. If there is a report in cache then that report’s results will be sent to the client interface.
- If no valid cache exists for the report, then the Intelligence Server gets the report definition from the Metadata layer.
- Using the report definition, the SQL Engine generates optimized SQL.
- The Query Engine runs the SQL against the data warehouse and sends the data to Intelligence Server.
- The Analytical Engine performs analytical calculations as necessary.
- The Intelligence Server collects the data and sends to client interface which will format and display to the user.
Identifying the Cause
As this report was taking a long time at Step 6 (querying the data warehouse), I decided to examine the SQL query generated for this report by SQL Engine (you can view the SQL generated by the SQL Engine by running the report in “View SQL” mode in MicroStrategy Desktop).
When I looked at the SQL generated for this report, I found that there were too many SQL passes to the same fact table. Initially, when the fact table size was small, the report didn’t take a long time to run even with so many SQL passes. As the fact table grew to millions of rows, each SQL pass took longer to return results.
Because this report SQL had a large number of SQL passes and the time required for each pass was growing with the increase of table size, the total run time of the report grew geometrically by the number of SQL passes. If there are 60 SQL passes, and each pass takes 1 second, then the total time will be 60 seconds. And if the time to run each pass increased to 2 seconds, then total time increased 120 seconds.
I tried to optimize the SQL (with less passes) generated by the SQL Engine by tweaking VLDB (very large database) property settings and join properties on the metric schema objects in MicroStrategy, but did not succeed in reducing the number of SQL passes. Then I took the generated SQL and put it in Query Analyzer of Microsoft SQL Server and got the execution plan to identify whether any optimization was required on the table indexes; however SQL Server didn’t come up with any recommended index changes.
When I tested the report with only a few metrics on it, it took less time to execute. As I added more and more metrics to the report (built on the same fact table), execution time of the report began increasing.
I concluded that the reason for too many SQL passes in the report was because of the structure of the fact table on which metrics and facts were created for this report. Data stored in the fact table is by rows for different score types, so for each metric defined by a score type, one SQL pass is generated. Because there were more than 60 metrics being calculated for each score type on the report, there were are as many as 60 SQL passes created by the SQL Engine.
Finding a Solution
To compensate for this, I thought of pivoting the data in the fact table (converting rows into columns) by creating a view with pivot logic on the database server. After I created the view on the existing fact table and pointed all the metrics and facts on this report to the new view, I looked at the new SQL generated by the SQL Engine. This time the SQL was much smaller than before and required fewer passes. After implementing the changes, we tested the report execution with the same large set of data in the fact table and with the same indexes.
The report took about 20 seconds to run, compared to 5 minutes previously. Users were happy with the performance of the report after deploying the report modifications to production.
I hope this concept of pivoting data with SQL views in the database engine helps you in tuning the performance of your MicroStrategy reports.