Custom Search
My Zimbio

Monday, April 6, 2009

queries for reporting

When I create sql based reports, I usually start with a mock report and along with the end-users I review the criteria, the output and the sorting options. It is important to understand in the very beginning how much data will be generated and the frequency of the report. It is also good to determine the amount of storage may be required for the report generation and archiving. I then will set to work on designing the initial query. Usually I will create an all encompassing query. I try to think futuristically, what will they possibly want to see a few weeks, months, years from now. If it is not too much on query performance I will try to add these extra fields in my initial code and return set. Then I will start to create the report based on the criteria, parameters and business rules that were documented. Once I finish the initial query and have a sample return set, I will sit down with the end-users or business analysts again to determine the delivery mechnanism.

query optimization

I truly believe that no query should be longer than 32 seconds to return data. If it takes longer it should be checked for performance optimization. Of course there are some exceptions to this but for the most part, most people do not like to wait longer than 30 seconds to retrieve data. They will move on to another site or try to get their information a different way if they are forced to wait extended period of time for data elements to return.

I first try to limit my query to the simplest return set if possible. I try not to do too many sorts, calculations, aggregations, etc in my initial return set. I try to stay away from too many where clauses. For example if I am trying to retrieve data within a date range and I want to do more with the data after I get the initial subset, I will try to break it into two queries. I also try to break my procedures down whenever possible so that they can be ran from start and stop points. What I mean here is that if I am doing a lot of data manipulation for a report. I will first grab a subset of data, then I will do another subset, then another, until i end up with a final output set of data. This way if there is a problem, I don't necessarily have to run everything from the beginning in order to try and diagnose an issue. I try to split the data in like items. Like select information for a particular department within a date range. (I put this return set into a temp table) then from there, I may create another subset that joins to another large table but only has to search through the smaller subset to get the data back. Then I will put the calculations and transforming into another temp table and then I will finish all the prettiness into a final subset of data. I find this works for me and it is really fast if you need to find data issues quickly. You don't have the headache of reading through excessive amounts of code to find a small tiny data issue, or code issue. I sometimes will right the big procedure as one huge select then I will actually go back and break it up especially if I think it will increase query and return set peformance. I usually always leave the end user parameters for sorting to the final return set. I am not sure if all end users really know what they are looking for.

Sunday, April 5, 2009

Database Basics

Simple every day select statements

Select count(*) from table
Select getdate()
Select "titlevalue"

Select * from table
Select top 1 * from table
Select max(column) from table
Select min(column) from table