Posted At : Jan 12, 2010 21:15 PM | Posted By : Ed Tabara
Related Categories: SQL

  1. Apply partitioning on big and fat tables
    Table partitioning means splitting a large table into multiple smaller tables so that queries has to scan less amount data while retrieving. This is very useful when you have very large (possibly having millions of records) table in your database and when you see that after all optimization steps, querying on this table is still executing slowly. Following two following options are available to partition a table.

    • Horizontal partitioning
      Suppose we have a table containing 10 millions of rows. We can divide the table's data into 10 separate partitioning tables where each partition will contain 1 million records and the partition will be based for example upon the value of the ID field. So, first partition will contain the rows in the range 1-1000000, second partition will contain the rows which have a primary key value in the range 1000001-2000000, etc.
      This way we are partitioning the table by grouping the rows based upon a criteria (ID range). It's like we have a stack of books in a box from where we are horizontally splitting the stack by taking a group of books from the top and putting in smaller boxes.

    • Vertical partitioning
      Suppose, we have a table having many columns and also millions of records. Some of the columns in the table are very frequently accessed in some queries and most of the columns in the table are less frequently accessed in some other queries. As the table size is huge in terms of number of columns and rows, any data retrieval query from the table performs slowly. So, based on the frequency of access of the columns we can split the table into two or more tables where each table would contain a few columns from the original tables. In our case, a partition of this table should contain the columns that are frequently accessed by queries and another partition of this table should contain the columns that are less frequently accessed by other queries. Splitting the columns vertically and putting in different thinner partitions is called vertical partition.
      Another good criteria for applying vertical partitioning could be to partition the Indexed columns and non-indexed columns into separate tables. Also, vertical partitioning could be done by splitting the LOB or VARCHARMAX columns into separate tables.
      This type of partitioning has to be done very carefully because if there is any query that involves columns from both partitions, then the query processing engine would require joining two partitions of the tables to retrieve data, which in turn would degrade performance.

    • Partitioning best practice
      1. Consider partitioning big fat tables into different file groups where each file inside the file group is spread into separate physical disks. This would enable database engine to read/write data operations faster.
      2. For history data, consider partitioning based on "Age". For example, suppose a table has order data. To partition this table, use the Order date column to split the table so that, a partition is created to contain each year's sales data.


    • How to partition?
      Suppose, we have an Order table in our database that contains Order data for 4 years (1999, 2000, 2001 and 2002) and this table contains millions of rows. We would like to apply partitioning on this table. To do that, following tasks are to be performed:

      1. Add user defined file groups to the database
        To create a file group do the following
        ALTER DATABASE OrderDB ADD FILEGROUP [1999]
        ALTER DATABASE OrderDB ADD FILE (NAME = N'1999', FILENAME
        = N'C:\OrderDB\1999.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 5MB) TO
        FILEGROUP [1999]

        We are adding a file group "1999" and adding a secondary data file "C:\OrderDB\1999.ndf" to this file group. We did this because we would like to put our table partitions into separate files in separate file groups. Using same SQL command you may create another 3 file groups "2000", "2001" and "2002". Each of these file group would store a year's order data inside their corresponding data files.
      2. Create a partition function
        A partition function is an object that defines the boundary points for partitioning data.
        CREATE PARTITION FUNCTION FNOrderDateRange (DateTime) AS
        RANGE LEFT FOR VALUES ('19991231', '20001231', '20011231')

        The above partition function specifies that, the Order date column having value between

        DateTime <= 1999/12/31 would fall into 1st partition.
        DateTime > 1999/12/31 and <= 2000/12/31 would fall info 2nd partition.
        DateTime > 2000/12/31 and <= 2001/12/31 would fall info 3rd partition.
        DateTime > 2001/12/31 would fall info 4th partition.

        The RANGE LEFT is used to specify that the boundary value should fall into left partition. For example, here the boundary value 1999/12/31 is falling into the 1st partition (with all other dates less than this value) and the next value is falling into the next partition. If we specify RANGE RIGHT, then the boundary value would fall into the right partition. So, in this example the boundary value 2000/12/31 would fall into the 2nd partition and any date less than this value would fall into the 1st partition.
      3. Create a partition schema
        The partition scheme maps the partitions of a partitioned table/index to the file groups that will be used to store the partitions. To create a partition schema, use the following command:
        CREATE PARTITION SCHEME OrderDatePScheme AS PARTITION FNOrderDateRange
        TO ([1999], [2000], [2001], [2002])
        Here, we are specifying that

        The 1st partition should go into the "1999" file group
        The 2nd partition should go into the "2000" file group
        The 3rd partition should go into the "2001" file group
        The 4th partition should go into the "2002" file group
      4. Apply partitioning on the table
        Till now we have defined the necessary partitioning criteria. So all we need to do now is to partition the table.
        In order to do this follow the steps:
        Drop the existing clustered index from the table that is most likely created due to the primary key creation on the table. The clustered index can be dropped by using the DROP INDEX statement. Assuming that the PK_Orders is the primary key of the table, use the following command to drop the primary key which will eventually drop the clustered index from the table.
        ALTER TABLE Orders DROP CONSTRAINT PK_Orders
        Recreate the clustered index on the partition scheme: The index can be created on a partitioned scheme as follows:
        CREATE UNIQUE CLUSTERED INDEX PK_Orders ON Orders(OrderDate) ON
        OrderDatePScheme (OrderDate)

        Assuming that the OrderDate column values are unique in the table, the table will be partitioned based on the partition scheme specified (OrderDatePScheme) which internally uses the partition function to partition the table into 4 smaller parts in 4 different file groups.



For more on partioning check Partitioned Tables and Indexes in SQL Server 2005

Comments Comments (1) | Print Print | Email Send | 575 Views | 3% / 6% Popularity


Posted At : Jan 11, 2010 15:10 PM | Posted By : Ed Tabara
Related Categories: SQL


Use Performance Monitoring Tool (also known as Perfmon) to diagnose performance problems
Sometimes when you encounter performance related problems in your database the Profiler alone can't help you identifying the exact cause of the problems.
For example, analyzing the query execution time using the Profiler in the production server you've seen that the corresponding TSQL is executing slowly, say 10 seconds, while the same query takes a much lower time in the Test server, say 200 ms. You analyzed the query execution plans and data volume and found those to be roughly the same.
The Performance Monitoring Tool comes to your aid in these kinds of situations. Performance Monitor is a tool that gathers statistical data related to hardware and software metrics from time to time.
When you issue a TSQL to execute in the database server, there are many stakeholders participating in the actions to execute the query and return result. These include the TSQL Execution engine, Server buffer cache, SQL Optimizer, Output queue, CPU, Disk I/O and lots of other things. So, if one of these does not perform its corresponding task well and fast, the ultimate query execution time taken by the database server would be high. Using the Performance Monitoring tool you can take a microscopic look at the performance of these individual components and identify the root cause of the performance problem.
With Perfmon you can create a counter log including different built in counters that measures performance of each individual components while executing the queries and analyze the counter log with a graphical view to understand what's going on in detail. Also you can combine the Performance counter log with the SQL Profiler trace for a certain period of time to better understand the complete situation while executing a query.
  • Basic use of Performance Monitor
    Steps to create a performance counter log:
    1. Launch the Performance Monitor tool from Tools->Performance Monitor in the SQL profiler tool

    2. Create a new Performance Counter log by clicking on the Counter Logs->New Log Settings

      Specify log file name and press OK.

    3. Click on the Add Counters button to select the preferred counters in the newly created counter log.

    4. Add the preferred counters by selecting desired objects and their corresponding counters from the list. Click on Close when done.

    5. The selected counters will be displayed in the form.

    6. Click on the Log Files tab and click on the Configure tab to specify the log file location and modify log file name if required. Click OK when done.

    7. Click on the Schedule tab to specify a schedule for reading the counter information and write in the log file. Or you can also select Manually for Start log and Stop log options in which case the counter data will be logged after you start the performance counter log.

    8. Click on the General tab and specify the interval for gathering counter data.

    9. Press OK and start performance counter log by selecting the counter log and clicking start. When done stop the counter log.

    10. To view log data close and open the Performance monitor tool again. Click on the view log icon to view counter log. Click on the Source tab and select Log files radio button and add the log file to view by clicking on the Add button.

    11. By default only three default counters are selected to be shown in the counter log output. Specify other counters, that were included while creating the Coutner log, by clicking on the Data tab and selecting the desired counters by clicking on the Add button.

    12. Click OK button to view the performance counter log output in a graphical view.

  • Correlate Performance counter log and SQL Profiler trace for better investigation
    SQL Profiler can give you information about the long running queries, but it can't provide you with the context information to explain the reason for long query execution time. On the other hand, the Performance monitor tool gives you statistics regarding the individual component's performance but, it does not give you information about the query execution time. So, by combining the performance counter log with the SQL Profiler trace you can get the complete picture while diagnosing performance problems in SQL Server.
    Correlating these two things serve another important purpose also. If the same query takes longer time in production server to execute on Production server than on Test server, this indicates that the Test server may not have the same amount of load, environment and query execution context as the Production server has. So, to diagnose the performance problem, you need a way to simulate the Production server's query execution context in the Test server somehow. You can do this by correlating the SQL Profiler trace at the Test server with the Performance counter log that is taken at the Production server. Correlating these two tool's output can help you identifying the exact root cause of the performance problem.
    For example, you might find that each time the query takes 10 seconds to execute in the Production server, the CPU utilization reaches up to 100%. So, instead of trying to tune the SQL, you should investigate the reason why the CPU utilization rises up to 100% to optimize the query performance.
    Here are the steps to correlate the SQL Profiler trace with the Performance counter log:
    1. Create a Performance Counter log by incorporating the following common performance counters. Specify Manual option for starting and stopping the counter log.
      --Network Interface\Output Queue length
      --Processor\%Processor Time
      --SQL Server:Buffer Manager\Buffer Cache Hit Ratio
      --SQL Server:Buffer Manager\Page Life Expectancy
      --SQL Server:SQL Statistics\Batch Requests/Sec
      --SQL Server:SQL Statistics\SQL Compilations
      --SQL Server:SQL Statistics\SQL Re-compilations/Sec

      Create the performance counter log, but don't start it.
    2. Using the SQL Profiler create a trace using the TSQL Duration template. Add Start Time and End Time column to the trace and Start the Profiler trace and the Performance counter log created in the previous step at the same time.
    3. When enough tracing has been done, stop both SQL Profiler trace and the Performance counter log at the same time. Save the SQL Profiler trace as a .trc file in the file system.
    4. Close the SQL Profiler trace window and open the trace file again with the Profiler's .trc file. Then, you have to close the Profiler trace and open the trace file again, otherwise you will not get the Import Performance Data option enabled. Click on the File->Import Performance Data to correlate the Performance Counter log with the SQL Profiler trace. Nota, that if the Import Performance Data option is disabled, something is wrong and review your steps from the beginning. A file browser window will appear and select the Performance counter log file in the file system that is to be correlated.
    5. A window will appear to select the counters to correlate. Select all counters and press OK. You will be presented with a screen like the below that is the correlated output of SQL Profiler trace and Performance Counter log.

    6. Click on a particular TSQL in the Profiler trace output. You'll see that, a Red vertical bar will be set in the Performance counter log output to indicate the particular counter statistics when that particular query was being executed. Similarly, click on the Performance counter log output any where you see a certain performance counter's value is high or above the normal value. You'll see that the corresponding TSQL that was being executed on the database server will be highlighted in the SQL Profiler trace output.



Comments Comments (0) | Print Print | Email Send | 808 Views | 4% / 0% Popularity


Posted At : Jan 08, 2010 15:20 PM | Posted By : Ed Tabara
Related Categories: SQL

  1. Diagnose performance problems using SQL Profiler and Performance Monitoring tool
    • Basic use of SQL Profiler
      1. Launch the SQL Profiler (Tools->SQL Server Profiler in the Management Studio) and connect it to the desired SQL Server instance. Select a new trace to be created (File->New Trace) and select a trace template (a trace template is a template where some pre-selected events and columns are selected to be traced).

      2. Or you can select particular events and select/deselect columns to specify the information you want to see in the trace output.

      3. Or you can organize columns (click the Organize Columns button) to specify the order of their appearance in the trace. Also, specify column filter values to filter the event data which you are interested in. For example, click on the Column Filters and specify the database name value in the Like text box to trace events only for the specified database. Please note that, filtering is important because, SQL profiler would otherwise capture all unnecessary events and trace too many information that you might find difficult to deal with.

      4. Run the profiler and wait for the events to be captured on the trace.

      5. When enough information is traced, stop the profiler (by pressing the red Stop icon) and save the trace.

      6. If the trace is saved on a table you can run a query to retrieve the expensive TSQL's using a query like this:
        SELECT TextData, Duration, ...
        FROM tableName
        ORDER BY Duration DESC

      That's it, nice and easy.
    • Effective use of SQL Profiler to troubleshot performance related problems
      Usually SQL Profiler is used to trace the most expensive TSQLs/Stored Procedures. But, the tool is not limited to it. You can use many powerful features of this tool to diagnose and troubleshoot different kinds of problems that could occur due to many possible reasons.
      When you are running the SQL Profiler, there are two possibilities. Either you have a reported performance related issue that you need to diagnose or you need to diagnose any possible performance issue in advance so that you can make sure your system would perform fast in the production after deployment.
      Here are some tips you may use:
      1. Create your own template when needed
        Usually the existing templates will serve your needs, but there may be situations when you will need a customized template for diagnosing a specific kind of problem in the database server (for example deadlock occurring in the production server). In this case you can create a customized template using File->Templates->New Template and specifying the Template name, events and columns. Also you can select an existing template and modify it according to your needs.


      2. Capture TableScan or DeadLock events
        With SQL Profiler you can edit the templates so that, the profiler listens for any Table scan or deadlock event that might take place in the database. To do this, check the Deadlock Graph, Deadlock and DeadLock chain events in the DeadLock section while creating/editing the tracing template. Then start the profiler and run your application. Sooner or later, when any table scan or deadlock occurs in the database, the corresponding events would be captured in the profiler trace and you would be able to find out the corresponding TSQLs that are responsible for the above described situation.
        Note: You can also require the SQL Server log file to write deadlock events so that you can get important context information from the log when the deadlock took place. This is important because sometimes you need to combine the SQL Server deadlock trace information with that of the SQL Server log file to detect the involved database objects and TSQLs that are causing deadlocks.
        Detecting Table scan:

        Detecting Deadlocks:

      3. Create Replay trace
        In order to troubleshoot any performance problem in the production database server, you need to try to simulate the same environment in your test server first so that the performance problem can be re-generated. The SQL Profiler tool lets you do this by using a Replay trace. You can use a TSQL_Replay Trace template to capture events in the production server and save that trace in a .trace file. Then you can replay the trace on test server to re-generate and diagnose problems.

        For more info check this
      4. Create Tuning trace
        The Database Tuning Advisor is a tool that can give you good tuning suggestions to enhance your database performance. But to get a good and realistic suggestion, you need to provide the tool with appropriate load that is similar to the production environment. So, you need to execute the same set of TSQL's and open the same number of concurrent connections in the test server and then run the tuning advisor there. The SQL Profiler lets you capture the appropriate set of events and columns (for creating load in the tuning advisor tool) by the Tuning template. Run the profiler using the Tuning template, capture the traces and save it. Then use the tuning trace file for creating load in the test server by using the Tuning advisor tool.

        For more info check this
      5. Capture ShowPlan to include SQL execution plans in the profiler
        Sometimes the same query give different performance in the Production and Test server. In this case you will have to take a look at the execution plan that is being used in the Production server for executing the actual Query. But you just can't run the TSQL that is causing performance problem in the production server to view the actual execution plan for lots of reasons as being that this execution plan might not reflect you the true execution plan that is used in reality in a fully loaded production database. The SQL Profiler can help you in this regard. You can include ShowPlan or ShowPlan XML in your trace while profiling in the Production server. This would capture SQL plans along with the TSQL text while tracing. Do this in the test server too and analyze and compare both execution plans to find out the difference in them very easily.
        Specifying Execution plans to be included in the trace

        Execution plan in the profiler trace






Comments Comments (0) | Print Print | Email Send | 767 Views | 4% / 0% Popularity


Posted At : Jan 06, 2010 14:10 PM | Posted By : Ed Tabara
Related Categories: SQL

  1. Some advanced indexing techniques
    • Make computed columns and create index on these
      I am sure most of us have written application codes where you select a result set from the database and for each row do a calculation in the result set to produce the information to show in the output. Sometimes this processing can be done directly in the database. You can specify a database column as a Computed column by specifying a formula. While your TSQL includes the computed column in the select list, the SQL engine will apply the formula to derive the value for this column. So, when executing the query, you will get the result for the computed column.
      But sometimes this might be expensive if the table contains large number of rows. And this might get even worse if the computed column is specified in the WHERE clause in a SELECT statement because in this case the database engine has to calculate computed column's value for each row in the table. To improve performance, you need to create index on the computed columns as in this case SQL Server calculates the result in advance and builds an index over them. Additionally, when the corresponding column values are updated (That the computed column depends on), the index values on computed column are also updated. And so while executing the query, the database engine does not have to execute the computation formula for every row in the result set.

    • Create Indexed Views
      The Views are nothing but compiled SELECTs. Views does not remember the result they were built upon. They are just compiled queries. BUT! Why not to apply indexes on Views and so they to become Indexed Views? For an Indexed View, the database engine processes the SQL and stores the result in the data file just like a clustered table and SQL Server will automatically maintain the index when data in the table will change. As a result, when you will do a SELECT query on the Indexed View, the DB engine will select the values from and index which obviously performs very fast.
      But there are some restrictions:
      1. The View has to be created with SCHEMABINDING option. In this case, the DB engine will not allow you to change the underlying table schema.
      2. The View can't contain any nondeterministic function, DISTINCT clause and subquery.
      3. The underlying tables in the View must have clustered index (Primary keys)
      Note: when data in the base table changes, the DB engine has to update the index also. So, you should consider creating Indexed Views when the view has to process too many rows with aggregate function and when data and the base table do not change often.

    • Create indexes on User Defined Functions
      But this is not something really easy. You will have to create a computed column specifying an UDF as the formula and then you will have to create index on the computed column field. The steps are:
      1. Create the function (if it does not exists already) and make sure that this function is deterministic. Add SCHEMABINDING option in the function definition. Example:
        CREATE FUNCTION [dbo.ufnGetLineTotal]
        (
        -- Add the parameters for the function here
        @UnitPrice [money],
        @UnitPriceDiscount [money],
        @OrderQty [smallint]
        )
        RETURNS money
        WITH SCHEMABINDING
        AS

        BEGIN
        return (((@UnitPrice*((1.0)-@UnitPriceDiscount))*@OrderQty))
        END
      2. Add a computed column in your desired table and specify the function with parameters as the value of the computed column.
      3. Create an index on the computed column.

      Creating indexes on UDFs will give you a tremendous performance benefit when you include the UDF in a query, especially if you use UDFs in the join conditions between different tables/views.

    • Create indexes on XML columns
      XML columns are stored as BLOBs in SQL Server 2005 and later which can be queried using XQuery, but querying XML data types can be very time consuming without an index. The syntax to create Primary XML indexes is:
      CREATE PRIMARY XML INDEX index_name ON <object> ( xml_column )

      When it is created, SQL Server shreds the XML content and creates several rows of data that include information like element and attribute names, the path to the root, node types and values, and so on. So, creating the primary index enable the SQL server to support XQuery requests more easily.
      Primary indexes are good, but SQL Server still needs to scan through the shredded data to find the desired result. Here come to help Secondary XML indexes:
      CREATE XML INDEX
      index_name
      ON <object> ( xml_column )
      USING XML INDEX primary_xml_index_name
      FOR { VALUE | PATH | PROPERTY }

      As seen from the syntax, there are 3 types of them:
      1. Path Secondary XML indexes: useful when using the .exist() methods to determine whether a specific path exists.
      2. Value Secondary XML indexes: used when performing value-based queries where the full path is unknown or includes wildcards.
      3. Property Secondary XML indexes: used to retrieve property values when the path to the value is known.


  2. Apply de-normalizations, use history tables and pre-calculated columns
    • De-normalization
      As written in one of the previous posts, normalization is good, but sometimes having some level of de-normalization is good because the reporting and data analytical queries would run very fast. So, when we have a query with JOINs for example just to get the value of one field that is not in the base table and the query performs bad, we may want to to duplicate that field with it's values in the base table. Yes, we will add redundancy, but because we will not have to do the JOINs the resulting query most probably will run faster and we will get better performance. But remember that de-normalization is a trade-off between data redundancy and select operation's performance. So use this carefully and only after applying all other optimizations if still needed.

    • History tables
      If in your application you have some data retrieval operations, for example reporting, that periodically runs on a time period and if the process involves tables that are large in size having normalized structure, you can consider moving data periodically from your transactional normalized tables into a de-normalized heavily indexed single history table. Also you can create a scheduled operation in your database server that would populate this history table on a specified time on given periods. So, the periodic data retrieval operation than has to read data, will do it only from a single table that is heavily indexed and the operation would perform a lot faster.
      1. Creating the scheduled operation
        Here are the steps to create a scheduled operation in SQL Server that periodically populates a history table on a specified schedule.
        • Make sure that SQL Server Agent is running. To do this, launch the SQL Server Configuration Manager, click on the SQL Server 2005 Services and start the SQL Server Agent by right clicking on it.

        • Expand SQL Server Agent node in the object explorer and click on the Job node to create a new job. In the General tab, provide job name and descriptions.

        • On the Steps tab, click on the New button to create a new job step. Provide a name for the step and also provide TSQL, that would load the history table with the daily sales data, along with providing Type as Transact-SQL script(T-SQL). Press OK to save the Step.

        • Go to the Schedule tab and click on New button to specify a job schedule.

        • Click the OK button to save the schedule and also to apply the schedule on the specified job.
      2. Perform expensive calculations in advance in data INSERT/UPDATE, simplify SELECT query
        In most of the cases in your application you will see that data insert/update operations occur one by one, for each record. But, data retrieval/read operations involve multiple records at a time. So, if you have a slowly running read operation (select query) that has to do complex calculations to determine a resultant value for each row in the big result set, you can consider doing the following:
        • Create an additional column in a table that will contain the calculated value.
        • Create a trigger for Insert/Update events on this table and calculate the value there using the same calculation logic that was in the select query earlier. After calculation, update the newly added column value with the calculated value.
        • Replace the existing calculation logic from your select query with the newly created field.

        After doing this, the Insert/Update operation for each record in the table will be a bit slower because the trigger will now be executed to calculate a resultant value, but the data retrieval operation should run faster than previous. Because while the SELECT query executes, the database engine does not have to process the expensive calculation logic any more for each row.



Comments Comments (0) | Print Print | Email Send | 549 Views | 3% / 0% Popularity


Posted At : Jan 05, 2010 15:12 PM | Posted By : Ed Tabara
Related Categories: SQL

  1. Move your TSQL codes from application into the DB Server
    While i don't think ALL your sql codes should be moved to the DB Server (i think no reason simple scripts to be moved), i do believe that anything that may create performance issues or have to operate with more or less big amount of data should be on the DB Server. Different products are good for different things and as so should do the work they were created for. This is one of the reasons personaly i didn't get TOO excited about ColdFusion + ORM. At least not yet. Maybe this will change over time, but for right now i don't feel too comfortable about all this ORM thing. I believe if/when you will have to optimize the data access performance you wil find yourself in the need to use more Stored Procedures/Views/Functions/Triggers.
    While moving your TSQL codes to the DB Server most probably will not result in an immediate performance boost as in the case of indexes discussed in the previous post, it still will have a lot of benefits in long-run plus it will enable you to perform other optimization steps and apply other techniques easily to further optimize your data access routines. I don't say that you can't optimize while using ORM, i am sure you can, but i don't think anyone can not agree that while having your data access routines using TSQL objects in your database, you have the maximum opportunity to optimize the database.
    Again - each product (ColdFusion, MS SQL, etc) should do the work he is best match for.


  2. Identify inefficient TSQLs, re-factor and apply best practices
    It doesn't matter how good indexes you created, if you have poorly written data retrival/access logic, sooner or latter you will get slow performance. Often we don't see the issues right away but only when there is a critical mass in records number, database calls, etc. Here come to help TSQL Best Practices. Many of us know them (or most of them), but the key is to really USE them and not just know. So here are some of them.
    • Don't use SELECT * in SQL Query
      Unnecessary columns not only adds expense to the data retrieval time, but also the Database engine cannot utilize Covered Indexes and so query performs slowly.
    • Avoid unnecessary columns in SELECT list and unnecessary tables in JOIN conditions
      Selecting unnecessary columns in select query adds overhead to the actual query. Including unnecessary tables in the join conditions forces the database engine to retrieve and fetch unnecessary data and so increase the query execution time.
    • Do not use the COUNT() aggregate in a subquery to do an existence check
      When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values. When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.
      So, instead of doing
      SELECT yourColumnList
      FROM yourTable
      WHERE 0 < (
               SELECT Count( * ) FROM yourTable2 WHERE ..
            )
      ,
      do
      SELECT yourColumnList
      FROM yourTable
      WHERE EXISTS (
                  SELECT *
                  FROM yourTable2
                  WHERE ...
               )
    • Do your best to avoid joining between two types of columns
    • Avoid deadlocks
      Always access tables in the same order in all your Stored Procedures and Triggers. Keep your transactions as short as possible. During a transaction access only the data you really need. The less data you touch, the better.
    • Avoid using Cursors and UDFs
    • If you don't need always a 100% perfect result, try not to use COUNT( * ) to obtain the record count in the table
      Instead of doing
      SELECT COUNT( * )
      FROM dbo.orders

      you may do
      SELECT rows
         FROM sysindexes
         WHERE id = OBJECT_ID('dbo.Orders') AND indid < 2
    • Use Table variable and Temp table as far as possible
      Use Table Variable or Temp Table for intermediate storage of records in a procedure. There are pros and cons between Table Variable and Temp Table, but in general, if the record set is small you should go for Table Variable.
    • Use Full Text Search for searching textual data instead of LIKE search
      For more info on Full Text Search see this.
    • Where possible, use UNION to implement OR operation
      If distinguished result is not required, use UNION ALL instead of UNION as it is faster.
    • Use Stored Procedures
      this way you get: Code reusability, Access Control, Execution plan reusability, Prevent SQL Injection, more Readability and Manageability.
    • Use Fully Qualified Name for objects (i.e. SchemaName.ObjectName)
      When the execution plan is prepared by the query engine, in the binding process, Query engine has to resolve the Object existence. If you specify the fully qualified name the object resolution become easy for the engine.
    • EXISTS vs IN
      IN Operator can easily be replaced by EXISTS which is more optimized for correlated queries. But you may find IN better for small tables.
    • When possible avoid Function in select statement
      because any functions like CONVERT(), CAST, ISNULL in query may ignore the indexes available on the table.
    • Try to avoid Negative operator
      Avoid using < >, NOT IN, NOT EXISTS kind of operator because it causes a table scan. Query engine has to ensure there is no data till the last row is read.
    • Implement a lazy loading strategy for the large objects
      Store the Large Object columns (like VARCHAR(MAX), Image Text etc) in a different table other than the main one and put a reference to the large object in the main table. Retrieve all the main table data in the query and if large object is required to load, retrieve the large object data from the large object table only when this is required.
    • Do not call functions repeatedly within your stored procedures, triggers, functions and batches
      Instead store the result in a variable for later use.
    • Some good practices in Stored Procedure
      Do NOT use SP_XXX as a naming convention as it will cause additional searches and I/O (because the system Stored Procedure names start the same way). Also this increases the possibility of conflicting with an existing system stored procedure.
      Use Set Nocount On to eliminate Extra network trip. Database Engine, return the number of rows effected by the statements to the client which is unnecessary and you can avoid that using this statement.
      Use WITH RECOMPILE clause in the EXECUTE statement (first time) when index structure changes (So that, the compiled version of the SP can take advantage of the newly created indexes).
      Use default parameter values for easy testing.
    • Some good practices in Triggers
      Avoid the use of triggers as firing and executing them is an expensive process.
      Don't use triggers that can be implemented using constraints.
      Don't use same trigger for different triggering events (Insert, Update, Delete)
      Don't use Transactional codes inside a trigger. The trigger always runs within the transactional scope of the code that fired the trigger.
    • Some good practices in Views
      Use views for re-using complex TSQL blocks and to enable it for Indexed views
      Do not use views that retrieve data from a single table only as it will be an unnecessary overhead.
    • Some good practices in Transactions
      Prior to SQL Server 2005, after BEGIN TRANSACTION and each subsequent modification statement, the value of @@ERROR had to be checked. If its value was non-zero then the last statement caused an error and if any error occurred, the transaction had to be rolled back and an error had to be raised. In SQL Server 2005 and onwards the Try..Catch.. block can be used to handle transactions in the TSQLs. So, try to used Try..Catch based transactional codes.
      Try to avoid nested transaction. Use @@TRANCOUNT variable to determine whether Transaction needs to be started
      Start transaction as late as possible and commit/rollback the transaction as fast as possible to reduce the time period of resource locking.
    • Don't go crazy with Normalization
      Sometimes de-normalization can give you better performance at the cost of Data redundancy, so don't blindly Normalize everything.
    • Avoid wildcard characters at the beginning of a word while searching using the LIKE keyword
      As it will result in an index scan, which defeats the purpose of an index.
    • Wherever possible, use Derived Tables Derived Tables perform much better.
      For example:
      SELECT MIN(Salary) 
      FROM Employees
      WHERE EmpID IN
                  (
                     SELECT TOP 2 EmpID
                     FROM Employees
                     ORDER BY Salary Desc
                  )

      can be written as:
      SELECT MIN(Salary) 
      FROM
         (
            SELECT TOP 2 Salary
            FROM Employees
            ORDER BY Salary DESC
         ) AS A

      which will give you same result but will perform twice faster.

    This are only some of the Best Practices, but using them you will get better results.

Till soon.

Comments Comments (0) | Print Print | Email Send | 470 Views | 2% / 0% Popularity


Posted At : Jan 04, 2010 17:24 PM | Posted By : Ed Tabara
Related Categories: SQL

Having a site, a dynamic one, is cool. But it's not a secret that the Database may be the bottleneck and create you lot of issues as your traffic, DB records number and DB calls grows. So i am going to have a set of posts on how to optimize your SQL Server so that everyone to be happy.
  1. Create proper indexes
    Make sure that every table in your database has a primary key.
    Create non-clustered indexes on columns which are:
    • Frequently used in the search criteria
    • Used to join other tables
    • Used as foreign key fields
    • Of having high selectivity (Column which returns a low percentage (0-5%) of rows from a total number of rows on a particular value)
    • Used in the ORDER BY clause
    • Of type XML (Primary and secondary indexes need to be created)
    To create an index use the following sample:
    CREATE INDEX IX_ProductVendor_VendorID ON Purchasing.ProductVendor (VendorID)

    You can use also the SQL Server Management Studio to create indexes on the desired table, BUT if your table already have some high number of rows it may create performance issues while the manual creation will work like a charm.


  2. Create appropriate covering indexes
    You create Covered indexes in table columns to specify what are the additional column values the index page should store along with the clustered index key values (primary keys). Example:
    CREATE INDEX IX_ProductVendor_VendorID ON Purchasing.ProductVendor (VendorID) INCLUDE(ProductName, ProductPrice)

    Note: Covered index should be created including a few columns that are frequently used in the select queries. But be aware that including too many columns in the covered indexes would not give you too much benefit. Rather, doing this would require too much memory to store all the covered index column values resulting in over consumption of memory and slow performance. Can surely confirm this as was in such situations.


  3. Defragment indexes if fragmentation occurs
    Index fragmentation is a situation where index pages split due to heavy insert/update/delete operations on the tables in the database. If indexes have high fragmentation, scanning and seeking the indexes takes much time or the indexes are not used at all while executing queries and so data retrieval operations perform slow.
    There are 2 types of fragmentation:
    • Internal Fragmentation - occurs due to the data deletion/update operation in the index pages which ends up in creation of lots of empty rows in the pages. Also results in increase of index/data pages that increase query execution time.
    • External Fragmentation - occurs due to the data insert/update operation in the index/data pages which ends up in page splitting and allocation of new index/data pages that are not contiguous in the file system. That reduces performance in determining query result where ranges are specified in the WHERE clauses. Also, the database server cannot take advantage of the read-ahead operations as, the next related data pages are not guaranteed to be contiguous, rather, these next pages could be anywhere in the data file.

    To defragment indexes you can Reorganize or Rebuild them. You should Reorganize indexes when the External Fragmentation value for the corresponding index is in between 10-15 and Internal Fragmentation value is in between 60-75. Otherwise, you should rebuild indexes. But note that while rebuilding indexes for a particular table, the entire table will be locked. But in SQL Server 2005, there is a solution where you can use the ONLINE option as ON while rebuilding indexes for a table. This will rebuild the indexes for the table along with making the table available for transactions.
    And remember that if you work on a transactional database, you should not create more than 5 indexes on the tables on an average. On the other hand, if you work on a Data warehouse application, you should be able to create up to 10 indexes on the tables on an average.

    Below see a list of links to my previous posts where you can find different sql scripts that will help identify potentially useful indexes, indexes not used for some time, etc. Also you may use cfSQLMaster to get all that SQL scripts and many other as a ColdFusion Component.

See you soon with more recommendations.

Comments Comments (0) | Print Print | Email Send | 525 Views | 3% / 0% Popularity


Posted At : Dec 15, 2009 12:36 PM | Posted By : Ed Tabara
Related Categories: SQL

This is a script by Thao Truong and posting it here for me easier to find it later.
--Description: synchronize all stored procedures between two servers
--      @ACTION = 0: Synch procedures exist in server1 but not in server2
--      @ACTION = 1: Synch all procedures from server1 to server2
--WARNING: use this stored procedure VERY CAREFULLY!
CREATE PROCEDURE [dbo].[spSynchAllProcedures]
   @ACTION TINYINT = 0,
   @SourceServer VARCHAR(50) = NULL,
   @SourceDatabase VARCHAR(50) = NULL,
   @TargetServer VARCHAR(50) = NULL,
   @TargetDatabase VARCHAR(50) = NULL
AS
BEGIN
   SET NOCOUNT ON;
   DECLARE @STRSQL NVARCHAR(MAX);
   DECLARE @Params NVARCHAR(MAX);
   DECLARE @ProcedureName VARCHAR(50);
   
   --Insert to temp table since some stored procedures contains over 4000 characters
   CREATE TABLE #tblTmp(item VARCHAR(50));
   IF @ACTION = 0
      SET @STRSQL = N'INSERT INTO #tblTmp
               SELECT [name]
               FROM ' + @SourceServer + '.' + @SourceDatabase + '.dbo.sysobjects
               WHERE xtype = 'P'
               AND [name] NOT IN (SELECT [name]
                           FROM ' + @TargetServer + '.' + @TargetDatabase + '.dbo.sysobjects
                           WHERE xtype = 'P')';
   ELSE IF @ACTION = 1
      SET @STRSQL = N'INSERT INTO #tblTmp
               SELECT [name]
               FROM ' + @SourceServer + '.' + @SourceDatabase + '.dbo.sysobjects
               WHERE xtype = 'P'';

   EXECUTE(@STRSQL);
   
   DECLARE Cur CURSOR FOR
      SELECT item FROM #tblTmp
   OPEN Cur
   FETCH FROM Cur
   INTO @ProcedureName
   WHILE @@FETCH_STATUS = 0
   BEGIN
      --PRINT @ProcedureName;      
      EXEC dbo.spSynchStoredProcedure @ProcedureName, @SourceServer, @SourceDatabase, @TargetServer, @TargetDatabase;

      FETCH NEXT FROM Cur
      INTO @ProcedureName
   END
   CLOSE Cur
   DEALLOCATE Cur

   DROP TABLE #tblTmp;
END

This script synchronize all stored procedures between two servers by using spSynchStoredProcedure.

Comments Comments (0) | Print Print | Email Send | 421 Views | 2% / 0% Popularity