Friday, July 25, 2008

Data Compression in SQL Server 2008

When we talk about compression feature in SQL Server 2008, it address two areas. Data Compression and Backup Compression. I have already discussed Backup compression here. This article is specifically discussing Data compression feature in SQL Server 2008. The idea behind Data compression is, if we can reduce the data size then the resource usage like memory (data buffer) , Storage space and Network traffic can be reduced therefore you will have better performance and manageability. But before going for Data Compression you should do proper analysis, whether this feature is appropriate for your environment or not. If the Database Server is having CPU resource crunch you must not opt Data compression because, Data Compression is more CPU oriented operation. When you have a system where there is no problem of CPU but it is more IO based, then you may go for data compression which will give you better performance.

SQL Server 2008 supports ROW and PAGE compression for both tables and indexes. Data compression can be configured for the following database objects:
• A whole table that is stored as a heap.
• A whole table that is stored as a clustered index.
• A whole non-clustered index.
• A whole indexed view.
• For partitioned tables and indexes, the compression option can be configured for each partition, and the various partitions of an object do not have to have the same compression setting.

As already mentioned there are two levels of compression, PAGE and ROW. Before going for compression of the data you must do proper estimation and planning. Use system stored procedure sp_estimate_data_compression_savings to check whether the data compression can be benefited for a particular table.

What is the difference between PAGE and ROW level compression?
(a) PAGE Level Compression• Compressing the leaf level of tables and indexes with page compression consists of three operations in the following order:
• Row compression
• Prefix compression
• Dictionary compression

(b) ROW Level Compression
• It reduces the metadata overhead that is associated with the record. This
metadata is information about columns, their lengths and offsets. In some
cases, the metadata overhead might be larger than the old storage format.
• It uses variable-length storage format for numeric types (for example
integer, decimal, and float) and the types that are based on numeric (for
example datetime and money).
• It stores fixed character strings by using variable-length format by not
storing the blank characters.

Note : You must read more about ROW and PAGE Compression in Books Online. There are few interesting points like when the PAGE compression is actually take place kind. PAGE compression is a Superset of ROW compression and system goes for page compression only when the page is filled. Till then it is a ROW level compression. Once the page is filled, The whole page is reviewed; each column is evaluated for prefix compression, and then all columns are evaluated for dictionary compression. If page compression has created enough room on the page for an additional row, the row is added, and the data is both row- and page-compressed.

How do we do data compression estimation on a table?
(a) Create a sample table Table
Use AdventureWorks
Go
Create table TestCompression (col int,comments char(8000))
Go
Insert into TestCompression select Message_id,text from sys.messages

Note : I have choosen CHAR Datatype intentionally so that I can demonstrate the estimation properly.
(b) Run the sp_estimate_data_compression_savings system stored procedure against this table
-------------------------------------------------------------------------------------
Row Compression Estimation Report

EXEC sp_estimate_data_compression_savings 'dbo', 'TestCompression', NULL, NULL, 'ROW' ;
Go

object_name : TestCompression
schema_name :DBO
index_id :0
partition_number : 1
size_with_current_compression_setting(KB) : 704888
size_with_requested_compression_setting(KB) : 9952

sample_size_with_current_compression_setting(KB) : 40784
sample_size_with_requested_compression_setting(KB) : 576
-------------------------------------------------------------------------------------
Page Compression Estimation Report

EXEC sp_estimate_data_compression_savings 'dbo', 'TestCompression', NULL, NULL, 'Page' ;
Go
object_name : TestCompression
schema_name :DBO
index_id :0
partition_number : 1
size_with_current_compression_setting(KB) : 704888
size_with_requested_compression_setting(KB) : 9134

sample_size_with_current_compression_setting(KB) : 39664
sample_size_with_requested_compression_setting(KB) : 560

-------------------------------------------------------------------------------------
How to read this result ?
I am not going to explain, how to read the result of this stored procedure in detail, since it is clearly mentioned in Books Online. In the result the size_with_current_compression_setting(KB) : 704888 and
size_with_requested_compression_setting(KB) : 9134
will give you fair amount of idea how compression is going to benefits. See the difference in space usage shown in before compression and after the compression 704888: 9134. Without any doubt the compression is going to save lot of space in this table.

Why Page Compression showing more compression of data than ROW Compression?
If you see the result of PAGE and ROW level compression, you can see that page level compression save more space. Why so? Because Page compression do ROW compression as well as Dictionary and Prefix Compression. So PAGE compression is a superset of ROW Compression. So the point here is, if we want highest level of compression, go for page at the cost of more CPU cycle.

Note : One thing I have noticed in the system stored procedure is, for every run the result is different. There is few KBs size different in each run. I have reported this in Microsoft Connect.

Estimation Report in Query Analyser


Now we know the compression feature will be useful for this table. How do we enable compression for this table? Here we go…








There are few options available to enable Data Compress on a table.
(a) From Management Studio.
(b) While creating in the CREATE Table Statement

CREATE TABLE [dbo].[TestCompression](
[col] [int] NULL,
[comments] [char](8000) NULL
) ON [PRIMARY]
with( DATA_COMPRESSION = PAGE )

(c) In ALTER Table Statement

ALTER TABLE [TestCompression]
REBUILD WITH (DATA_COMPRESSION = PAGE);

Enabling Data Compression from Management Studio
Screen 1


Screen 2


Screen 3


Screen 4



Screen 5


Screen 6
















We have successfully enabled data compression on this table. How do we confirm this. You can query sys.partitions. Here we go….
Select OBJECT_NAME(object_id),data_compression_desc,* From sys.partitions where data_compression<>0.



Screen 7






Now we have enabled compression and let us check what happens if we run the same system stored procedure to do the estimation. See the result. No scope for improvement further because it is already compressed















Few more informations from Books online.
• The compression setting of a table is not automatically applied to its nonclustered indexes.
• Each index must be set individually.
• Tables and indexes can be compressed when they are created by using the CREATE TABLE and CREATE INDEX statements. To change the compression state of a table, index, or partition, use the ALTER TABLE or ALTER INDEX statements.
• Compression is not available for system tables.
• Apply compression only after exploring options like Data defragmentation etc.

Summary
Having explained the whole topic I kept one very important point for the summary section. That is, this feature will be only available in Enterprise Edition, Developer Edition and Evaluation Edition. So the system stored procedures and options in GUI will only be available in the above mentioned editions. In IO bound application certainly this feature will be useful. I keep my finger crossed for the feedback from the community who have successfully implemented and experience this feature.

Thursday, July 10, 2008

SQL Server 2008 – Table Locking enhancement

In SQL Server 2008 ALTER TABLE statement, you can find a new option called SET LOCK_ESCALATION. This is one of the enhancement in Locking in SQL Server 2008. This option can have three value, Auto,Table, Disable

FROM BOL
SET ( LOCK_ESCALATION = { AUTO | TABLE | DISABLE } )
Specifies the allowed methods of lock escalation for a table.

AUTO
This option allows SQL Server Database Engine to select the lock escalation granularity that is appropriate for the table schema.
• If the table is partitioned, lock escalation will be allowed to the heap or B-tree (HoBT) granularity. After the lock is escalated to the HoBT level, the lock will not be escalated later to TABLE granularity.
• If the table is not partitioned, the lock escalation will be done to the TABLE granularity.

TABLE
Lock escalation will be done at table-level granularity regardless whether the table is partitioned or not partitioned. This behavior is the same as in SQL Server 2005. TABLE is the default value.

DISABLE
Prevents lock escalation in most cases. Table-level locks are not completely disallowed. For example, when you are scanning a table that has no clustered index under the serializable isolation level, Database Engine must take a table lock to protect data integrity.

Note : If you use partitions then After you create a partitioned table, consider setting the LOCK_ESCALATION option for the table to AUTO. This can improve concurrency by enabling locks to escalate to partition (HoBT) level instead of the table. For more information

SQL Server 2008 – PowerShell in SQL Server

SQL Server 2008 introduces support for Microsoft PowerShell. PowerShell is a new command-line shell and scripting language which offers more functionality than Windows command prompt utilities.


Read more about powershell here
http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

I was planning to blog about this for long time but the problem is, PS is an ocean in itself. For sure, very powerful and very useful in day to day DBA activities. When I started exploring this utility I could figure out many scenarios where SQLPS can really help. Like , backup file management(or for that matter any file management) like moving/deleting old backup files. With my little knowledge and experience in SQL Server I am sure we can write many utility tools using SQLPS.

How to go to SQL Powershell (PS) prompt

(a) Start - - ->> Run -- ->> Cmd -- ->> sqlps
(b) I got a prompt like “PS C:\Documents and Settings\Madhu>
(c) Now I want to change the location to my SQL Server instance and databases

PS C:\Documents and Settings\Madhu>Set-Location SQL:\LHIL075\SQL2008FEB\Databases

OR

PS C:\Documents and Settings\Madhu>CD SQL:\LHIL075\SQL2008FEB\Databases

Note : CD is an aliases for Set-Location cmdlet.

(d) I get the following prompt
PS SQL:\LHIL075\SQL2008FEB\Databases>
(e) Now you can do Get-ChildItem or DIR to get the object in this node


There are many useful cmdlets available. In the coming months we should find many useful utilities using PS.

Using Invoke-Sqlcmd (one of the cmdlet available)
The Invoke-Sqlcmd cmdlet lets you run your sqlcmd script files in a PowerShell environment. Much of what you can do with sqlcmd can also be done using Invoke-Sqlcmd.
This is an example of calling Invoke-Sqlcmd to execute a simple query, similar to specifying sqlcmd with the -Q and -S options:

Summary
Simple... need to explore more. I will be updating with few sample scripts soon