Wednesday, February 13, 2008

SQL Server 2008 - Change Data Capture

What is Change Data Capture?

This new feature of SQL Server 2008 will capture all the data modifications(ie. DML- Insert /update / delete) on a table if you configure that table for Change Data Capture. SQL Server community were waiting for this feature for long time. Though earlier versions were provided with few functions and DBCC commands to read Transaction Log, it was never been a complete solution. In couple projects we had these kind of requirements and had to track each Data changes for Audit purpose. With lot of efforts we wrote our own application using Triggers and History table(s).

Change Data Capture is the mechanism to track and capture DML (Insert/Update/Delete) on a Table and stored the captured data in Relational format. A table or few columns in a table can be configured for Data capturing. All data manipulation language (DML) are read from the transaction log and captured in the associated change table. This may be considered as Customized Replication log reader. This is an asynchronous mechanism and there will not be any performance issue for the actual DML which changed the data what so ever. SQL 2008 has set of table valued function to query this trapped data to create report or synchronize standby db. Of course when you log / tack data you may have extra IO that you can not avoid. You can also capture only required columns changes from the table.

How it works?

Each insert or delete operation that is applied to a source table appears as a single row within the change table. The data columns of the row resulting from an insert operations contain the column values after the insert; the data columns of the row resulting from a delete operation contain the column values prior to the delete. An update operation, however, requires two row entries: one to identify the column values before the update and a second to identify the column values after the update.


Pre-Requisite :

(a) SQL Server 2008 Enterprise, Developer, and Evaluation editions : Many of use would not like to here this but it’s the fact.
(b) SQL Server Agent should be up and running

Pre-Conditions

(a) You should not have any Schema already existing with the name cdc. change data capture requires exclusive use of the cdc schema and cdc user. If either a schema or database user named cdc currently exists in a database, the database cannot be enabled for change data capture until the schema and or user are dropped or renamed.

(b) The table columns which you are going to capture should not contain the following reserved column names: __$start_lsn, __$end_lsn, __$seqval, __$operation, and __$update_mask.

(c) captured_column_list cannot contain columns defined with a data type introduced in SQL Server 2008.

Best Practices

(a) Create a separate filegroup for change data capture change tables
(b) Configure only the required table and required columns for CDC

Permission

A member of SYSADMIN fixed server role can enable CDC on the database.
A member of DB_Owner fixed database role can enable can create a capture instance for individual source tables


My questions before starting Change Data Capture process testing

(a) What Edition will support this feature?
(b) Do we need to have Primarykey on the table which we need to track?
(c) What will happen when and CDC enabled table schema changes?
(d) Do we need to follow any extra care when we do Schema changes to CDC enabled tables?
(e) If you Disable CDC from a table, what will happened to the previous data captured.


1. Enable database for CDC

Check whether the database is already enabled CDC or not. Run this query
Select is_cdc_enabled From sys.databases where name='GPx'.

If this query return 0 then its not enabled.

USE AdventureWorks;
GO
EXECUTE sys.sp_cdc_enable_db ;

When a database is enabled for change data capture, the cdc schema, cdc user, metadata tables, and other system objects are created for the database. Just make sure these objects are created.

2. Enabling a Source Table for Change Data Capture :-

By default, all of the columns in the source table are identified as captured columns. If only a subset of columns need to be tracked then use the @captured_column_list parameter to specify the subset of columns


Dynamic Management views for CDC.

Following DMVs can be used to display information about Change Data Capture and Log sessions

(a) sys.dm_cdc_log_scan_sessions
(b) sys.dm_cdc_errors
(c) sys.dm_repl_traninfo

Test Case :-

I have a table called CDCTest to be enabled for CDC and I planned to capture all the columns of the tables.

Enable Table CDCTest for Change Data Capturing


EXECUTE sys.sp_cdc_enable_table
@source_schema = N'dbo'
, @source_name = N'CDCTest'
, @role_name = N'cdc_Admin';
GO

Messages
LHI-115\SQL2008(LHI-115\Madhusudanan):
Job 'cdc.GPx_capture' started successfully.
LHI-115\SQL2008(LHI-115\Madhusudanan):
Job 'cdc.GPx_cleanup' started successfully.

Do some DML Operation on this table

delete top (3) from cdctest
Insert CDCTest (PLIContainerID,CenterTypeID,TotalVisits) select 1,222,333

Update CDCTest set Code='xxxxx' where plicontainerid between 1 and 20

Query Functions to get the trapped data.

DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);

SET @begin_time = GETDATE()-1;
SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

-- Return the changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_CDCTest(@from_lsn, @to_lsn, 'all');


Wrapper Stored Procedure to get CDC captured information for a given table

IF OBJECT_ID(N'dbo.spGetCDCInformationForATable') IS NOT NULL
BEGIN
PRINT 'Dropping Stored Procedure: [dbo].spGetCDCInformationForATable'
DROP PROCEDURE dbo.spGetCDCInformationForATable
END

PRINT 'Creating Stored Procedure: [dbo].spGetCDCInformationForATable'

GO
/******************************************************************************

Copyright 2008

Procedure : [spGetCDCInformationForATable]
Author :
on : <08-Feb-2007>
Description : Get Data captured through Change Data Capture for a given Table

Pending Issue : Nil

Returns : @Err

Date Modified By Description of Change

------- ---------------- --------------------------------
****************************************************************************************/

Create Procedure spGetCDCInformationForATable
@TableName sysname,
@SourceTableSchmea sysname,
@begin_time datetime,
@end_time datetime
As
Begin

DECLARE @from_lsn binary(10), @to_lsn binary(10);


-- Map the time interval to a change data capture query range.
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

-- Return the changes occurring within the query window.

Declare @SqlStmt nvarchar(1000)
Set @SqlStmt ='SELECT * FROM cdc.fn_cdc_get_all_changes_'+@SourceTableSchmea+'_'+@TableName+'(@from_lsn, @to_lsn, ''all'')'

Exec sp_ExecuteSQL @SqlStmt,@params=N'@from_lsn binary(10), @to_lsn binary(10)' ,@from_lsn =@from_lsn ,@to_lsn=@to_lsn


End

What will happen when and CDC enabled table schema changes?

You can very well add a new column. Obeviously, data in this column will not be tracked. Then what will happen if the column is dropped. CDC is not tracking any Schema Changes it seems. If you add or drop column it don’t care. CDC function still return the same number of columns as when the CDC is enabled

alter table cdctest Add column ColAddedAfterEnableCDC int

insert into cdctest(PLIContainerID,ColAddedAfterCDCEnabled) select 111111111 ,222

alter table cdctest drop column completedon

DDL modifications to the source table that change the source table column structure, such as adding or dropping a column, or changing the data type of an existing column, are maintained in the cdc.ddl_history table. These changes can be reported by using this stored procedure. Entries in cdc.ddl_history are made at the time the capture process reads the DDL transaction in the log.

You can track the DDL activity on a CDC enabled Table using sys.sp_cdc_get_ddl_history system stored procedure. This sp Returns the data definition language (DDL) change history associated with the specified capture instance since change data capture was enabled for that capture instance


Disable Change Data Capturing

sys.sp_cdc_disable_db_change_data_capture :
Disable Change Data Capturing on current database

sys.sp_cdc_disable_table_change_data_capture
Disable Change data capture for the specified source table in the current database. sys.sp_cdc_disable_table_change_data_capture drops the change data capture change table and system functions associated with the specified source table and capture instance. It deletes any rows associated with the specified capture instance from the change data capture system tables and sets the is_tracked_by_cdc column in the sys.tables catalog view to 0.

No comments: