10/14/2008 23:13:37

Note:

This sample works only with SQL Server 2008 Enterprise and SQL Server 2008 Developer. This sample will not work with any version of SQL Server earlier than SQL Server 2008, or on any edition other than SQL Server 2008 Enterprise or SQL Server 2008 Developer.

The Change Data Capture LSN Based Sample illustrates how to incorporate Change Data Capture technology into an SSIS based ETL environment. It is a complete end to end sample that includes both workload generation and the initial load of target tables as part of the sample setup. After the initial load completes the master package is periodically invoked to gather the changes occurring since the last request and then update the target replicas to reflect those changes.

This sample makes use of Log Sequence Numbers (LSNs) to specify the range for the Change Data Capture enumeration functions that are used to request change data. See the sample “Change Data Capture for Specified Interval Package Sample” to see how TVF wrapper functions can be used to allow datetime values to be used to specify the query range when querying for change data.

The AdventureWorks2008 database is used as the source environment, where existing tables are used to populate comparable tables in the CDCSample schema. The destination tables are created in the CDCSample schema in the database AdventureWorksDW2008.

Important:

Samples are provided for educational purposes only. They are not intended to be used in a production environment and have not been tested in a production environment. Microsoft does not provide technical support for these samples.

Requirements

To run this sample package, the following requirements must be met:

·         The sample package and data files that the sample uses must be installed on the local hard disk drive.

·         You must have installed and have administrative permissions on the AdventureWorks2008 OLTP and AdventureWorksDW2008 databases.

·         You must install SQL Server Database Engine to install support for the change data capture feature. You must also install Replication when you install SQL Server Database Engine.

·         If you intend only to run the sample from the command line, you must install Integration Services.

·         If you want to open and run the sample in SSIS Designer, you must install Business Intelligence Development Studio.

For more information about how to install samples, see "Installing Sample Integration Services Packages" in SQL Server Books Online.

Location of the Sample Files

This sample consists of the following files:

·         The solution file and Readme file.

·         The project files and packages.

·         The Transact-SQL script files.

Location of the Solution File and Readme File

If the samples were installed to the default location, the Change Data Capture since Last Request Package Sample solution file and the README file are located in the following folder:

<drive>:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture since Last Request Package Sample\

The following table provides the complete name and a short description of the files in this folder.

File

Description

Change Data Capture Sample.sln

The Change Data Capture since Last Request Package Sample solution file.

Readme_ChangeDataCapturesinceLastRequestPackageSample.htm

Readme for the Change Data Capture since Last Request Package Sample.

Location of the Project Files and Packages

The following folder is the default location of the project files and packages for this sample:

<drive>:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture since Last Request Package Sample\Change Data Capture Sample\

The following table provides the complete name and a short description of the files in this folder.

File

Description

Change Data Capture Sample.dtproj

The Change Data Capture since Last Request Package Sample project file.

SetupCDCSample.dtsx

The setup package and sample test harness.

MasterCDC.dtsx

The master package used to perform an incremental load cycle.

CDCCustomerExtract.dtsx

Child package used to gather the changes to the Customer table for the extraction interval and apply them to the destination.

CDCCreditCardExtract.dtsx

Child package used to gather the changes to the CreditCard table for the extraction interval and apply them to the destination.

CDCWorkOrderExtract.dtsx

Child package used to gather the changes to the WorkOrder table for the extraction interval and apply them to the destination.

Location of the Transact-SQL Script Files

The following folder is the default location of the Transact-SQL script files for this sample:

<drive>:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture since Last Request Package Sample\Change Data Capture Sample\Scripts\

File

Description

CDCSetupTables.sql

The sample setup file.

CDCCustomerInsert.sql

Script to apply Inserts to CDCSample.Customer table.

CDCCustomerModify.sql

Script to apply Updates to CDCSample.Customer table.

CDCCreditCardInsert.sql

Script to apply Inserts to CDCSample.CreditCard table.

CDCCreditCardModify.sql

Script to apply Updates to CDCSample.CreditCard table.

CDCWorkOrderInsert.sql

Script to apply Inserts to CDCSample.WorkOrder table.

CDCWOrkOrderModify.sql

Script to apply Updates to CDCSample.WorkOrder table.

CDCCleanup.sql

Script used to remove all objects created by the sample.

To apply run sqlcmd –E –iCDCCleanup.sql

 

Note:

The SetupCDCSample package includes the package variable SQLServerInstallPath which defaults to c:\Program Files\Microsoft SQL Server\. The package variable BasePath is derived from this value, assuming the root of the installed Samples is at c:\Program Files\Microsoft SQL Server\100\Samples\. If you used a different install path for SQL Server or for installing the samples, these variables should be modified to reflect the appropriate paths.

Running the Sample

To run the package, you can use either the dtexec command prompt utility (dtexec.exe) or BI Development Studio.

To run the package by using the dtexec utility

1.  Open a Command Prompt window.

2.  Use the Change Directory command, cd, to change the directory that contains the dtexec utility.

The default location of the dtexec utility is <drive>:\Program Files\Microsoft SQL Server\100\DTS\Binn.

3.  At the command prompt, type the following command:

dtexec /f "C:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture since Last Request Package Sample\Change Data Capture Sample\SetupCDCSample.dtsx "

4.  Press ENTER.

For more information about how to use the dtexec utility run a package, see the topic, "dtexec Utility," in SQL Server Books Online.

To run the package in Business Intelligence Development Studio

1.  In BI Development Studio, on the File menu, point to Open, and then click Project/Solution.

2.  For Files of type, select Integration Services Project Files (.dtproj). Then, locate the Change Data Capture since Last Request Package Sample/Change Data Capture Sample* folder, and double-click the file, Change Data Capture Sample.dtproj.

3.  In Solution Explorer, in the SSIS Packages folder, right-click SetupCDCSample.dtsx, and then click Execute Package.

Sample Packages and Their Components

The sample consists of five packages:

The setup packageSetupCDCSample

The setup package acts as a test harness for the sample. As a test harness, this package does the following processes:

·         Performs all the required initialization. This includes creating a database snapshot.

·         Generates a data manipulation language (DML) workload against the source tables while cycling the master package at 10 second intervals to perform an incremental load of the target tables for all changes processed by the capture process since the last request.

·         Monitors the progress of the run, and stops the sample after all the DML workload has been processed and applied to the destination tables.

·         Validates, at the end of the run, the content of the destination tables. The setup package compares the destination tables with the content of the source tables, and then records the results in the event log.

The master packageMasterCDC

The master package performs an incremental load of the target tables for the interval provided as input.

To determine whether to continue with the incremental load, the master package checks the endpoints of the interval:

·         If both endpoints for the extraction interval are in range, the master package starts the three child packages, which perform the incremental loads for the individual tables.

·         If either the low endpoint or the high endpoint is outside the interval where change data exists for a source table, the master package logs an error and stops the sample.

After all the child packages have finished running, the master package logs a message to the event log that indicates the sample has finished for the interval. The master package then stops the sample.

Three child packagesExtract Customer Data, Extract CreditCard Data and Extract WorkOrder Data

The structure of each of these child packages is identical:

·         Each child package first creates the query that extracts the changed data.

·         Then, the child package runs a data flow task that performs the incremental load of the changed data. The data flow task uses information in the result set that the query returns to split the returned rows. The data flow splits the rows into three distinct flows—insert, update, or delete—that correspond to the operation required to apply the row to the destination. The data flow task then runs the tasks that handle the various flows.

The following sections describe the components found in each package.

SetupCDCSample Package Components

The SetupCDCSample package has several annotated functional groups:

·         Initialize Environment and Enable Change Data Capture

·         Generate Workload

·         Verify Capture Process is Running

·         Perform Initial Load for Target Tables

·         Prepare for Initial Extraction

·         Extract and Process Change Data

·         Validate Incremental Load and Report Run Status

These groups help organize the content as described in the following sections.

Note:

To view the annotated functional groups, open the SetupCDCSample package in Business Intelligence Development Studio. To see a larger view of the package, click the four-way arrow in the lower-right corner of the Control Flow tab, and then position the view window over the package.

Initialize Environment and Enable Change Data Capture

This functional group contains a single Execute SQL task. The purpose of this task is to do the following:

·         Create the sample source tables from existing AdventureWorks2008 tables, and then enable Change Data Capture for these tables.

·         Create the target tables in the AdventureWorks2008DW database.

The following table lists the task that this functional group uses.

Name

Element

Purpose

Create Tables and Enable Change Data Capture

Execute SQL task

Creates the sample source tables from existing AdventureWorks2008 tables:

·         Checks for any objects that were created in previous runs of the sample, and deletes these objects, if present.

·         Enables the AdventureWorks2008 database for change data capture.

·         Creates the CDCSample schema.

·         Creates the tables, CDCSample.Customer, CDCSample.CreditCard, and CDCSample.WorkOrder, in the CDCSample schema. (These tables mirror the comparable tables in the dbo schema.)

Create the target tables in the AdventureWorks2008DW database :

·         Checks for any objects that were created in previous runs of the sample, and then deletes these objects, if present.

·         Creates the CDCSample schema.

·         Creates the target tables, Customer, CreditCard, and WorkOrder, in the CDCSample schema.

·         Initializes the tables, CDCSample.Customer, CDCSample.CreditCard, and CDCSample.WorkOrder, with a subset of the table data from their parent AdventureWorks2008 tables.

·         Enables change data capture on the parent AdventureWorks2008 tables.

Generate Workload

This functional group contains six Execute SQL tasks and a Script task:

·         Each of the three source tables has two Execute SQL tasks: one that generates insert activity against the table and one that generates update activity. Three of the six Execute SQL tasks run after change data capture has been enabled on the source tables. The remaining three run as each of the first three workload tasks finishes.

·         When all the Execute SQL tasks have run, the Script task runs. The Script task sets the package variables that indicate that the workload has finished.

The following table lists the tasks that this functional group uses.

Name

Element

Purpose

Insert Customer Table

Execute SQL task

Applies inserts to the AdventureWorks2008.CDCSample.Customer table.

Modify Customer Table

Execute SQL task

Applies updates to the AdventureWorks2008.CDCSample.Customer table.

Insert CreditCard Table

Execute SQL task

Applies inserts to the AdventureWorks2008.CDCSample.CreditCard table.

Modify CreditCard Table

Execute SQL task

Applies updates to the AdventureWorks2008.CDCSample.CreditCard table.

Insert WorkOrder Table

Execute SQL task

Applies inserts to the AdventureWorks2008.CDCSample.WorkOrder table.

Modify WorkOrder Table

Execute SQL task

Applies updates to the AdventureWorks2008.CDCSample.WorkOrder table.

Mark Workload Completion

Script task

Sets the following package variables:

·         The WorkloadCompleted package variable that indicates whether the workload has finished.

·         The WorkloadEndTime package variable that indicates the time at which the workload finished.

This task uses Visual Basic.

Verify Capture Process is Running

This functional group contains one Execute SQL tasks and one Script task.

·         The Execute SQL task runs after the environment has been initialized and the source tables have been enabled for change Data capture. It is used to verify that the Change Data Capture process has auto-started.

·         If the Change Data Capture process has not auto-started, the Script task runs to log an error in the event log. This error message indicates that SQL Agent must be running in order for the capture process to auto-start. Then the sample stops running. If the capture process is running, the task to create the database snapshot is allowed to proceed in preparation for the initial load of the target tables.

Perform Initial Load for Target Tables

This functional group contains one Script task and three Data Flow tasks:

·         The Script task, Create Database Snapshot, creates a database snapshot of the AdventureWorks2008 database. This snapshot provides the source tables for the initial load of the target tables.

·         The three data flow tasks then extract data from the snapshot tables and apply this data to the destination tables in the AdventureWorksDW2008 database.

The following table summarizes these tasks.

Name

Element

Purpose

Create Database Snapshot

Execute SQL task

Creates a database snapshot for AdventureWorks2008.

Load Target Customer Table from Snapshot

Data FLow task

Uses an OLE DB source to extract data from the snapshot table CDCSample.Customer and an OLE DB destination to apply the rows to the table AdventureWorksDW2008.CDCSample.Customer.

Load CreditCard Table from Snapshot

Data Flow task

Uses an OLE DB source to extract data from the snapshot table CDCSample.CreditCard and an OLE DB destination to apply the rows to the table AdventureWorksDW2008.CDCSample.CreditCard.

Load WorkOrder Table From Snapshot

Data Flow task

Uses an OLE DB source to extract data from the snapshot table CDCSample.WorkOrder and an OLE DB destination to apply the rows to the table AdventureWorksDW2008.CDCSample.WorkOrder.

Prepare for Initial Extraction

This functional group contains one Execute SQL task, which is used to determine an appropriate LSN range to synchronize the first incremental load to the initial load.

Name

Element

Purpose

Get End-points for Initial Query Interval

Execute SQL task

Extracts the last LSN included in the database snapshot from snapshot metadata. It then increments this value so it can serve as the low LSN value for the first incremental load applied after initialization. The task then delays until the capture process has processed through this LSN value in the transaction log. Once this condition is true, the current maximum LSN is identified as the high LSN value for the initial query interval. The binary LSN values to be used for the initial query interval are then converted to strings using sys.fn_varbintohexstr, to be saved as package variables User::ExtractStartLSN and User::ExtractEndLSN. The cdc.lsn_time_mapping table is used to associate an approximate datetime value with the low end point, which is saved as package variable User::ExtractStartTime.

Extract and Process Change Data

This functional group contains a For loop container that includes two tasks:

·         An Execute Package task that run to gather and apply the change data associated with the current query interval to the target tables.

·         An Execute SQL Task that is used to first delay for 10 seconds, and then compute the LSN range for the next query interval.

The For loop maintains the package variable, IntervalID. The For Loop container initializes the IntervalID package variable to 0, and then and increments this variable by 1 on each iteration of the loop. The loop continues to run until the following conditions are true:

·         The package variable WorkloadCompleted is set to 1.

·         The computed ExtractionStartTime to be used in the next iteration exceeds the recorded value for WorkloadEndTime, also a package variable.

Name

Element

Purpose

Cycle Master at 10 second intervals

For Loop Container

First executes the package to gather changes for the interval and apply them to the target tables. It then executes an Execute SQL task to first delay for 10 seconds and then compute the new extraction interval. The For Loop Container terminates when it determines that all of the changes in the generated workload have been applied to the destination tables.

Run Master to Extract Data

Execute Package task

Runs the package MasterCDC that is used to apply the incremental load for a given extraction interval to the destination tables.

Delay 10 Seconds and Set New Query Interval

Execute SQL task

First converts the string representation of the high LSN value for the previous query to a binary(10) value using the function sys.fn_cdc_hexstrtobin. This value is then incremented using sys.fn_cdc_increment_lsn to serve as the low LSN value for the next query interval. After a delay of 10 seconds, the function sys.fn_cdc_get_max_lsn is used to obtain the current maximum LSN processed by the capture process. This value will serve as the high end-point of the next query window. If the low end-point is greater than the determined maximum LSN, the task will continue to delay for 10 seconds and then obtain the current maximum LSN until the low end-point does not exceed the maximum. The binary LSN values to bound the next query interval are then converted to strings using sys.fn_varbintohexstr, to be saved as package variables User::ExtractStartLSN and User::ExtractEndLSN. The cdc.lsn_time_mapping table is used to associate an approximate datetime value with the low end point, which is saved as package variable User::ExtractStartTime.

Validate Incremental Load and Report Run Status

This functional group contains an Execute SQL task and a Script task:

·         The Execute SQL task, Check for Mismatch in Replicas, uses SQL CHECKSUM to compare the source tables with the replicas and sets package variables that indicate the run status.

·         At the end of the run, the Script task, Output Run Completion Status, uses the package variables to report the results of the table comparisons in the event log.

The following table summarizes these tasks.

Name

Element

Purpose

Check for Mismatch in Replicas

Execute SQL task

Uses CHECKSUM to compare the replicas to the source tables. Based on this comparison, then sets the package variables CustomerMismatch, CreditCardMismatch, and WorkOrderMismatch, accordingly.

Output Run Completion Status

Script task

Outputs a status message to the event log at the end of the run. (This message reports on the parity of the package variables, CustomerMismatch, CreditCardMismatch, and WorkOrderMismatch.) Then, sets the run status accordingly.

MasterCDC Package Components

When the SetupCDCSample package runs, five package configurations of type, "Parent package variable", enable the SetupCDCSample package to pass package variables to the MasterCDC package. The following table lists these package variables.

Parent

Child

User::ExtractEndLSN

ExtractEndLSN

User::ExtractStartLSN

ExtractStartLSN

User::IntervalID

IntervalID

User::BasePath

BasePath

The following tables list the tasks, containers, data sources and destinations, and transformations that are used within the master package.

Name

Element

Purpose

Wait for Capture Process to Process Extraction Interval

For Loop container

Contains the following components:

·         An Execute SQL task that sets a package variable that indicates the status of the proposed extraction interval relative to the capture process.

·         A Script task that generates an entry in the error log if the query interval is invalid, and causes the package to stop running. The query interval is invalid if either the low end-point or the high end-point do not lie within the change data capture validity interval for the database.

If the interval is valid, this container also starts the Execute Packages tasks that run the child packages that apply the change data to the destination tables.

After the child packages complete, this container also runs a Script task that records that the extraction interval has finished in the event log.

Check for Valid Interval

Execute SQL task

Sets the DataReady package variable to one of the following values:

0 = Invalid query interval

·         1 = Valid query interval

(This task runs in the For Loop container.)

Log Extract Error

Script task

Logs an error to the event log when the DataReady package variable is set to 0.

(This task runs in the For Loop Container.)

Extract Customer Data

Execute Package task

Runs the CDCCustomerExtract package after the For Loop container finishes, and the DataReady package variable is set to 1

Extract CreditCard Data

Execute Package task

Runs the CDCCreditCardExtract package after the For Loop container finishes, and the DataReady package variable is set to 1.

Extract WorkOrder Data

Execute Package task

Runs the CDCWorkOrderExtract package after the For Loop container finishes, and the DataReady package variable is set to 1.

Log Extraction Complete

Script task

Enters a message in the event log that indicates that the extraction interval has finished.

This task runs after all child packages have finished.

CDCCustomerExtract Package Components

When the MasterCDC package runs, two package configurations of type, "Parent package variable", enable the MasterCDC package to pass package variables to the CDCCustomerExtract package. The following table lists these package variables.

Parent

Child

User::ExtractEndLSN

EndLSN

User::ExtractStartLSN

StartLSN

The following table lists the tasks, containers, data sources and destinations, and transformations that the CDCCustomerExtract package uses.

Name

Element

Purpose

Generate SQL Data Query

Script task

Uses package variables StartLSN and EndLSN to determine the LSN values to supply in the generated function call to be used to query for change data. The query is deposited in the package variable SQLDataQuery.

Process Change Data

Data Flow task

Uses an OLE DB source to query for change data. A data flow component then separates the returned rows into flows—insert, delete, and update—that correspond to the operation required to apply the row to the destination. An OLE DB destination handles the insert flow. Data Flow components that run SQL commands for each row handle the delete and update flows.

Query for Change Data

OLE DB source

Uses the query that the SQLDataQuery package variable supplies to extract change data. The query uses the connection manager, AdventureWorks2008.

Filter Based on Operation

Conditional Split transformation

Divides the results set that the query returns into three different flows. The transformation uses the value of the __$operation column in each returned row to divide the results set:

·         If the value is 2, directs the row to the insert flow.

·         If the value is 4, directs the row to the update flow.

·         If the value is 1, directs the row to the delete flow.

Customer Inserts

OLE DB destination

Processes the insert flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then inserts the specified rows into the target Customer table.

Customer Deletes

OLE DB Command transformation

Processes the delete flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then deletes the specified rows from the target Customer table.

Customer Updates

OLE DB Command transformation

Processes the update flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then updates the specified rows in the target Customer table.

CDCCreditCardExtract Package Components

When the MasterCDC package runs, two package configurations of type, "Parent package variable", enable the MasterCDC package to pass package variables to the CDCCreditCardExtract package. The following table lists these package variables.

Parent

Child

User::ExtractEndLSN

EndLSN

User::ExtractStartLSN

StartLSN

The following table lists the tasks, containers, data sources, data destinations, and transformations that the CDCCreditCardExtract package uses.

Name

Element

Purpose

Generate SQL Data Query

Script task

Uses package variables StartLSN and EndLSN to determine the LSN values to supply in the generated function call to be used to query for change data. The query is deposited in the package variable SQLDataQuery.

Process Change Data

Data Flow task

Uses an OLE DB source to query for change data. A data flow component then separates the returned rows into flows—insert, delete, and update—that correspond to the operation required to apply the row to the destination. An OLE DB destination handles the insert flow. Data Flow components that run SQL commands for each row handle the delete and update flows.

Query for Change Data

OLE DB source

Uses the query that the SQLDataQuery package variable supplies to extract change data. The query uses the connection manager, AdventureWorks2008.

Filter Based on Operation

Conditional Split transformation

Divides the results set that the query returns into three different flows. The transformation uses the value of the __$operation column in each returned row to divide the results set:

·         If the value is 2, directs the row to the insert flow.

·         If the value is 4, directs the row to the update flow.

·         If the value is 1, directs the row to the delete flow.

CreditCard Inserts

OLE DB destination

Processes the insert flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then inserts the specified rows into the target CreditCard table.

CreditCard Deletes

OLE DB Command transformation

Processes the delete flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then deletes the specified rows from the target CreditCard table.

CreditCard Updates

OLE DB Command transformation

Processes the update flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then updates the specified rows in the target CreditCard table.

CDCWorkOrderExtract Package Components

When the MasterCDC package runs, two package configurations of type, "Parent package variable", enable the MasterCDC package to pass package variables to the CDCWorkOrderExtract Extract package. The following table lists these package variables.

Parent

Child

User::ExtractEndLSN

EndLSN

User::ExtractStartLSN

StartLSN

The following table lists the tasks, containers, data sources, data destinations, and transformations that the CDCWorkOrderExtract package uses.

Name

Element

Purpose

Generate SQL Data Query

Script task

Uses package variables StartLSN and EndLSN to determine the LSN values to supply in the generated function call to be used to query for change data. The query is deposited in the package variable SQLDataQuery.

Process Change Data

Data Flow task

Uses an OLE DB source to query for change data. A data flow component then separates the returned rows into flows—insert, delete, and update—that correspond to the operation required to apply the row to the destination. An OLE DB destination handles the insert flow. Data Flow components that run SQL commands for each row handle the delete and update flows.

Query for Change Data

OLE DB source

Uses the query that the SQLDataQuery package variable supplies to extract change data. The query uses the connection manager, AdventureWorks2008.

Filter Based on Operation

Conditional Split transformation

Divides the results set that the query returns into three different flows. The transformation uses the value of the __$operation column in each returned row to divide the results set:

·         If the value is 2, directs the row to the insert flow.

·         If the value is 4, directs the row to the update flow.

·         If the value is 1, directs the row to the delete flow.

WorkOrder Inserts

OLE DB destination

Processes the insert flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then inserts the specified rows into the target WorkOrder table.

WorkOrder Deletes

OLE DB Command transformation

Processes the delete flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then deletes the specified rows from the target WorkOrder table.

WorkOrder Updates

OLE DB Command transformation

Processes the update flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then updates the specified rows in the target WorkOrder table.