Ssis Unit Testing Tool

For testing SPAs, we recommended a tool such as Selenium, which can automate a browser. Introduction to integration tests Integration tests evaluate an app's components on a broader level than unit tests. Unit tests are used to test isolated software components, such as individual class methods. In this episode Dmitry is joined by Eric Kang, Program Manager for SQL Server Data Tools (SSDT) who is back on the show to talk about Unit Testing for database projects and how it integrates with the.

-->

August 2013

Volume 28 Number 8

By Pavle Guduric

I worked on a project where we built extract, transform and load (ETL) processes with more than 150 packages. Many of them contained complex transformations and business logic, thus were not simple “move data from point A to point B” packages. Making minor changes was not straightforward and results were often unpredictable. To test packages, we used to fill input tables or files with test data, execute the package or task in Microsoft Business Intelligence Development Studio (BIDS), write a SQL query and compare the output produced by the package with what we thought was the correct output. More often we just ran the whole ETL process on a sample database and just sampled the output data at the end of the process—a time-consuming and unreliable procedure. Unfortunately, this is a common practice among SQL Server Integration Services (SSIS) developers. Even more challenging is to determine what effects the execution of one package has on subsequent packages. As you build your ETL process, you create a network of connected packages and different resources. It’s difficult to maintain a complete overview of the numerous dependencies among all of these at all times.

This article explains how to perform unit and integration testing of SSIS packages by introducing a library called SSISTester, which is built on top of the managed SSIS API. After reading this article you should be able to use the described techniques and tools to automate unit and integration testing of your existing and new SSIS projects. To understand the article, you should have previous experience with SSIS and C#.

SSISTester

When I started thinking about a testing framework for SSIS packages, I found three aspects to be important. First, I wanted to have a similar UX to writing tests using the Visual Studio testing framework, so the typical methodology involving setup, verification and cleanup (aka teardown) steps had to be applied. Second, I wanted to use existing and proven tools to write, execute and manage tests. Once again, Visual Studio was the obvious choice. And third, I wanted to be able to code tests in C#. With that in mind I wrote SSISTester, a .NET library that sits on top of the SSIS runtime and exposes an API that allows you to write and execute tests for SSIS packages. The main logical components of the library are depicted in Figure1.

Figure 1 Logical Components of the SSISTester Library

The Package Repository is used to store raw XML representations of target packages. Each time a test is executed, a new instance of the Microsoft.SqlServer.Dts.Runtime.Package class is deserialized from XML with all fields and properties set to their default values. This is important because you don’t want different tests that target the same package to accidently reuse any of the values set by previous tests.

Instances of test classes are stored within the Test Repository. These classes contain methods that implement your test cases. When a test is executed, these methods are called by the Test Engine. The specific rules that must be followed when creating test classes will be described in detail later.

Metadata contains the attributes needed to decorate a test class so it can be recognized as a test implementation. The Test Engine looks for these attributes when loading tests into the Test Repository.

The Test Context represents a set of classes that provide access to the runtime information during different phases of the test execution. For example, you can use these classes to access different aspects of a package being tested, such as variables, properties, preceding constraints, connection managers, currently executing task, package errors and so forth.

The Test Engine refers to the core classes and interfaces of the SSISTester API that directly utilize the managed SSIS runtime. They are used to load packages and test classes into their respective repositories, as well as to execute tests and to create test results.

Mini ETL

To create packages and test classes, I’ll use Visual Studio 2012 and SQL Server 2012, and I’ll use three packages to illustrate a simple ETL scenario in which customer data, delivered as a text file, is transformed and stored within a database. The packages are CopyCustomers.dtsx, LoadCustomers.dtsx and Main.dtsx. CopyCustomers.dtsx copies the Customers.txt file from one location to another and on the way it converts all customer names to uppercase text. Customers.txt is a simple CSV file that contains ids and names of customers, like so:

LoadCustomers.dtsx loads the converted names into the Demo database. Before it loads data into a target table called CustomersStaging, it truncates all previously stored data. At the end of the process, it stores the number of customers into a variable. Here’s the script to create the Demo database and the CustomersStaging table:

The package Main.dtsx contains two Execute Package tasks that execute the sub-packages CopyCustomers.dtsx and LoadCustomers.dtsx, respectively. Connection managers in both CopyCustomers.dtsx and LoadCustomers.dtsx are configured using expressions and package variables. The same package variables are retrieved from the parent package configuration when executed from within another package.

Creating Unit Tests

To begin, create a console project and add assembly references to SSIS.Test.dll and SSIS.Test.Report.dll. I’m going to create a unit test for the CopyCustomers.dtsx package first. Figure 2 shows the control flow (left) and data flow (right) for CopyCustomers.dtsx.

Figure 2 Control Flow (Left) and Data Flow (Right) of the CopyCustomers.dtsx Package

Every unit test is implemented in a single class that derives from the BaseUnitTest class and must be decorated with the UnitTest attribute:

The UnitTest attribute marks a class as a unit test implementation so it can be found by the Test Engine. The first parameter corresponds to the Package Repository where a target package will be loaded during test execution, CUSTOMERS in this example. The second parameter can be the name of a target package, the path to a task in the control flow, the path to an event handler or the path to a preceding constraint. In this example it’s the name of the CopyCustomers.dtsx package because I want to test the whole package. Basically, the UnitTest attribute tells the Test Engine to look for the CopyCustomers.dtsx package in the CUSTOMERS repository and execute it during the CopyCustomersTest test.

The base class BaseUnitTest that all unit test implementations need to derive from contains three methods that have to be implemented: Setup, Verify and Teardown.

These three methods are executed during different test phases. The Setup method runs before a target package is executed by the Test Engine. Setup prepares the package and all inputs and outputs the package depends on so it can be successfully validated and executed. In the following example, I set paths to the package variables that are used as connection strings in the connection managers:

After the Setup method has successfully executed, Test Engine executes the target package. When the package has executed, Test Engine calls the Verify method and I can check whether my assertions are true:

The first assert checks whether the package has executed successfully. The second one determines whether the FST Copy Source File file system task copied the nc1CustomersCustomers.txt file to the C:TestFilesArchive folder. The last two asserts validate whether the DFT Convert Customer Names data flow task correctly converted company names to uppercase. Earlier, I briefly described the testing context. Here you can see how I used the context parameter to access a package object within the Setup and Verify methods.

At the end of the test, I use the Teardown method to delete the files that were copied or created by the package:

Testing Control Flow Tasks

Tests can target specific tasks in the control flow as well. For example, to test the DFT Load Customers data flow in the LoadCustomers.dtsx package, I used an additional parameter of the UnitTest attribute, called ExecutableName, to tell the Test Engine that I want to test this task:

ExecutableName represents the path that combines names of all nested containers beginning with a package name.

Control and data flow for LoadCustomers.dtsx are shown in Figure3.

Figure 3 Control Flow (Left) and Data Flow (Right) of the LoadCustomers.dtsx Package

When a test targets a specific task, only that task is executed by the Test Engine. If the successful execution of the target task depends on the execution of preceding tasks, the results of executing those tasks need to be manually generated. The DFT Load Customers data flow expects the target table to be truncated by the SQL Truncate CustomersStaging task. Further, the data flow expects the transformed Customers.txt file at a specific location. Because this file is created by the CopyCustomers.dtsx package, I need to copy it manually. Here’s the Setup method that does all this:

By using File.Copy, I copy the Customers.txt to the location expected by the data flow. Then I use the DataAccess property of the SetupContext to execute a truncate statement on the target table. This property exposes a lightweight ADO.NET wrapper that enables you to execute SQL commands without having to use SqlConnection and SqlCommand classes every time you want to access the database. At the end, I use the Package property to set the connection strings to the underlying connection managers.

Testing Preceding Constraints

Writing tests that target preceding constraints is also possible. For example, the CountConstraint that precedes the SCR CheckCount script task in the LoadCustomers.dtsx package has an expression that checks whether the variable CustomerCount is greater than zero. If this expression evaluates to true and the SEQC Load task executes successfully, then the script task is executed. Figure 4 shows the complete unit test.

Figure 4 The Complete Unit Test

To prepare the precedence constraint to be tested, I need to do two things. First, I have to set the CustomerCount variable to some value, because the expression in the precedence constraint refers to it. In this case, I choose 0. Next, I set the execution result of the preceding task to success, failure or completion. I do this by using the SetExecutionResult method to simulate success of the preceding task. This means that CountConstraint should evaluate to false and this is what I expect in the Verify method. You can have only one class where you implement unit tests for all preceding constraints in a package. Therefore, there’s no target path to the particular constraint in the UnitTest attribute, only a Bool flag that tells the engine that this is a unit test class for precedence constraints. The reason for this is that with precedence constraints, there’s no need to execute the package or task before the Verify method is called.

Executing Unit Tests

Before I can execute my tests, I need to load target packages and tests into their repositories. To do this, I need a reference to the Test Engine. Open the Program.cs file and replace the empty Main method with this one:

The first line creates a reference to the Test Engine. To load all packages from the folder C:TargetPackages into the CUSTOMERS repository, I use the LoadPackages method. The LoadUnitTests method loads all classes in the calling assembly that are decorated with the UnitTest attribute into the specified test repository. Finally, I call ExecuteUnitTestsWithGui to start the execution of tests and to open the monitoring GUI, which is shown in Figure5.

Figure 5 The Monitoring GUI During the Execution of Tests

The GUI in Figure5 is practical if you want to test locally on your machine and you don’t want to start Visual Studio. If you’d like to test packages on a server, you could make small modifications to the program and schedule it to run tests directly on a build server, for example:

The IUnitTestEngine interface has the UnitTestResults property that lets you access test results and save them as an HTML report. I replaced ExecuteUnitTestsWithGui with ExecuteUnitTests, which doesn’t show the monitoring GUI. You could also run tests inside Visual Studio or use ReSharper so you don’t need to start the console program. To do this, I created the new class called SSISUnitTestAdapter, shown in Figure 6.

Figure 6 The SSISUnitTestAdapter Class

If you’ve worked with the Microsoft unit testing framework before, you’ll recognize the TestClass, AssemblyInitialize and TestMethods attributes. The three test methods, CopyCustomersTest, LoadCustomersTest and LoadCustomersConstraintsTest, wrap the call of the ExecuteUnitTest method, which in turn executes the Setup, Verify and Teardown methods of the class that’s passed as parameter. The Prepare method creates the Test Engine object and loads packages and unit tests into their respective repositories. I used slightly different methods called LoadRepositoryUnitTests to load tests bound to the CUSTOMERS repository only. This is useful if you don’t want to load all tests. You can execute all tests by clicking on Tests | Execute | All Tests in Visual Studio.

Creating Integration Tests

The basic idea of unit tests is to isolate all of the possible effects other packages or tasks may have on the one being tested. Sometimes it can be challenging to create a realistic test setup and the initial conditions needed for a unit test to ensure the package or task being tested behaves like a part of a complete ETL process. Because you usually implement ETL processes with a number of packages, you need to perform integration tests to be sure that each package works well when run as part of that process. The idea is to define probing points in your ETL process where you want to perform tests, without having to stop the whole process. As the process progresses and reaches the probing point, your tests are executed and you can verify a “live” work-in-progress ETL process; hence the name, “live test.”

A live test is basically a post-condition—defined for a package, task or event handler—that needs to be satisfied after the package, task or event handler has executed. This post-condition corresponds to the verification step of a unit test. Live tests are different from the unit tests because it’s not possible to prepare the test prior to package execution or to perform a clean-up step afterward. This is because unlike a unit test, a live test doesn’t execute the package; it’s the other way round: A package executes a test when it comes to the probing point for which a post-condition is defined.

Figure 7 illustrates this difference. Note the position of the package in both figures. When running unit tests, the Test Engine explicitly executes a unit test by calling its Setup, Verify and Teardown methods. A package is executed as a part of this Setup-Verify-Teardown sequence.

Figure 7 Sequence Diagrams for Unit Test (Left) and Live Test (Right) Execution

On the other hand, when running live tests, the Test Engine executes a package explicitly, which in turn triggers the execution of action methods that implement the post-conditions for a package and its tasks.

In order to create a live test for the CopyCustomers.dtsx package, I created the new class called CopyCustomers, shown in Figure 8.

Figure 8 The CopyCustomers Class

Each live test class must derive from the BaseLiveTest class, a major difference when compared with a unit test. The BaseLiveTest class is used internally by the Test Engine to execute live tests and has no methods that have to be overridden. The ActionClass attribute marks this class as a live test. The parameters are the same as when using the UnitTest attribute—repository and target package. Note that unlike unit tests where each test is implemented in a single, separate class, only one class is needed to implement all post-conditions for a package. Live test classes can have an arbitrary number of post-conditions that should be evaluated. These post-conditions correspond to the Verify method in a unit test and are implemented as methods decorated with the ActionMethod attribute. In the example in Figure 8, I have one post-condition for each task in the package and one for the package itself. ActionMethod accepts a path to the target task, which is the same as the ExecutableName in the UnitTest attribute. This tells the Test Engine to execute this method when the target task has executed. Unlike the Verify method, which is always executed, these post-conditions might not be called when, for example, the target task doesn’t execute successfully or the preceding constraint evaluates to false. The ActionContext parameter provides the same functionality as the VerificationContext.

Executing Live Tests

The steps necessary to execute live tests are slightly different than when executing unit tests. To execute live tests, replace the Main method in the Program.cs file with the code in Figure 9.

Figure 9 The Main Method for Executing Live Tests

I need an instance of ILiveTestEngine, which I create using EngineFactory. Loading packages is the same as when using IUnitTestEngine. The LoadActions method loads all actions defined in the calling assembly and is practically an equivalent of LoadUnitTests. At this point, however, the similarity with unit tests stops. Instead of executing unit tests, I tell the Test Engine to execute the Main.dtsx package by calling the ExecuteLiveTestsWithGui.

When the Main.dtsx package starts, it runs the CopyCustomers.dtsx by executing the EPT CopyCustomers task. Each successfully finished task in the CopyCustomers.dtsx triggers one of the corresponding action methods in the CopyCustomersLiveTests class. It’s important to note that this test implicitly tests the configuration settings of CopyCustomers.dtsx package.

Configured variables inherit their values from the Main.dtsx package. Please note that these variables are used as connection strings in the flat file connection managers of the CopyCustomers.dtsx package. This basically means that execution success of the tasks in the CopyCustomers.dtsx package depends on whether the value handover between these two packages works properly. This is a simple example of how interactions and dependencies between packages are tested, but you can imagine more complex scenarios where isolated unit tests wouldn’t be enough to cover the test case.

Test Engine Internals

The core class that implements the main functions of the SSISTester library is TestEngine. It’s an internal class that’s exposed through the IUnitTestEngine and ILiveTestEngine interfaces. The two methods that reveal most of the inner logic are LoadUnitTests (shown in Figure 10) and ExecuteUnitTests.

Figure 10 The LoadUnitTests Method

LoadUnitTests basically iterates all classes decorated with the UnitTest attribute and creates an instance for each. These instances are then cast to BaseUnitTest and are assigned the target package previously loaded from the package repository. At the end, all instances are saved in the _unitTests list. The method ExecuteUnitTests iterates all BaseUnitTest instances and calls ExecuteTests on each:

The actual execution of unit tests is implemented in the ExecuteTest method (shown in Figure 11) in the BaseUnitTest class.

Figure 11 The ExecuteTest Method

The most important aspect of this method is that it executes the Setup, Verify and Teardown methods as well as the target package.

Wrapping Up

The examples presented here, along with the accompanying project, should allow you to start testing your SSIS projects. Automating the testing of your SSIS packages can save you a lot of time. What’s more important, automated testing is more reliable because it’s done continuously and you can cover more packages. Once you have written tests, you can always run them during automated build processes. In the end, this means fewer errors and better quality.

Pavle Gudurićis a software engineer located in Germany. He has a master’s degree in e-business and several technical certifications, and develops business intelligence (BI) solutions in the finance industry. Reach him at [email protected].

Thanks to the following technical experts for reviewing this article: Christian Landgrebe (LPA) and Andrew Oakley (Microsoft)
Christian Landgrebe leads the database team at LPA, focused on delivering BI solutions to clients in the financial and banking industry.
Andrew Oakley is a Senior Program Manager on the patterns & practices team. Prior to becoming a Program Manager, Andrew spent two years as a Technical Evangelist for Visual Studio and the .NET platform. His current project focuses on data access guidance around building polyglot persistent systems using relational and NoSQL data stores.

-->

Great apps—like great artists, actors, and athletes—are great performers. To validate and continually improve their performances, human beings undergo extensive training, practice, rehearsals, and after-the-fact reviews. In software development we achieve the same ends through many levels of rigorous testing across the entire application lifecycle.

Chances are you’ve heard about unit testing, but you might not be clear on what it is, exactly. This primer, then, provides a background understanding of the nature of unit testing and the terminology often used in its context.

Difference between unit testing and other types of testing

Most forms of testing, such as functional testing, UI/acceptance testing, performance profiling, integration testing, security testing, usability testing, and so forth, all work with an app that’s been built and deployed to emulators or test devices. Testing then happens through the app’s user interface, which can be done manually or through scripts and other automation tools. Such testing, which has no knowledge of the app’s code, is referred to as “black box” testing.

Unit testing, on the other hand, works directly against the app’s code, not its user interface. Unit tests validate the behaviors of units of code, namely your classes, methods, properties, and functions; as such, unit tests are fully aware about how the code under test works, and is an example of 'white box' testing. Each unit test is itself a piece of code and although you typically write them in the same language as the app code, this is not technically required.

Unit testing also occurs earliest in the application lifecycle: you can write tests alongside the app code, or even before any code is written (the approach of test-driven development). Unit testing is also unique in that it happens on development and/or build machines outside the context of mobile platforms, emulators, and app packages. Instead, a test runner feeds the code being tested along with the unit test code into a standalone runtime. For apps written using Apache Cordova, this means working with an independent JavaScript runtime.

The following figure illustrates how app testing (left) is different from unit testing (right).

Be mindful! The diagram above shows how unit tests are run in an environment that’s different from the actual mobile platform. This means two things. First, there are be small behavioral differences between the runtimes. Second, platform APIs will not be available in the test environment. How, then, do you effectively test code that calls those platform APIs? See the topic, Using mocks for platform APIs, plugins, and other external dependencies.

Consider a simple calculator app. To test its adding function, a UI test script effectively presses the 2, +, 3, and = buttons in the app’s UI, then checks that the output control in the UI displays the number 5 as illustrated below:

Somewhere under the covers, of course, the app will have some kind of addition function, as shown below, that’s being invoked through this kind of UI interaction:

Note

you wouldn't likely have such a trivial addition function in a real app, of course, because the + operator has been thoroughly tested by the authors of the runtime and wouldn’t need to be unit-tested further. Later on we’ll use a more realistic example; for now, this function suffices to explain the fundamentals.

A unit test, then, is a separate piece of code that makes a direct call to this function, passing values like 2 and 3 as parameters and checking that the return value is 5. A typical unit test might look like this, where it has a name and a description of what’s being tested, which is essential for reporting:

The unit test knows how the doAdditionOperation function should behave, so it’s written to validate that behavior for a specific case. You’d also write other unit tests for this same function to make sure you adequately cover boundary conditions and special cases. (UI tests, of course, typically exercise boundary cases as well.) As a whole, your collection of unit tests for any given unit should fully exercise every unique code path in that unit to achieve 100% code coverage. And it should be obvious that if the unit is modified to allow more code paths, you’ll want to write addition tests to cover them.

Unit tests for a library belong with the library: In your projects, invest in unit tests for your own app code, but not for libraries. Those libraries should already have their own unit tests within their own repository alongside the library source code. If a library lacks such tests, work with the author or contribute to the open-source project to add them. This way everyone benefits from your contribution, and app developers can use the library with full confidence.

Using unit tests and continuous integration to minimize the cost of bugs

Because unit tests are just pieces of code calling other pieces of code, they can (and should) run very quickly; a whole suite of unit tests typically runs in a matter of seconds. This characteristic makes it easy for a developer to run a test suite against their work-in-progress to quickly detect any unintended changes in behavior—called regressions—prior to committing code to their team’s version control repository.

Unit tests are also often run as part of an automated build process as a way to validate committed code. This practice, known as continuous integration (CI), validates that the newly committed code both builds properly and did not introduce any regressions. (Team Foundation Server also offers gated check-in where code is built and tested before being committed.)

Continuous integration helps to catch bugs very early on in the development cycle and minimize the cost of fixing them. Because each unit test is (ideally) written to test a single case, any given test failure identifies exactly what parameters caused a failure in which unit of code. This means the developer can go immediately to that code unit, review the change history, and find the regression. There’s no guesswork about where to look, and because unit testing happens very near in time to changes in the code, the developer doesn’t spend much time getting familiar with that code again.

In contrast, such an error might be revealed during UI testing, if the UI tests are themselves complete. The error could also manifest in any number of ways: bad output, incorrect data being sent to a backend (causing a failure there), or an app crash for example. The developer assigned to the bug report then has to spend quite a bit of time tracing the problem back to the offending piece of code, and the bug might even pass through the hands of multiple developers before it finally reaches the right person. Even then, the error might have been checked in days or weeks ago, so the developer has to shift his or her attention away from current tasks to investigate and fix the bug.

Put graphically in the image below, the cost of fixing bugs increases the farther along you are in the app lifecycle, and goes up dramatically once the app becomes available publicly. The cost of a bug is shown taking a big jump at public release because at that point the cost may include public relations, ratings and reviews, customer retention, potential legal liability, and having to prioritize critical fixes in near-term updates over feature and/or performance improvements that would gain new customers. Put another way, handling bugs in released versions of an app is often an exercise in damage control, which is expensive!