What is the Real Performance Impact of Triggers? | Madeira
Madeira  

What is the Real Performance Impact of Triggers?

תאריך: 25 ינואר, 2013 | מאת: גיא גלנצר

Most DBAs and developers don't like triggers. I know some people that wouldn't even consider using triggers in their application, no matter what.

Why is that? What is so bad about triggers?

There are several reasons, but when I ask people why they don't like triggers, the answer that I hear the most is performance. Most people believe that triggers affect performance dramatically. Is that so?

I have been recently involved in the development of a new database for one of our customers. During the design phase, the trigger question came up again. We wanted to have a column that stores the last update date & time of each row, and we had a discussion about whether to update that column by using a trigger or by explicitly update it along with every other update on the table.

A trigger really makes sense in this case, because it will always update the column, and it's transparent to the application. Once the trigger is up and running, we don't have to worry about updating that column anymore. Without the trigger, every developer must remember to update that column in every new code that updates the table. And what about the DBA that needs to manually perform an update against the table in the middle of the night because of some critical bug? Do you think he will remember to update that column too? Most chances are that if we check that column 2-3 years from now, we will find out that it is unreliable.

So I voted for using a trigger. But the customer was worried about performance, and he insisted that we don't use triggers, because they hurt performance. At that point I decided that it's time to perform a benchmark and find out what the real performance impact of triggers is.

First, I created a test database and a test table:

USE
	master;
GO
 
-- Create the "TriggerTest" database
 
IF
	DB_ID (N'TriggerTest') IS NOT NULL
BEGIN
 
	ALTER DATABASE
		TriggerTest
	SET
		SINGLE_USER
	WITH
		ROLLBACK IMMEDIATE;
 
	DROP DATABASE
		TriggerTest;
 
END;
GO
 
CREATE DATABASE
	TriggerTest;
GO
 
ALTER DATABASE
	TriggerTest
SET
	RECOVERY SIMPLE;
GO
 
-- Create the "Billing.InvoiceDrafts" table
 
USE
	TriggerTest;
GO
 
CREATE SCHEMA
	Billing;
GO
 
CREATE TABLE
	Billing.InvoiceDrafts
(
	Id					INT				NOT NULL	IDENTITY(1,1) ,
	InvoiceNumber		INT				NOT NULL ,
	CustomerName		NVARCHAR(100)	NOT NULL ,
	CustomerAddress		NVARCHAR(500)	NULL ,
	InvoiceDescription	NVARCHAR(200)	NULL ,
	AmountBeforeVAT		DECIMAL(19,2)	NOT NULL ,
	VATAmount			DECIMAL(19,2)	NOT NULL ,
	IssueDateTime		DATE			NOT NULL ,
	InvoiceStatusId		TINYINT			NOT NULL ,
	LastUpdateDateTime	DATETIME2(7)	NOT NULL	DEFAULT (SYSDATETIME ()) ,
 
	CONSTRAINT
		pk_InvocieDrafts_c_Id
	PRIMARY KEY CLUSTERED
		(Id ASC)
);
GO
 
-- Populate the "Billing.InvoiceDrafts" table with 1,000,000 rows
 
INSERT INTO
	Billing.InvoiceDrafts WITH (TABLOCK)
(
	InvoiceNumber ,
	CustomerName ,
	CustomerAddress ,
	InvoiceDescription ,
	AmountBeforeVAT ,
	VATAmount ,
	IssueDateTime ,
	InvoiceStatusId
)
SELECT TOP (1000000)
	InvoiceNumber		= ABS (CHECKSUM (NEWID ())) % 10000000 + 1 ,
	CustomerName		= N'Customer Name - ' + CAST ((ABS (CHECKSUM (NEWID ())) % 1000000 + 1) AS NVARCHAR(100)) ,
	CustomerAddress		= N'Customer Address - ' + CAST ((ABS (CHECKSUM (NEWID ())) % 1000000 + 1) AS NVARCHAR(500)) ,
	InvoiceDescription	= N'Invoice Description - ' + CAST ((ABS (CHECKSUM (NEWID ())) % 1000000 + 1) AS NVARCHAR(200)) ,
	AmountBeforeVAT		= CAST ((CAST ((ABS (CHECKSUM (NEWID ())) % 100000 + 1) AS DECIMAL(19,2)) / 100.0) AS DECIMAL(19,2)) ,
	VATAmount			= CAST ((CAST ((ABS (CHECKSUM (NEWID ())) % 100000 + 1) AS DECIMAL(19,2)) / 100.0) AS DECIMAL(19,2)) ,
	IssueDateTime		= DATEADD (SECOND , - ABS (CHECKSUM (NEWID ())) % (60 * 60 * 24 * 365) , SYSDATETIME ()) ,
	InvoiceStatusId		= CAST ((ABS (CHECKSUM (NEWID ())) % 5 + 1) AS TINYINT)
FROM
	sys.all_columns AS T1
CROSS JOIN
	sys.all_columns AS T2;
GO

Next, for the sake of the benchmark, I retrieved the whole table in order to make sure it's all in cache (I have enough memory on my laptop):

-- Retrieve all the rows from the table to make sure they are all in cache
 
SELECT
	Id ,
	InvoiceNumber ,
	CustomerName ,
	CustomerAddress ,
	InvoiceDescription ,
	AmountBeforeVAT ,
	VATAmount ,
	IssueDateTime ,
	InvoiceStatusId ,
	LastUpdateDateTime
FROM
	Billing.InvoiceDrafts
ORDER BY
	Id ASC;
GO

In this example, I want the "LastUpdateDateTime" column to be updated to SYSDATETIME () whenever an update is made to any other column in the same row. One of the scenarios is to update the status of the invoice ("InvoiceStatusId"), and this will be the scenario for our benchmark.

For the sake of the benchmark, I also created a table to store the benchmark results and a stored procedure to run the benchmark, each time with another method:

-- Prepare the benchmark stuff
 
CREATE TABLE
	dbo.BenchmarkResults
(
	Id							INT		NOT NULL	IDENTITY(1,1) ,
	ProcedureName				SYSNAME	NOT NULL ,
	ExecutionTimeInMicroseconds	INT		NOT NULL ,
 
	CONSTRAINT
		pk_BenchmarkResults_c_Id
	PRIMARY KEY CLUSTERED
		(Id ASC)
);
GO
 
CREATE PROCEDURE
	dbo.PerformBenchmark
(
	@ProcedureName		AS SYSNAME ,
	@NumberOfExecutions	AS INT
)
AS
 
SET NOCOUNT ON;
 
DECLARE
	@Statement	AS NVARCHAR(MAX)	= N'' ,
	@Params		AS NVARCHAR(MAX)	= N'';
 
SET @Statement +=
	N'
		DECLARE
			@ExecutionNumber	AS INT			= 1 ,
			@Id					AS INT ,
			@InvoiceStatusId	AS TINYINT ,
			@StartDateTime		AS DATETIME2(7) ,
			@EndDateTime		AS DATETIME2(7);
 
		WHILE
			@ExecutionNumber <= @pNumberOfExecutions
		BEGIN
 
			SET @Id					= ABS (CHECKSUM (NEWID ())) % 1000000 + 1;
			SET @InvoiceStatusId	= CAST ((ABS (CHECKSUM (NEWID ())) % 5 + 1) AS TINYINT);
 
			SET @StartDateTime = SYSDATETIME ();
 
			EXECUTE Billing.' + QUOTENAME (@ProcedureName) + N'
				@Id					= @Id ,
				@InvoiceStatusId	= @InvoiceStatusId;
 
			SET @EndDateTime = SYSDATETIME ();
 
			INSERT INTO
				dbo.BenchmarkResults
			(
				ProcedureName ,
				ExecutionTimeInMicroseconds
			)
			VALUES
			(
				N''' + @ProcedureName + N''' ,
				DATEDIFF (MICROSECOND , @StartDateTime , @EndDateTime)
			);
 
			SET @ExecutionNumber += 1;
 
		END;
	';
 
SET @Params += N'@pNumberOfExecutions AS INT';
 
EXECUTE sys.sp_executesql
	@statement				= @Statement ,
	@params					= @Params ,
	@pNumberOfExecutions	= @NumberOfExecutions;
GO

Let's try the trigger first. Here is the code to create the trigger:

-- Create a trigger to update the "LastUpdateDateTime" column
 
CREATE TRIGGER
	Billing.trg_InvoiceDrafts_a_u_UpdateLastUpdateDateTime
ON
	Billing.InvoiceDrafts
AFTER
	UPDATE
AS
 
SET NOCOUNT ON;
 
UPDATE
	Billing.InvoiceDrafts
SET
	LastUpdateDateTime = SYSDATETIME ()
WHERE
	Id IN
		(
			SELECT
				Id
			FROM
				inserted
		);
GO

And here is the stored procedure that updates the invoice status without updating the "LastUpdateDateTime" column:

CREATE PROCEDURE
	Billing.usp_UpdateInvoiceStatusId_DoNotUpdateLastUpdateDateTime
(
	@Id					AS INT ,
	@InvoiceStatusId	AS TINYINT
)
AS
 
SET NOCOUNT ON;
 
UPDATE
	Billing.InvoiceDrafts
SET
	InvoiceStatusId = @InvoiceStatusId
WHERE
	Id = @Id;
GO

Let's execute this stored procedure once to let it compile and have an execution plan in cache:

-- Execute the "Billing.usp_UpdateInvoiceStatusId_DoNotUpdateLastUpdateDateTime" stored procedure once in order to make sure it's in cache
 
DECLARE
	@Id					AS INT		= ABS (CHECKSUM (NEWID ())) % 1000000 + 1 ,
	@InvoiceStatusId	AS TINYINT	= CAST ((ABS (CHECKSUM (NEWID ())) % 5 + 1) AS TINYINT);
 
EXECUTE Billing.usp_UpdateInvoiceStatusId_DoNotUpdateLastUpdateDateTime
	@Id					= @Id ,
	@InvoiceStatusId	= @InvoiceStatusId;
GO

Now we're ready for the first run of the benchmark. Let’s run the stored procedure 10,000 times with random data and check the average execution time. Remember that the trigger updates the "LastUpdateDateTime" column behind the scenes and contributes to the execution time of the stored procedure…

-- Run the first test - "Billing.usp_UpdateInvoiceStatusId_DoNotUpdateLastUpdateDateTime"
 
EXECUTE dbo.PerformBenchmark
	@ProcedureName		= N'usp_UpdateInvoiceStatusId_DoNotUpdateLastUpdateDateTime' ,
	@NumberOfExecutions	= 10000;
GO
 
SELECT
	ProcedureName						= ProcedureName ,
	AverageExecutionTimeInMicroseconds	= AVG (ExecutionTimeInMicroseconds)
FROM
	dbo.BenchmarkResults
GROUP BY
	ProcedureName
ORDER BY
	AverageExecutionTimeInMicroseconds ASC;
GO

On my laptop, the average execution time of this stored procedure was 7,558 microseconds.

Now let's try the other method, without the trigger.

First, we need to drop the trigger:

-- Drop the trigger
 
DROP TRIGGER
	Billing.trg_InvoiceDrafts_a_u_UpdateLastUpdateDateTime;
GO

Now, let's create another stored procedure that updates the invoice status and the last update date & time together in the same UPDATE statement:

-- Create a stored procedure that updates the "InvoiceStatusId" column for a specific invoice draft
-- This stored procedure also updates the "LastUpdateDateTime" column within the same UPDATE statement
 
CREATE PROCEDURE
	Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSameStatement
(
	@Id					AS INT ,
	@InvoiceStatusId	AS TINYINT
)
AS
 
SET NOCOUNT ON;
 
UPDATE
	Billing.InvoiceDrafts
SET
	InvoiceStatusId		= @InvoiceStatusId ,
	LastUpdateDateTime	= SYSDATETIME ()
WHERE
	Id = @Id;
GO

Again, let's execute the stored procedure once to let it compile and have an execution plan in cache:

-- Execute the stored procedure once in order to make sure it's in cache
 
DECLARE
	@Id					AS INT		= ABS (CHECKSUM (NEWID ())) % 1000000 + 1 ,
	@InvoiceStatusId	AS TINYINT	= CAST ((ABS (CHECKSUM (NEWID ())) % 5 + 1) AS TINYINT);
 
EXECUTE Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSameStatement
	@Id					= @Id ,
	@InvoiceStatusId	= @InvoiceStatusId;
GO

Now we're ready for the second benchmark run:

-- Run the second test - "Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSameStatement"
 
EXECUTE dbo.PerformBenchmark
	@ProcedureName		= N'usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSameStatement' ,
	@NumberOfExecutions	= 10000;
GO
 
SELECT
	ProcedureName						= ProcedureName ,
	AverageExecutionTimeInMicroseconds	= AVG (ExecutionTimeInMicroseconds)
FROM
	dbo.BenchmarkResults
GROUP BY
	ProcedureName
ORDER BY
	AverageExecutionTimeInMicroseconds ASC;
GO

The average execution time in this case was 2,747 microseconds. This is about third the execution time of the trigger method, which is quite an impressive improvement. So the trigger does have a dramatic impact on performance. But why is that? Is it because of the trigger or because of something else?

One of the differences between the two methods is that in the first method we actually performed two UPDATE statements each time, while in the second method we performed only a single UPDATE statement. So maybe the second option is faster because we access the table only once, regardless of the trigger. Let’s check…

The following stored procedure updates the two columns in two separate UPDATE statements:

-- Create a stored procedure that updates the "InvoiceStatusId" column for a specific invoice draft
-- This stored procedure also updates the "LastUpdateDateTime" column in a separate UPDATE statement
 
CREATE PROCEDURE
	Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatement
(
	@Id					AS INT ,
	@InvoiceStatusId	AS TINYINT
)
AS
 
SET NOCOUNT ON;
 
UPDATE
	Billing.InvoiceDrafts
SET
	InvoiceStatusId = @InvoiceStatusId
WHERE
	Id = @Id;
 
UPDATE
	Billing.InvoiceDrafts
SET
	LastUpdateDateTime = SYSDATETIME ()
WHERE
	Id = @Id;
GO

Let's compare this stored procedure to the one with the trigger and see what the impact of the trigger really is…

-- Execute the stored procedure once in order to make sure it's in cache
 
DECLARE
	@Id					AS INT		= ABS (CHECKSUM (NEWID ())) % 1000000 + 1 ,
	@InvoiceStatusId	AS TINYINT	= CAST ((ABS (CHECKSUM (NEWID ())) % 5 + 1) AS TINYINT);
 
EXECUTE Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatement
	@Id					= @Id ,
	@InvoiceStatusId	= @InvoiceStatusId;
GO
 
-- Run the third test - "Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatement"
 
EXECUTE dbo.PerformBenchmark
	@ProcedureName		= N'usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatement' ,
	@NumberOfExecutions	= 10000;
GO
 
SELECT
	ProcedureName						= ProcedureName ,
	AverageExecutionTimeInMicroseconds	= AVG (ExecutionTimeInMicroseconds)
FROM
	dbo.BenchmarkResults
GROUP BY
	ProcedureName
ORDER BY
	AverageExecutionTimeInMicroseconds ASC;
GO

The average execution time in this case was 4,434 microseconds. This is somewhere in the middle between the trigger option and the single UPDATE statement method. So we can see that part of the performance impact of the trigger is due to the additional statement, and part is due to the execution of the trigger itself.

There is another difference between the trigger method and the two UPDATE statements method. When the trigger performs the second UPDATE statement, the two statements are implicitly executed under a single transaction, while in the two statements method (without the trigger) each statement runs in its own transaction (assuming your session runs under the auto-commit mode, which is the default).

So our last test is with two UPDATE statements under a single explicit transaction. This test resembles the trigger as much as possible, and the difference in performance is only due to the execution of the trigger itself.

Here is the test:

-- Create a stored procedure that updates the "InvoiceStatusId" column for a specific invoice draft
-- This stored procedure also updates the "LastUpdateDateTime" column in a separate UPDATE statement but in the same transaction
 
CREATE PROCEDURE
	Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatementInSameTransaction
(
	@Id					AS INT ,
	@InvoiceStatusId	AS TINYINT
)
AS
 
SET NOCOUNT ON;
 
BEGIN TRANSACTION
	UpdateInvoiceStatusId;
 
UPDATE
	Billing.InvoiceDrafts
SET
	InvoiceStatusId = @InvoiceStatusId
WHERE
	Id = @Id;
 
UPDATE
	Billing.InvoiceDrafts
SET
	LastUpdateDateTime = SYSDATETIME ()
WHERE
	Id = @Id;
 
COMMIT TRANSACTION
	UpdateInvoiceStatusId;
GO
 
-- Execute the stored procedure once in order to make sure it's in cache
 
DECLARE
	@Id					AS INT		= ABS (CHECKSUM (NEWID ())) % 1000000 + 1 ,
	@InvoiceStatusId	AS TINYINT	= CAST ((ABS (CHECKSUM (NEWID ())) % 5 + 1) AS TINYINT);
 
EXECUTE Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatementInSameTransaction
	@Id					= @Id ,
	@InvoiceStatusId	= @InvoiceStatusId;
GO
 
-- Run the fourth test - "Billing.usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatementInSameTransaction"
 
EXECUTE dbo.PerformBenchmark
	@ProcedureName		= N'usp_UpdateInvoiceStatusId_UpdateLastUpdateDateTimeInSeparateStatementInSameTransaction' ,
	@NumberOfExecutions	= 1000;
GO
 
SELECT
	ProcedureName						= ProcedureName ,
	AverageExecutionTimeInMicroseconds	= AVG (ExecutionTimeInMicroseconds)
FROM
	dbo.BenchmarkResults
GROUP BY
	ProcedureName
ORDER BY
	AverageExecutionTimeInMicroseconds ASC;
GO

The average execution time in this case was 3,488 microseconds.

Here is a summary of the 4 methods we used:

Average Execution Time (Microseconds) Method
2.747 Single Statement
3.488 Two Statements in a Single Transaction
4.434 Two Statements in Separate Transactions
7.558 Trigger

Notice that when we added an explicit transaction to run the two statements together, the execution time dropped by more than 20% (from 4,434 microseconds to 3,488 microseconds). This is because there is one transaction less to deal with. Starting and ending a transaction involves a lot of overhead, mainly in the transaction log.

It's also important to remember that this benchmark was performed with a single thread while no other workload was running against the database. So this benchmark doesn't take into account locking and blocking. But it still provides a rough measure of the impact of using a trigger to perform an update.

So the actual impact of the trigger based on this benchmark is around 117%, which is definitely a huge impact. This means that the customer was right when he was worried about the performance impact of the trigger. It also means that all those people saying that triggers are bad for performance are also right.

But does it mean we shouldn't use triggers? Of course not. It's a tradeoff between performance on one hand and between development and reliability on the other hand. If your table gets updated dozens times in a second and your system is already scratching the limits, then performance is too important to let triggers do their bad things. But if you can afford the performance hit (after all, we’re talking about a difference of 4 milliseconds in this case), then I prefer easier development and increased reliability.

Fortunately, I managed to convince the customer in this case, and we used the trigger to update the column. The system is still in its last development phases, and it hasn’t been released yet, so I can't tell you how well (or not) it performs. But we'll find out soon…

Download the script I used throughout this blog post:




תגובות הגולשים

  1. מאת משתמש אנונימי (לא מזוהה):

    That is correct. In information was useful to me

  2. מאת Noam Stein:

    Guy,
    Thanks for a clear overview of this important topic.
    We always knew that triggers have some impact, but here we see the full picture.

  3. מאת Omri:

    Your shape of writing and you Explaining really simplify that subject.Thanks.
    This example really helped me to understand the triggers.

    PS: At first I was sure you can prove otherwise to the customer, I was wrong :)

  4. מאת Alex Friedman:

    Nice! It really shows it's not just an extra write, but actual additional overhead.

    I usually avoid triggers because of both the performance hit and the hidden functionality.

    Somewhere down the road people forget about the trigger, and then it takes a long time to figure out that the weird issue at hand is caused by a forgotten trigger.

    Still, as you pointed out, it's sometimes the best solution.

על-מנת להשאיר תגובה אנא הכנס את הפרטים או התחבר




הערה: התגובות עוברות סינון, לאחריו תופיע התגובה. אין צורך להכניס שוב את התגובה.

תגובות אחרונות