Slow While Loop, Query Improvment AssistanceQuery to normalize table/combine row textIndexing -...
Why exactly do action photographers need high fps burst cameras?
Is there any risk in sharing info about technologies and products we use with a supplier?
Crontab: Ubuntu running script (noob)
Move fast ...... Or you will lose
How does Leonard in "Memento" remember reading and writing?
Finding a logistic regression model which can achieve zero error on a training set training data for a binary classification problem with two features
Alien invasion to probe us, why?
Why is Agricola named as such?
Why is it that Bernie Sanders is always called a "socialist"?
A curious equality of integrals involving the prime counting function?
Which communication protocol is used in AdLib sound card?
Why zero tolerance on nudity in space?
Why are the books in the Game of Thrones citadel library shelved spine inwards?
Consequences of lack of rigour
When can a QA tester start his job?
Why would space fleets be aligned?
How can a large fleets maintain formation in interstellar space?
How to use Mathematica to do a complex integrate with poles in real axis?
How can I get my players to come to the game session after agreeing to a date?
How can prove this integral
Do authors have to be politically correct in article-writing?
How do you voice extended chords?
How can my powered armor quickly replace its ceramic plates?
Is Krishna the only avatar among dashavatara who had more than one wife?
Slow While Loop, Query Improvment Assistance
Query to normalize table/combine row textIndexing - Uniqueidentifier Foreign Key or Intermediary mapping table?Parent-Child Tree Hierarchical ORDEROracle GoldenGate add trandata errorsSHOWPLAN does not display a warning but “Include Execution Plan” does for the same queryBest approach to have a Live copy of a table in the same databaseQuery Performance - Why is a query slower when there are two criteria on the same column?deteriorating stored procedure running timesSelect Into removes IDENTITY property from target tableChanging primary key from IDENTITY to being persisted Computed column using COALESCE
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
add a comment |
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
24 mins ago
add a comment |
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
sql-server t-sql sql-server-2016
edited 23 mins ago
Aaron Bertrand♦
152k18289489
152k18289489
asked 56 mins ago
WadeHWadeH
174110
174110
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
24 mins ago
add a comment |
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
24 mins ago
1
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
24 mins ago
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
24 mins ago
add a comment |
1 Answer
1
active
oldest
votes
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
7 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230890%2fslow-while-loop-query-improvment-assistance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
7 mins ago
add a comment |
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
7 mins ago
add a comment |
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
edited 4 mins ago
answered 25 mins ago
Max VernonMax Vernon
51.1k13112225
51.1k13112225
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
7 mins ago
add a comment |
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
7 mins ago
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
9 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to a
time(0)
value.– Max Vernon
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to a
time(0)
value.– Max Vernon
7 mins ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230890%2fslow-while-loop-query-improvment-assistance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
24 mins ago