Search This Blog

Monday, July 31, 2023

Monitoring Long-Running SQL Server Agent Jobs with T-SQL (Updated)

Monitoring Long-Running SQL Server Agent Jobs with T-SQL (Updated)

As a SQL Server DBA, especially in production, you'll eventually hit SQL Agent jobs that run way longer than they should. They blow past SLAs, block other jobs, or start impacting users and apps. SSMS Job Activity Monitor is great for manual checks, but what you really need is automation that pages you the moment a job crosses your threshold.

This post shares a battle-tested T-SQL script that finds long-running Agent jobs and emails you an HTML report. It's dead simple to deploy, needs no third-party tools, and adapts easily to any environment. Even if you have fancy monitoring suites, I still keep this in my toolkit, it's lightweight, portable, and gets the job done anywhere, anytime.


Script Overview

The script is divided into two main parts:

  • Part 1 identifies active/running jobs whose run durations exceed a set threshold.

  • Part 2 formats the results and sends an email alert with the job details.

You can execute the whole script at once or run each part separately based on your workflow.


Setup Instructions

  1. Configure Database Mail if not already enabled.

  2. Update the variables at the top of the script, especially the runtime threshold (in minutes) and email address.

  3. Schedule this script as a SQL Agent job or run manually as required.

  4. Review output in SSMS or wait for the automated email alert.

TSQL Script: 

-- See blog post: https://sqlpal.blogspot.com/2023/07/long-running-sql-server-jobs.html
-- 
/*==========================================================================================
  Script Name : Monitor_Long_Running_SQL_Agent_Jobs.sql

  Purpose     : 
      Identify currently running SQL Server Agent jobs that exceed a configurable 
      runtime threshold and optionally send an HTML email alert with details 
      (job name, duration, and command text).

  Context / Typical Use:
      - Scheduled as a SQL Agent job (e.g., every 5–10 minutes) on production 
        instances to detect runaway or stuck jobs.
      - Can be run interactively to quickly see which jobs are currently 
        running long.

  Key Behaviors:
      - Uses msdb job metadata plus dm_exec dynamic views to correlate SQL Agent 
        job execution with live session/request runtime metrics.
      - Filters only *currently running* jobs and applies a runtime threshold 
        (default: > 30 minutes) to avoid noise from short/expected jobs.
      - Builds an HTML table via FOR XML so the email is readable and easy to scan.
      - Skips the email step entirely when no qualifying long-running jobs exist.

  Assumptions / Requirements:
      - Database Mail is configured and usable on this instance.
      - Caller has appropriate permissions on msdb and server-level DMVs.
      - SQL Server Agent uses the standard "SQLAgent - TSQL JobStep" program_name 
        format for job sessions.

  Configuration Variables (at top of script):
      @MinutesThreshold  - Minimum runtime (minutes) before flagging as "long-running"
      @EmailRecipients   - Semicolon-delimited list of email addresses to notify
      @MailProfile       - Database Mail profile name (optional; uses default if NULL)

  How to Adjust:
      - Modify @MinutesThreshold for your SLA (e.g., 15 for aggressive monitoring).
      - Update @EmailRecipients for environment-specific oncall lists.
      - Set @MailProfile if you use named profiles instead of the default.

  Output:
      - Result set in SSMS with long-running jobs.
      - HTML email with one row per long-running job, when any are found.

  Maintenance Notes:
      - When changing thresholds, recipients, or HTML layout, keep this header 
        updated so behavior matches documentation.
      - If SQL Server version or Agent behavior changes (e.g., program_name 
        format), revisit the session-to-job mapping query.

  Last Updated:
      - 2026-01-29 : Added configurable @MinutesThreshold and @EmailRecipients variables.

==========================================================================================*/

-- ==========================================
-- CONFIGURATION VARIABLES - Modify these per environment
-- ==========================================
DECLARE @MinutesThreshold INT = 30;        -- Jobs running longer than this (minutes) trigger alerts
DECLARE @EmailRecipients  NVARCHAR(500) = '<dba-team@company.com;oncall@company.com>';  -- Semicolon-delimited recipients
DECLARE @MailProfile      NVARCHAR(128) = NULL;  -- Optional: Database Mail profile name; NULL uses default

-- PART 1: Check for active/running jobs and their duration so far
SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#job_durations') IS NOT NULL 
    DROP TABLE #job_durations;

;WITH cte_jobs AS
(
    SELECT 
           sja.session_id AS [Session ID],
           sj.name        AS [Job Name], 
           r.command      AS [Command],

           -- WHY: We derive a simplified job status from start/stop times so that 
           --      downstream logic can just check [Status] instead of multiple datetime columns.
           [Status] = CASE
                WHEN sja.start_execution_date IS NULL THEN 'Not running'
                WHEN sja.start_execution_date IS NOT NULL 
                     AND sja.stop_execution_date IS NULL THEN 'Running'
                WHEN sja.start_execution_date IS NOT NULL 
                     AND sja.stop_execution_date IS NOT NULL THEN 'Not running'
           END,

           sja.start_execution_date AS [Job Start Time],

           r.total_elapsed_time         AS [Elapsed Milliseconds],
           r.total_elapsed_time / 1000  AS [Elapsed Seconds],
           r.total_elapsed_time / 1000 / 60        AS [Elapsed Minutes],
           r.total_elapsed_time / 1000 / 60 / 60   AS [Elapsed Hours],
           sj.job_id                    AS [Job ID] 

    FROM msdb..sysjobs sj
    INNER JOIN msdb..sysjobactivity sja 
        ON sj.job_id = sja.job_id

    -- WHY: SQL Agent job steps appear as sessions whose program_name starts with
    --      'SQLAgent - TSQL JobStep'.  This trick lets us map from dm_exec_sessions
    --      back to the msdb job_id by decoding the program_name.
    INNER JOIN (
        SELECT 
            TRY_CONVERT(binary(30), SUBSTRING(program_name, 30, 34), 1) AS job_id,
            *
        FROM sys.dm_exec_sessions 
        WHERE program_name LIKE 'SQLAgent - TSQL JobStep%'
    ) s ON sj.job_id = s.job_id

    LEFT JOIN sys.dm_exec_requests r 
        ON s.session_id = r.session_id

    WHERE sja.session_id = (
        SELECT MAX(session_id) 
        FROM msdb..sysjobactivity 
        WHERE job_id = sj.job_id
    )
)

-- Persist and enrich only the jobs that are currently running and above the threshold
SELECT 
    cte.*,

    -- Convert the duration into words
    -- WHY: This human-readable string is meant for quick eyeballing in an email 
    --      or SSMS result grid, especially during incident response.
    [Duration] = 
        CASE WHEN [Elapsed Hours] > 0 
             THEN CAST([Elapsed Hours] AS VARCHAR(20)) + ' Hours' 
             ELSE '' 
        END +

        CASE WHEN [Elapsed Minutes] > 0 AND [Elapsed Hours] = 0 
             THEN CAST([Elapsed Minutes] AS VARCHAR(20)) + ' Minutes' 
             ELSE '' 
        END +
        CASE WHEN [Elapsed Minutes] > 0 AND [Elapsed Hours] > 0 
             THEN ', ' + CAST([Elapsed Minutes] - ([Elapsed Hours]*60) AS VARCHAR(20)) 
                      + ' Minutes' 
             ELSE '' 
        END +

        CASE WHEN [Elapsed Seconds] > 0 AND [Elapsed Minutes] = 0 AND [Elapsed Hours] = 0 
             THEN CAST([Elapsed Seconds] AS VARCHAR(20)) + ' Seconds' 
             ELSE '' 
        END +
        CASE WHEN [Elapsed Seconds] > 0 AND [Elapsed Minutes] > 0 
             THEN ', ' + CAST([Elapsed Seconds] - ([Elapsed Minutes] * 60) AS VARCHAR(20)) 
                      + ' Seconds' 
             ELSE '' 
        END   
INTO #job_durations
FROM cte_jobs cte
WHERE [Status] = 'Running'            -- Only currently running jobs matter here
  AND [Elapsed Minutes] > @MinutesThreshold 

SELECT 
    [Session ID], 
    [Job Name], 
    [Command], 
    [Job Start Time], 
    [Duration], 
    [Elapsed Milliseconds] 
FROM #job_durations 
ORDER BY [Elapsed Milliseconds] DESC;

-- WHY: If there are no rows, we skip sending an email. Using GOTO here keeps
--      the control flow simple in a single script that may run under Agent.
IF @@ROWCOUNT = 0 
    GOTO QUIT;

-- PART 2: Send email alert

-- Variables for HTML-formatted email
DECLARE 
    @html_body       NVARCHAR(MAX), 
    @html_table_head VARCHAR(1000),  
    @html_table_tail VARCHAR(1000);

SET @html_table_tail = '</table></body></html>';

SET @html_table_head = 
      '<html><head>'
    + '<style>'
    + 'td {border: solid black;border-width: 1px;padding-left:5px;padding-right:5px;'
    + 'padding-top:1px;padding-bottom:1px;font: 15px arial} '
    + '</style>'
    + '</head>'
    + '<body>'  
    + '<pre style="font-weight: bold">'
    + 'SQL Server Instance: ' + @@SERVERNAME
    + '<br> <br> Report generated on : ' + CAST(GETDATE() AS VARCHAR(100))
    + '<br> <br> Long-running threshold: ' + CAST(@MinutesThreshold AS VARCHAR(10)) + ' minutes'
    + ' <br> <table cellpadding=0 cellspacing=0 border=0>' 
    + '<br> <tr> ' 
    + '<td bgcolor=#E6E6FA><b>Job Name</b></td>'
    + '<td bgcolor=#E6E6FA><b>Duration</b></td>'
    + '<td bgcolor=#E6E6FA><b>Command</b></td></tr>';

SET @html_body = (
    SELECT    
        td = LEFT([Job Name], 50), '',
        td = Duration, '',
        td = [Command], ''
    FROM #job_durations 
    ORDER BY [Elapsed Milliseconds] DESC
    FOR XML RAW('tr'), ELEMENTS
);

-- Combine header, body, and footer into a complete HTML document
SELECT @html_body = @html_table_head + ISNULL(@html_body, '') + @html_table_tail;

-- Send email with HTML body

DECLARE @EmailSubject NVARCHAR(500) = 'Long Running SQL Server Jobs found (>' + CAST(@MinutesThreshold AS VARCHAR(10)) + ' min)';
EXEC msdb.dbo.sp_send_dbmail  
       @profile_name = @MailProfile,  -- NULL uses default profile
       @recipients   = @EmailRecipients,  
       @body         = @html_body,  
       @subject      = @EmailSubject,
       @body_format  = 'HTML';

QUIT:


Example Output

For a demo, I started two sample jobs, doing nothing but running the WAITFOR DELAY command.  Here is the results displayed on the screen:






If a job exceeds your chosen threshold, you’ll receive an HTML-formatted email alert summarizing job details:











These next two screenshots are from a real/production job:












You can adapt thresholds and recipients to suit your production environment or integrate further enhancements as needed.


Tips

  • Customize thresholds at the top of the script for reuse and clarity.

  • Add exclusions or filters for specific job names as needed.

  • Provide context for others: update email subjects, include server or environment tags.

  • Enhance with automation: Schedule this script as a recurring SQL Agent job.

  • Extend for more detail: Add columns for the affected database, more job properties, or job step text.


Conclusion

My hope is that this script gives DBAs a quick, flexible way to keep tabs on long-running SQL Server Agent jobs - no third-party tools required. With just a few tweaks, it can scale nicely for large enterprise setups or work just as well in smaller environments. Feel free to adapt it to your needs, and if you come up with improvements or new ideas, share them in the comments!


Extending to Long-Running Query Monitoring

If you want to take things a step further, you can easily tweak this script’s logic to keep an eye on long-running queries instead of just jobs. By querying dynamic management views like sys.dm_exec_requests and sys.dm_exec_sessions, you can spot active queries that go over a set duration. Grab details like the start_time and total_elapsed_time, join in the actual query text with sys.dm_exec_sql_text, and you’ll be able to create a similar alert system for queries that might be dragging down performance. It’s a great way to dive deeper into SQL Server DMVs and start building your own custom monitoring solutions that go beyond just tracking jobs.