Search This Blog

Sunday, April 23, 2023

Find out how fast is your SQL database growing using the backup metadata from msdb

Find out how fast is your SQL database growing using the backup metadata from msdb

Often, we need to determine the rate of growth of a database or identify which database is expanding the fastest among all those on a SQL instance. Currently, I do not have a central repository where I regularly poll and store the database size or other data. Nor do I have a CMDB tool at my current job.

What I present here is an ad-hoc, indirect method to gauge database growth using backup history information from the MSDB database. This script should be used primarily for ad-hoc solutions, though I find it a convenient and quick method when I need to quickly assess growth.

It’s important to note that this script will not be helpful for databases that are not backed up (there may be justifiable reasons for this) or if you only keep backup history in MSDB for a very short period of time (e.g., purging them after 7 days).


THIS IS A SCRIPT SO PLEASE REMEMBER TO MODIFY THE VALUES FOR 
THE FOLLOWING VARIABLES BEFORE RUNNING IT:

declare @backup_start_date datetime  -- Example: '7/1/2022'
declare @backup_end_date datetime    -- Example: '8/1/2022' 
                                     -- default is current date/time 
declare @number_of_days int = 30  -- ignored if the @backup_start_date is given

 

/*
GET THE DATABASE SIZE GROWTH INDIRECTLY BY COMPARING CHANGE IN 
DATABASE BACKUP SIZE

You can either specify a date range or a number of days to go 
back to compare with. For example, you can specify a date range 
if you want to know how much your database grew during a specific 
period in the past. This info can be useful for comparison between 
two different time periods.

And specify a number of days if you want to know much the database 
grew in the past week, month, year, etc.

Uses the backup metadata from the msdb to get the backup size. 
If you are like me, you probably keep the msdb size in check by 
purging metadata regularly. So, you  can only go as far back as 
you have the backup metadata for.

Limitations:
In the case of an AG cluster, you must run the query on the 
replica where the full backups are performed or run it as a 
multi-server query against all nodes simultaneously (recommended).

Only looks for the full/complete backups (backup type=D). 
In other words, does not consider differential and/or log backups.
*/

-- Check if the temp table already exists and drop if it does
if OBJECT_ID('tempdb..#db_backup_metadata') is not null 
    drop table #db_backup_metadata

go

-- LET'S FIRST GET THE DATA FROM msdb..backupset INTO A TEMP TABLE
declare @backup_start_date datetime -- Example: '7/1/2022'
declare @backup_end_date datetime   -- Example: '8/1/2022'    
                                    -- default is current date/time 

-- ignored if the @backup_start_date is given
declare @number_of_days int = 90    

-- Error handling for date input
if @backup_start_date is null and @number_of_days is null
begin
    raiserror('Error: Either a start date or number of days to go back 
to must be provided.', 16,1)
    return
end

-- Set default values for start and end dates
set @backup_start_date = isnull(@backup_start_date, GETDATE() - 
                         @number_of_days)
set @backup_end_date   = isnull(@backup_end_date, GETDATE())

-- Printing the start and end dates
print '-- BEGIN AND START DATES:'
print @backup_start_date
print @backup_end_date
print '--'

-- Error handling for start and end date logic
if @backup_end_date < @backup_start_date 
begin
    raiserror('Error: Start date cannot be after the end date.', 16,1)
    return
end

-- Select and insert backup data into temp table
SELECT 
    bs.backup_set_id,
    bs.database_name,
    bs.database_creation_date db_create_date,
    bs.backup_start_date,
    cast(bs.backup_size / 1024 / 1024 / 1024 as numeric(19,3)) 
        backup_size_gb
INTO #db_backup_metadata
FROM msdb..backupset bs 
WHERE
    bs.type = 'D' 
    and bs.backup_start_date >= @backup_start_date
    and bs.backup_finish_date <= @backup_end_date
ORDER BY database_name, backup_start_date DESC

-- USE THE CTEs TO GET THE DESIRED DATA
;WITH c1 AS
(
    SELECT database_name, 
           db_create_date, 
           backup_start_date, 
           backup_size_gb 
    FROM #db_backup_metadata a
    WHERE backup_set_id = 
        (SELECT MIN(backup_set_id) 
         FROM #db_backup_metadata 
         WHERE database_name = a.database_name 
         GROUP BY database_name)
),
c2 AS
(
    SELECT database_name, 
           backup_start_date, 
           backup_size_gb 
    FROM #db_backup_metadata a
    WHERE backup_set_id = 
        (SELECT MAX(backup_set_id) 
         FROM #db_backup_metadata 
         WHERE database_name = a.database_name 
         GROUP BY database_name)
)

-- Final select statement to output the growth data
SELECT
    dbs.name db_name,
    c1.db_create_date,
    c1.backup_start_date backup_date1, -- earliest backup
    c2.backup_start_date backup_date2, -- most recent backup
    datediff(D, c1.backup_start_date, c2.backup_start_date) [days],
    c1.backup_size_gb backup_size1,    -- size of the earliest backup
    c2.backup_size_gb backup_size2,    -- size of the most recent backup
    (c1.backup_size_gb - c2.backup_size_gb) * -1 growth_gb,
    try_cast(case when c1.backup_size_gb > 0 then
        ((c1.backup_size_gb - c2.backup_size_gb) * -1) * 100 
        / c1.backup_size_gb 
        else 0 end as numeric(10,2)) growth_rate
FROM sys.databases dbs
LEFT JOIN c1 ON dbs.name = c1.database_name
LEFT JOIN c2 ON c1.database_name = c2.database_name
WHERE dbs.name != 'tempdb'
ORDER BY growth_gb DESC


Now, I'm curious to hear from you! How do you currently track your SQL database growth, and how effective has it been? Or, if you've tried out the method described in this post, I’d love to hear about your experiences. Did it offer new insights into your database management?