Search This Blog

Sunday, April 23, 2023

Find out how fast is your SQL database growing using the backup metadata from msdb

Find out how fast is your SQL database growing using the backup metadata from msdb

Often, we need to determine the rate of growth of a database or identify which database is expanding the fastest among all those on a SQL instance. Currently, I do not have a central repository where I regularly poll and store the database size or other data. Nor do I have a CMDB tool at my current job.

What I present here is an ad-hoc, indirect method to gauge database growth using backup history information from the MSDB database. This script should be used primarily for ad-hoc solutions, though I find it a convenient and quick method when I need to quickly assess growth.

It’s important to note that this script will not be helpful for databases that are not backed up (there may be justifiable reasons for this) or if you only keep backup history in MSDB for a very short period of time (e.g., purging them after 7 days).


THIS IS A SCRIPT SO PLEASE REMEMBER TO MODIFY THE VALUES FOR 
THE FOLLOWING VARIABLES BEFORE RUNNING IT:

declare @backup_start_date datetime  -- Example: '7/1/2022'
declare @backup_end_date datetime    -- Example: '8/1/2022' 
                                     -- default is current date/time 
declare @number_of_days int = 30  -- ignored if the @backup_start_date is given

 

/*
GET THE DATABASE SIZE GROWTH INDIRECTLY BY COMPARING CHANGE IN 
DATABASE BACKUP SIZE

You can either specify a date range or a number of days to go 
back to compare with. For example, you can specify a date range 
if you want to know how much your database grew during a specific 
period in the past. This info can be useful for comparison between 
two different time periods.

And specify a number of days if you want to know much the database 
grew in the past week, month, year, etc.

Uses the backup metadata from the msdb to get the backup size. 
If you are like me, you probably keep the msdb size in check by 
purging metadata regularly. So, you  can only go as far back as 
you have the backup metadata for.

Limitations:
In the case of an AG cluster, you must run the query on the 
replica where the full backups are performed or run it as a 
multi-server query against all nodes simultaneously (recommended).

Only looks for the full/complete backups (backup type=D). 
In other words, does not consider differential and/or log backups.
*/

-- Check if the temp table already exists and drop if it does
if OBJECT_ID('tempdb..#db_backup_metadata') is not null 
    drop table #db_backup_metadata

go

-- LET'S FIRST GET THE DATA FROM msdb..backupset INTO A TEMP TABLE
declare @backup_start_date datetime -- Example: '7/1/2022'
declare @backup_end_date datetime   -- Example: '8/1/2022'    
                                    -- default is current date/time 

-- ignored if the @backup_start_date is given
declare @number_of_days int = 90    

-- Error handling for date input
if @backup_start_date is null and @number_of_days is null
begin
    raiserror('Error: Either a start date or number of days to go back 
to must be provided.', 16,1)
    return
end

-- Set default values for start and end dates
set @backup_start_date = isnull(@backup_start_date, GETDATE() - 
                         @number_of_days)
set @backup_end_date   = isnull(@backup_end_date, GETDATE())

-- Printing the start and end dates
print '-- BEGIN AND START DATES:'
print @backup_start_date
print @backup_end_date
print '--'

-- Error handling for start and end date logic
if @backup_end_date < @backup_start_date 
begin
    raiserror('Error: Start date cannot be after the end date.', 16,1)
    return
end

-- Select and insert backup data into temp table
SELECT 
    bs.backup_set_id,
    bs.database_name,
    bs.database_creation_date db_create_date,
    bs.backup_start_date,
    cast(bs.backup_size / 1024 / 1024 / 1024 as numeric(19,3)) 
        backup_size_gb
INTO #db_backup_metadata
FROM msdb..backupset bs 
WHERE
    bs.type = 'D' 
    and bs.backup_start_date >= @backup_start_date
    and bs.backup_finish_date <= @backup_end_date
ORDER BY database_name, backup_start_date DESC

-- USE THE CTEs TO GET THE DESIRED DATA
;WITH c1 AS
(
    SELECT database_name, 
           db_create_date, 
           backup_start_date, 
           backup_size_gb 
    FROM #db_backup_metadata a
    WHERE backup_set_id = 
        (SELECT MIN(backup_set_id) 
         FROM #db_backup_metadata 
         WHERE database_name = a.database_name 
         GROUP BY database_name)
),
c2 AS
(
    SELECT database_name, 
           backup_start_date, 
           backup_size_gb 
    FROM #db_backup_metadata a
    WHERE backup_set_id = 
        (SELECT MAX(backup_set_id) 
         FROM #db_backup_metadata 
         WHERE database_name = a.database_name 
         GROUP BY database_name)
)

-- Final select statement to output the growth data
SELECT
    dbs.name db_name,
    c1.db_create_date,
    c1.backup_start_date backup_date1, -- earliest backup
    c2.backup_start_date backup_date2, -- most recent backup
    datediff(D, c1.backup_start_date, c2.backup_start_date) [days],
    c1.backup_size_gb backup_size1,    -- size of the earliest backup
    c2.backup_size_gb backup_size2,    -- size of the most recent backup
    (c1.backup_size_gb - c2.backup_size_gb) * -1 growth_gb,
    try_cast(case when c1.backup_size_gb > 0 then
        ((c1.backup_size_gb - c2.backup_size_gb) * -1) * 100 
        / c1.backup_size_gb 
        else 0 end as numeric(10,2)) growth_rate
FROM sys.databases dbs
LEFT JOIN c1 ON dbs.name = c1.database_name
LEFT JOIN c2 ON c1.database_name = c2.database_name
WHERE dbs.name != 'tempdb'
ORDER BY growth_gb DESC


Now, I'm curious to hear from you! How do you currently track your SQL database growth, and how effective has it been? Or, if you've tried out the method described in this post, I’d love to hear about your experiences. Did it offer new insights into your database management? 


Saturday, April 22, 2023

Monitor SQL replication subscriptions synchronization status

Monitor SQL replication subscriptions synchronization status

In your current role if you are managing a SQL server replication setup, you are probably well aware that if any of the subscriber is not getting synchronized within a reasonable time (which is subjective and variable, I may add), it has serious consequences not only for the subscribers but also for the publisher.

So I had written this little script to see if any of the subscriber had not been synchronized in last @hrs. I used 24 hours as a threshold as some of the subscriptions we only synchronized after hours. So adjust that value according to your needs.

Caveat: I wrote this script long time ago to monitor and get alert if any of the replication subscribers were falling behind or had completely stopped getting synchronized with the publisher. So its not up to date with any new features etc. that newer versions of SQL Server may have added to the replication technology. For example there maybe now built in alerts you could leverage to achieve the same results. The built alerts were probably available even back then as well but I had found my little custom script much more suitable for our needs at that time. So I am hoping you may find this just as useful or give a starting point for your own solution.

RUN THIS SCRIPT ON THE DISTRIBUTOR SERVER

/* Execute this script on the distributor server
Description: Return subscriptions that have not been synchronized in last @hrs hours.

The script loops through each publisher served by the local distributor server.
*/ SET NOCOUNT ON -- declare variables DECLARE @SQL NVARCHAR(4000), @SERVERNAME VARCHAR(500), @mypublisher nvarchar(500), @hrs int DECLARE @rcpts nvarchar(1000), @msg nvarchar(1000), @sub nvarchar(150), @query2 nvarchar(2000) SET @hrs = 24 -- return subscription that has not synchronized in last @hrs SET @rcpts = '<YOUR EMAIL ADDRESS>' -- make sure the distribution server is configured for the data access if not exists (select * from sysservers where srvid = 0 and dataaccess = 1) EXEC('exec sp_serveroption @server = ''' + @@SERVERNAME + ''', @optname = ''DATA ACCESS'', @optvalue = ''TRUE''') -- uses a cursor to loop through each publisher DECLARE C1 CURSOR FOR SELECT NAME FROM MSDB..MSdistpublishers ORDER BY 1 OPEN C1 FETCH C1 INTO @mypublisher WHILE @@FETCH_STATUS = 0 BEGIN IF OBJECT_ID('TEMPDB..##Temp_sp_replmonitorhelpsubscription') IS NOT NULL DROP TABLE ##Temp_sp_replmonitorhelpsubscription SET @SQL = 'select * INTO ##Temp_sp_replmonitorhelpsubscription from openquery([' + @@SERVERNAME + '], ''SET FMTONLY OFF exec [distribution].sys.sp_replmonitorhelpsubscription @publisher = ''''' + @mypublisher + ''''', @publication_type = 0, @mode = 0, @exclude_anonymous = 0, @refreshpolicy = N''''0'''''')' PRINT @SQL EXEC (@SQL) IF EXISTS (SELECT @mypublisher Publisher, * FROM ##Temp_sp_replmonitorhelpsubscription WHERE datepart(hh, getdate()-last_distsync) > @hrs) BEGIN SELECT @mypublisher Publisher, * FROM ##Temp_sp_replmonitorhelpsubscription WHERE datepart(hh, getdate()-last_distsync) > @hrs -- SELECT @mypublisher Publisher,Subscriber, Publication FROM ##Temp_sp_replmonitorhelpsubscription WHERE datepart(hh, getdate()-last_distsync) > @hrs SET @sub = @mypublisher + ': Subscribers found with no data sync in last ' + cast(@hrs as varchar(15)) + ' hours' SET @msg = '' SET @query2 = 'SET NOCOUNT ON; SELECT ''' + LEFT(@mypublisher, 25) + ''' Publisher,LEFT(Subscriber, 15) Subscriber, LEFT(Publication, 35) Publication, last_distsync [LastDistSync] FROM ##Temp_sp_replmonitorhelpsubscription WHERE datepart(hh, getdate()-last_distsync) > ' + cast(@hrs as varchar(15)) EXEC dba..sp_sendmail @recipients = @rcpts ,@message = @msg ,@subject = @sub -- ,@body_format = 'HTML' ,@query = @query2 END FETCH C1 INTO @mypublisher END CLOSE C1 DEALLOCATE C1

 .