Search This Blog

Monday, April 8, 2024

Data Search within SQL Server Databases

Search and Export Script for SQL Server Databases

Plainly speaking, this article is about searching for a specific value inside all tables and their columns of a SQL database and return list of tables and columns that contains that value. 

That being said, third-party search tools have been around for a long time, as well as custom SQL scripts that you can find online for free, for the purpose of searching for specific values within a database. And even if they don’t meet all your requirements, they at least offer a starting point. However, I realized these solutions weren't sufficient for my specific needs, and I found it easier and quicker to write my own script.

This script uses the dynamic SQL queries to search across all user tables and columns. It does allow you to specify filter criteria to search within a particular schema, table, or column or columns of specific data type groups, for example search inside numeric data only. It also includes commands for exporting table data using PowerShell and BCP utilities, which was one of my requirements.


/* 
-- CREATE A TEST TABLE FOR DEMO 
IF OBJECT_ID('my_data_search_table', 'U') IS NOT NULL
    DROP TABLE my_data_search_table;

CREATE TABLE my_data_search_table (
    id INT,
    name VARCHAR(100)
);

INSERT INTO my_data_search_table (id, name)
VALUES (1996, 'dummy');

*/
SET NOCOUNT ON;
USE <Your DB Name Here>;
GO

-- Drop temp tables if exists
IF OBJECT_ID('tempdb..#t_table_columns') IS NOT NULL DROP TABLE #t_table_columns;
IF OBJECT_ID('tempdb..#results') IS NOT NULL DROP TABLE #results;
GO

-- Search string
DECLARE @search_value NVARCHAR(4000) = N'dummy'; -- Example search value

/* Limit what column data types to search into 
Here, you can specify whether to search within numeric data types, 
other data types, or both. This helps in narrowing down the search 
to relevant fields, potentially speeding up the search 
process and reducing the load on the database.
*/
-- valid values are numeric, other or both
DECLARE @search_datatype   VARCHAR(10)  
-- SET @search_datatype   = 'numeric'

-- Search mode
DECLARE @exact_match       BIT = 1; 

-- EXECUTION OPTIONS
DECLARE @execute           BIT = 1; -- 1 to execute the search queries
DECLARE @debug             BIT = 0; -- 1 to print only
DECLARE @show_progress     BIT = 1; -- 1 to print progress messages
DECLARE @progress_interval INT = 100;

-- Schema, Table, and Column Filtering (NULL means no filter)
DECLARE @search_column_name NVARCHAR(1000);
DECLARE @search_table_name  NVARCHAR(1000);
DECLARE @search_schema_name NVARCHAR(1000);

-- Dynamic SQL query
DECLARE @SQL NVARCHAR(4000);

-- If wild card searfch is requested i.e. @exact_match = 0 
IF @exact_match = 0 SET @search_value = N'%' + @search_value + N'%';

-- Temp table to store columns metadata
SELECT s.name AS [schema_name],
       t.name AS [table_name], 
       c.name AS [column_name],
       TYPE_NAME(c.system_type_id) AS [column_type]
INTO #t_table_columns 
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.columns c ON t.object_id = c.object_id
WHERE t.is_ms_shipped = 0
  AND TYPE_NAME(c.system_type_id) NOT IN ('image', 'varbinary');


-- Store list of numeric data types into a table variable
DECLARE @numeric_types TABLE (name VARCHAR(100));

INSERT INTO @numeric_types (name)
VALUES
    ('bigint'),
    ('bit'),
    ('decimal'),
    ('int'),
    ('money'),
    ('numeric'),
    ('smallint'),
    ('smallmoney'),
    ('tinyint'),
    ('float'),
    ('real');


IF (@search_datatype = 'numeric' AND ISNUMERIC(@search_value) = 0)
BEGIN
   RAISERROR('Error: Search value (%s) invalid for numeric search.', 16, 1, @search_value)
   GOTO QUIT
END

-- Apply filters
IF @search_schema_name IS NOT NULL AND @search_schema_name <> ''
    DELETE FROM #t_table_columns WHERE [schema_name] <> @search_schema_name;
IF @search_table_name IS NOT NULL AND @search_table_name <> ''
    DELETE FROM #t_table_columns WHERE [table_name] <> @search_table_name;
IF @search_column_name IS NOT NULL AND @search_column_name <> ''
    DELETE FROM #t_table_columns WHERE [column_name] <> @search_column_name;

IF @search_datatype IS NOT NULL AND @search_datatype <> ''
BEGIN
	IF @search_datatype NOT IN ('numeric', 'other', 'both')
	BEGIN
		RAISERROR('Error: Invalid value %s for @search_datatype.', 16, 1, @search_datatype) WITH NOWAIT
		RAISERROR('Valid values are 1) numeric 2) other 3) both.', 16, 1) WITH NOWAIT
		GOTO QUIT
	END
	ELSE IF @search_datatype = 'numeric'
	    DELETE #t_table_columns FROM #t_table_columns t
			WHERE NOT EXISTS (SELECT * FROM @numeric_types v WHERE v.name = t.[column_type])
	ELSE IF @search_datatype = 'other'
	    DELETE #t_table_columns FROM #t_table_columns t
			INNER JOIN @numeric_types v ON v.name = t.[column_type]
END

-- Placeholder for results
SELECT TOP 0 * INTO #results FROM #t_table_columns;

-- Progress tracking
DECLARE @total_columns INT;
DECLARE @counter INT = 0;

-- Variables for cursor
DECLARE @schema_name SYSNAME,
        @table_name  SYSNAME,
        @column_name SYSNAME,
        @column_type NVARCHAR(500);

-- Declare cursor
DECLARE c1 CURSOR STATIC FOR 
    SELECT * FROM #t_table_columns 
    ORDER BY [schema_name], [table_name], [column_name];

OPEN c1;
SELECT @total_columns = @@CURSOR_ROWS; 


FETCH NEXT FROM c1 INTO 
    @schema_name, @table_name, @column_name, @column_type;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @counter = @counter + 1;

    -- Progress message
    IF @counter % @progress_interval = 0 AND @show_progress = 1
        RAISERROR('%i columns of %i processed', 10, 1, @counter, @total_columns) WITH NOWAIT;

    -- Build and execute the search query
    SET @SQL = N'SELECT TOP 1 ''' + @schema_name + N''' AS [schema_name], ''' 
               + @table_name + N''' AS [table_name], ''' +  @column_name 
               + N''' AS [column_name], ''' +  @column_type 
               + N''' AS [column_type] FROM ' + QUOTENAME(@schema_name) 
               + N'.' + QUOTENAME(@table_name) + N' WHERE TRY_CAST(' 
               + QUOTENAME(@column_name) + N' AS VARCHAR(8000)) LIKE ''' 
               + @search_value + N''';';

    IF @debug = 1 PRINT @SQL;

    -- Insert into results if match is found
    IF @execute = 1 BEGIN
        INSERT INTO #results EXEC (@SQL);
        IF @@ROWCOUNT > 0 BEGIN
            PRINT '';
            RAISERROR('*** match found ***', 10, 1) WITH NOWAIT;
            PRINT @SQL;
            PRINT '';
        END
    END

    FETCH NEXT FROM c1 INTO 
        @schema_name, @table_name, @column_name, @column_type;
END

-- Cleanup
CLOSE c1;
DEALLOCATE c1;

-- Display results with export commands
SELECT *,
       'SELECT COUNT(*) [' + [schema_name] + '.' + [table_name] 
       + '] FROM '  + QUOTENAME([schema_name]) + '.' 
       + QUOTENAME([table_name]) + ' WHERE TRY_CAST(' 
       + QUOTENAME([column_name])  + ' AS VARCHAR(8000)) LIKE ''' 
       + @search_value + N''';' AS [SQL],
       'Send-SQLDataToExcel -Connection "Server=' + @@SERVERNAME 
       + ';Trusted_Connection=True;" -MsSQLserver -DataBase "' 
       + DB_NAME() + '" -SQL "select * from ' 
       + QUOTENAME([schema_name]) + '.' + QUOTENAME([table_name]) 
       + '" -Path "$env:USERPROFILE\Documents\' + DB_NAME() +  '.' 
       + [schema_name] + '.' + [table_name] + '.xlsx"' AS [PS Export],
       'BCP ' + QUOTENAME(DB_NAME()) + '.' +  QUOTENAME([schema_name]) 
       + '.' + QUOTENAME([table_name]) + ' out %USERPROFILE%\Documents\' 
       + DB_NAME() + '_' +  [schema_name] + '_' + [table_name] 
       + '.txt -c -t, -T -S' + @@SERVERNAME AS [BCP Export]
FROM #results 
ORDER BY [schema_name], [table_name], [column_name];

QUIT:


Gotchas and work arounds:


The script does not search within columns of certain data types, notably 'IMAGE' and 'VARBINARY'. These data types often store binary data (e.g., files, images, binary large objects (BLOBs)) that require different methods to search and interpret. Searching within such data types would necessitate additional logic for conversion or binary pattern matching, which is not covered by this script. This is by design.

And for columns with data type VARCHAR(MAX) and NVARCHAR(MAX) that store large amounts of text, the script's default method of casting the column to `VARCHAR(8000)` may truncate data in those columns, possibly missing matches in the truncated portion. You can however modify it to VARCHAR(MAX).

There are also couple performance considerations to keep in mind.  The script uses a cursor to iterate over potentially thousands of columns across many tables and executes a dynamic SQL query for each column. This can be resource-intensive and slow, especially in large databases with many tables or columns. As a mitigating action, you can set some filters (e.g., specific schemas or tables, numeric data only etc.).

Similarly, for very huge tables (e.g. billions of rows), the script understandly performs very poorly, especially if the table has no matching records at all but the script will still have to scan all rows for each and every column in it to find out that it doesn't have any matching records after all. You can however limit the performance hit by applying some filters mentioned above.

And because it uses dynamic SQL, it also poses a risk of SQL injection which is why I do not recommend to build a UI around it where users can supply any input values to it. You can however convert the script into a stored procedure and add code to validate and sanitize any inputs passed to it.

I believe the script is highly customizable and moderately detailed enough method for searching data inside a SQL Server database and, designed to accommodate some of the more typical search scenarios and preferences. Please ensure you modify and adapt it to suit your specific requirements.



Friday, March 29, 2024

Finding the Latest Backup Timestamps for Your Databases

Finding the Latest Backup Timestamps for Your Databases

I wanted to find out the most recent date and time each database was last backed up, focusing specifically on full, differential, and transaction log backups, all displayed on a single line for each database for easier viewing and reading. For example:







This provides a high-level overview of when my databases were last backed up, including differential and transaction log backups, if applicable.  The query results also show the duration of each backup (not shown in the above screenshot). Here is the query:

USE msdb
GO

;WITH cte_latest_backup AS (
    SELECT 
        sd.Name AS DatabaseName,
        bs.type AS backup_type,
        MAX(bs.backup_start_date) AS backup_start_time,
        MAX(bs.backup_finish_date) AS backup_finish_time
    FROM sys.sysdatabases sd
    INNER JOIN msdb.dbo.backupset bs 
       ON bs.database_name = sd.name
    GROUP BY sd.Name, bs.type
),
cte_full_backup AS 
(
    SELECT * FROM cte_latest_backup 
    WHERE backup_type = 'D'
),
cte_diff_backup AS 
(
    SELECT * FROM cte_latest_backup 
    WHERE backup_type = 'I'
),
cte_tlog_backup AS 
(
    SELECT * FROM cte_latest_backup
    WHERE backup_type = 'L'
)

SELECT
    d.name AS [Database], 
    d.recovery_model_desc,
    full_backup.backup_start_time AS full_backup_start,
    diff_backup.backup_start_time AS diff_backup_start,
    tlog_backup.backup_start_time AS tlog_backup_start,
    DATEDIFF(MINUTE, full_backup.backup_start_time, 
             full_backup.backup_finish_time) AS full_backup_elapsed_min,
    DATEDIFF(MINUTE, diff_backup.backup_start_time, 
             diff_backup.backup_finish_time) AS diff_backup_elapsed_min,
    DATEDIFF(SECOND, tlog_backup.backup_start_time, 
             tlog_backup.backup_finish_time) AS tlog_backup_elapsed_sec
FROM master.sys.databases d
LEFT JOIN cte_full_backup full_backup 
  ON full_backup.DatabaseName = d.name
LEFT JOIN cte_diff_backup diff_backup 
  ON diff_backup.DatabaseName = d.name
LEFT JOIN cte_tlog_backup tlog_backup 
  ON tlog_backup.DatabaseName = d.name
WHERE d.name NOT IN ('tempdb')
ORDER BY [Database]



Gotchas:

  • The query doesn't capture other available types of backups like file backup, file group backups, partial backups etc. If your backup strategy includes any of those backup types, please consider adapting the query to meet your needs. 
  • The query doesn't also differentiate whether the full backup was done in copy-only mode or not.
  • In AlwaysOn AG servers, some of your database backups maybe offloaded or distributed across primary and secondary replicas. This query doesn't capture backups performed on other replicas and therefore the most recent backup date/times displayed by the query may not be accurate for databases that are in an Availability Group.






Tuesday, January 30, 2024

How to Find Where Your Databases Reside In The File System

How to Find Where Your Databases Reside In The File System

SQL Script:


/*
RETURNS LIST OF FOLDER NAMES WHERE SQL SERVER DATABASE
FILES ARE STORED. 

It is dynamic enough to be 
used on even much older versions of SQL Server.

*/

DECLARE @PRODUCTVERSION NVARCHAR(132)
DECLARE @PERMISSION_FLAG bit
SET @PRODUCTVERSION = CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(32)) 
SET @PERMISSION_FLAG = 1

-- For use on older versions of sql server or if you don't have permission to sys.master_files
-- tested in sql 2000 & 20005
if SUBSTRING(@PRODUCTVERSION, 1, CHARINDEX('.', @PRODUCTVERSION)-1) < 10 OR @PERMISSION_FLAG = 0 OR @PERMISSION_FLAG IS NULL
	BEGIN
	PRINT 'We gonna use old school method.....'
	set nocount on if object_id('tempdb..#t1') is not null drop table #t1
		create table #t1 (fpath varchar(4000), fname varchar(8000))
	insert into #t1
		exec sp_msforeachdb 'select left(filename, len(filename) - charindex(''\'', reverse(rtrim(filename)))) fpath, filename fname from [?].dbo.sysfiles'
	select distinct fpath from #t1 order by 1
END

-- For use on SQL version xxx and above 
ELSE 
	with cte as
	(
		select 
			case 
				when type = 2 then physical_name     -- FILESTREAM
				else left(physical_name, len(physical_name) - charindex('\', reverse(rtrim(physical_name)))) 
			end folder_path, 
			physical_name  
		from sys.master_files
	)
	select distinct 
		folder_path,
		[create_folder] = 'mkdir -p "' + folder_path + '"'
	from cte order by folder_path;




This SQL script is designed to return a list of distinct folder paths where SQL Server database files are stored. It adapts its behavior based on the version of SQL Server it's running against and whether specific permissions are set. 

The output of the script includes a column with command to create directories (mkdir -p), useful in cases where this script is part of a larger process, perhaps in database migrations or backups.



Use Cases:

Finding the folder paths where SQL Server database files are stored can be beneficial for several operational, management, and maintenance tasks related to database administration and infrastructure management. Here are some key use cases:


1. Backup and Restore Operations

- Backup Scripts: Knowing the folder paths helps in scripting backup operations, especially when you want to NOT save the backup files in the same location as the database files or in a related directory structure.

- Restore Operations: When restoring databases, especially to a new server or instance, knowing the original file locations can be crucial for planning the restore process, particularly if you need to maintain a specific directory structure or adhere to storage policies.


2. Disaster Recovery Planning

- Identifying the storage locations of database files is essential for creating effective disaster recovery plans. It allows administrators to ensure that all necessary files are included in backups and can be quickly restored in case of a failure.


3. Capacity Planning and Monitoring

- Storage Management: Monitoring the disk space usage of SQL Server file paths can help in forecasting future storage requirements and preventing disk space shortages that could lead to database downtime.

- Performance Optimization: File placement can significantly impact database performance. Knowing where files are stored enables DBAs to distribute files across different storage subsystems based on performance characteristics.


4. Database Migration

- When migrating databases between servers or instances, it's important to know the current storage paths to replicate the environment accurately on the target server or to plan new storage configurations that improve performance or resource utilization.


5. Auditing and Compliance

- For compliance with internal policies or external regulations, it may be necessary to verify that database files are stored on secure, encrypted, or approved storage devices. Knowing the file paths helps in auditing these practices.


6. Server and Storage Configuration

- High Availability (HA) and Disaster Recovery (DR) Solutions: Configuring solutions like Always On Availability Groups, Database Mirroring, or Log Shipping often requires precise control over file locations.

- Storage Optimization: Identifying file locations helps in moving files to optimize storage across different media (SSDs, HDDs) or to balance I/O load.


7. Troubleshooting and Maintenance

- When dealing with file corruption, access issues, or performance problems, knowing the exact location of database files is the first step in diagnosing and resolving such issues.


8. Automated Scripting and Tooling

- Automating database tasks such as backups, file growth monitoring, or applying storage policies often requires scripts that know where database files are located. Scripts may also need to create or modify directories based on these paths.





Understanding and Using xp_sqlagent_enum_jobs in SQL Server

Understanding and Using xp_sqlagent_enum_jobs in SQL Server

What is xp_sqlagent_enum_jobs?


xp_sqlagent_enum_jobs is an undocumented, extended stored procedure in Microsoft SQL Server. It is used to provide information about the jobs that are currently managed by the SQL Server Agent. This extended stored procedure is used in some internal procedures (e.g. sp_get_composite_job_info) by SQL Server, but it can also be used in custom scripts or applications to obtain details about SQL Agent jobs, especially in automated scripts or applications that need to track job status or performance. For example:


How to start SQL Server job using TSQL and wait for it to complete?


Often, the main impetus for using xp_sqlagent_enum_jobs is to programmatically check if a SQL Server Agent job is currently running, and then either wait for it to complete or take some other action. While interactive tools are available for monitoring job status, there aren't many reliable and well-documented stored procedures, system tables, or DMVs suitable for use in automated, non-interactive scenarios.


Syntax and Usage:


xp_sqlagent_enum_jobs typically requires that minimum two input parameters are given and, a third optional parameter. However, since this procedure is undocumented, the exact nature and behavior of these parameters might vary slightly based on the version of SQL Server. Here is a general explanation of its parameters:

  • Visibility Flag / is_sysadmin (Int): This parameter is usually an integer 0 or 1 to indicate if the executing user is a sysadmin (member of sysadmin role) and determines the scope of the job information returned by the procedure. A value of 1 indicates Yes and the procedure can return information for all jobs. A value of 0 means No in which case the procedure will return only those jobs that are owned by the executing user or jobs that the user has been granted permissions to view. 

In my experience though, xp_sqlagent_enum_jobs  seems to accept not only 0 or 1 but various integer values, both positive and negative.


  • Job Owner (Sysname): This is a string parameter representing the user in the context of whom the job information is requested. It's often provided in the format of a domain and username (e.g., DOMAIN\Username) or 'sa' if a DBA is running this procedure. This parameter is used along with the the visibility flag to determine which jobs are visible to the specified user based on their permissions.

Again in my experience, xp_sqlagent_enum_jobs  seems to accept any value, including invalid or non-existent usernames.

  • Job ID (UNIQUEIDENTIFIER): Optional parameter to get information for a single job with a specific job ID. 

If you are getting Job ID value from the msdb.dbo.sysjobs table, make sure to convert its value to a UNIQUEIDENTIFIER data type.


Basic Examples:


In this example, 1 is the visibility flag, and 'sa' represents the user.
EXEC master.dbo.xp_sqlagent_enum_jobs 1, 'sa' 












To get information about a specific job:

DECLARE @job_name NVARCHAR(100) = 'DatabaseIntegrityCheck - USER_DATABASES'
DECLARE @job_id UNIQUEIDENTIFIER
SELECT @job_id = job_id FROM   msdb..sysjobs WHERE  NAME = @job_name
EXEC master.dbo.Xp_sqlagent_enum_jobs 1, 'sa', @job_id 



When using xp_sqlagent_enum_jobs, consider the following aspects:


Permissions: The actual jobs returned can depend on the permissions of the user executing the procedure or the user specified in the parameters.

Security Risk: As an undocumented feature, using this procedure can pose a security risk, especially if used improperly or without understanding its impact.

Version Dependence: The behavior and availability of xp_sqlagent_enum_jobs can change without notice in different versions of SQL Server.

Results/Output Columns:


The procedure returns a result set with various columns providing details about each job. These details typically include:

  1. Job ID: A unique identifier for each job.
  2. Last Run Date: The date and time when the job was last executed.
  3. Next Run Date: The date and time when the job is scheduled to run next.
  4. Next Run Schedule ID: The identifier of the schedule according to which the job will run next.
  5. Requested To Run: Indicates whether the job has been requested to run immediately (outside its scheduled times).
  6. Request Source: Indicates the source of the run request.
  7. Request Source ID: The identifier of the request source.
  8. Running: Indicates whether the job is currently running.
  9. Current Step: The current step number the job is on (if it's a multi-step job).
  10. Current Retry Attempt: The number of the current retry attempt (in case the job is configured to retry on failure).
  11. State: The current state of the job.


What does it mean that xp_sqlagent_enum_jobs is undocumented?


xp_sqlagent_enum_jobs is indeed an undocumented extended stored procedure in Microsoft SQL Server. This means that it is not officially documented or supported by Microsoft in their public documentation. In essence, this raises several important considerations:

  • Lack of Official Documentation: Since there's no official documentation, detailed information about its parameters, behavior, and potential changes in different versions of SQL Server are not publicly available through Microsoft's channels.
  • Potential for Change: Microsoft may change or remove undocumented features in any new release or update of SQL Server without prior notice. This can lead to compatibility issues or sudden breaks in functionality in systems that rely on these features.
  • Limited Support: If you encounter issues while using undocumented features, finding solutions can be more challenging as these features are typically not covered by standard support channels.
  • Use with Caution: It's advisable to use undocumented features with caution, especially in production environments. They might be less stable or tested compared to officially supported features.
  • Alternative Approaches: For long-term stability and support, it's often better to seek alternative, documented features or methods to achieve the same goals.
  • Community Knowledge: Sometimes, information about undocumented features can be found through community forums, blogs, or from SQL Server experts who share their findings. However, this information is based on personal experiences and might not be universally applicable.


Usage Scenerios:


Here are a few example scenarios of how xp_sqlagent_enum_jobs might be used:

Basic Usage to List All SQL Agent Jobs

This is the simplest form of using xp_sqlagent_enum_jobs to get a list of all SQL Agent jobs with their details.

EXEC master.dbo.xp_sqlagent_enum_jobs 1, '<your_domain\your_login>'

/* OR */

EXEC master.dbo.xp_sqlagent_enum_jobs 1, 'sa'


Checking for Running Jobs

You can use xp_sqlagent_enum_jobs to check if there are any jobs currently running on the SQL Server Agent.


IF OBJECT_ID('tempdb..#RunningJobs') is not null DROP TABLE #RunningJobs;
go
CREATE TABLE #RunningJobs (
    job_id UNIQUEIDENTIFIER NOT NULL,
    last_run_date INT NOT NULL,
    last_run_time INT NOT NULL,
    next_run_date INT NOT NULL,
    next_run_time INT NOT NULL,
    next_run_schedule_id INT NOT NULL,
    requested_to_run INT NOT NULL, -- boolean
    request_source INT NOT NULL,
    request_source_id sysname COLLATE database_default NULL,
    running INT NOT NULL, -- boolean
    current_step INT NOT NULL,
    current_retry_attempt INT NOT NULL,
    state INT NOT NULL
)


INSERT INTO #RunningJobs
EXEC master.dbo.xp_sqlagent_enum_jobs 1, '<your_domain\your_login>'
SELECT * FROM #RunningJobs WHERE running = 1

IF OBJECT_ID('tempdb..#RunningJobs') is not null DROP TABLE #RunningJobs;
go


Finding Jobs with Specific Conditions

You can filter the results to find jobs with specific conditions, like jobs that have failed to run, or jobs scheduled to run at a specific time.

SELECT * FROM #RunningJobs WHERE last_run_outcome <> 1 -- Replace with your specific condition


Wait for a SQL Server job to complete before proceeding



Precautions and Best Practices:



  • Use a Dedicated Account: When specifying the user account (<your_domain\your_login>), it's best to use an account with appropriate permissions.
  • Temporary Tables: In the examples, I've used temporary tables (#RunningJobs) to store the output of xp_sqlagent_enum_jobs. This is a common practice to make data handling easier.
  • Cleanup: Always remember to clean up temporary tables to avoid unnecessary consumption of resources.
  • Testing: Thoroughly test these scripts in a non-production environment first to ensure they work as expected in your specific SQL Server setup.
  • Monitoring and Logging: When using such procedures in scripts or applications, implement adequate logging and error monitoring.
  • Seek Alternatives: While I haven't tried or tested the following myself, here is an article discussing one of the possible alternatives:

xp_sqlagent_enum_jobs alternative








Tuesday, January 23, 2024

SQL Server's Buffer Cache: How to Calculate Dirty Page Sizes

SQL Server's Buffer Cache: How to Calculate Dirty Page Sizes

As we know, the SQL Server buffer cache is an area or region inside the server's memory where SQL Server keeps the data it needs for processing queries.

Primarily, the buffer cache's job is to reduce the amount of time SQL Server spends reading data from the disk. It does this by keeping frequently used data in memory, which is much faster to access than disk storage. 

Additionally, the buffer also contains data that don't already exist on the disk, meaning in the SQL Server data files. That would be the case for the new data, when new data are added, although SQL Server will write those data immediately to the transaction log file (unless the Delayed Durability is turned on), these new data are not immediately written/stored into the data files, SQL Server waits until the the next CHECKPOINT process kicks in which then writes those called "dirty" into the data files. The LAZY WRITER process also writes dirty pages to the disk but it does so continuously, in a much less aggressive manner.

To calculate the size of the dirty pages in buffer cache, SQL Server exposes some meta data regarding the pages in buffer cache through sys.dm_os_buffer_descriptors DMV.  

If you have VIEW SERVER STATE permission or, the SYSDBA role membership, you can query sys.dm_os_buffer_descriptors, mostly to get some summary information as it only contains the meta data. The below query is an example of such, which returns the ratio/percentage of dirty pages against the overall/total size of the buffer cache.

;WITH cte
     AS (SELECT Cast(Count(*) AS FLOAT) Total_Pages,
                Count(CASE WHEN is_modified = 0 THEN 1 END) Clean_Pages,
                Count(CASE WHEN is_modified = 1 THEN 1 END) Dirty_Pages
         FROM   sys.dm_os_buffer_descriptors
         WHERE  page_type IN ( 'DATA_PAGE', 'INDEX_PAGE', 'TEXT_MIX_PAGE' ))

SELECT Total_Pages,
       Clean_Pages,
       Dirty_Pages,
       Dirty_Pages_Percentage = Cast(( Dirty_Pages / Total_Pages ) * 100 
              AS DECIMAL(4, 2)) 
FROM   cte; 


Dirty Pages In SQL Server Buffer Cache







To get this information for each database:

;WITH cte
     AS (SELECT 
		database_id,
                Case When database_id = 32767 then '(ResourceDB)'
                     Else Coalesce(db_name(database_id),'***** TOTAL *****') 
                    End [Database],

		Cast(Count(*) AS FLOAT) Total_Pages,
                Count(CASE WHEN is_modified = 0 THEN 1 END) Clean_Pages,
                Count(CASE WHEN is_modified = 1 THEN 1 END) Dirty_Pages

         FROM   sys.dm_os_buffer_descriptors
         WHERE  page_type IN ( 'DATA_PAGE', 'INDEX_PAGE', 'TEXT_MIX_PAGE' )
		 GROUP BY ROLLUP (database_id))

SELECT [Database],
       Total_Pages,
       Clean_Pages,
       Dirty_Pages,
       Dirty_Pages_Percentage = Cast(( Dirty_Pages / Total_Pages ) * 100 
              AS DECIMAL(4, 2)) 
FROM   cte
ORDER BY Total_Pages desc; 


A word of caution: if you have a very large buffer cache (like 75GB or even 750GB), querying this DMV can be slow. The larger the cache, the more data the DMV has to process. For a 75GB cache, it might need to process about ~10 million rows; for a 750GB cache, that number could jump to ~100 million rows!


Please also note that when using the sys.dm_os_buffer_descriptors tool to view the buffer cache contents, keep in mind that it also includes pages used by SQL Server's internal Resource database. Often, these pages are left out when people are summing up or analyzing the buffer cache data. However, in this particular example, I've chosen to include them in the count and analysis.



Wednesday, January 17, 2024

Best Practice Tip #1: Beware of Best Practices

Best Practice Tip #1: Beware of Best Practices

It may sound contractionary that the first item of a best practices list is to be wary of best practices. When you hear a piece of 'best practice' advice, it's wise to take a step back and think it through before jumping in.

Now, it's true that not every system is completely unique – there are times when systems are pretty much carbon copies of each other. But, each organization usually has its own set of specific needs and ways of doing things. There is no one-size fits for all, all the time. So, even if a piece of advice sounds good on paper, it's important to see if it really fits your particular situation.

Also, be wary of best practices that might be out of date. This includes tips for older versions of SQL Server or from the days when 8 GB of RAM was a lot. When you're searching online, you might run into advice that’s been hanging around for a while. Plus, a strategy that worked wonders for your system in the past might not be the best choice now if things have changed. And don’t forget, even if a tip seems like it'll boost your system, it could have downsides, like the risk of messing up things you've already done, which could be a major issue.


A Detailed Example:


Imagine you're managing your company's databases. You find this often-suggested tip: 'Always update to the latest database software for top performance and security.' It sounds sensible, and in fact, at my current job, it’s almost a given to keep everything up-to-date.

But here’s the catch: just because it's latest and greatest version doesn't mean it's automatically the right choice for you. It’s all about weighing the good against the potential hiccups. New versions might and often do offer cool features, but will they integrate smoothly with your current setup? Could they cause unexpected problems? It’s about finding the right balance for your specific needs. 


Here are few of the points to consider:


  • You have a database version that is stable and has been customized to suit your company's unique needs. Yes, this violates the best practice that don't change the default settings or deploy solutions that will get overwritten if you ever upgrade the version. But, here you are. And the latest database version offers new features, but it's unknown how these will interact with your customizations.
  • The best practices often do become outdated. Research if the advice is based on the latest information or if it was more relevant to older versions of the database software.
  • Analyze how the upgrade might affect your current workload and system performance.
  • Understand the potential risks, such as incompatibility with existing applications or downtime during the upgrade process.
  • While the upgrade might offer improved performance and security, it could also introduce new challenges or instability.
  • The risk of disrupting your current stable environment might outweigh the benefits of the new features in the latest version.


In any of the above circumstances, blindly following the best practice of upgrading to the latest software version without considering these factors could lead to unforeseen complications. 

Also, always test, test and test and have more than one back-out/rollback options ready in case things don't go as planned.

Here are some more examples:

  • Password Management: The old security best practice was to change passwords frequently, like every 90 days. This advise is now being questioned as it often leads to weaker passwords and password fatigue. The modern approach emphasizes strong, unique passwords possibly managed through a password manager, and the use of multi-factor authentication. MFA is already available in Azure SQL Server and, you could achieve this even in on-premise SQL Servers.
  • CPU, Memory and Storage Solutions: The use of physical servers for data storage and processing was a common practice. In today's cloud computing world, memory (both RAM and disk) and CPUs are not only cheap, they are dynamically available, and transient – therefore, not even a long-term investment if you don't end up needing them in future. 
  • Denormalization for Performance: In the past, full denormalization was a common practice to improve database performance, especially in data warehousing systems with limited resources. Nowadays, some or more degree of normalization is preferred. And I am aware, here I am dangerously close to long running debate/feud between the Team Star Schema Vs Team Snow-Flake Schema.
  • Vertical Scaling: The traditional approach to handling increased load or data volume was to scale up (vertical scaling) by upgrading the existing server's hardware (more CPU, RAM, storage). Modern databases often favor horizontal scaling (scaling out), which involves adding more nodes to the system (e.g., AlwaysOn Availability Groups), offering better fault tolerance and scalability.
  • Full Database Backups: In very distant past, relying solely on full database backups was common and/or the only option available, not anymore. While still important, it is inefficient for VLDBs. Incremental and differential backups are now frequently used alongside full backups to minimize data loss and recovery time.
  • Manual Tuning and Maintenance: Earlier, database administrators (DBAs) spent considerable time manually tuning queries and maintaining databases. Today, many databases incorporate automated performance tuning, real-time monitoring, and AI-driven optimization, reducing the need for manual intervention.


These examples illustrate how database management best practices have evolved and will continue to do so.





Tuesday, January 16, 2024

A Primer on Attaching Databases with PowerShell Method AttachDatabase

A Primer on Attaching Databases with PowerShell Method AttachDatabase

Attaching a SQL Server database using PowerShell can be straightforward. Here's an effective and easy approach using the Invoke-Sqlcmd Cmdlet, by encapsulating your SQL statement in a variable, and then pass it to the Invoke-Sqlcmd cmdlet. For example:


$sql_command = "

    USE [master]

    GO

    CREATE DATABASE [TestDB_20240115_173550] ON 

    ( FILENAME = N'K:\\SQL2022AG01\\DATA\\TestDB_20240115_173550.mdf' ),

    ( FILENAME = N'L:\\SQL2022AG01\\LOG\\TestDB_log_20240115_173550.ldf' )

    FOR ATTACH

    GO

"



Invoke-Sqlcmd -ServerInstance SQLVM01\\SQL2022AG01 -Query $sql_command


The above command assumes that the PowerShell module for the SQL Server is installed and imported into your PowerShell session.  For any DBA using PowerShell to manage SQL Server instances, I think this should be automatically loaded, which you can do by copying the SQL Server module files in one of the directories listed in $env:PSModulePath. 

To determine the locations where PowerShell modules can be stored, simply enter $env:PSModulePath in the PowerShell console and press Enter. This will display the paths:

PS C:\Users\Dummy> $env:PSModulePath

C:\Users\dummy\Documents\WindowsPowerShell\Modules;
C:\Program Files\WindowsPowerShell\Modules;
C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules

On my system, these three directories are where I can place PowerShell module files. The choice of directory depends on whether I want the module accessible only to my user profile or to all users on the computer. It also depends on my permissions to create files and folders in the Program Files and System32 directories.

You can verify if the SQL Server module is loaded into your current session with the Get-Module command. 

PS C:\Users\Dummy> Get-Module -Name sqlserver | Format-List

Name              : SQLServer
Path              : C:\Users\dummy\Documents\WindowsPowerShell\Modules\SQLServer\21.1.18256\SqlServer.psm1

Description       : This module allows SQL Server developers, administrators and business intelligence professionals to automate database development and server administration, as well as both 
                    multidimensional and tabular cube processing.
                    
                    For Release Notes, expand the Package Details section on this page.
                    
                    For automation with SSRS & Power BI Report Server, use the ReportingServicesTools module available at https://www.powershellgallery.com/packages/ReportingServicesTools
ModuleType        : Script
Version           : 21.1.18256
NestedModules     : {Microsoft.SqlServer.Management.PSSnapins, Microsoft.SqlServer.Management.PSProvider, Microsoft.AnalysisServices.PowerShell.Cmdlets, 
                    Microsoft.SqlServer.Assessment.Cmdlets...}
ExportedFunctions : {Invoke-SqlNotebook, SQLSERVER:}
ExportedCmdlets   : {Add-RoleMember, Add-SqlAvailabilityDatabase, Add-SqlAvailabilityGroupListenerStaticIp, Add-SqlAzureAuthenticationContext...}
ExportedVariables : 
ExportedAliases   : {Decode-SqlName, Encode-SqlName}


BTW,  in a layman's terms, a PowerShell module is basically your regular folder containing files like script files, DLL/assemblies files etc, as highlighted SQLServer value in the above example.

If it returns nothing that means you currently don't have the SQLServer module in your session. Generally, though if the module folder is in one of the folders listed in $env:PSModulePath then it should get automatically loaded into your session when you reference any of the commands from that module for the first time. PowerShell won't load a module into your session until it is required.

You can also manually load a module into your session using the Import-Module cmdlet:

Import-Module -Name sqlserver


If not then you can install it from the PSGallery, provided that PSGallery is configured as a repository:



Find-Module -Name SQLServer






Install-Module -Name SQLServer


Of course, I already have a previous version of this module installed, which I can upgrade to the latest version by adding -Force to the Install-Module command. This is also essentially how you would typically upgrade an existing module.




That being said, beyond SQL statements, PowerShell offers native cmdlets for database management. To attach a database, use the AttachDatabase method. Here's an example:


# Connect to the sql server instance
$sql_instance = Get-SqlInstance -ServerInstance SQLVM01\SQL2022AG01

# Attach the AttachDatabase method
$sql_instance.AttachDatabase('TestDB_20240115_173550', 'K:\SQL2022AG01\DATA\TestDB_20240115_173550.mdf')


The value for first parameter, "TestDB_20240115_173550", is the name for the attached database. The second value is the file path for the primary data file. SQL Server will look up the location of other files (secondary data files, transaction logs etc.) from the primary data file and will look for them in their original locations. 

Below is more information on the AttachDatabase method, along with the parameters you can pass to it. Notice the Definition field, each line in it starting with "void...." is called method/function overload  (https://en.wikipedia.org/wiki/Function_overloading)  , each has different list of parameters that SQL Server will automatically determine based on how you execute the method:

Get-Member -InputObject $sql_instance -MemberType Methods `
           -Name AttachDatabase | Format-List

TypeName   : Microsoft.SqlServer.Management.Smo.Server
Name       : AttachDatabase
MemberType : Method
Definition : void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, string owner), 

             void AttachDatabase(string name, System.Collections.Specialized.StringCollection files), 

             void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, 
                Microsoft.SqlServer.Management.Smo.AttachOptions attachOptions), 

             void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, string owner, 
                    Microsoft.SqlServer.Management.Smo.AttachOptions attachOptions)


Lets look at the first function overload:

Void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, string owner)


It accepts 3 parameters:

  1. Name: This would be the name of the database when it's attached
  2. Files: This would be an array with the database file names i.e. MDFs, NDFs, Transaction log files, FileStream directories etc. Only the location of the primary data file is required, rest are optional. If other files are omitted, SQL Server will read their location from the primary data file and then will try to look for them in their original location. If any of the files is missing, it will throw an error, and believe me, those error messages are not very informative!
  3. Owner: The name of the database owner. If you omit this value then the login of the current user will be the owner of the attached database. I generally use "sa" as the database owner.


Below is the second overload for the method. In this one you are not passing the value for Owner parameter.

 
void AttachDatabase(string name, System.Collections.Specialized.StringCollection files) 

Advanced Options: 


The method also supports AttachOptions for more control. Here is list of options you can pass to it:
  • EnableBroker
  • ErrorBrokerConversations
  • NewBroker
  • None
  • RebuildLog
For detailed information, refer to the Microsoft documentation:



In conclusion, PowerShell offers both straightforward and advanced methods for attaching SQL databases. Whether you prefer direct SQL commands or native PowerShell cmdlets, the process is efficient and well-integrated into the SQL Server environment. By understanding the nuances of these methods and ensuring the necessary modules are installed and loaded, you can seamlessly manage your SQL Server databases.