Search This Blog

Tuesday, January 30, 2024

How to Find Where Your Databases Reside In The File System

How to Find Where Your Databases Reside In The File System

SQL Script:


/*
RETURNS LIST OF FOLDER NAMES WHERE SQL SERVER DATABASE
FILES ARE STORED. 

It is dynamic enough to be 
used on even much older versions of SQL Server.

*/

DECLARE @PRODUCTVERSION NVARCHAR(132)
DECLARE @PERMISSION_FLAG bit
SET @PRODUCTVERSION = CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(32)) 
SET @PERMISSION_FLAG = 1

-- For use on older versions of sql server or if you don't have permission to sys.master_files
-- tested in sql 2000 & 20005
if SUBSTRING(@PRODUCTVERSION, 1, CHARINDEX('.', @PRODUCTVERSION)-1) < 10 OR @PERMISSION_FLAG = 0 OR @PERMISSION_FLAG IS NULL
	BEGIN
	PRINT 'We gonna use old school method.....'
	set nocount on if object_id('tempdb..#t1') is not null drop table #t1
		create table #t1 (fpath varchar(4000), fname varchar(8000))
	insert into #t1
		exec sp_msforeachdb 'select left(filename, len(filename) - charindex(''\'', reverse(rtrim(filename)))) fpath, filename fname from [?].dbo.sysfiles'
	select distinct fpath from #t1 order by 1
END

-- For use on SQL version xxx and above 
ELSE 
	with cte as
	(
		select 
			case 
				when type = 2 then physical_name     -- FILESTREAM
				else left(physical_name, len(physical_name) - charindex('\', reverse(rtrim(physical_name)))) 
			end folder_path, 
			physical_name  
		from sys.master_files
	)
	select distinct 
		folder_path,
		[create_folder] = 'mkdir -p "' + folder_path + '"'
	from cte order by folder_path;




This SQL script is designed to return a list of distinct folder paths where SQL Server database files are stored. It adapts its behavior based on the version of SQL Server it's running against and whether specific permissions are set. 

The output of the script includes a column with command to create directories (mkdir -p), useful in cases where this script is part of a larger process, perhaps in database migrations or backups.



Use Cases:

Finding the folder paths where SQL Server database files are stored can be beneficial for several operational, management, and maintenance tasks related to database administration and infrastructure management. Here are some key use cases:


1. Backup and Restore Operations

- Backup Scripts: Knowing the folder paths helps in scripting backup operations, especially when you want to NOT save the backup files in the same location as the database files or in a related directory structure.

- Restore Operations: When restoring databases, especially to a new server or instance, knowing the original file locations can be crucial for planning the restore process, particularly if you need to maintain a specific directory structure or adhere to storage policies.


2. Disaster Recovery Planning

- Identifying the storage locations of database files is essential for creating effective disaster recovery plans. It allows administrators to ensure that all necessary files are included in backups and can be quickly restored in case of a failure.


3. Capacity Planning and Monitoring

- Storage Management: Monitoring the disk space usage of SQL Server file paths can help in forecasting future storage requirements and preventing disk space shortages that could lead to database downtime.

- Performance Optimization: File placement can significantly impact database performance. Knowing where files are stored enables DBAs to distribute files across different storage subsystems based on performance characteristics.


4. Database Migration

- When migrating databases between servers or instances, it's important to know the current storage paths to replicate the environment accurately on the target server or to plan new storage configurations that improve performance or resource utilization.


5. Auditing and Compliance

- For compliance with internal policies or external regulations, it may be necessary to verify that database files are stored on secure, encrypted, or approved storage devices. Knowing the file paths helps in auditing these practices.


6. Server and Storage Configuration

- High Availability (HA) and Disaster Recovery (DR) Solutions: Configuring solutions like Always On Availability Groups, Database Mirroring, or Log Shipping often requires precise control over file locations.

- Storage Optimization: Identifying file locations helps in moving files to optimize storage across different media (SSDs, HDDs) or to balance I/O load.


7. Troubleshooting and Maintenance

- When dealing with file corruption, access issues, or performance problems, knowing the exact location of database files is the first step in diagnosing and resolving such issues.


8. Automated Scripting and Tooling

- Automating database tasks such as backups, file growth monitoring, or applying storage policies often requires scripts that know where database files are located. Scripts may also need to create or modify directories based on these paths.





Understanding and Using xp_sqlagent_enum_jobs in SQL Server

Understanding and Using xp_sqlagent_enum_jobs in SQL Server

What is xp_sqlagent_enum_jobs?


xp_sqlagent_enum_jobs is an undocumented, extended stored procedure in Microsoft SQL Server. It is used to provide information about the jobs that are currently managed by the SQL Server Agent. This extended stored procedure is used in some internal procedures (e.g. sp_get_composite_job_info) by SQL Server, but it can also be used in custom scripts or applications to obtain details about SQL Agent jobs, especially in automated scripts or applications that need to track job status or performance. For example:


How to start SQL Server job using TSQL and wait for it to complete?


Often, the main impetus for using xp_sqlagent_enum_jobs is to programmatically check if a SQL Server Agent job is currently running, and then either wait for it to complete or take some other action. While interactive tools are available for monitoring job status, there aren't many reliable and well-documented stored procedures, system tables, or DMVs suitable for use in automated, non-interactive scenarios.


Syntax and Usage:


xp_sqlagent_enum_jobs typically requires that minimum two input parameters are given and, a third optional parameter. However, since this procedure is undocumented, the exact nature and behavior of these parameters might vary slightly based on the version of SQL Server. Here is a general explanation of its parameters:

  • Visibility Flag / is_sysadmin (Int): This parameter is usually an integer 0 or 1 to indicate if the executing user is a sysadmin (member of sysadmin role) and determines the scope of the job information returned by the procedure. A value of 1 indicates Yes and the procedure can return information for all jobs. A value of 0 means No in which case the procedure will return only those jobs that are owned by the executing user or jobs that the user has been granted permissions to view. 

In my experience though, xp_sqlagent_enum_jobs  seems to accept not only 0 or 1 but various integer values, both positive and negative.


  • Job Owner (Sysname): This is a string parameter representing the user in the context of whom the job information is requested. It's often provided in the format of a domain and username (e.g., DOMAIN\Username) or 'sa' if a DBA is running this procedure. This parameter is used along with the the visibility flag to determine which jobs are visible to the specified user based on their permissions.

Again in my experience, xp_sqlagent_enum_jobs  seems to accept any value, including invalid or non-existent usernames.

  • Job ID (UNIQUEIDENTIFIER): Optional parameter to get information for a single job with a specific job ID. 

If you are getting Job ID value from the msdb.dbo.sysjobs table, make sure to convert its value to a UNIQUEIDENTIFIER data type.


Basic Examples:


In this example, 1 is the visibility flag, and 'sa' represents the user.
EXEC master.dbo.xp_sqlagent_enum_jobs 1, 'sa' 












To get information about a specific job:

DECLARE @job_name NVARCHAR(100) = 'DatabaseIntegrityCheck - USER_DATABASES'
DECLARE @job_id UNIQUEIDENTIFIER
SELECT @job_id = job_id FROM   msdb..sysjobs WHERE  NAME = @job_name
EXEC master.dbo.Xp_sqlagent_enum_jobs 1, 'sa', @job_id 



When using xp_sqlagent_enum_jobs, consider the following aspects:


Permissions: The actual jobs returned can depend on the permissions of the user executing the procedure or the user specified in the parameters.

Security Risk: As an undocumented feature, using this procedure can pose a security risk, especially if used improperly or without understanding its impact.

Version Dependence: The behavior and availability of xp_sqlagent_enum_jobs can change without notice in different versions of SQL Server.

Results/Output Columns:


The procedure returns a result set with various columns providing details about each job. These details typically include:

  1. Job ID: A unique identifier for each job.
  2. Last Run Date: The date and time when the job was last executed.
  3. Next Run Date: The date and time when the job is scheduled to run next.
  4. Next Run Schedule ID: The identifier of the schedule according to which the job will run next.
  5. Requested To Run: Indicates whether the job has been requested to run immediately (outside its scheduled times).
  6. Request Source: Indicates the source of the run request.
  7. Request Source ID: The identifier of the request source.
  8. Running: Indicates whether the job is currently running.
  9. Current Step: The current step number the job is on (if it's a multi-step job).
  10. Current Retry Attempt: The number of the current retry attempt (in case the job is configured to retry on failure).
  11. State: The current state of the job.


What does it mean that xp_sqlagent_enum_jobs is undocumented?


xp_sqlagent_enum_jobs is indeed an undocumented extended stored procedure in Microsoft SQL Server. This means that it is not officially documented or supported by Microsoft in their public documentation. In essence, this raises several important considerations:

  • Lack of Official Documentation: Since there's no official documentation, detailed information about its parameters, behavior, and potential changes in different versions of SQL Server are not publicly available through Microsoft's channels.
  • Potential for Change: Microsoft may change or remove undocumented features in any new release or update of SQL Server without prior notice. This can lead to compatibility issues or sudden breaks in functionality in systems that rely on these features.
  • Limited Support: If you encounter issues while using undocumented features, finding solutions can be more challenging as these features are typically not covered by standard support channels.
  • Use with Caution: It's advisable to use undocumented features with caution, especially in production environments. They might be less stable or tested compared to officially supported features.
  • Alternative Approaches: For long-term stability and support, it's often better to seek alternative, documented features or methods to achieve the same goals.
  • Community Knowledge: Sometimes, information about undocumented features can be found through community forums, blogs, or from SQL Server experts who share their findings. However, this information is based on personal experiences and might not be universally applicable.


Usage Scenerios:


Here are a few example scenarios of how xp_sqlagent_enum_jobs might be used:

Basic Usage to List All SQL Agent Jobs

This is the simplest form of using xp_sqlagent_enum_jobs to get a list of all SQL Agent jobs with their details.

EXEC master.dbo.xp_sqlagent_enum_jobs 1, '<your_domain\your_login>'

/* OR */

EXEC master.dbo.xp_sqlagent_enum_jobs 1, 'sa'


Checking for Running Jobs

You can use xp_sqlagent_enum_jobs to check if there are any jobs currently running on the SQL Server Agent.


IF OBJECT_ID('tempdb..#RunningJobs') is not null DROP TABLE #RunningJobs;
go
CREATE TABLE #RunningJobs (
    job_id UNIQUEIDENTIFIER NOT NULL,
    last_run_date INT NOT NULL,
    last_run_time INT NOT NULL,
    next_run_date INT NOT NULL,
    next_run_time INT NOT NULL,
    next_run_schedule_id INT NOT NULL,
    requested_to_run INT NOT NULL, -- boolean
    request_source INT NOT NULL,
    request_source_id sysname COLLATE database_default NULL,
    running INT NOT NULL, -- boolean
    current_step INT NOT NULL,
    current_retry_attempt INT NOT NULL,
    state INT NOT NULL
)


INSERT INTO #RunningJobs
EXEC master.dbo.xp_sqlagent_enum_jobs 1, '<your_domain\your_login>'
SELECT * FROM #RunningJobs WHERE running = 1

IF OBJECT_ID('tempdb..#RunningJobs') is not null DROP TABLE #RunningJobs;
go


Finding Jobs with Specific Conditions

You can filter the results to find jobs with specific conditions, like jobs that have failed to run, or jobs scheduled to run at a specific time.

SELECT * FROM #RunningJobs WHERE last_run_outcome <> 1 -- Replace with your specific condition


Wait for a SQL Server job to complete before proceeding



Precautions and Best Practices:



  • Use a Dedicated Account: When specifying the user account (<your_domain\your_login>), it's best to use an account with appropriate permissions.
  • Temporary Tables: In the examples, I've used temporary tables (#RunningJobs) to store the output of xp_sqlagent_enum_jobs. This is a common practice to make data handling easier.
  • Cleanup: Always remember to clean up temporary tables to avoid unnecessary consumption of resources.
  • Testing: Thoroughly test these scripts in a non-production environment first to ensure they work as expected in your specific SQL Server setup.
  • Monitoring and Logging: When using such procedures in scripts or applications, implement adequate logging and error monitoring.
  • Seek Alternatives: While I haven't tried or tested the following myself, here is an article discussing one of the possible alternatives:

xp_sqlagent_enum_jobs alternative








Tuesday, January 23, 2024

SQL Server's Buffer Cache: How to Calculate Dirty Page Sizes

SQL Server's Buffer Cache: How to Calculate Dirty Page Sizes

As we know, the SQL Server buffer cache is an area or region inside the server's memory where SQL Server keeps the data it needs for processing queries.

Primarily, the buffer cache's job is to reduce the amount of time SQL Server spends reading data from the disk. It does this by keeping frequently used data in memory, which is much faster to access than disk storage. 

Additionally, the buffer also contains data that don't already exist on the disk, meaning in the SQL Server data files. That would be the case for the new data, when new data are added, although SQL Server will write those data immediately to the transaction log file (unless the Delayed Durability is turned on), these new data are not immediately written/stored into the data files, SQL Server waits until the the next CHECKPOINT process kicks in which then writes those called "dirty" into the data files. The LAZY WRITER process also writes dirty pages to the disk but it does so continuously, in a much less aggressive manner.

To calculate the size of the dirty pages in buffer cache, SQL Server exposes some meta data regarding the pages in buffer cache through sys.dm_os_buffer_descriptors DMV.  

If you have VIEW SERVER STATE permission or, the SYSDBA role membership, you can query sys.dm_os_buffer_descriptors, mostly to get some summary information as it only contains the meta data. The below query is an example of such, which returns the ratio/percentage of dirty pages against the overall/total size of the buffer cache.

;WITH cte
     AS (SELECT Cast(Count(*) AS FLOAT) Total_Pages,
                Count(CASE WHEN is_modified = 0 THEN 1 END) Clean_Pages,
                Count(CASE WHEN is_modified = 1 THEN 1 END) Dirty_Pages
         FROM   sys.dm_os_buffer_descriptors
         WHERE  page_type IN ( 'DATA_PAGE', 'INDEX_PAGE', 'TEXT_MIX_PAGE' ))

SELECT Total_Pages,
       Clean_Pages,
       Dirty_Pages,
       Dirty_Pages_Percentage = Cast(( Dirty_Pages / Total_Pages ) * 100 
              AS DECIMAL(4, 2)) 
FROM   cte; 


Dirty Pages In SQL Server Buffer Cache







To get this information for each database:

;WITH cte
     AS (SELECT 
		database_id,
                Case When database_id = 32767 then '(ResourceDB)'
                     Else Coalesce(db_name(database_id),'***** TOTAL *****') 
                    End [Database],

		Cast(Count(*) AS FLOAT) Total_Pages,
                Count(CASE WHEN is_modified = 0 THEN 1 END) Clean_Pages,
                Count(CASE WHEN is_modified = 1 THEN 1 END) Dirty_Pages

         FROM   sys.dm_os_buffer_descriptors
         WHERE  page_type IN ( 'DATA_PAGE', 'INDEX_PAGE', 'TEXT_MIX_PAGE' )
		 GROUP BY ROLLUP (database_id))

SELECT [Database],
       Total_Pages,
       Clean_Pages,
       Dirty_Pages,
       Dirty_Pages_Percentage = Cast(( Dirty_Pages / Total_Pages ) * 100 
              AS DECIMAL(4, 2)) 
FROM   cte
ORDER BY Total_Pages desc; 


A word of caution: if you have a very large buffer cache (like 75GB or even 750GB), querying this DMV can be slow. The larger the cache, the more data the DMV has to process. For a 75GB cache, it might need to process about ~10 million rows; for a 750GB cache, that number could jump to ~100 million rows!


Please also note that when using the sys.dm_os_buffer_descriptors tool to view the buffer cache contents, keep in mind that it also includes pages used by SQL Server's internal Resource database. Often, these pages are left out when people are summing up or analyzing the buffer cache data. However, in this particular example, I've chosen to include them in the count and analysis.



Wednesday, January 17, 2024

Best Practice Tip #1: Beware of Best Practices

Best Practice Tip #1: Beware of Best Practices

It may sound contractionary that the first item of a best practices list is to be wary of best practices. When you hear a piece of 'best practice' advice, it's wise to take a step back and think it through before jumping in.

Now, it's true that not every system is completely unique – there are times when systems are pretty much carbon copies of each other. But, each organization usually has its own set of specific needs and ways of doing things. There is no one-size fits for all, all the time. So, even if a piece of advice sounds good on paper, it's important to see if it really fits your particular situation.

Also, be wary of best practices that might be out of date. This includes tips for older versions of SQL Server or from the days when 8 GB of RAM was a lot. When you're searching online, you might run into advice that’s been hanging around for a while. Plus, a strategy that worked wonders for your system in the past might not be the best choice now if things have changed. And don’t forget, even if a tip seems like it'll boost your system, it could have downsides, like the risk of messing up things you've already done, which could be a major issue.


A Detailed Example:


Imagine you're managing your company's databases. You find this often-suggested tip: 'Always update to the latest database software for top performance and security.' It sounds sensible, and in fact, at my current job, it’s almost a given to keep everything up-to-date.

But here’s the catch: just because it's latest and greatest version doesn't mean it's automatically the right choice for you. It’s all about weighing the good against the potential hiccups. New versions might and often do offer cool features, but will they integrate smoothly with your current setup? Could they cause unexpected problems? It’s about finding the right balance for your specific needs. 


Here are few of the points to consider:


  • You have a database version that is stable and has been customized to suit your company's unique needs. Yes, this violates the best practice that don't change the default settings or deploy solutions that will get overwritten if you ever upgrade the version. But, here you are. And the latest database version offers new features, but it's unknown how these will interact with your customizations.
  • The best practices often do become outdated. Research if the advice is based on the latest information or if it was more relevant to older versions of the database software.
  • Analyze how the upgrade might affect your current workload and system performance.
  • Understand the potential risks, such as incompatibility with existing applications or downtime during the upgrade process.
  • While the upgrade might offer improved performance and security, it could also introduce new challenges or instability.
  • The risk of disrupting your current stable environment might outweigh the benefits of the new features in the latest version.


In any of the above circumstances, blindly following the best practice of upgrading to the latest software version without considering these factors could lead to unforeseen complications. 

Also, always test, test and test and have more than one back-out/rollback options ready in case things don't go as planned.

Here are some more examples:

  • Password Management: The old security best practice was to change passwords frequently, like every 90 days. This advise is now being questioned as it often leads to weaker passwords and password fatigue. The modern approach emphasizes strong, unique passwords possibly managed through a password manager, and the use of multi-factor authentication. MFA is already available in Azure SQL Server and, you could achieve this even in on-premise SQL Servers.
  • CPU, Memory and Storage Solutions: The use of physical servers for data storage and processing was a common practice. In today's cloud computing world, memory (both RAM and disk) and CPUs are not only cheap, they are dynamically available, and transient – therefore, not even a long-term investment if you don't end up needing them in future. 
  • Denormalization for Performance: In the past, full denormalization was a common practice to improve database performance, especially in data warehousing systems with limited resources. Nowadays, some or more degree of normalization is preferred. And I am aware, here I am dangerously close to long running debate/feud between the Team Star Schema Vs Team Snow-Flake Schema.
  • Vertical Scaling: The traditional approach to handling increased load or data volume was to scale up (vertical scaling) by upgrading the existing server's hardware (more CPU, RAM, storage). Modern databases often favor horizontal scaling (scaling out), which involves adding more nodes to the system (e.g., AlwaysOn Availability Groups), offering better fault tolerance and scalability.
  • Full Database Backups: In very distant past, relying solely on full database backups was common and/or the only option available, not anymore. While still important, it is inefficient for VLDBs. Incremental and differential backups are now frequently used alongside full backups to minimize data loss and recovery time.
  • Manual Tuning and Maintenance: Earlier, database administrators (DBAs) spent considerable time manually tuning queries and maintaining databases. Today, many databases incorporate automated performance tuning, real-time monitoring, and AI-driven optimization, reducing the need for manual intervention.


These examples illustrate how database management best practices have evolved and will continue to do so.





Tuesday, January 16, 2024

A Primer on Attaching Databases with PowerShell Method AttachDatabase

A Primer on Attaching Databases with PowerShell Method AttachDatabase

Attaching a SQL Server database using PowerShell can be straightforward. Here's an effective and easy approach using the Invoke-Sqlcmd Cmdlet, by encapsulating your SQL statement in a variable, and then pass it to the Invoke-Sqlcmd cmdlet. For example:


$sql_command = "

    USE [master]

    GO

    CREATE DATABASE [TestDB_20240115_173550] ON 

    ( FILENAME = N'K:\\SQL2022AG01\\DATA\\TestDB_20240115_173550.mdf' ),

    ( FILENAME = N'L:\\SQL2022AG01\\LOG\\TestDB_log_20240115_173550.ldf' )

    FOR ATTACH

    GO

"



Invoke-Sqlcmd -ServerInstance SQLVM01\\SQL2022AG01 -Query $sql_command


The above command assumes that the PowerShell module for the SQL Server is installed and imported into your PowerShell session.  For any DBA using PowerShell to manage SQL Server instances, I think this should be automatically loaded, which you can do by copying the SQL Server module files in one of the directories listed in $env:PSModulePath. 

To determine the locations where PowerShell modules can be stored, simply enter $env:PSModulePath in the PowerShell console and press Enter. This will display the paths:

PS C:\Users\Dummy> $env:PSModulePath

C:\Users\dummy\Documents\WindowsPowerShell\Modules;
C:\Program Files\WindowsPowerShell\Modules;
C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules

On my system, these three directories are where I can place PowerShell module files. The choice of directory depends on whether I want the module accessible only to my user profile or to all users on the computer. It also depends on my permissions to create files and folders in the Program Files and System32 directories.

You can verify if the SQL Server module is loaded into your current session with the Get-Module command. 

PS C:\Users\Dummy> Get-Module -Name sqlserver | Format-List

Name              : SQLServer
Path              : C:\Users\dummy\Documents\WindowsPowerShell\Modules\SQLServer\21.1.18256\SqlServer.psm1

Description       : This module allows SQL Server developers, administrators and business intelligence professionals to automate database development and server administration, as well as both 
                    multidimensional and tabular cube processing.
                    
                    For Release Notes, expand the Package Details section on this page.
                    
                    For automation with SSRS & Power BI Report Server, use the ReportingServicesTools module available at https://www.powershellgallery.com/packages/ReportingServicesTools
ModuleType        : Script
Version           : 21.1.18256
NestedModules     : {Microsoft.SqlServer.Management.PSSnapins, Microsoft.SqlServer.Management.PSProvider, Microsoft.AnalysisServices.PowerShell.Cmdlets, 
                    Microsoft.SqlServer.Assessment.Cmdlets...}
ExportedFunctions : {Invoke-SqlNotebook, SQLSERVER:}
ExportedCmdlets   : {Add-RoleMember, Add-SqlAvailabilityDatabase, Add-SqlAvailabilityGroupListenerStaticIp, Add-SqlAzureAuthenticationContext...}
ExportedVariables : 
ExportedAliases   : {Decode-SqlName, Encode-SqlName}


BTW,  in a layman's terms, a PowerShell module is basically your regular folder containing files like script files, DLL/assemblies files etc, as highlighted SQLServer value in the above example.

If it returns nothing that means you currently don't have the SQLServer module in your session. Generally, though if the module folder is in one of the folders listed in $env:PSModulePath then it should get automatically loaded into your session when you reference any of the commands from that module for the first time. PowerShell won't load a module into your session until it is required.

You can also manually load a module into your session using the Import-Module cmdlet:

Import-Module -Name sqlserver


If not then you can install it from the PSGallery, provided that PSGallery is configured as a repository:



Find-Module -Name SQLServer






Install-Module -Name SQLServer


Of course, I already have a previous version of this module installed, which I can upgrade to the latest version by adding -Force to the Install-Module command. This is also essentially how you would typically upgrade an existing module.




That being said, beyond SQL statements, PowerShell offers native cmdlets for database management. To attach a database, use the AttachDatabase method. Here's an example:


# Connect to the sql server instance
$sql_instance = Get-SqlInstance -ServerInstance SQLVM01\SQL2022AG01

# Attach the AttachDatabase method
$sql_instance.AttachDatabase('TestDB_20240115_173550', 'K:\SQL2022AG01\DATA\TestDB_20240115_173550.mdf')


The value for first parameter, "TestDB_20240115_173550", is the name for the attached database. The second value is the file path for the primary data file. SQL Server will look up the location of other files (secondary data files, transaction logs etc.) from the primary data file and will look for them in their original locations. 

Below is more information on the AttachDatabase method, along with the parameters you can pass to it. Notice the Definition field, each line in it starting with "void...." is called method/function overload  (https://en.wikipedia.org/wiki/Function_overloading)  , each has different list of parameters that SQL Server will automatically determine based on how you execute the method:

Get-Member -InputObject $sql_instance -MemberType Methods `
           -Name AttachDatabase | Format-List

TypeName   : Microsoft.SqlServer.Management.Smo.Server
Name       : AttachDatabase
MemberType : Method
Definition : void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, string owner), 

             void AttachDatabase(string name, System.Collections.Specialized.StringCollection files), 

             void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, 
                Microsoft.SqlServer.Management.Smo.AttachOptions attachOptions), 

             void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, string owner, 
                    Microsoft.SqlServer.Management.Smo.AttachOptions attachOptions)


Lets look at the first function overload:

Void AttachDatabase(string name, System.Collections.Specialized.StringCollection files, string owner)


It accepts 3 parameters:

  1. Name: This would be the name of the database when it's attached
  2. Files: This would be an array with the database file names i.e. MDFs, NDFs, Transaction log files, FileStream directories etc. Only the location of the primary data file is required, rest are optional. If other files are omitted, SQL Server will read their location from the primary data file and then will try to look for them in their original location. If any of the files is missing, it will throw an error, and believe me, those error messages are not very informative!
  3. Owner: The name of the database owner. If you omit this value then the login of the current user will be the owner of the attached database. I generally use "sa" as the database owner.


Below is the second overload for the method. In this one you are not passing the value for Owner parameter.

 
void AttachDatabase(string name, System.Collections.Specialized.StringCollection files) 

Advanced Options: 


The method also supports AttachOptions for more control. Here is list of options you can pass to it:
  • EnableBroker
  • ErrorBrokerConversations
  • NewBroker
  • None
  • RebuildLog
For detailed information, refer to the Microsoft documentation:



In conclusion, PowerShell offers both straightforward and advanced methods for attaching SQL databases. Whether you prefer direct SQL commands or native PowerShell cmdlets, the process is efficient and well-integrated into the SQL Server environment. By understanding the nuances of these methods and ensuring the necessary modules are installed and loaded, you can seamlessly manage your SQL Server databases.




 



Saturday, January 13, 2024

Troubleshooting SQL Server Error 8623: The query processor ran out of internal resources and could not produce a query plan

SQL Server Error 8623

At least once day, an application for one of the databases I manage gets this error:

The query processor ran out of internal resources and could not produce a query plan. 

This is a rare event and only expected for extremely complex queries or queries that reference 
a very large number of tables or partitions. Please simplify the query. 
If you believe you have received this message in error, 
contact Customer Support Services for more information.

This error is sent to the client that initiated the query, so it won't appear in the SQL Server error log. Consequently, you may not be aware of it until you receive reports from your users. However, you can set up Extended Events, traces, etc., to capture the error. That's exactly what I did because we were uncertain which specific query was causing this error. 

Here is an explanation on this error:

https://learn.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-8623-database-engine-error


In my experience, this is indeed an uncommon situation and therefore can be a daunting challenge. However, the query itself doesn't necessarily have to be highly complex. More often than not, it's an excessively and unnecessarily long or inefficient query. For instance, it could involve an IN clause with thousands of values, an overly convoluted query with numerous derived tables, joins, UNION ALL operations, and so on. As a result, the SQL Server query optimizer eventually gives up attempting to find an optimal execution plan for it and throws the error.

In my specific case, the problematic query consists of over 400 UNION ALL operations! It consistently times out and triggers the error after 20 seconds. The duration after which the query optimizer gives up can vary and depends on various factors.

 
How you resolve it? From the same article:
Simplify the query by breaking the query into multiple queries along the largest dimension. First, remove any query elements that aren't necessary, then try adding a temp table and splitting the query in two. Note that if you move a part of the query to a subquery, function, or a common table expression that isn't sufficient because they get recombined into a single query by the compiler. You can also, try adding hints to force a plan earlier, for example OPTION (FORCE ORDER).

Why did I highlight the use of 'OPTION (FORCE ORDER)'? Because I've found it to be a reliable workaround to make the query work, especially when you need a quick workaround to address the issue immediately. Afterward, you can focus on implementing a better and more sensible permanent solution, assuming you have the option to rewrite the query, if not you can dynamically apply query hints using the Query Store:

https://learn.microsoft.com/en-us/sql/relational-databases/performance/query-store-hints


The database I'm referring to is a third-party database running on SQL Server version 2019 Standard edition, equipped with 8 cores and 48GB of RAM, with 40GB allocated to the SQL Server. However, it's worth noting that this error can occur in older versions as well as potentially in future versions of SQL Server.

Capturing this error using Extended Events:

Given that the error is sent to the client and not logged into SQL Server error log, if you need to identify the query or queries that are encountering this error, you can establish an Extended Events trace to capture the necessary information, including the query text. 

Below is a sample code to create the Extended Events trace:


CREATE EVENT SESSION [trace_error_8623] ON SERVER 
ADD EVENT sqlserver.error_reported(
    ACTION(
	       package0.last_error,
	       sqlserver.client_app_name,
		   sqlserver.client_connection_id,
	       sqlserver.database_id,
		   sqlserver.database_name,
		   sqlserver.is_system,
		   sqlserver.plan_handle,
		   sqlserver.query_hash,
		   sqlserver.query_plan_hash,
		   sqlserver.session_id,
		   sqlserver.sql_text,
		   sqlserver.username
		   )
    WHERE ([error_number]=(8623)))
ADD TARGET package0.event_file(SET filename=N'trace_error_8623.xel')
WITH (MAX_MEMORY=4096 KB,
      EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
	  MAX_DISPATCH_LATENCY=30 SECONDS,
	  MAX_EVENT_SIZE=0 KB,
	  MEMORY_PARTITION_MODE=NONE,
	  TRACK_CAUSALITY=OFF,
	  STARTUP_STATE=OFF
	 )
GO

To query/read this XE:

;with cte as
(
		SELECT top 100 
				@@SERVERNAME [SQL Server],
				[file_name],
                cast(event_data as xml) event_data
		FROM sys.fn_xe_file_target_read_file('trace_error_8623*.xel', null, null, null)

)
select -- TOP 10
		 [SQL Server] 
	,en.v.value('../@timestamp', 'varchar(100)') [TimeStamp]
	,en.v.value('.', 'varchar(100)') [Error Number]
	,em.v.value('.', 'varchar(8000)') [Error Message]
	,un.v.value('.', 'varchar(100)') [User Name]
	,st.v.value('.', 'varchar(max)') [SQL Text]
	,datalength(st.v.value('.', 'varchar(max)')) / 1024 [SQL Size (mb)]

	,sid.v.value('.', 'Int') [Session ID]
	,qh.v.value('.', 'varchar(8000)') [Query Hash]
	,CONCAT('0x', ph.v.value('.', 'varchar(200)')) [Plan Handle]
	,dbname.v.value('.', 'varchar(8000)') [DB Name]
	,dbid.v.value('.', 'Int') [DB ID]
	,can.v.value('.', 'varchar(8000)') [Client App Name]

	,cte.event_data [Event Data]

from cte
CROSS APPLY event_data.nodes('//event/data[@name = "error_number"]')     as en(v)
CROSS APPLY event_data.nodes('//event/data[@name = "message"]')  as em(v)
CROSS APPLY event_data.nodes('//event/action[@name = "username"]') as un(v)
CROSS APPLY event_data.nodes('//event/action[@name = "sql_text"]')   as st(v)
CROSS APPLY event_data.nodes('//event/action[@name = "session_id"]')      as sid(v)
CROSS APPLY event_data.nodes('//event/action[@name = "query_hash"]')      as qh(v)
CROSS APPLY event_data.nodes('//event/action[@name = "plan_handle"]')      as ph(v)
CROSS APPLY event_data.nodes('//event/action[@name = "database_name"]')      as dbname(v)
CROSS APPLY event_data.nodes('//event/action[@name = "database_id"]')      as dbid(v)
CROSS APPLY event_data.nodes('//event/action[@name = "client_app_name"]')      as can(v)
;


In my particular case, the issue stems from how the application's user interface handles user selections. Users can make multiple selections, which the application then uses to dynamically construct a query with UNION ALL for each selected value before execution on SQL Server. However, if a user selects an excessive number of values, the accumulation of UNION ALL operations overwhelms the query optimizer, making it difficult to find an optimal execution plan. This problem consistently triggers the error after precisely 20 seconds on our SQL Server.

As a temporary workaround, I implemented the query hint OPTION (FORCE ORDER), which reduced the query execution time to just 7 seconds, still returning same 469 rows. In search of efficiency, replacing the UNION ALL operations with an IN clause further improved performance, allowing the query to finish in slightly under a second and producing the same 469 rows. Opting for an even better solution, I moved all values into a table variable, which enabled the query to complete in less than a second.

It's important to note that the specific solution for this error in one of your queries may differ based on the specific characteristics of your query.

Resources:

Take a look at this article for a guide on reproducing this error on your SQL server:

When I tested the sample code provided in the article, it would trigger this error within a couple of seconds. Here, the suggested workaround is to include the query hint OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')).

However, adding the query hint OPTION (FORCE ORDER) also resolves the issue.