Mission: SQL Homeostasis: September 2023

Sunday, September 24, 2023

Enable PowerShell Remoting

I use PowerShell quite a lot to manage servers, especially SQL Servers. So I need to be able to run PowerShell commands remotely. Fortunately I don't have to worry about this much on Windows Server 2012 and up, because the PowerShell remoting is enabled by default. On the other hand, if an organization deems PowerShell, and PSRemoting in particular, as a security risk and decides to disable it on purpose. that obivoulsly is a different matter.

And, any earlier versions of Window Server and all versions of the Windows client operating systems (Home Pro etc.), PowerShell remoting is turned off by default.

To enable PSRemoting:

Connect / RDP into the computer.

Launch the PowerShell with elevated privileges (Run As Administrator).

Run the command:

Enable-PSRemoting –Verbose and –Confirm

I added the -Verbose and -Confirm parameters because by default, the Enable-PSRemoting runs silently with no output whatsoever.

Here is a link to the Microsoft article for more information on Enable-PSRemoting command:

Enable-PSRemoting

And, if you’re working in a workgroup environment, for example, at home, with no domain controller to handle the security and identity, you need to add list of computers you trust. Here is the article that explains how-to, and troubleshooting other potential issues:

Troubleshoot-PowerShell-Remoting

Once it is enable, here is a basic example how to send command to the remote computer:

Invoke-Command -ComputerName Servername -ScriptBlock {'Hello world'}

That's all nice. But how do you check whether it is already enabled or not? PowerShell doesn't provide a direct way to check whether PSRemoting is enabled or not. Indirectly, you can send a test command to the remote computer.

Invoke-Command -ComputerName Servername -ScriptBlock {'Hello world'}

If errors out, it will display an ugly error message. The error per se could be due to a completely different reason, maybe you typed in wrong server name, WinRM service is not running, issues with your credentials, firewall, trusts etc.

Another, indirect method, is to run the Test-WSMan command to see whether WinRM service is running there or not.

Test-WSMan -ComputerName Servername

This too doesn't guarantee to tell if issue is with PSRemoting setup.

May be some future version of PowerShell will add a command to check status of PSRemoting locally as well as remotely, in secure ways of course. I am not too hopeful though because it has been enabled by default since Windows 2012, so Microsoft may not see a need for it or deem it a security risk.

Friday, September 22, 2023

Get Configuration Change History From the SQL Server Error Logs

There are several options if and when you need to see if any configuration change was made in SQL Server.

The simplest maybe to use the SSMS GUI, using its built-in (aka standard) reports:

The good thing about this standard report is that it reads from the default trace, which being the "default" setting, would have been already running, that means you don't need to setup and enable it. It is just there, unless of course it was purposely disabled/stopped.

The bad is that the default trace is of limited size, I think the size limit is 20 MB with up to 5 rollover files. You cannot change the size of default trace, you can however create your own trace, in which case better to look at setting up Extended Events instead.

You can also use the SQL Server Auditing and Extended Events features as well, which gives you lot more control on what to capture and to what size. The downside is that you need to set it up and it should be already running by the time there is a need to check if any configuration change was made.

SQL Server also logs anytime there is a configuration change, which you can then view and filter to find the configuration changes by searching for string "Configuration option". You can obviously use the SSMS to open and filter the log file, or even your favorite text editor. And better, you can programmatically read and filter using the sys.xp_readerrorlog:

https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-readerrorlog-transact-sql

Here are some examples:

Example 1: Search for entries containing Configuration option

USE master;
EXEC sys.xp_readerrorlog
	0, -- 0 for the current log file, 1 = archive file #1 etc
        1, -- 1 or NULL for SQL Server error log, 2 for the SQL Agent log
        N'Configuration option', -- Search/filter by this string
        NULL, -- Further refine the search/filter condition
        NULL, -- Start time
	NULL, -- End time
	N'desc'; -- Sorts the results by asc for ascending order or

                 -- desc for descending order

Example 2: A second filter for the Start Time

USE master;
EXEC sys.xp_readerrorlog
	0, -- 0 for the current log file, 1 = archive file #1 etc
        1, -- 1 or NULL for SQL Server error log, 2 for the SQL Agent log
        N'Configuration option', -- Search/filter by this string
        NULL, -- Further refine the search/filter condition
        '2023-09-08 16:00:00', -- Start time
	NULL, -- End time
	N'desc'; -- Sorts the results by asc for ascending order or

                 -- desc for descending order

Example 3: Further refine/search into the results

USE master;
EXEC sys.xp_readerrorlog
	0, -- 0 for the current log file, 1 = archive file #1 etc
        1, -- 1 or NULL for SQL Server error log, 2 for the SQL Agent log
        N'Configuration option', -- Search/filter by this string
        N'Optimize', -- Further refine the search/filter condition
        NULL, -- Start time
	NULL, -- End time
	N'desc'; -- Sorts the results by asc for ascending order or

                 -- desc for descending order

So, what is the catch here? Every time SQL Server is restarted, it starts a new log file and renames the previous log file to errorlog.1, and the previous errrolog.1 gets renamed to errorlog.2 and so forth, up to 7 logs, which you can increase up to 99. Stored procedure sp_cycle_errorlog can be used to start a new log file in between the SQL Server restarts. That means 1) The older configuration change entries would get moved to the so called archived error logs, which sys.xp_readerrorlog does allow to read from as well, and 2) Eventually the oldest archived error log files will get removed, unless of course you setup something to store the log entries somewhere.

Of course, you can also use sys.xp_readerrorlog for other purposes as well, for example to see if there have been any errors in last 24 hours.

USE master;
declare @start_time datetime = getdate() - 1
EXEC sys.xp_readerrorlog
    0, -- 0 for the current log file, 1 = archive file #1 etc
    1, -- 1 or NULL for SQL Server error log, 2 for the SQL Agent log
    N'Severity: 16', -- Search/filter by this string
    N'41145', --  Further refine the search/filter condition
    @start_time, -- Start time
    NULL, -- End time
    N'desc'; -- Sorts the results by asc for ascending order

             -- or desc for descending order

Emergency Access to SQL Server: A DBA's Guide to Regaining Control

When I showed a draft of this article to a friend of mine, to get his opinion on it. He gave some interesting feedback and made valid points:

Me: "I've added comments inside the PowerShell code. Do you think it's clear enough, or should I include more screenshots?"

Friend: "The article's informative, but the title could be clearer. It somewhat implies hacking into SQL Server."

Me: "Is that how it comes across? That wasn't my intention."

Friend: "It might just be my perception. Who's your target audience?"

Me: "It's aimed at SQL Server DBAs and those aspiring to be one. The technical depth is necessary."

Friend: "True, but the heavy PowerShell code could be daunting for newcomers."

Me: "I see your point. Technical articles need to balance depth with accessibility. How about I supplement the article with step-by-step explanations or a beginner's guide section?"

Friend: "That sounds like a great idea! It keeps the article comprehensive yet approachable."

Me: "Thanks for the feedback. I'll revise the title to be more specific and refine the content to cater to a broader audience, including those new to SQL Server."

And so I made some changes, added more comments inside the code and updated the title. But I am still bit apprehensive so I would like to make this clear from the beginning: This is really intended for responsible and cautious DBAs, who have legitimate need to access SQL Server but there is no one around with DBA i.e., sysadmin role or any access at the instance level for that matter. Yes, there can still be loopholes in your SQL Servers if left open, can be leveraged by "ethical" as well as rogue actors alike. Here is a relatively old article discussing some of the ways the SQL Serve security might be compromised:

How To Bypass SQL Server Security

So, always stay vigilant!

How do you manage and troubleshoot a SQL Server if you don't any access to it at all? You can't. Even if you have full administrative rights on the server or even in the entire IT network, you cannot connect to the SQL Server if you don't have valid credentials with sysadmin rights.

Unless...... it is a very very old SQL Server version, like SQL Server 7 or older. Back then SQL Server would automatically add the local administrators as sysdba during the installation. You could go back and remove that access, but I don't think most people did. But back then there was even a bigger security issue with SQL Server, blank password for almighty sa user by default!!!!!

Fortunately those dark days are long gone, SQL Server has come a very long way since then and so have we, in our mindsets regarding the security.

But that is not what I am here to talk about.

So back to the topic, this problem per se is not new and neither is the solution. Most long time DBAs would know the solution already, as they would have run into this issue at some point in their career.

Let me just describe the solution briefly and get it out of the way as this article is not about that.

The solution requires that you are a member of the local administrators group on the computer where SQL Server instance is installed. You log into the server, stop and then restart the SQL Server in a single user mode, that temporarily grants local administrators sysadmin permissions. You then connect to the SQL instance with Windows Authentication, and make yourself or any login you like a member of the sysadmin role. Then stop and restart the SQL service in normal mode. VOILÀ!

Below is a link to a how-to document on this very topic from Microsoft:

https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/connect-to-sql-server-when-system-administrators-are-locked-out

Recently, I got a new gig and had to solve this problem on more than 200 SQL Servers, over a single weekend! Given the short amount of time and a relatively large number of SQL Servers, I decided to write a PowerShell script that I can then call remotely on each SQL Server. At first the PowerShell code was a relatively short one, without any validation or error checking. It basically restarts the SQL instance in single user mode, adds a AD domain group for DBAs to the sysadmin role then restarts back the SQL Instance and the SQL Agent services:

# INPUTS
$sql_instance_name = 'SQL2022AG01'
$login_to_be_granted_access = 'Contaso\Group-MSSQL-DBAs'

# GET THE NAME FOR THE WINDOWS SERVICE AND THE SQL CONNECTION
if($sql_instance_name -eq "MSSQLSERVER") 
{
    $service_name = 'MSSQLSERVER'
    $sql_server_instance = "."
}
else 
{
    $service_name = "MSSQL`$$sql_instance_name"
    $sql_server_instance = ".\$sql_instance_name"
}

$sql =  "CREATE LOGIN [$login_to_be_granted_access] FROM WINDOWS; ALTER SERVER ROLE sysadmin ADD MEMBER $login_to_be_granted_access; "

Stop-Service -Name $service_name -Force

# START SQL SERVER IN SINGLE USER MODE
net start $service_name /f /m
Start-Sleep 2
sqlcmd.exe -E -S $sql_server_instance -Q $sql

# RESTART SQL SERVICES IN NORMAL MODE
$service = Start-Service -Name $service_name -PassThru
Start-Service -Name $service.DependentServices[0].Name

Note: Restarting a Windows service requires administrator privileges, including for the process from where the service is being restarted from, which in this case is PowerShell and therefore it needs to be launched as “Run As Administrator”:

Later, I added more code for validation and error checking and an option to create a SQL Login instead of a Windows Login. So, now the script is longer than maybe it should be:

##### CAUTION: THE SCRIPT WILL STOP AND RESTART YOUR SQL SERVER INSTANCE!!!!!!!!!
<#
THIS SCRIPT IS INTENDED TO GET ACCESS TO SQL SERVER ONLY IF 
YOU DON'T HAVE SYSADMIN PERMISSION. SO ONLY USE THIS IN EMERGENCY

ON THE PLUS SIDE, IT HAS OPTION TO PROMPT YOU FOR CONFIRMATION BEFORE 
STOPPING THE SQL INSTANCE

REQUIREMENTS:.

1. YOU MUST HAVE THE LOCAL ADMINISTRATOR RIGHTS ON THE SERVER
2. YOU ARE RUNNING THE SCRIPT LOCALLY ON THE SERVER, OR 
   THROUGH THE INVOKE-COMMMAND CMDLET AND THE PSREMOTING
   MUST BE ENABLED ON THE TARGET SERVER, WHICH IT SHOULD BE BY 
   DEFAULT BE ON WINDOWS SERVER 2012 AND UP

3. THE POWERSHELL MUST BE LAUNCHED WITH ELEVATED ADMINISTRATIVE PRIVILEGES

PARAMETERS: ONLY THE FIRST PARAMETER, $login_to_be_granted_access, IS REQUIRED

1.  $login_to_be_granted_access --> This can be a Windows Login or SQL Login
                                    $sql_login_password is required for SQL Login

2.  $sql_instance_name -----------> Only specify the sql instance name without the server name
                                    If omitted, the default instance is assumed

3.  $confirm ---------------------> $true or $false to prompt for confirmation

Optional: Only required for SQL Login
4.  $sql_login_password ----------> Password for the SQL Login

#>
param (
  [string] $login_to_be_granted_access = 'sqladmin',
  [string] $sql_instance_name = 'SQL2022AG01',  
  [Boolean] $confirm = $true,
  [string] $sql_login_password = 'WA1!!1P7JRjN7F4eibEES&IxU%Elgw6b#'
)


# set the default preferences
$ErrorActionPreference = 'Stop'
$WarningPreference     = 'Continue'
$InformationPreference = 'Continue'

# if value for sql_instance_name is blnak then assume the default instance
if (-Not ($sql_instance_name)) {$sql_instance_name = 'MSSQLSERVER'}

# if value for $confirm is blank then assume $true
if ($confirm -eq $null) {$confirm = $true}


$msg = 'Computer Name: ' + $env:COMPUTERNAME
Write-Information $msg
$msg = 'SQL Instance Name: ' + $sql_instance_name
Write-Information $msg

Write-Information ""

# SHOW PROMPT IF $confirm IS TRUE
if ($confirm -eq $true)
{
    $valid_responses = "Yes", "yes", "No", "no"
    do {
        Write-Warning "##### CAUTION: THE SCRIPT WILL STOP AND RESTART YOUR SQL SERVER INSTANCE!!!!!!!!!"
        $response = Read-Host -Prompt "Are you sure you want to continue (Yes/No) ?"
        if(-not $valid_responses.Contains($response)){write-host "Please enter Yes or No"}
    } until ($valid_responses.Contains($response))

    if ($response -in ('No', 'no') )
    {
        return
    }
}
else
{
    Write-Warning '$confirm is false therefore all prompts will be suppressed.'
    Write-Information ""
}


# LETS DO A BIT OF VALIDATION
if (-not ($sql_instance_name) -or (-not ($login_to_be_granted_access)))
{
    Throw 'Error: Values for $sql_instance_name and $login_to_be_granted_access are required'
}

if (-not ($login_to_be_granted_access.Contains('\')) -and (-not ($sql_login_password)))
{
    Throw 'A password must be given for SQL Login'

}

# CHECK IF RUNNING POWERSHELL IN ELEVATED PRIVILEDGES MODE
if(-Not (([System.Security.Principal.WindowsIdentity]::GetCurrent()).Owner -eq "S-1-5-32-544"))
{
    Throw 'Error: Powershell must be launched in elevated privileges mode'
}


# GET THE NAME OF THE WINDOWS SERVICE AND THE SQL CONNECTION
if($sql_instance_name -eq "MSSQLSERVER")   # DEFAULT INSTANCE
{
    $service_name = 'MSSQLSERVER'
    $sql_server_instance = "."
}
else                                       # NAMED SQL INSTANCE 
{
    $service_name = "MSSQL`$$sql_instance_name"
    $sql_server_instance = ".\$sql_instance_name"
}


Write-Information "SQL Server: $sql_server_instance"
Write-Information "Serivce Name: $service_name"
Write-Information ""

# GET THE SERVICE OBJECT AND THE DEPENDENT SERVICES
$sql_service = Get-Service -Name $service_name
$dependent_services = $sql_service.DependentServices

# 
# EXTRA CHECK: STOP IF THE SERVICE IS NOT FOUND
if(-Not ($sql_service))
{
        Throw "Error: Please ensure the sql instance $sql_instance_name is valid and a windows service with name $service_name exists..."
}


$msg = "Service Status: " + ($sql_service.Status)
Write-Information $msg

$msg = "Service Startup Type: " + $sql_service.StartType
Write-Information $msg

Write-Information ""

# IF THE SERVICE IS DISABLED, PROMPT TO RE-ENABLE IT IN MANUAL MODE
if($sql_service.StartType -eq 'Disabled')
{

    Write-Warning "SQL Instance sql_instance_name is currently disabled...."
    
    if ($confirm) {Set-Service -Name $service_name -StartupType Manual -Confirm}
    else {Set-Service -Name $service_name -StartupType Manual}
    
    $sql_service.Refresh()

    if($sql_service.StartType -eq 'Disabled')
    {
           Throw "Error: Script cannot continue when SQL Instance is in Disabled mode."
    }


}

# STOP THE SERVICE ONLY IF IT IS RUNNING
# PROMPT TO CONFIRM BEFORE STOPPING THE SERVICE

if($sql_service.Status -eq 'Running')
{
    Write-Warning "Stopping service: $service_name"
    Write-Warning "Any dependent services will also be stopped..."
    
    if ($confirm) {Stop-Service -InputObject $sql_service -Confirm -Force}
    else {Stop-Service -InputObject $sql_service -Force}


    Write-Information ""
    Write-Information "STOP-SERVICE MAY RUN IN ASYNC MODE SO LETS SLEEP FOR FEW SECONDS..."
    Start-Sleep 5
    
    # CHECK TO MAKE SURE THE SERVICE IS NOW STOPPED 
    $sql_service.Refresh()

    if($sql_service.Status -ne "Stopped")
    {
        throw "Error: SQL instance service $service_name must be in stopped state before continuing...."
    }

}



#  A WINDOWS SERVICE CAN ONLY BE STARTED IF IT'S START UP TYPE IS MANUAL, AUTOMATIC OR DELAYED AUTOMATIC START 
#  SO, CONTINUE ONLY IF THE START UP TYPE IS NOT ONE OF THEM

$sql_service.Refresh()
if($sql_service.Status -ne 'Running' -and $sql_service.StartType -in ('Manual', 'Automatic'))
{
    
    Write-Warning  "Starting SQL Service in single user mode..."
    Write-Information ""
    net start $service_name /f /m
    Start-Sleep 2

    # CHECK TO MAKE SURE THE INSTANCE IS NOW RUNNING 
    $sql_service.Refresh()
    if($sql_service.Status -eq "Running")
    {

        if ($login_to_be_granted_access.Contains('\'))
        {
            $sql =  "CREATE LOGIN [$login_to_be_granted_access] FROM WINDOWS; ALTER SERVER ROLE sysadmin ADD MEMBER $login_to_be_granted_access; "
        }
        else
        {
            $sql = "CREATE LOGIN [$login_to_be_granted_access] WITH PASSWORD=N'$sql_login_password', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF"
            $sql += "; ALTER SERVER ROLE sysadmin ADD MEMBER $login_to_be_granted_access; "
        }


        $msg = "Adding $login_to_be_granted_access to the SYSADMIN role in SQL Server..."
        Write-Information $msg

        sqlcmd.exe -E -S $sql_server_instance -Q $sql

        Write-Information ""
        Write-Information "Restarting the SQL instance in normal mode..."
        net stop $service_name
        net start $service_name

        # Restart any dependenT service that were running... 
        foreach ($dependent_service in $dependent_services)
        {
            if($dependent_service.Status -eq 'Running')
            {
                $dependent_service_name = $dependent_service.Name
                
                # Check one more time to make sure it's not already running...
                if ((Get-Service -Name $dependent_service_name).Status -ne 'Running')
                {
                    $msg = "Starting dependent service: $dependent_service_name"
                    Write-Information $msg
                    $dependent_service.Start()
                }
            }


        }
    }
    else
    {
        Throw "Error: SQL Instance is not running...."
    }
}

Now what? How do you use it then? If it is just one or two SQL Servers, you can simply copy/paste it into an elevated PowerShell session, and edit the param block with the appropriate values.

param (
  [string] $login_to_be_granted_access = 'sqladmin',
  [string] $sql_instance_name = 'SQL2022AG01',  
  [Boolean] $confirm = $true,
  [string] $sql_login_password = 'WA1!!1P7JRjN7F4eibEES&IxU%Elgw6b#'
)

Note: You can give a Windows login for the $login_to_be_granted_access parameter, as long as it contains a "\" in it, the script will know it is a Windows login.

Then click the green Run Script button or hit F5 key:

It will display a warning and ask if you want to continue:

It will also ask to confirm again before stopping the SQL Service.

You can disable the confirmation prompts by setting $confirm = $false, which is what I would do when running this in non-interactive mode, especially in a batch mode when running it against multiple SQL Servers at the same time. In the following example, the script is saved in a file "add user to sql sysadmin role.ps1, then using the Invoke-Command, I can run it on a remote server (highlighted):

$script_file_path = "add user to sql sysadmin role.ps1"
Invoke-Command -ComputerName SQLServerVM01 -FilePath $script_file_path `
               -ArgumentList 'sqladmin', 'SQL2022AG01', $false, 'WA1!!1P7JRjN7F4eibEES'

To run the script on multiple SQL Servers, you can save the list of SQL servers, instance names and login name in a CSV file. For integrated/Windows logins, the LoginName needs to be in "ServerName\UserName" or "Domain\UserName" format and must contain a "\" as that is what my script is using to dynamically determine whether is is Windows login or a SQL login. For example:

Then use the following PowerShell code to execute the script against all servers listed in the CSV file:

$script_file_path = "add user to sql sysadmin role.ps1"
$csv_data = Import-Csv -Path 'C:\Users\dummy\Documents\myServers.csv'

foreach($csv_row in $csv_data)
{
    Invoke-Command -ComputerName $csv_row.ServerName -FilePath $script_file_path `
                   -ArgumentList $csv_row.LoginName, $csv_row.SQLInstance, `
                                 $false, $csv_row.Password
}

I hope you find this useful in times of need. If you see or come across a bug in this PowerShell code, please let me know.

Download This PowerShell Script From

A Study in SQL Server Ad hoc Query Plans

We often think of ad hoc query plans, perhaps due to a dismissive attitude, as single-use plans that won't be reused frequently or at least not in the very near future. By that time, the plan might have already been evicted from the cache by SQL Server. However, what if a significant portion of our overall workload consists of such queries? Let's say, for the sake of argument, more than 50%. In this scenario, caching these execution plans would simply waste SQL Server's memory, and the server's memory in general. Some might argue this can bloat the plan cache.

As an illustration, consider a real-world scenario where the memory consumed by these query plans is a staggering 30GB!

SELECT 
    instance_name AS name,
    cntr_value / 128 AS pages_mb,
    cntr_value / 128 / 1024 AS pages_gb
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Cache Pages'
ORDER BY pages_mb DESC;

Next, I fire up a slight variation of the query to see the type of workloads using the most memory:

;WITH cte_plan_cache_usage_by_obj_type AS 
(
	SELECT COALESCE(objtype, '-- ALL Plans') objtype,
           Sum(Cast(size_in_bytes AS BIGINT)) / 
		              1024 / 1024 size_mb
    FROM   sys.dm_exec_cached_plans
    GROUP  BY rollup ( objtype )
)
SELECT objtype,
       size_mb,
	   'percent' = 
		   ( size_mb * 100 ) /

                       (SELECT size_mb
			FROM   cte_plan_cache_usage_by_obj_type
			WHERE  objtype = '-- ALL Plans'
		   ) 
FROM   cte_plan_cache_usage_by_obj_type
ORDER  BY size_mb DESC

Wow, the ad hoc queries are using a staggering 94%!

This workload is predominantly ad hoc. And no, I am not in a position to instruct the application team to modify their queries, at least not in the short term. To paraphrase a famous quote:

“In the long run we will be all dead”

- John Maynard Keynes

So, any improvements must be made on the backend SQL Server.

Fortunately, for these types of workloads, SQL Server offers a configurable setting called “optimize for ad hoc workloads”. This setting helps limit or reduce the memory used by single-use ad hoc queries. The key here is "single-use". How does it work? When this setting is enabled, the optimizer doesn't cache the full execution plan for an ad hoc query initially. Instead, it caches only a much smaller plan "stub" along with its query hash and plan hash. This plan "stub" isn't a full execution plan, so it's not reusable for subsequent runs of the same query. However, if that same query is executed again soon after, the optimizer will then cache the reusable, full execution plan. In environments where single-use, ad hoc queries constitute a large portion of the workload, this setting can save a significant amount of memory, freeing it up for other tasks.

Note: For two ad-hoc queries to be reused in SQL Server, they must have an exact text match. This match is sensitive to both case and spacing, even on servers that are case-insensitive.

By default, this setting is turned off. However, I occasionally encounter situations where I find it beneficial to enable it. Why? Because memory in SQL Server is valuable. Perhaps an even more compelling reason is that many SQL servers have a limited memory allocation. As much as I'd like to, I can't convince myself, let alone my manager, to allocate 2TB of RAM to every server.

In the example I presented, the 30GB plan cache might appear excessive without understanding the context. Heck, most of my SQL servers don't even have that much total memory. To shed some light, here's some context regarding this SQL Server:

SQL Version:

Microsoft SQL Server 2019 (RTM-CU18) (KB5017593) - 15.0.4261.1 (X64)

Enterprise Edition: Core-based Licensing (64-bit)

Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor)

Database Size is 60TB

Application: Bigdata/Data Warehouse/Analytical Reports

CPU and Memory Configuration:

(MEMORY INFO IS IN GIGABYTES)

While a MAXDOP of 8 is the recommended best practice, I'm skeptical whether it's the optimal setting for this SQL Server, particularly considering its primarily analytical workload and 112 cores. It might be worthwhile to adjust this value dynamically based on the time of day and expected workload. For example, if there's an extensive ETL job running every night for 5 hours, designed for exclusive use of the entire server, thereby blocking or prohibiting other users, then that ETL job might benefit from a higher MAXDOP value. I'll explore this in more detail later on.

In this context, where the SQL instance has more than 2TB of RAM available, the 30GB cache size represents less than 1.5%. This percentage is, on average, lower than that of most other SQL Servers. However, in absolute terms, it's still significant, so it's worth investigating.

Let's first check what the current setting is for 'optimize for ad 'hoc workloads'.

It's not enabled, which is the default, so it remains in the turned-off state. Ordinarily, I wouldn't spend more time investigating this; I would simply enable the setting. However, in this case, I've been hired solely to investigate and make recommendations. I'm not authorized to implement any changes myself.

Therefore, I need to delve deeper into my investigation. But before proceeding, let's discuss: what precisely defines an execution plan as 'ad hoc'?

As I highlighted at the outset, some of us often view an ad hoc plan or query as something executed only once. This perception is only partially accurate, especially when considering how SQL Server categorizes a query plan as 'ad hoc' or otherwise in its cache:

“Adhoc: Ad hoc query. Refers to Transact-SQL submitted as language events by using osql or sqlcmd instead of as remote procedure calls”

That description is straight from this MS article:

https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-cached-plans-transact-sql

Even though it doesn't explicitly mention SSMS, queries executed from SSMS are also labeled as ad hoc by the optimizer.

It's not the frequency of a query's execution that determines its classification as an ad hoc query. However, the number of query executions does influence whether the "optimize for ad hoc workloads" setting caches the full execution plan or just a stub.

In practical terms, ad hoc queries are those that either lack parameters or cannot be auto-parameterized by the optimizer and are not part of a database object (like stored procedures, functions, views, etc.). Consider, for instance, this simple query:

SELECT NAME FROM SYS.DATABASES DBS WHERE NAME = 'tempdb';

Observe that the query utilizes a constant value 'tempdb' in the WHERE clause, rather than a variable or parameter. Even so, the optimizer can—and often will—attempt to parameterize this query if it's included within a stored procedure or sent as a prepared SQL query. Interactive client tools like sqlcmd, osql, SSMS, and others don't execute such queries as prepared statements. As a result, the optimizer labels them as ad hoc queries and caches their execution plans accordingly. If you have processes that frequently run certain queries using such tools, consider using sp_executesql. This way, they're cached as prepared plans, which can be reused to enhance query performance.

Now, let's examine the subsequent example where the same SQL statement is executed in three distinct manners to obtain identical results:

-- Ad hoc query
SELECT NAME FROM sys.databases dbs WHERE  NAME = 'tempdb'; 
GO
-- Prepared SQL statement
EXEC Sp_executesql N'SELECT NAME FROM sys.databases dbs WHERE  NAME = @db_name',
                   N'@db_name sysname',
                   N'tempdb' 
GO
-- Stored Procedure
IF Object_id('sp_test_adhoc_plans_cache', 'P') IS NOT NULL
  DROP PROCEDURE sp_test_adhoc_plans_cache
GO
CREATE PROCEDURE sp_test_adhoc_plans_cache
(@db_name SYSNAME = 'tempdb')
AS
    SELECT NAME FROM sys.databases dbs WHERE  NAME = @db_name;
GO

EXEC sp_test_adhoc_plans_cache
GO

Now, observe how the optimizer processed and cached those queries:

SELECT cp.plan_handle,
       cp.objtype,
       cp.cacheobjtype,
       cp.size_in_bytes,
       cp.usecounts,
       qt.text,
       qs.query_hash,
       qs.query_plan_hash
FROM   sys.dm_exec_cached_plans cp
       inner join sys.dm_exec_query_stats qs on qs.plan_handle = cp.plan_handle
       CROSS apply sys.Dm_exec_sql_text(cp.plan_handle) qt
WHERE  qt.text LIKE '%SELECT NAME FROM sys.databases dbs WHERE  NAME =%'
       AND qt.text NOT LIKE 'SELECT cp.plan_handle%'

For clearer visibility of its content, I've divided the results into two screenshots:

Observe that the query hash and plan hash values for them are identical. However, the plan handles and objtype differ, even for the same input value, 'tempdb'. While the cached plan size for the ad hoc query is considerably smaller than for the other plans, it remains larger than it would be if 'optimize for ad hoc workloads' were enabled. Let's confirm that:

EXEC sp_configure 'optimize for ad hoc workloads', 1;

reconfigure with override;

Let's remove the already cached plans:

DBCC FREEPROCCACHE (0x060001002C0F151A10D0B0C3D30100000100000000000000000000000000000);

DBCC FREEPROCCACHE (0x060001008A0FCA2810A6C9B5D3010000010000000000000000000000000000);

DBCC FREEPROCCACHE (0x05000100D563663F50416C7CE30100000100000000000000000000000000000);

Rerun the same three queries and examine the cache information:

Note that the plan_handle values differ from those in the previous screenshot, as I cleared the earlier plans from the cache.

Observe the size of the ad hoc plan: merely 456 bytes, with the cacheobjtype listed as 'Compiled Plan Stub'. That's correct – it's only a stub. It doesn't contain an execution plan, so technically, there's no cached plan available for reuse. However, should the same ad hoc query be executed again in the near future, the optimizer will generate a full execution plan for it, which can be reused by subsequent invocations of the same query. Let's validate that:

-- Ad hoc query

SELECT NAME FROM sys.databases dbs WHERE NAME = 'tempdb';

Here is the info from sys.dm_exec_cached_plans after running the ad hoc query the second time:

The cacheobjtype for the Adhoc plan is no longer a tub.

In summary, when 'optimize for ad hoc workloads' is enabled:

The SQL Server optimizer doesn't immediately cache the full execution plan for ad hoc queries. To clarify, this setting only impacts queries that the optimizer classifies as Ad Hoc.
The optimizer will cache the plan stub, accompanied by the query and plan hash values. This allows it to recognize if the same query is re-executed. In such an instance, it will replace the stub with the complete execution plan.

Given this information, would I recommend enabling the 'optimize for ad hoc workloads' setting? Given that over 90% of the workload is ad hoc, the risks seem minimal. However, I'd like to examine one more piece of data. The following query displays the aggregate memory usage for:

Ad hoc query plans
Plans that are not stubs
Plans with a use count of 2 or fewer
Plans not used by a query in over 5 hours

;with cte_cached_adhoc_plans as
(
SELECT plan_handle,
       MAX(last_execution_time) last_execution_time,
       SUM(execution_count)     execution_count
FROM   sys.dm_exec_query_stats
GROUP  BY plan_handle 
)
SELECT	COUNT(*) [Plan Count],
		SUM(CAST(size_in_bytes AS BIGINT)) / 
		              1024 / 1024 [Size MB],
		MAX(cte_cached_adhoc_plans.execution_count) [Max Exec Count],
		SUM(cte_cached_adhoc_plans.execution_count) [Total Exec Count]

FROM       sys.dm_exec_cached_plans cp 
INNER JOIN cte_cached_adhoc_plans ON cte_cached_adhoc_plans.plan_handle = cp.plan_handle 
WHERE cp.objtype = 'Adhoc'
  AND cte_cached_adhoc_plans.last_execution_time < DATEADD(HOUR, -5, GETDATE())
  AND cp.objtype != 'Compiled Plan Stub'
  AND cte_cached_adhoc_plans.execution_count <= 2

That's nearly 28GB. In this case, I would recommend enabling the 'optimize for ad hoc workloads' setting. However, like any recommendation involving configuration changes, it comes with caveats. Exercising caution is essential, as any change can have potentially unintended and sometimes adverse effects. The organization should have a robust change control procedure in place, complete with a back-out plan.

As an alternative, the following query can be used to generate a script to remove individual ad hoc query plans from the cache. Please tailor the filter conditions to fit your needs. This script could be scheduled to run at appropriate times, perhaps after peak business hours and just before nightly ETL or maintenance tasks commence.

;with cte_cached_adhoc_plans as
(
SELECT plan_handle,
       MAX(last_execution_time) last_execution_time,
       SUM(execution_count)     execution_count
FROM   sys.dm_exec_query_stats
GROUP  BY plan_handle 
)
SELECT	TOP 1000
	[Remove Cached Plan] = 
	'DBCC FREEPROCCACHE (0x' + convert(varchar(max), cte_cached_adhoc_plans.plan_handle, 2) + ');' 
FROM       sys.dm_exec_cached_plans cp 
INNER JOIN cte_cached_adhoc_plans ON cte_cached_adhoc_plans.plan_handle = cp.plan_handle 
WHERE cp.objtype = 'Adhoc'
  AND cte_cached_adhoc_plans.last_execution_time < DATEADD(HOUR, -5, GETDATE())
  AND cp.objtype != 'Compiled Plan Stub'
  AND cte_cached_adhoc_plans.execution_count <= 2

In conclusion, as we're aware, SQL Server will evict cached plans and cached data pages from the buffer cache as necessary, using its continually evolving algorithm. So, why not trust its judgment? I generally concur, but with a caveat. The default settings and behaviors are effective for many SQL installations in a majority of scenarios. However, they don't cater to every SQL Server instance or every situation.

P.S.: As a huge fan of Sherlock Holmes stories, the title of this post pays homage to the famous 'A Study in Scarlet' novel.

Mission: SQL Homeostasis

Search This Blog