Search This Blog

Saturday, January 13, 2024

Troubleshooting SQL Server Error 8623: The query processor ran out of internal resources and could not produce a query plan

SQL Server Error 8623

At least once day, an application for one of the databases I manage gets this error:

The query processor ran out of internal resources and could not produce a query plan. 

This is a rare event and only expected for extremely complex queries or queries that reference 
a very large number of tables or partitions. Please simplify the query. 
If you believe you have received this message in error, 
contact Customer Support Services for more information.

This error is sent to the client that initiated the query, so it won't appear in the SQL Server error log. Consequently, you may not be aware of it until you receive reports from your users. However, you can set up Extended Events, traces, etc., to capture the error. That's exactly what I did because we were uncertain which specific query was causing this error. 

Here is an explanation on this error:

https://learn.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-8623-database-engine-error


In my experience, this is indeed an uncommon situation and therefore can be a daunting challenge. However, the query itself doesn't necessarily have to be highly complex. More often than not, it's an excessively and unnecessarily long or inefficient query. For instance, it could involve an IN clause with thousands of values, an overly convoluted query with numerous derived tables, joins, UNION ALL operations, and so on. As a result, the SQL Server query optimizer eventually gives up attempting to find an optimal execution plan for it and throws the error.

In my specific case, the problematic query consists of over 400 UNION ALL operations! It consistently times out and triggers the error after 20 seconds. The duration after which the query optimizer gives up can vary and depends on various factors.

 
How you resolve it? From the same article:
Simplify the query by breaking the query into multiple queries along the largest dimension. First, remove any query elements that aren't necessary, then try adding a temp table and splitting the query in two. Note that if you move a part of the query to a subquery, function, or a common table expression that isn't sufficient because they get recombined into a single query by the compiler. You can also, try adding hints to force a plan earlier, for example OPTION (FORCE ORDER).

Why did I highlight the use of 'OPTION (FORCE ORDER)'? Because I've found it to be a reliable workaround to make the query work, especially when you need a quick workaround to address the issue immediately. Afterward, you can focus on implementing a better and more sensible permanent solution, assuming you have the option to rewrite the query, if not you can dynamically apply query hints using the Query Store:

https://learn.microsoft.com/en-us/sql/relational-databases/performance/query-store-hints


The database I'm referring to is a third-party database running on SQL Server version 2019 Standard edition, equipped with 8 cores and 48GB of RAM, with 40GB allocated to the SQL Server. However, it's worth noting that this error can occur in older versions as well as potentially in future versions of SQL Server.

Capturing this error using Extended Events:

Given that the error is sent to the client and not logged into SQL Server error log, if you need to identify the query or queries that are encountering this error, you can establish an Extended Events trace to capture the necessary information, including the query text. 

Below is a sample code to create the Extended Events trace:


CREATE EVENT SESSION [trace_error_8623] ON SERVER 
ADD EVENT sqlserver.error_reported(
    ACTION(
	       package0.last_error,
	       sqlserver.client_app_name,
		   sqlserver.client_connection_id,
	       sqlserver.database_id,
		   sqlserver.database_name,
		   sqlserver.is_system,
		   sqlserver.plan_handle,
		   sqlserver.query_hash,
		   sqlserver.query_plan_hash,
		   sqlserver.session_id,
		   sqlserver.sql_text,
		   sqlserver.username
		   )
    WHERE ([error_number]=(8623)))
ADD TARGET package0.event_file(SET filename=N'trace_error_8623.xel')
WITH (MAX_MEMORY=4096 KB,
      EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
	  MAX_DISPATCH_LATENCY=30 SECONDS,
	  MAX_EVENT_SIZE=0 KB,
	  MEMORY_PARTITION_MODE=NONE,
	  TRACK_CAUSALITY=OFF,
	  STARTUP_STATE=OFF
	 )
GO

To query/read this XE:

;with cte as
(
		SELECT top 100 
				@@SERVERNAME [SQL Server],
				[file_name],
                cast(event_data as xml) event_data
		FROM sys.fn_xe_file_target_read_file('trace_error_8623*.xel', null, null, null)

)
select -- TOP 10
		 [SQL Server] 
	,en.v.value('../@timestamp', 'varchar(100)') [TimeStamp]
	,en.v.value('.', 'varchar(100)') [Error Number]
	,em.v.value('.', 'varchar(8000)') [Error Message]
	,un.v.value('.', 'varchar(100)') [User Name]
	,st.v.value('.', 'varchar(max)') [SQL Text]
	,datalength(st.v.value('.', 'varchar(max)')) / 1024 [SQL Size (mb)]

	,sid.v.value('.', 'Int') [Session ID]
	,qh.v.value('.', 'varchar(8000)') [Query Hash]
	,CONCAT('0x', ph.v.value('.', 'varchar(200)')) [Plan Handle]
	,dbname.v.value('.', 'varchar(8000)') [DB Name]
	,dbid.v.value('.', 'Int') [DB ID]
	,can.v.value('.', 'varchar(8000)') [Client App Name]

	,cte.event_data [Event Data]

from cte
CROSS APPLY event_data.nodes('//event/data[@name = "error_number"]')     as en(v)
CROSS APPLY event_data.nodes('//event/data[@name = "message"]')  as em(v)
CROSS APPLY event_data.nodes('//event/action[@name = "username"]') as un(v)
CROSS APPLY event_data.nodes('//event/action[@name = "sql_text"]')   as st(v)
CROSS APPLY event_data.nodes('//event/action[@name = "session_id"]')      as sid(v)
CROSS APPLY event_data.nodes('//event/action[@name = "query_hash"]')      as qh(v)
CROSS APPLY event_data.nodes('//event/action[@name = "plan_handle"]')      as ph(v)
CROSS APPLY event_data.nodes('//event/action[@name = "database_name"]')      as dbname(v)
CROSS APPLY event_data.nodes('//event/action[@name = "database_id"]')      as dbid(v)
CROSS APPLY event_data.nodes('//event/action[@name = "client_app_name"]')      as can(v)
;


In my particular case, the issue stems from how the application's user interface handles user selections. Users can make multiple selections, which the application then uses to dynamically construct a query with UNION ALL for each selected value before execution on SQL Server. However, if a user selects an excessive number of values, the accumulation of UNION ALL operations overwhelms the query optimizer, making it difficult to find an optimal execution plan. This problem consistently triggers the error after precisely 20 seconds on our SQL Server.

As a temporary workaround, I implemented the query hint OPTION (FORCE ORDER), which reduced the query execution time to just 7 seconds, still returning same 469 rows. In search of efficiency, replacing the UNION ALL operations with an IN clause further improved performance, allowing the query to finish in slightly under a second and producing the same 469 rows. Opting for an even better solution, I moved all values into a table variable, which enabled the query to complete in less than a second.

It's important to note that the specific solution for this error in one of your queries may differ based on the specific characteristics of your query.

Resources:

Take a look at this article for a guide on reproducing this error on your SQL server:

When I tested the sample code provided in the article, it would trigger this error within a couple of seconds. Here, the suggested workaround is to include the query hint OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')).

However, adding the query hint OPTION (FORCE ORDER) also resolves the issue.