Stored procedures vs. inline SQL
I know stored procedures are more efficient through the execution path (than the inline sql in applications). However, when pressed, I'm not super knowlegeable about why.
I'd like to know the technical reasoning for this (in a way that I can explain it to someone later).
Can anyone help me formulate a good answer?
Answers 3
I believe this sentiment was true at one point, but not in current versions of SQL Server. The whole problem was that in the old days ad hoc SQL statements could not be properly optimized because SQL Server could only optimize / compile at the batch level. Now we have statement-level optimization, so a properly parameterized query coming from an application can take advantage of the same execution plan as that query embedded in a stored procedure.
I still prefer stored procedures from the DBA side for the following reasons (and several of them can have a huge impact on performance):
sys.sql_modules
, for references to specific objects) makes everyone's lives much easier.SET ANSI_WARNINGS ON
, and the other could haveSET ANSI_WARNINGS OFF
, and they would each have their own copy of the plan. The plan they get depends on the parameters in use, stats in place, etc. the first time the query is called in each case, which could lead to different plans and hence very different performance.That all said, this question is likely to stir up more religious arguments than technical debate. If we see that happening we'll probably shut it down.
TLDR: There is no appreciable performance difference between the two as long as your inline sql is parameterized.
These are the reason I've slowly phased out stored procedures:
We run a 'beta' application environment - an environment parallel to production that shares the production database. Because, db code is at the application level and that db structure changes are rare, we can allow people to confirm new functionality beyond QA and do deployments outside of the production deployment window but still provide production functionality and non-critical fixes. This would not be possible if half of the application code was in the DB.
We practice devops at the database level (octopus + dacpacs). However, while business layer and up can basically be purged and replaced and recovery just the reverse, that is not true for the incremental and potentially destructive changes that must go to the databases. Consequently, we prefer to keep our DB deployments lighter and less frequent.
In order to avoid near exact copies of the same code for optional parameters, we often will use a 'where @var is null or @var=table.field' pattern. With a stored proc, you're likely to get the same execution plan, despite rather different intents, and thus either experience performance problems or eliminating cached plans with 'recompile' hints. However, with a simple bit of code that appends a "signature" comment to the end of the sql, we can force different plans based on which variables were null (not to be interpreted as a different plan for all variable combinations - only null vs not null). Update 8/2020 - This bit no longer seems to be true or is now a lot harder to do as later versions of sql server have become pretty smart about ignoring trivial pieces of code.
I can make dramatic changes to the results with only minor changes on the fly to the sql. For example, I can have a statement that closes out with two CTEs, "Raw", and "ReportReady". There is nothing that says both CTEs must be used. My sql statement can then be:
...
select * from {(format)}"
This allows me to use the exact same business logic method for both a streamlined api call and a report that needs to be more detailed ensuring that I don't duplicate complicated logic.
There are valid reasons to use procs:
Security - You've got another layer here that the app must go thru. If the application service account is not allowed to touch tables, but only only have 'execute' permission on procs, you've got some extra protection. This doesn't make it a given as it does have a cost, but it is a possibility.
Reuse - While I would say that reuse should largely happen at the business layer to make sure you are not bypassing non-db related business rules, we do still have the occational, low-level "used everywhere" type of utility procs and functions.
There are some arguments that don't really support procs or are easily mitigated IMO:
Reuse - I mentioned this above as a "plus", but also wanted to mention it here that reuse should largely happen at the business layer. A proc to insert a record should not be considered "reusable" when the business layer might also be checking other non-db services.
Cache plan bloat - the only way this is going to be an issue is if you are concatenating values rather than parameterizing. The fact that you rarely get more than one plan per proc actually often hurts you when you have an 'or' in a query
Statement size - an extra kb of sql statements over proc name is typically going to be negligible relative to the data coming back. If it's ok for Entities, it's ok for me.
Seeing the exact query - Making queries easy to find in code is as simple as adding the calling location as a comment to the code. Making code copyable from c# code to ssms is as easy as some creative interpolation and comment usage:
Sql Injection - Parameterize your queries. Done. This can actually be undone if the proc is instead using dynamic sql.
Bypassing deployment - We practice devops at the database level as well, so this is not an option for us.
"Slow in the application, fast in SSMS" - This is a plan caching problem which affects both sides. The set options merely cause a new plan to be compiled that appears to fix the problem for THE ONE SET of variables. This only answers why you see different results - the set options themselves do NOT fix the problem of parameters sniffing.
Inline sql execution plans are not cached - Simply false. A parameterized statement, just like the proc name is quickly hashed and then a plan is searched for by that hash. It's 100% the same.
To be clear I'm talking about raw inline sql not generated code from an ORM - we only use Dapper which is a micro ORM at best.
https://weblogs.asp.net/fbouma/38178
https://stackoverflow.com/a/15277/852208
While I respect the submitter, I humbly disagree with the answer provided and not for "religious reasons". In other words, I believe there is no facility that Microsoft has provided which decreases the need for the guidance to use stored procedures.
Any guidance provided to a developer which favors the use of raw text SQL queries must be filled with many caveats, such that I think the most prudent advice is to greatly encourage the use of Stored Procedures and, discourage your developer teams from engaging in the practice of embedding SQL statements in code, or submitting raw, plain-old text-based SQL requests, outside of SQL SPROCs (stored procedures).
I think the simple answer to the question of why use a SPROC is as the submitter surmised: SPROCs are parsed, optimized, and compiled. As such, their query/execution plans are cached because you've saved a static representation of a query and you, normally, will be varying it only by parameters, which is not true in the case of copied/pasted SQL statements which likely morph from page-to-page-and component/tier, and are often variablized to the extent that different tables, even database names, can be specified from call-to-call. Allowing for this type of dynamic ad hoc SQL submission, greatly decreases the likelihood of the DB Engine to re-use the query plan for your ad hoc statements, according to some very strict rules. Here, I am making the distinction between dynamic ad hoc queries (in the spirit of the question raised) versus the use of the efficient System SPROC sp_executesql.
More specifically, there are the following components:
When a SQL statement is issued from a web page, termed an "ad hoc statement", the engine looks for an existing execution plan to handle the request. Because this is text submitted from a user, it will be ingested, parsed, compiled, and executed, if it is valid. At this time it will receive a query cost of zero. Query cost is used when the DB engine uses its algorithm in order to determine which execution plans to evict from cache.
Ad hoc queries receive an original query cost value of zero, by default. Upon subsequent execution of the exact same ad hoc query text, by another user process (or the same one), the current query cost is reset to the original compile cost. Since our ad hoc query compile cost is zero, this does not bode well for the possibility of reuse. Obviously, zero is the least-valued integer, but why would it be evicted?
When memory pressures arise, and they will if you have a often-used site, the DB engine uses a cleanup algorithm to determine how it can reclaim memory that the Procedure cache is using. It uses the current query cost to decide which plans to evict. As you might guess, plans with a cost of zero are the first to be evicted from cache because zero essentially means "no current users of, or references to, this plan".
Therefore, it is quite likely that such a plan will be evicted first when memory pressures arise.
So, if you have a server build-out with lots of memory "beyond your needs", you may not experience this issue as often as a busy server that has only "sufficient" memory to handle its workload. (Sorry, server memory capacity and utilization are somewhat subjective/relative, though the algorithm is not.)
Now, if I am factually incorrect about one or more points, I'm certainly open to being corrected.
Lastly, the author wrote:
"Now we have statement-level optimization, so a properly parameterized query coming from an application can take advantage of the same execution plan as that query embedded in a stored procedure."
I believe the author is referring to the "optimize for ad hoc workloads" option.
If so, this option allows for a two-step process which avoids immediately sending the full query plan to the Procedure cache. It only sends a smaller query stub there. If an exact query call is sent back down to the server while the query stub is still in the Procedure cache, the full query execution plan is saved to the Procedure cache, at that time. This saves on memory, which during memory pressure incidents, may allow the eviction algorithm to evict your stub less frequently than a larger query plan that was cached. Again, this depends upon your server memory and utilization.
However, you have to turn this option on, since it's off by default.
Lastly, I want to stress that, often, the very reason developers would embed SQL in pages, components, and other places, is because they wish to be flexible and submit dynamic SQL query to the database engine. Therefore, in a real-world Use Case, submitting the very same text, call-over-call, is unlikely to occur as are the caching/efficiencies we seek, when submitting ad hoc queries to SQL Server.
For additional information, please see:
https://technet.microsoft.com/en-us/library/ms181055(v=sql.105).aspx
http://sqlmag.com/database-performance-tuning/don-t-fear-dynamic-sql
Best,
Henry