Does the order of columns in a WHERE clause matter?

Does the order of the columns in a WHERE clause effect performance?

e.g.

  • Visual Studio 2010 complains about database versions
  • Go with SQL Server driver is unable to connect successfully, login fail
  • Mapping SqlGeography with Dapper
  • How Read JSON Data From URL In SQL Server
  • SSRS 2014 report deploy issue
  • How do I get records before and after given one?
  • Say I put a column that has a higher potential for uniqueness first or visa versa?

  • Calculate the Last Day in the CURRENT quarter
  • join two tables and generate two columns(pivot) depending on value in one column
  • SQL update based on value from other table
  • Find SQL query errors without executing the SQL query in SQL server
  • Entity Framework 5 mixing Oracle & SQL Server
  • Version Controlling Database that is used by multiple projects
  • 7 Solutions collect form web for “Does the order of columns in a WHERE clause matter?”

    With a decent query optimiser: it shouldn’t.

    But in practice, I suspect it might.

    You can only tell for your cases by measuring. And the measurements will likely change as the distribution of data changes in the database.

    For Transact-SQL there is a defined precedence for operators in the condition of the WHERE clause. The optimizer may re-order this evaluation, so you shouldn’t rely on short-circuiting behavior for correctness. The order is generally left to right, but selectivity/availability of indexes probably also matters. Simplifying your search condition should improve the ability of the optimizer to handle it.

    Ex:

     WHERE (a OR b) AND (b OR c)
    

    could be simplified to

     WHERE b OR (a AND c)
    

    Clearly in this case if the query can be constructed to find if b holds first it may be able to skip the evaluation of a and c and thus would run faster. Whether the optimizer can do this simple transformation I can’t answer (it may be able to), but the point is that it probably can’t do arbitrarily complex transformations and you may be able to effect query performance by rearranging your condition. If b is more selective or has an index, the optimizer would likely be able to construct a query using it first.

    EDIT: With regard to your question about ordering based on uniqueness, I would assume that the any hints you can provide to the optimizer based on your knowledge (actual, not assumed) of the data couldn’t hurt. Pretend that it won’t do any optimization and construct your query as if you needed to define it from most to least selective, but don’t obsess about it until performance is actually a problem.

    Quoting from the reference above:

    The order of precedence for the logical operators is NOT (highest),
    followed by AND, followed by OR. Parentheses can be used to override
    this precedence in a search condition. The order of evaluation of
    logical operators can vary depending on choices made by the query
    optimizer.

    For SQL Server 2000 / 20005 / 2008, the query optimizer usually will give you identical results no matter how you arrange the columns in the WHERE clause. Having said this, over the years of writing thousands of T-SQL commands I have found a few corner cases where the order altered the performance. Here are some characteristics of the queries that appeared to be subject to this problem:

    1. If you have a large number of tables in your query (10 or more).

    2. If you have several EXISTS, IN, NOT EXISTS, or NOT IN statements in your WHERE clause

    3. If you are using nested CTE’s (common-table expressions) or a large number of CTE’s.

    4. If you have a large number of sub-queries in your FROM clause.

    Here are some tips on trying to evaluate the best way to resolve the performance issue quickly:

    1. If the problem is related to 1 or 2, then try reordering the WHERE clause and compare the sub-tree cost of the queries in the estimated query plans.

    2. If the problem is related to 3 or 4, then try moving the sub-queries and CTE’s out of the query and have them load temporary tables. The query plan optimizer is FAR more efficient at estimating query plans if you reduce the number of complex joins and sub-queries from the body of the T-SQL statement.

    3. If you are using temporary tables, then make certain you have specified primary keys for the temporary tables. This means avoid using SELECT INTO FROM to generate the table. Instead, explicitly create the table and specify a primary KEY before using an INSERT INTO SELECT statement.

    4. If you are using temporary tables and MANY processes on the server use temporary tables as well, then you may want to make a more permanent staging table that is truncated and reloaded during the query process. You are more likely to encounter disk contention issues if you are using the TempDB to store your working / staging tables.

    5. Move the statements in the WHERE clause that will filter the most data to the beginning of the WHERE clause. Please note that if this is your solution to the problem, then you will probably have poor performance again down the line when the query plan gets confused again about generating and picking the best execution plan. You are BEST off finding a way to reduce the complexity of the query so that the order of the WHERE clause is no longer relevant.

    I hope you find this information helpful. Good luck!

    It all depends on the DBMS, query optimizer and rules, but generally it does affect performance.

    If a where clause is ordered such that the first condition reduces the resultset significantly, the remaining conditions will only need to be evaluated for a smaller set. Following that logic, you can optimize a query based on condition order in a where clause.

    In theory any two queries that are equivalent should produce identical query plans. As the order of WHERE clauses has no effect on the logical meaning of the query, this should mean that the order of the WHERE clause should have no effect.

    This is because of the way that the query optimiser works. In a vastly simplified overview:

    1. First SQL Server parses the query and constructs a tree of logical operators (e.g JOIN or SELECT).
    2. Then it translates these logical operators into a “tree of physcial operations” (e.g. “Nested Loops” or “Index scan”, i.e. an execution plan)
    3. Next it permutates through the set of equivalent “trees of physcial operations” (i.e. execution plans) by swapping out equivalent operations, estimating the cost of each plan until it finds the optimal one.

    The second step is done is a completely nieve way – it simply chooses the first / most obvious physical tree that it can, however in the 3rd step the query optimiser is able to look through all equivalent physical trees (i.e. execution plans), and so as long as the queries are actually equivalent it doesn’t matter what initial plan we get in step 2, the set of plans all plans to be considered in step 3 is the same.

    (I can’t remember the real names for the logical / physical trees, they are in a book but unfortunately the book is the other side of the world from me right now)

    See the following series of blog articles for more detail Inside the Optimizer: Constructing a Plan – Part 1

    In reality however often the query optimiser doesn’t have the chance to consider all equivalent trees in step 3 (for complex queries there can be a massive number of possible plans), and so after a certain cutoff time step 3 is cut short and the query optimiser has to choose the best plan that it has found so far – in this case not all plans will be considered.

    There is a lot of behind the sceene magic that goes on to ensure that the query optimiser selectively and inteligently chooses plans to consider, and so most of the time the plan choses is “good enough” – even if its not the absolute fastest plan, its probably not that much slower than the theoretical fastest,

    What this means however is that if we have a different starting plan in step 2 (which might happen if we write our query differently), this potentially means that a different subset of plans is considered in step 3, and so in theory SQL Server can come up with different query plans for equivalent queries depending on the way that they were written.

    In reality however 99% of the time you aren’t going to notice the difference (for many simple plans there wont be any difference as the optimiser will actually consider all plans). Also you can’t predict how any of this is going to work and so things that might seem sensible (like putting the WHERE clauses in a certain order), might not have anything like the expected effect.

    In the vast majority of cases the query optimizer will determine the most efficient way to select the data you have requested, irrespective of the ordering of the SARGS defined in the WHERE clause.

    The ordering is determined by factors such as the selectivity of the column in question (which SQL Server knows based on statistics) and whether or not indexes can be used.

    If you are ANDing conditions the first not true will return false, so order can affect performance.

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.