SQL Server: Joining by primary key or using separate queries?

I have a simple question regarding the dichotomy of joining two tables vs using 2 separate queries.

I was hoping to find an existing question, but my search didn’t yield much (most questions were for more complex problems).

  • Is it possible to deploy an enterprise ASP.NET application and SQL schema changes with zero downtime?
  • The proper use of MSSQL “go” in VBA
  • sql query to manipulate strings
  • Comparing performance of OLEDB and Ado.Net
  • Encrypt In .NET / Decrypt in SQL Server
  • Conditional statement in sql server 2k5
  • For example, consider two tables, A and X, with a very simple schema:

    Table A   +-------------+-------------+-------------+
              | Column A (*)| Column X(FK)| Column C    |
              +-------------+-------------+-------------+
    
    Table X   +-------------+-------------+-------------+
              | Column X (*)| Column Y    | Column Z    |
              +-------------+-------------+-------------+
    

    Where columns A and X are identity columns and primary keys (bigint). There is also an existing foreign key relationship for column X between tables A and X.

    My question is, assuming both tables are sufficiently large (just say 500K rows), would I benefit more in terms of performance from using a single query (see Linq2Sql pseudo code below), or use two separate queries?

    Option 1:

    long aValue = 107;
    DataContext dc = new DataContext();
    var items = (from a in dc.TableA
                 join x in dc.TableX
                 on a.X equals x.X
                 where a.A == aValue
                 select new { a, x });
    

    Option 2:

    • Just assume I write an SP that does 2 separate select statements in serial.

    To further quantify the problem, you can assume for every value of A, there are only a few (0-5) rows that will be joined from Table Y, so the duplication of Table A data returned in the join is not significant.

    I’m asking strictly form a DB server impact standpoint. So ignoring any client-side considerations, (e.g. roundtrip networking latency, L2S query building and data marshalling costs, etc.) my questions are:

    1. Which option will take less time to compute on the DB server?

    2. Which option will require less memory to evaluate the result?

    3. Which option is generally preferred, if there is a best practice?

    Sorry if this sounds too rudimentary, but any insight will be appreciated.

    Thanks,
    – K.

  • Deploy SQL 2008 R2 MDS Functions without MDS
  • why sql server increment the Identity specification?
  • On table update, trigger an action in my .NET code
  • Dynamically Joining a table based on value in recordset
  • SQL Server : TOP along with Distinct
  • SQL Server query to hide duplicate rows column data. Don't want to remove a duplicate row
  • 2 Solutions collect form web for “SQL Server: Joining by primary key or using separate queries?”

    Short answer: Trust the optimizer.

    A Single query (especially with a simple join) against a well indexed table will be more effecient than writing a set of serial SQL statements. I’m not an expert in LINQ, so I’m not sure what columns you’ll be returning with your pseudocode, but if the tables are properly indexed on appropriate hardware, you’ll be fine.

    Well, from my experience in a busy database it is always better to run two queries without join, than one query with join.

    There will be always one difference between these two approaches: when you join, the SQL must match the rows somehow (probably with nested loops when there is small number of rows predicted to be returned). When workload or result sets are big it starts to matter.

    If you don’t expect your database to be a bottleneck and this join simplifies development somehow, then go ahead with it.


    To provide you any numbers on larger tables I have tried two approaches against parent & child tables on the DB I administer. They have 150k rows & 3000k rows respectively. SQL prints those statistics:

    Join query

    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 7 ms.
    Table 'child'. Scan count 1, logical reads 324, physical reads 0, read-ahead reads 0.
    Table 'parent'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.
    
    SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 70 ms.
    

    Separate selects

    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    Table 'parent'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.
    
    SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    
    (1 row(s) affected)
    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    Table 'child'. Scan count 1, logical reads 324, physical reads 0, read-ahead reads 0.
    
    SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 1 ms.
    

    (tables have proper indexes in place; table names are changed)

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.