How to Speed Up Simple Join

I am no good at SQL.

I am looking for a way to speed up a simple join like this:

  • DATEPART IN SELECT CASE Not working
  • Parse Json data and update in sql table via python
  • Where clause not working for Alphanumeric values
  • How to monitor SQL Server table changes by using c#?
  • SQL Server 2014 performance - parameterized SQL vs. literals
  • One 400GB table, One query - Need Tuning Ideas (SQL2005)
  • SELECT
        E.expressionID,
        A.attributeName,
        A.attributeValue
    FROM 
        attributes A
    JOIN
        expressions E
    ON 
        E.attributeId = A.attributeId
    

    I am doing this dozens of thousands times and it’s taking more and more as the table gets bigger.

    I am thinking indexes – If I was to speed up selects on the single tables I’d probably put nonclustered indexes on expressionID for the expressions table and another on (attributeName, attributeValue) for the attributes table – but I don’t know how this could apply to the join.

    EDIT: I already have a clustered index on expressionId (PK), attributeId (PK, FK) on the expressions table and another clustered index on attributeId (PK) on the attributes table

    I’ve seen this question but I am asking for something more general and probably far simpler.

    Any help appreciated!

  • How to retrieve data, based upon two date picker controls
  • Higher Query result with the DISTINCT Keyword?
  • SqlServer is in script upgrade mode
  • JPA SET IDENTITY_INSERT not working
  • Force partial join order in SQL Server
  • Different output when executing statement directly and from stored procedure?
  • 6 Solutions collect form web for “How to Speed Up Simple Join”

    You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don’t currently have those indexes in place, I think you’ll see a big speedup.

    In fact, because there are so few columns being returned, I would consider a covered index for this query

    i.e. an index that includes all the fields in the query.

    Some things you need to care about are indexes, the query plan and statistics.

    Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it’s not the 1st column, but it’s not as fast).

    Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down — but for now, just be aware that 99% of the time indexes are good).

    Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics — it’s been too long so I don’t have that info handy.

    That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.

    I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?

    Another thing to do is add some indexes like this:

    attributes.{attributeId, attributeName, attributeValue}
    expressions.{attributeId, expressionID}
    

    This is hacky! But useful if it’s a last resort.

    What this does is create a query plan that can be “entirely answered” by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).

    This is especially helpful if “attributes” or “expresssions” is a wide table. That is, a table that’s expensive to fetch the rows from.

    Finally, the best way to speed your query is to add a WHERE clause!

    If I’m understanding your schema correctly, you’re stating that your tables kinda look like this:

    Expressions: PK - ExpressionID, AttributeID
    Attributes:  PK - AttributeID
    

    Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.