Hibernate SQL In clause making CPU usage to 100%

In my java application I am using SQL server and Hibernate3 with EJB. When I tried to execute a select query with In clause, the DB server CPU usage reaches to 100%. But when I tried to run the same query in SQL management studio, the query is running without any CPU spikes. Application server and DB server are two different machines. My table has the following schema,

CREATE TABLE student_table (
       Student_Id BIGINT NOT NULL IDENTITY
     , Class_Id BIGINT NOT NULL
     , Student_First_Name VARCHAR(100) NOT NULL
     , Student_Last_Name VARCHAR(100)
     , Roll_No VARCHAR(100) NOT NULL
     , PRIMARY KEY (Student_Id)
     , CONSTRAINT UK_StudentUnique_1 UNIQUE  (Class_Id, Roll_No)
);

The table contains around 1000k records. My query is

  • MySql database structure for : Search based on single column and different value
  • Get nested XML output sql server
  • SQL Job having issues with transaction log
  • How to import a bak file into SQL Server Express
  • How to Interpret DateTimeOffset column in SQL Server?
  • Log table access using SQL Server Profiler
  • select Student_Id from student_table where Roll_No in ('A101','A102','A103',.....'A250');
    

    In clause contains 250 values, When I tried to run above query in SQL management studio the result is retrieved within 1 seconds and without any CPU spikes. But when I tried to run the same query through hibernate the CPU spikes reaches to 100% for around 60 seconds and result is retrieved around 60 seconds. The hibernate query is,

    Criteria studentCriteria = session.createCriteria(StudentTO.class);
    studentCriteria.add(Restrictions.in("rollNo", rollNoLists)); //rollNoLists is an Arraylist contains 250 Strings
    studentCriteria.setProjection(Projections.projectionList().add(Projections.property("studentId")));
    List<Long> studentIds = new ArrayList<Long>();
    List<Long> results = (ArrayList<Long>) studentCriteria.list();
    if (results != null && results.size() > 0) {
       studentIds.addAll(results);
    }
    return studentIds;
    

    What is the problem why it is so. If the same query is running through management studio the result is retrieved without any spikes and result is retrieved within 1 seconds. Any solution???

    Edit1:
    My hibernate generated query is,

    select this_.Student_Id as y0_ from student_table this_ where this_.Roll_No in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    

    Edit2:
    My execution plan
    This was after indexing roll_no

    CREATE INDEX i_student_roll_no ON student_table (Roll_No) 
    

    My execution plan,

    8 Solutions collect form web for “Hibernate SQL In clause making CPU usage to 100%”

    The query you run from the console is easily cacheable and that’s why the response is instantaneous. If you look at the query, you’ll see that all parameters are embedded in the query, so the query planner can detect there’s no variation and all executions will always go to the same plan and to the same cached result.

    The query that you run with Hibernate, even if it were a native query, it uses a PreparedStatement and parameters are bind at query execution time and to quote one of the best author on indexing:

    What has that to do with bind parameters?

    The shared execution plan caches of DB2, Oracle and SQL Server use a hash value of the literal SQL string as key to the cache. Cached
    plans are not found if the SQL contains literal values that vary with
    each execution.

    Place holders (bind parameters) unify the statement so that the SQL string is identical when executed with different values—thus,
    increasing the cache-hit rate.

    To solve it, you need to add an index on both the (Roll_No, Student_Id) columns so that the query becomes an index-only scan.

    SQL Server defaults to cluster indexes, which limit you to one clustered index per table, so you might want to turn this table into a heap table instead and focus on index-only scans.

    To answer the question “why it is slow through hibernate” you need to see the actual execution plan that is used by the server when you run your hibernate code, NOT the execution plan that the server uses when you run the query from SSMS. The screenshot with the execution plan that you included in the question doesn’t look like the actual plan that you get when running your hibernate code. Once you have that plan you can compare it to the plan you get from SSMS and the difference will most likely explain why it is slow in one case and fast in another.

    There is a very good article by Erland Sommarskog, which focuses on so-called “parameter sniffing”, which could be the reason for a problem here, but not likely. What is useful for us in this article here is that he explains how you can extract execution plan for examination from the cache.

    Without this information you can only guess. One guess is that you pass your parameters as nvarchar, but the indexed field Roll_No is varchar, so index is not used. The server converts your varchar column to nvarchar for the comparison, which means that index can’t be used => it is slow and the conversion may be the reason for high CPU usage. http://sqlperformance.com/2013/04/t-sql-queries/implicit-conversion-costs
    https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/


    Here is not an answer to your question, but a possible solution to the problem. Instead of passing 250 individual parameters to the query for the IN clause use table-valued parameter and pass your values as a table. In the query use JOIN instead of IN. This becomes especially true after your comments that you’ll have 100K parameters (which means that you want to run your query 400 times). In fact, 100K is a bit too much even for a table-valued parameter, so I’d consider having a permanent or temporary helper table, which will hold these Roll_No with proper index. The main query would JOIN to it. Something like this:

    CREATE TABLE RollNumbers (
         Roll_No VARCHAR(100) NOT NULL
         ,PRIMARY KEY (Roll_No)
    );
    

    Make sure there is index in table RollNumbers on Roll_No. Make sure there is index in table student_table on Roll_No. At first INSERT 100K values into RollNumbers and then use them in the main query:

    SELECT Student_Id 
    FROM
        student_table
        INNER JOIN RollNumbers ON RollNumbers.Roll_No = student_table.Roll_No
    

    Depending on the overall system the RollNumbers table could be a permanent table, temporary table or table variable.

    Check the datatype at hibernate level of all the fields used in the query and ensure it matches with you table definition. Frameworks like hibernate uses Unicode supported data types(e.g. nvarchar). Try to change the data type at either of the side.

    Alternatively, you can Add parameter called sendStringParametersAsUnicode in your connection string. It will force hibernate to use varchar instead nvarchar.

    Just give it a try and let us know!

    You’re probably thinking that, because your slow query takes 60 seconds, your “fast” query taking 1 second is actually fast. This is not the case. This execution speed difference keeps you from understanding the actual problem here.

    An additional problem (probably not the actual problem)

    The very simple type of query that you’re running should return results within less than a millisecond if you had an index on Roll_No, regardless if you’re using bind variables or inline values.

    I’m just assuming that you don’t have any indexes apart the ones that are generated from the constraints in your table. So, either, you should add a simple index on Roll_No:

    CREATE INDEX i_student_roll_no ON student_table (Roll_No);
    

    Or you could add an additional column to the above index in order to make it a “covering index” for this query (as explained by Vlad)

    CREATE INDEX i_student_roll_no2 ON student_table (Roll_No, Student_Id);
    

    This would make this particular query even faster, as the execution plan wouldn’t need to hit the disk again to fetch the Student_Id from the table. The information would already be contained in the index. Use covering indexes sparingly, though, as they:

    1. Use up more space, specifically for a medium size table like yours
    2. Work only well as long as your queries are limited to exactly the columns that are really covered, which is unlikely to stay this way in your case.

    How to recognise this using SQL Server Management Studio?

    There’s actually a very nice feature in SQL Server Management Studio. When you turn on execution plans (which you should), you’ll get this additional info about your query:

    SQL Server Management Studio execution plans

    Right click on that info and choose “Missing Index Details…” to get information similar to this one:

    /*
    Missing Index Details from SQLQuery1.sql - 
      LUKAS-ENVY\SQLEXPRESS.test (LUKAS-ENVY\Lukas (52))
    The Query Processor estimates that implementing the 
      following index could improve the query cost by 87.5035%.
    */
    
    /*
    USE [test]
    GO
    CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
    ON [dbo].[student_table] ([Roll_No])
    INCLUDE ([Student_Id])
    GO
    */
    

    Why the difference between SQL Server Mgmt Studio and Hibernate?

    Your original question has not yet been answered. Perhaps, the additional index fixes it, but perhaps you didn’t supply all the information. You could have:

    • Bind variable peeking issues
    • N+1 problems in Hibernate (with that many rows)

    By a strange coincidence, I ran into this problem just this week, and it’s not the Hibernate N+1 problem many people here are referring to. I use Amazon Redshift, with a Postgres Java driver.

    For reasons I won’t go into here, I was using a 7,000 parameter long WHERE IN () statement which, when run against the database, results in a 10-second query execution (it’s a complex query on a large table, and we haven’t setup indexing yet–still in development, we haven’t started tuning). When run through Hibernate, there’s a 120-second query execution time.

    I discovered that if you get the actual SQL string out of Hibernate, replace (?,?…?) with a String of the actual values, and run that (still through Hibernate), suddenly everything comes back in 10 seconds again.

    Digging around in the Hibernate internals, it turns out that they do a non-trivial amount of processing per parameter, resulting in an initial CPU spike and bloated execution times for statements with a large number of parameters.

    In addition, once the query’s eventually sent to the database, the database server’s CPU spikes to 100% for the duration when using parameters, but not without parameters. I haven’t checked exact timings for how much of all this processing is happening on which side of the fence, but it looks like using that many parameters is not viable either on the Hibernate side or the database side.

    The solution? Use fewer parameters. Or find a database that supports large parameter sets while remaining performant.

    We’re probably going to switch from Hibernate to jOOQ, since jOOQ allows you to define your own custom SQL fragments that work with the official DSL. Then, we’ll build the IN() clause manually without parameters. We can do this since our IN variables are internal IDs, so SQL injection isn’t a possibility, but if SQL injection is a possibility, make sure to sanitize your inputs.

    It is look like you got not all records when execute query not from hibernate, but in your code all records from query execution placed into collection.

    It’s very easy to determine the issue by running SQL Profiler. You will see exactly what SQL statements are being executed against the database in both scenarios.

    http://www.codeproject.com/Articles/21371/SQL-Server-Profiler-Step-by-Step

    I just point this part of LBushkin answer for you from this post

    Second, when using either IN or OR with a variable number of
    arguments, you are causing the database to have to re-parse the query
    and rebuild an execution plan each time the arguments change. Building
    the execution plan for a query can be an expensive step. Most
    databases cache the execution plans for the queries they run using the
    EXACT query text as a key. If you execute a similar query but with
    different argument values in the predicate – you will most likely
    cause the database to spend a significant amount of time parsing and
    building execution plans. This is why bind variables are strongly
    recommended as a way to ensure optimal query performance.

    So you can try binding variables to prevent running the execution plan each time

    Bind Variables Usage (Parameterized Queries in SQL Server)

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.